wiki:kernel_barriers

Version 4 (modified by alain, 9 years ago) (diff)

--

GIET-VM / Barriers access functions

The kernel_barriers.c and kernel_barriers.h files define the functions used by the kernel to accesss synchronisation barriers between several concurrent tasks.

The GIET_VM kernel define two types of barriers:

  1. The simple_barrier_t implements a non-distributed toggle barrier. the number of expected tasks can be defined by the software. It can be safely used several times.
  1. The sqt_barrier_t is physically distributed on all clusters, and is intended to avoid contention on a single cluster when a barrier is shared by a large number of tasks. It is implemented as a Synchronisation Quad Tree (SQT). For now, the number of expected tasks is defined by the number of processors specified in the mapping (NB_TOTAL_PROCS).

All access functions are prefixed by "_" to remind that they can only be executed by a processor in kernel mode.

The simple_barrier_t and sqt_barrier_t, structures are implemented to have one single barrier node in a 64 bytes cache line, and should be aligned on a cache line boundary.

Simple barrier access functions

void _simple_barrier_init( simple_barrier_t * barrier, unsigned int ntasks )

This function initialises the barrier.

  • barrier pointer on the barrier
  • ntasks number of expected tasks.

void _simple_barrier_wait( simple_barrier_t * barrier )

This function is blocking until all expected tasks reached the barrier. It uses a toggle condition to avoid race conditions when the same barrier is used several times. It uses the _atomic_increment() kernel function to compute the number of arrived tasks.

Distributed barrier access functions

void _sqt_barrier_init( sqt_barrier_t* barrier )

This function allocates and initialises the distributed SQT barrier nodes on clusters. The number of expected tasks is defined by the NB_TOTAL_PROCS parameter defined in the hard_config.h file. The SBT footprint is computed to cover all clusters containing processors in the 2D mash (X_SIZE / Y_SIZE). The SQT can be "uncomplete" as SQT barrier nodes are only build in clusters containing processors. The actual number of SQT barriers nodes in a cluster[x][y] depends on (x,y). Ther is at least 1 node / at most 5 nodes per cluster:

  • barrier node arbitrating between all processors of 1 cluster has level 0,
  • barrier node arbitrating between all processors of 4 clusters has level 1,
  • barrier node arbitrating between all processors of 16 clusters has level 2,
  • barrier node arbitrating between all processors of 64 clusters has level 3,
  • barrier node arbitrating between all processors of 256 clusters has level 4,

This function uses the _remote_malloc() function, and the distributed kernel heap[x][y] segments.

void _sqt_barrier_wait( sqt_lock_t* barrier )

This function is blocking until all expected tasks reached the barrier. It uses a toggle condition to avoid race conditions when the same barrier is used several times. It uses the _atomic_increment() kernel function to compute the number of arrived tasks.