Ignore:
Timestamp:
Nov 14, 2019, 11:44:12 AM (4 years ago)
Author:
alain
Message:

Simplify the pthread_parallel_create() syscall.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/libs/libalmosmkh/almosmkh.h

    r647 r650  
    101101 * @ level    : [in]  macro-cluster level in [1,2,3,4,5].
    102102 * @ cxy      : [out] selected core cluster identifier.
    103  * @ lid      : [out] selectod core local index.
     103 * @ lid      : [out] selected core local index.
    104104 * @ return 0 if success / 1 if no core in macro-cluster / -1 if illegal arguments.
    105105 **************************************************************************************/
     
    415415 * This function releases the memory buffer identified by the <ptr> argument,
    416416 * to the store identified by the <cxy> argument.
    417  * It displays an error message, but does nothing if the ptr is illegal.
     417 * It  does nothing, but displays an error message, if the ptr is illegal.
    418418 *****************************************************************************************
    419419 * @ ptr   : pointer on the released buffer.
     
    456456
    457457//////////////////////////////////////////////////////////////////////////////////////////
    458 // This system call can be used to parallelize the creation and the termination
    459 // of a parallel multi-threaded application. It removes the loop in the main thread that
    460 // creates the N working threads (N  sequencial pthread_create() ). It also removes the
    461 // loop that waits completion of these N working threads (N sequencial pthread_join() ).
    462 // It creates one "work" thread (in detached mode) per core in the target architecture.
    463 // Each "work" thread is identified by the [cxy][lid] indexes (cluster / local core).
    464 // The pthread_parallel_create() function returns only when all "work" threads completed
     458// This syscall can be used to parallelize the creation, and the termination
     459// of a parallel multi-threaded application.
     460// It removes in the main thread the sequencial loop that creates the N working threads
     461// (N pthread_create() ), and removes also the sequencial loop that waits completion
     462// of these N working threads (N pthread_join() ).
     463// It creates one <work> thread (in detached mode) per core in the target architecture.
     464// Each <work> thread is identified by a continuous [tid] index.
     465// For a regular architecture, defined by the [x_size , y_size , ncores] parameters,
     466// the number of working threads can be the simply computed as (x_size * y_size * ncores),
     467// and the coordinates[x,y,lid] of the core running the thread[tid] cand be directly
     468// derived from the [tid] value with the following relations:
     469//     . cid = (x * y_size) + y
     470//     . tid = (cid * ncores ) + lid
     471//     . lid = tid % ncores
     472//     . cid = tid / ncores
     473//     . y   = cid % y_size
     474//     . x   = cid / y_size
     475// The pthread_parallel_create() function returns only when all <work> threads completed
    465476// (successfully or not).
    466477//
    467 // To use this system call, the application code must define the following structures:
    468 // - To define the arguments to pass to the <work> function the application must allocate
    469 //   and initialize a first 2D array, indexed by [cxy] and [lid] indexes, where each slot
    470 //   contains an application specific structure, and another 2D array, indexed by the same
    471 //   indexes, containing pointers on these structures. This array of pointers is one
    472 //   argument of the pthread_parallel_create() function.
    473 // - To detect the completion of the <work> threads, the application must allocate a 1D
    474 //   array, indexed by the cluster index [cxy], where each slot contains a pthread_barrier
    475 //   descriptor. This barrier is initialised by the pthread_parallel_create() function,
    476 //   in all cluster containing at least one work thread. This array of barriers is another
    477 //   argument of the pthread_parallel_create() function.
     478// WARNING : The function executed by the working thread is application specific,
     479// but the structure defining the arguments passed to this function is imposed.
     480// The "pthread_parallel_work_args_t" structure is defined below, and contains
     481// two fields: the tid value, and a pointer on a pthread_barrier_t.
     482// This barrier must be used by each working thread to signal completion before exit.
     483// The global variables implementing these stuctures for each working thread
     484// are allocated and initialised by the pthread_parallel_create() function.
    478485//
    479 // Implementation note:
    480 // To parallelize the "work" threads creation and termination, the pthread_parallel_create()
    481 // function creates a distributed quad-tree (DQT) of "build" threads covering all cores
    482 // required to execute the parallel application.
     486// Implementation note: the pthread_parallel_create()a function creates a distributed
     487// quad-tree (DQT) of <build> threads covering all cores required to execute the parallel
     488// application. This quad tree is entirely defined by the root_level parameter.
    483489// Depending on the hardware topology, this DQT can be truncated, (i.e. some
    484490// parent nodes can have less than 4 chidren), if (x_size != y_size), or if one size
    485 // is not a power of 2. Each "build" thread is identified by two indexes [cxy][level].
    486 // Each "build" thread makes the following tasks:
     491// is not a power of 2. Each <build> thread is identified by two indexes [cid][level].
     492// Each <build> thread makes the following tasks:
    487493// 1) It calls the pthread_create() function to create up to 4 children threads, that
    488 //    are are "work" threads when (level == 0), or "build" threads, when (level > 0).
    489 // 2) It initializes the barrier (global variable), used to block/unblock
    490 //    the parent thread until children completion.
     494//    are <work> threads when (level == 0), or <build> threads, when (level > 0).
     495// 2) It allocates and initializes the barrier, used to block the parent thread until
     496//    children completion.
    491497// 3) It calls the pthread_barrier_wait( self ) to wait until all children threads
    492498//    completed (successfully or not).
     
    495501
    496502/*****************************************************************************************
    497  * This blocking function creates N working threads that execute the code defined
    498  * by the <work_func> and <work_args> arguments, and returns only when all working
    499  * threads completed.
    500  * The number N of created threads is entirely defined by the <root_level> argument.
    501  * This value defines an abstract quad-tree, with a square base : level in [0,1,2,3,4],
     503 *    structure defining the arguments for the <build> thread function
     504 ****************************************************************************************/
     505typedef struct pthread_parallel_build_args_s           
     506{
     507    unsigned char        cid;                    // this <build> thread cluster index
     508    unsigned char        level;                  // this <build> thread level in quad-tree
     509    unsigned char        parent_cid;             // parent <build> thread cluster index
     510    pthread_barrier_t  * parent_barrier;         // pointer on parent <build> thread barrier
     511    unsigned char        root_level;             // quad-tree root level
     512    void               * work_func;              // pointer on working thread function
     513    unsigned int         x_size;                 // platform global parameter
     514    unsigned int         y_size;                 // platform global parameter
     515    unsigned int         ncores;                 // platform global parameter
     516    unsigned int         error;                  // return value : 0 if success
     517}
     518pthread_parallel_build_args_t;
     519
     520/*****************************************************************************************
     521 *    structure defining the arguments for the <work> thread function
     522 ****************************************************************************************/
     523typedef struct pthread_parallel_work_args_s           
     524{
     525    unsigned int         tid;                    // thread identifier
     526    pthread_barrier_t  * barrier;                // to signal completion
     527}
     528pthread_parallel_work_args_t;           
     529
     530/*****************************************************************************************
     531 * This blocking function creates N working threads identified by the [tid] continuous
     532 * index, that execute the code defined by the <work_func> argument, and returns only
     533 * when all working threads completed.
     534 * The number N of created threads is entirely defined by the <root_level> argument,
     535 * that defines an abstract quad-tree, with a square base : root_level in [0,1,2,3,4],
    502536 * side in [1,2,4,8,16], nclusters in [1,4,16,64,256]. This base is called  macro_cluster.
    503  * A working thread is created on all cores contained in the specified macro-cluster.
     537 * A working thread is created on all cores contained in this abstract macro-cluster.
    504538 * The actual number of physical clusters containing cores can be smaller than the number
    505  * of clusters covered by the quad tree. The actual number of cores in a cluster can be
    506  * less than the max value.
    507  *
    508  * In the current implementation, all threads execute the same <work_func> function,
    509  * on different arguments, that are specified as a 2D array of pointers <work_args>.
    510  * This can be modified in a future version, where the <work_func> argument can become
    511  * a 2D array of pointers, to have one specific function for each thread.
     539 * of clusters covered by the abstract quad tree.
     540 * All threads execute the same <work_func> function, on different arguments, that are
     541 * specified as an array of structures pthread_parallel_work_args_t, allocated and
     542 * initialised by this function.
    512543 *****************************************************************************************
    513544 * @ root_level            : [in]  DQT root level in [0,1,2,3,4].
    514545 * @ work_func             : [in]  pointer on start function.
    515  * @ work_args_array       : [in]  pointer on a 2D array of pointers.
    516  * @ parent_barriers_array : [in]  pointer on a 1D array of barriers.
    517546 * @ return 0 if success / return -1 if failure.
    518547 ****************************************************************************************/
    519 int pthread_parallel_create( unsigned int   root_level,
    520                              void         * work_func,
    521                              void         * work_args_array,
    522                              void         * parent_barriers_array );
     548int pthread_parallel_create( unsigned int         root_level,
     549                             void               * work_func );
     550
     551
     552
     553
    523554
    524555/********* Non standard (ALMOS-MKH specific) Frame Buffer access syscalls   *************/
Note: See TracChangeset for help on using the changeset viewer.