Changeset 650 for trunk/libs/libalmosmkh/almosmkh.h
- Timestamp:
- Nov 14, 2019, 11:44:12 AM (4 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/libs/libalmosmkh/almosmkh.h
r647 r650 101 101 * @ level : [in] macro-cluster level in [1,2,3,4,5]. 102 102 * @ cxy : [out] selected core cluster identifier. 103 * @ lid : [out] select od core local index.103 * @ lid : [out] selected core local index. 104 104 * @ return 0 if success / 1 if no core in macro-cluster / -1 if illegal arguments. 105 105 **************************************************************************************/ … … 415 415 * This function releases the memory buffer identified by the <ptr> argument, 416 416 * to the store identified by the <cxy> argument. 417 * It displays an error message, but does nothingif the ptr is illegal.417 * It does nothing, but displays an error message, if the ptr is illegal. 418 418 ***************************************************************************************** 419 419 * @ ptr : pointer on the released buffer. … … 456 456 457 457 ////////////////////////////////////////////////////////////////////////////////////////// 458 // This system call can be used to parallelize the creation and the termination 459 // of a parallel multi-threaded application. It removes the loop in the main thread that 460 // creates the N working threads (N sequencial pthread_create() ). It also removes the 461 // loop that waits completion of these N working threads (N sequencial pthread_join() ). 462 // It creates one "work" thread (in detached mode) per core in the target architecture. 463 // Each "work" thread is identified by the [cxy][lid] indexes (cluster / local core). 464 // The pthread_parallel_create() function returns only when all "work" threads completed 458 // This syscall can be used to parallelize the creation, and the termination 459 // of a parallel multi-threaded application. 460 // It removes in the main thread the sequencial loop that creates the N working threads 461 // (N pthread_create() ), and removes also the sequencial loop that waits completion 462 // of these N working threads (N pthread_join() ). 463 // It creates one <work> thread (in detached mode) per core in the target architecture. 464 // Each <work> thread is identified by a continuous [tid] index. 465 // For a regular architecture, defined by the [x_size , y_size , ncores] parameters, 466 // the number of working threads can be the simply computed as (x_size * y_size * ncores), 467 // and the coordinates[x,y,lid] of the core running the thread[tid] cand be directly 468 // derived from the [tid] value with the following relations: 469 // . cid = (x * y_size) + y 470 // . tid = (cid * ncores ) + lid 471 // . lid = tid % ncores 472 // . cid = tid / ncores 473 // . y = cid % y_size 474 // . x = cid / y_size 475 // The pthread_parallel_create() function returns only when all <work> threads completed 465 476 // (successfully or not). 466 477 // 467 // To use this system call, the application code must define the following structures: 468 // - To define the arguments to pass to the <work> function the application must allocate 469 // and initialize a first 2D array, indexed by [cxy] and [lid] indexes, where each slot 470 // contains an application specific structure, and another 2D array, indexed by the same 471 // indexes, containing pointers on these structures. This array of pointers is one 472 // argument of the pthread_parallel_create() function. 473 // - To detect the completion of the <work> threads, the application must allocate a 1D 474 // array, indexed by the cluster index [cxy], where each slot contains a pthread_barrier 475 // descriptor. This barrier is initialised by the pthread_parallel_create() function, 476 // in all cluster containing at least one work thread. This array of barriers is another 477 // argument of the pthread_parallel_create() function. 478 // WARNING : The function executed by the working thread is application specific, 479 // but the structure defining the arguments passed to this function is imposed. 480 // The "pthread_parallel_work_args_t" structure is defined below, and contains 481 // two fields: the tid value, and a pointer on a pthread_barrier_t. 482 // This barrier must be used by each working thread to signal completion before exit. 483 // The global variables implementing these stuctures for each working thread 484 // are allocated and initialised by the pthread_parallel_create() function. 478 485 // 479 // Implementation note: 480 // To parallelize the "work" threads creation and termination, the pthread_parallel_create() 481 // function creates a distributed quad-tree (DQT) of "build" threads covering all cores 482 // required to execute the parallel application. 486 // Implementation note: the pthread_parallel_create()a function creates a distributed 487 // quad-tree (DQT) of <build> threads covering all cores required to execute the parallel 488 // application. This quad tree is entirely defined by the root_level parameter. 483 489 // Depending on the hardware topology, this DQT can be truncated, (i.e. some 484 490 // parent nodes can have less than 4 chidren), if (x_size != y_size), or if one size 485 // is not a power of 2. Each "build" thread is identified by two indexes [cxy][level].486 // Each "build"thread makes the following tasks:491 // is not a power of 2. Each <build> thread is identified by two indexes [cid][level]. 492 // Each <build> thread makes the following tasks: 487 493 // 1) It calls the pthread_create() function to create up to 4 children threads, that 488 // are are "work" threads when (level == 0), or "build"threads, when (level > 0).489 // 2) It initializes the barrier (global variable), used to block/unblock490 // the parent thread untilchildren completion.494 // are <work> threads when (level == 0), or <build> threads, when (level > 0). 495 // 2) It allocates and initializes the barrier, used to block the parent thread until 496 // children completion. 491 497 // 3) It calls the pthread_barrier_wait( self ) to wait until all children threads 492 498 // completed (successfully or not). … … 495 501 496 502 /***************************************************************************************** 497 * This blocking function creates N working threads that execute the code defined 498 * by the <work_func> and <work_args> arguments, and returns only when all working 499 * threads completed. 500 * The number N of created threads is entirely defined by the <root_level> argument. 501 * This value defines an abstract quad-tree, with a square base : level in [0,1,2,3,4], 503 * structure defining the arguments for the <build> thread function 504 ****************************************************************************************/ 505 typedef struct pthread_parallel_build_args_s 506 { 507 unsigned char cid; // this <build> thread cluster index 508 unsigned char level; // this <build> thread level in quad-tree 509 unsigned char parent_cid; // parent <build> thread cluster index 510 pthread_barrier_t * parent_barrier; // pointer on parent <build> thread barrier 511 unsigned char root_level; // quad-tree root level 512 void * work_func; // pointer on working thread function 513 unsigned int x_size; // platform global parameter 514 unsigned int y_size; // platform global parameter 515 unsigned int ncores; // platform global parameter 516 unsigned int error; // return value : 0 if success 517 } 518 pthread_parallel_build_args_t; 519 520 /***************************************************************************************** 521 * structure defining the arguments for the <work> thread function 522 ****************************************************************************************/ 523 typedef struct pthread_parallel_work_args_s 524 { 525 unsigned int tid; // thread identifier 526 pthread_barrier_t * barrier; // to signal completion 527 } 528 pthread_parallel_work_args_t; 529 530 /***************************************************************************************** 531 * This blocking function creates N working threads identified by the [tid] continuous 532 * index, that execute the code defined by the <work_func> argument, and returns only 533 * when all working threads completed. 534 * The number N of created threads is entirely defined by the <root_level> argument, 535 * that defines an abstract quad-tree, with a square base : root_level in [0,1,2,3,4], 502 536 * side in [1,2,4,8,16], nclusters in [1,4,16,64,256]. This base is called macro_cluster. 503 * A working thread is created on all cores contained in th e specifiedmacro-cluster.537 * A working thread is created on all cores contained in this abstract macro-cluster. 504 538 * The actual number of physical clusters containing cores can be smaller than the number 505 * of clusters covered by the quad tree. The actual number of cores in a cluster can be 506 * less than the max value. 507 * 508 * In the current implementation, all threads execute the same <work_func> function, 509 * on different arguments, that are specified as a 2D array of pointers <work_args>. 510 * This can be modified in a future version, where the <work_func> argument can become 511 * a 2D array of pointers, to have one specific function for each thread. 539 * of clusters covered by the abstract quad tree. 540 * All threads execute the same <work_func> function, on different arguments, that are 541 * specified as an array of structures pthread_parallel_work_args_t, allocated and 542 * initialised by this function. 512 543 ***************************************************************************************** 513 544 * @ root_level : [in] DQT root level in [0,1,2,3,4]. 514 545 * @ work_func : [in] pointer on start function. 515 * @ work_args_array : [in] pointer on a 2D array of pointers.516 * @ parent_barriers_array : [in] pointer on a 1D array of barriers.517 546 * @ return 0 if success / return -1 if failure. 518 547 ****************************************************************************************/ 519 int pthread_parallel_create( unsigned int root_level, 520 void * work_func, 521 void * work_args_array, 522 void * parent_barriers_array ); 548 int pthread_parallel_create( unsigned int root_level, 549 void * work_func ); 550 551 552 553 523 554 524 555 /********* Non standard (ALMOS-MKH specific) Frame Buffer access syscalls *************/
Note: See TracChangeset
for help on using the changeset viewer.