Changeset 637 for trunk/libs/libalmosmkh/almosmkh.h
- Timestamp:
- Jul 18, 2019, 2:06:55 PM (5 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/libs/libalmosmkh/almosmkh.h
r629 r637 2 2 * almosmkh.h - User level ALMOS-MKH specific library definition. 3 3 * 4 * Author Alain Greiner (2016,2017,2018 )4 * Author Alain Greiner (2016,2017,2018,2019) 5 5 * 6 6 * Copyright (c) UPMC Sorbonne Universites … … 72 72 73 73 /*************************************************************************************** 74 * This syscall returns the cluster an local index for the calling core. 74 * This syscall returns the cluster identifier and the local index 75 * for the calling core. 75 76 *************************************************************************************** 76 77 * @ cxy : [out] cluster identifier. … … 78 79 * @ return always 0. 79 80 **************************************************************************************/ 80 int get_core( unsigned int * cxy, 81 unsigned int * lid ); 81 int get_core_id( unsigned int * cxy, 82 unsigned int * lid ); 83 84 /*************************************************************************************** 85 * This syscall returns the number of cores in a given cluster. 86 *************************************************************************************** 87 * @ cxy : [in] target cluster identifier. 88 * @ ncores : [out] number of cores in target cluster. 89 * @ return always 0. 90 **************************************************************************************/ 91 int get_nb_cores( unsigned int cxy, 92 unsigned int * ncores ); 93 94 /*************************************************************************************** 95 * This syscall uses the DQDT to search, in a macro-cluster specified by the 96 * <cxy_base> and <level> arguments arguments, the core with the lowest load. 97 * it writes in the <cxy> and <lid> buffers the selected core cluster identifier 98 * and the local core index. 99 *************************************************************************************** 100 * @ cxy_base : [in] any cluster identifier in macro-cluster.in clusters array. 101 * @ level : [in] macro-cluster level in [1,2,3,4,5]. 102 * @ cxy : [out] selected core cluster identifier. 103 * @ lid : [out] selectod core local index. 104 * @ return 0 if success / 1 if no core in macro-cluster / -1 if illegal arguments. 105 **************************************************************************************/ 106 int get_best_core( unsigned int cxy_base, 107 unsigned int level, 108 unsigned int * cxy, 109 unsigned int * lid ); 82 110 83 111 /*************************************************************************************** 84 * This function returns the calling core cycles counter,112 * This function returns the value contained in the calling core cycles counter, 85 113 * taking into account a possible overflow on 32 bits architectures. 86 114 *************************************************************************************** … … 414 442 unsigned int cxy ); 415 443 444 /********* Non standard (ALMOS-MKH specific) pthread_parallel_create() syscall *********/ 445 446 ////////////////////////////////////////////////////////////////////////////////////////// 447 // This system call can be used to parallelize the creation and the termination 448 // of a parallel multi-threaded application. It removes the loop in the main thread that 449 // creates the N working threads (N sequencial pthread_create() ). It also removes the 450 // loop that waits completion of these N working threads (N sequencial pthread_join() ). 451 // It creates one "work" thread (in detached mode) per core in the target architecture. 452 // Each "work" thread is identified by the [cxy][lid] indexes (cluster / local core). 453 // The pthread_parallel_create() function returns only when all "work" threads completed 454 // (successfully or not). 455 // 456 // To use this system call, the application code must define the following structures: 457 // - To define the arguments to pass to the <work> function the application must allocate 458 // and initialize a first 2D array, indexed by [cxy] and [lid] indexes, where each slot 459 // contains an application specific structure, and another 2D array, indexed by the same 460 // indexes, containing pointers on these structures. This array of pointers is one 461 // argument of the pthread_parallel_create() function. 462 // - To detect the completion of the <work> threads, the application must allocate a 1D 463 // array, indexed by the cluster index [cxy], where each slot contains a pthread_barrier 464 // descriptor. This barrier is initialised by the pthread_parallel_create() function, 465 // in all cluster containing at least one work thread. This array of barriers is another 466 // argument of the pthread_parallel_create() function. 467 // 468 // Implementation note: 469 // To parallelize the "work" threads creation and termination, the pthread_parallel_create() 470 // function creates a distributed quad-tree (DQT) of "build" threads covering all cores 471 // required to execute the parallel application. 472 // Depending on the hardware topology, this DQT can be truncated, (i.e. some 473 // parent nodes can have less than 4 chidren), if (x_size != y_size), or if one size 474 // is not a power of 2. Each "build" thread is identified by two indexes [cxy][level]. 475 // Each "build" thread makes the following tasks: 476 // 1) It calls the pthread_create() function to create up to 4 children threads, that 477 // are are "work" threads when (level == 0), or "build" threads, when (level > 0). 478 // 2) It initializes the barrier (global variable), used to block/unblock 479 // the parent thread until children completion. 480 // 3) It calls the pthread_barrier_wait( self ) to wait until all children threads 481 // completed (successfully or not). 482 // 4) It calls the pthread_barrier_wait( parent ) to unblock the parent thread. 483 ////////////////////////////////////////////////////////////////////////////////////////// 484 485 /***************************************************************************************** 486 * This blocking function creates N working threads that execute the code defined 487 * by the <work_func> and <work_args> arguments. 488 * The number N of created threads is entirely defined by the <root_level> argument. 489 * This value defines an abstract quad-tree, with a square base : level in [0,1,2,3,4], 490 * side in [1,2,4,8,16], nclusters in [1,4,16,64,256]. This base is called macro_cluster. 491 * A working thread is created on all cores contained in the specified macro-cluster. 492 * The actual number of physical clusters containing cores can be smaller than the number 493 * of clusters covered by the quad tree. The actual number of cores in a cluster can be 494 * less than the max value. 495 * 496 * In the current implementation, all threads execute the same <work_func> function, 497 * on different arguments, that are specified as a 2D array of pointers <work_args>. 498 * This can be modified in a future version, where the <work_func> argument can become 499 * a 2D array of pointers, to have one specific function for each thread. 500 ***************************************************************************************** 501 * @ root_level : [in] DQT root level in [0,1,2,3,4]. 502 * @ work_func : [in] pointer on start function. 503 * @ work_args_array : [in] pointer on a 2D array of pointers. 504 * @ parent_barriers_array : [in] pointer on a 1D array of barriers. 505 * @ return 0 if success / return -1 if failure. 506 ****************************************************************************************/ 507 int pthread_parallel_create( unsigned int root_level, 508 void * work_func, 509 void * work_args_array, 510 void * parent_barriers_array ); 511 416 512 #endif /* _LIBALMOSMKH_H_ */ 417 513
Note: See TracChangeset
for help on using the changeset viewer.