Changeset 652 for trunk/user/transpose


Ignore:
Timestamp:
Nov 14, 2019, 3:56:51 PM (4 years ago)
Author:
alain
Message:

Introduce the three placement modes in "transpose", "convol', "fft" applications.

Location:
trunk/user/transpose
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • trunk/user/transpose/transpose.c

    r646 r652  
    55//////////////////////////////////////////////////////////////////////////////////////////
    66// This multi-threaded aplication read a raw image (one byte per pixel)
    7 // stored on disk, transpose it, display the result on the frame buffer,
    8 // and store the transposed image on disk.
    9 // It can run on a multi-cores, multi-clusters architecture, with one thread
     7// stored on disk, transposes it, displays the result on the frame buffer,
     8// and stores the transposed image on disk.
    109//
    11 // per core, and uses the POSIX threads API.
    12 // It uses the mmap() syscall to directly access the input and output files
    13 // and the fbf_write() syscall to display the images.
     10// The image size and the pixel encoding type are defined by the IMAGE_SIZE and
     11// IMAGE_TYPE global parameters.
    1412//
    15 // The main() function can be launched on any core[cxy,l].
    16 // It makes the initialisations, launch (N-1) threads to run the execute() function
    17 // on the (N-1) other cores, calls himself the execute() function, and finally calls
    18 // the instrument() function to display instrumentation results when the parallel
    19 // execution is completed. The placement of threads on the cores can be done
    20 // automatically by the operating system, or can be done explicitely by the main thread
    21 // (when the EXPLICIT_PLACEMENT global parameter is set).
     13// It can run on a multi-cores, multi-clusters architecture, where (X_SIZE * Y_SIZE)
     14// is the number of clusters and NCORES the number of cores per cluster.
     15// A core is identified by two indexes [cxy,lid] : cxy is the cluster identifier,
     16// (that is NOT required to be a continuous index), and lid is the local core index,
     17// (that must be in the [Ø,NCORES-1] range).
    2218//
    23 // The buf_in[x,y] and buf_out[put buffers containing the direct ans transposed images
    24 // are distributed in clusters: In each cluster[cxy], the thread running on core[cxy,0]
    25 // map the buf_in[cxy] and // buf_out[cxy] buffers containing a subset of lines.
    26 // Then, all threads in cluster[xy] read pixels from the local buf_in[cxy] buffer, and
    27 // write the pixels to all remote buf_out[cxy] buffers. Finally, each thread display
    28 // a part of the transposed image to the frame buffer.
     19// The main() function can run on any core in any cluster. This main thread
     20// makes the initialisations, uses the pthread_create() syscall to launch (NTHREADS-1)
     21// other threads in "attached" mode running in parallel the execute() function, calls
     22// himself the execute() function, wait completion of the (NTHREADS-1) other threads
     23// with a pthread_join(), and finally calls the instrument() function to display
     24// and register the instrumentation results when execution is completed.
     25// All threads run the execute() function, but each thread transposes only
     26// (NLINES / NTHREADS) lines. This requires that NLINES == k * NTHREADS.
     27//
     28// The number N of working threads is always defined by the number of cores availables
     29// in the architecture, but this application supports three placement modes.
     30// In all modes, the working threads are identified by the [tid] continuous index
     31// in range [0, NTHREADS-1], and defines how the lines are shared amongst the threads.
     32// This continuous index can always be decomposed in two continuous sub-indexes:
     33// tid == cid * ncores + lid,  where cid is in [0,NCLUSTERS-1] and lid in [0,NCORES-1].
     34//
     35// - NO_PLACEMENT: the main thread is itsef a working thread. The (N_1) other working
     36//   threads are created by the main thread, but the placement is done by the OS, using
     37//   the DQDT for load balancing, and two working threads can be placed on the same core.
     38//   The [cid,lid] are only abstract identifiers, and cannot be associated to a physical
     39//   cluster or a physical core. In this mode, the main thread run on any cluster,
     40//   but has tid = 0 (i.e. cid = 0 & tid = 0).
     41//
     42// - EXPLICIT_PLACEMENT: the main thread is again a working thread, but the placement of
     43//   of the threads on the cores is explicitely controled by the main thread to have
     44//   exactly one working thread per core, and the [cxy][lpid] core coordinates for a given
     45//   thread[tid] can be directly derived from the [tid] value: [cid] is an alias for the
     46//   physical cluster identifier, and [lid] is the local core index.
     47//
     48// - PARALLEL_PLACEMENT: the main thread is not anymore a working thread, and uses the
     49//   non standard pthread_parallel_create() function to avoid the costly sequencial
     50//   loops for pthread_create() and pthread_join(). It garanty one working thread
     51//   per core, and the same relation between the thread[tid] and the core[cxy][lpid].
     52//   
     53// The buf_in[x,y] and buf_out[put buffers containing the direct and transposed images
     54// are distributed in clusters: each thread[cid][0] allocate a local input buffer
     55// and load in this buffer all lines that must be handled by the threads sharing the
     56// same cid, from the mapper of the input image file.
     57// In the execute function, all threads in the group defined by the cid index read pixels
     58// from the local buf_in[cid] buffer, and write pixels to all remote buf_out[cid] buffers.
     59// Finally, each thread displays a part of the transposed image to the frame buffer.
    2960//
    3061// - The image  must fit the frame buffer size, that must be power of 2.
    3162// - The number of clusters  must be a power of 2 no larger than 256.
    3263// - The number of cores per cluster must be a power of 2 no larger than 4.
    33 // - The number of clusters cannot be larger than (IMAGE_SIZE * IMAGE_SIZE) / 4096,
    34 //   because the size of buf_in[x,y] and buf_out[x,y] must be multiple of 4096.
     64// - The number of threads cannot be larger than IMAGE_SIZE.
    3565//
    3666//////////////////////////////////////////////////////////////////////////////////////////
     
    5080#define CORES_MAX             4                            // max number of cores per cluster
    5181#define CLUSTERS_MAX          (X_MAX * Y_MAX)              // max number of clusters
    52 
    53 #define IMAGE_SIZE            256                          // image size
     82#define THREADS_MAX           (X_MAX * Y_MAX * CORES_MAX)  // max number of threads
     83
     84#define IMAGE_SIZE            512                          // image size
    5485#define IMAGE_TYPE            420                          // pixel encoding type
    55 #define INPUT_FILE_PATH       "/misc/lena_256.raw"         // input file pathname
    56 #define OUTPUT_FILE_PATH      "/home/trsp_256.raw"         // output file pathname
    57 
     86#define INPUT_FILE_PATH       "/misc/couple_512.raw"       // input file pathname
     87#define OUTPUT_FILE_PATH      "/misc/transposed_512.raw"   // output file pathname
     88
     89#define SAVE_RESULT_FILE      0                            // save result image on disk
    5890#define USE_DQT_BARRIER       1                            // quad-tree barrier if non zero
    59 #define EXPLICIT_PLACEMENT    1                            // explicit thread placement
    60 #define VERBOSE               1                            // print comments on TTY
     91
     92#define NO_PLACEMENT          0                            // uncontrolefdthread placement
     93#define EXPLICIT_PLACEMENT    0                            // explicit threads placement
     94#define PARALLEL_PLACEMENT    1                            // parallel threads placement
     95
     96#define VERBOSE_MAIN          0                            // main function print comments
     97#define VERBOSE_EXEC          0                            // exec function print comments
     98#define VERBOSE_INSTRU        0                            // instru function print comments
    6199
    62100
     
    65103///////////////////////////////////////////////////////
    66104
    67 // instrumentation counters for each processor in each cluster
    68 unsigned int MMAP_START[CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
    69 unsigned int MMAP_END  [CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
     105// global instrumentation counters for the main thread
     106unsigned int SEQUENCIAL_TIME = 0;
     107unsigned int PARALLEL_TIME   = 0;
     108
     109// instrumentation counters for each thread in each cluster
     110// indexed by [cid][lid] : cluster continuous index / thread local index
     111unsigned int LOAD_START[CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
     112unsigned int LOAD_END  [CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
    70113unsigned int TRSP_START[CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
    71114unsigned int TRSP_END  [CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
     
    73116unsigned int DISP_END  [CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
    74117
    75 // arrays of pointers on distributed buffers
    76 // one input buffer & one output buffer per cluster
    77 unsigned char *  buf_in [CLUSTERS_MAX];
    78 unsigned char *  buf_out[CLUSTERS_MAX];
    79 
    80 // synchronisation barrier (all threads)
     118// pointer on buffer containing the input image, maped by the main to the input file
     119unsigned char *  image_in;
     120
     121// pointer on buffer containing the output image, maped by the main to the output file
     122unsigned char *  image_out;
     123
     124// arrays of pointers on distributed buffers indexed by [cid] : cluster continuous index
     125unsigned char *  buf_in_ptr [CLUSTERS_MAX];
     126unsigned char *  buf_out_ptr[CLUSTERS_MAX];
     127
     128// synchronisation barrier (all working threads)
    81129pthread_barrier_t   barrier;
    82130
    83131// platform parameters
    84 unsigned int  x_size;                       // number of clusters in a row
    85 unsigned int  y_size;                       // number of clusters in a column
    86 unsigned int  ncores;                       // number of processors per cluster
    87 
    88 // cluster identifier & local index of core running the main thread
    89 unsigned int  cxy_main;
    90 unsigned int  lid_main;
    91 
    92 // input & output file descriptors
    93 int  fd_in;
    94 int  fd_out;
    95 
    96 #if EXPLICIT_PLACEMENT
    97 
    98 // thread index allocated by the kernel
    99 pthread_t        trdid[CLUSTERS_MAX][CORES_MAX];   
    100 
    101 // user defined continuous thread index
    102 unsigned int     tid[CLUSTERS_MAX][CORES_MAX];
    103 
    104 // thread attributes only used if explicit placement
    105 pthread_attr_t   attr[CLUSTERS_MAX][CORES_MAX];
    106 
    107 #else
    108 
    109 // thread index allocated by the kernel
    110 pthread_t        trdid[CLUSTERS_MAX * CORES_MAX];   
    111 
    112 // user defined continuous thread index
    113 unsigned int     tid[CLUSTERS_MAX * CORES_MAX];
    114 
    115 #endif
     132unsigned int  x_size;              // number of clusters in a row
     133unsigned int  y_size;              // number of clusters in a column
     134unsigned int  ncores;              // number of cores per cluster
     135
     136// main thread continuous index
     137unsigned int     tid_main;
    116138
    117139//return values at thread exit
     
    119141unsigned int THREAD_EXIT_FAILURE = 1;
    120142
     143// array of kernel thread identifiers / indexed by [tid]
     144pthread_t                     exec_trdid[THREADS_MAX];   
     145
     146// array of execute function arguments / indexed by [tid]
     147pthread_parallel_work_args_t  exec_args[THREADS_MAX];
     148
     149// array of thread attributes / indexed by [tid]
     150pthread_attr_t                exec_attr[THREADS_MAX];
     151
    121152////////////////////////////////////////////////////////////////
    122153//             functions declaration
    123154////////////////////////////////////////////////////////////////
    124155
    125 void execute( unsigned int * ptid );
    126 
    127 void instrument( void );
    128 
    129 ///////////
    130 void main()
     156void execute( pthread_parallel_work_args_t * args );
     157
     158void instrument( FILE * f , char * filename );
     159
     160/////////////////
     161void main( void )
    131162{
    132     unsigned long long date;
     163    unsigned long long start_cycle;
     164    unsigned long long end_sequencial_cycle;
     165    unsigned long long end_parallel_cycle;
     166
     167    char               filename[32];      // instrumentation file name
     168    char               pathname[64];      // instrumentation file pathname
    133169
    134170    int error;
    135171
    136 printf("\n bloup 0\n");
    137 
    138     // get identifiers for core executing main
    139     get_core_id( &cxy_main , &lid_main );
    140 
    141 printf("\n bloup 1\n");
     172    /////////////////////////////////////////////////////////////////////////////////
     173    get_cycle( &start_cycle );
     174    /////////////////////////////////////////////////////////////////////////////////
     175
     176    if( (NO_PLACEMENT + EXPLICIT_PLACEMENT + PARALLEL_PLACEMENT) != 1 )
     177    {
     178        printf("\n[transpose error] illegal placement\n");
     179        exit( 0 );
     180    }
    142181
    143182    // get & check plat-form parameters
    144     get_config( &x_size , &y_size , &ncores );
    145 
    146 printf("\n bloup 2\n");
    147 
    148     if((ncores != 1) && (ncores != 2) && (ncores == 4))
     183    get_config( &x_size,
     184                &y_size,
     185                &ncores );
     186
     187    if((ncores != 1) && (ncores != 2) && (ncores != 4))
    149188    {
    150189        printf("\n[transpose error] number of cores per cluster must be 1/2/4\n");
     
    166205    }
    167206       
    168 printf("\n bloup 3\n");
     207    // main thread get identifiers for core executing main
     208    unsigned int  cxy_main;
     209    unsigned int  lid_main;
     210    get_core_id( &cxy_main , &lid_main );
    169211
    170212    // compute number of threads
     
    172214    unsigned int nthreads  = nclusters * ncores;
    173215
    174 printf("\n bloup 4\n");
    175 
    176     // get FBF ownership and FBF size
     216    // main thread get FBF size and type
    177217    unsigned int   fbf_width;
    178218    unsigned int   fbf_height;
     
    180220    fbf_get_config( &fbf_width , &fbf_height , &fbf_type );
    181221
    182 printf("\n bloup 5\n");
    183 
    184222    if( (fbf_width != IMAGE_SIZE) || (fbf_height != IMAGE_SIZE) || (fbf_type != IMAGE_TYPE) )
    185223    {
     
    188226    }
    189227
    190     get_cycle( &date );
    191     printf("\n[transpose] starts at cycle %d on %d cores / FBF = %d * %d pixels\n",
    192     (unsigned int)date , nthreads , fbf_width , fbf_height );
    193 
    194     // open input file
    195     fd_in = open( INPUT_FILE_PATH , O_RDONLY , 0 );    // read-only
    196     if ( fd_in < 0 )
     228    if( nthreads > IMAGE_SIZE )
     229    {
     230        printf("\n[transpose error] number of threads larger than number of lines\n");
     231        exit( 0 );
     232    }
     233
     234    unsigned int npixels = IMAGE_SIZE * IMAGE_SIZE;
     235
     236    // define instrumentation file name
     237    if( NO_PLACEMENT )
     238    {
     239        printf("\n[transpose] %d cluster(s) / %d core(s) / FBF[%d*%d] / PID %x / NO_PLACE\n",
     240        nclusters, ncores, fbf_width, fbf_height, getpid() );
     241
     242        // build instrumentation file name
     243        if( USE_DQT_BARRIER )
     244        snprintf( filename , 32 , "trsp_dqt_no_place_%d_%d_%d",
     245        IMAGE_SIZE , x_size * y_size , ncores );
     246        else
     247        snprintf( filename , 32 , "trsp_smp_no_place_%d_%d_%d",
     248        IMAGE_SIZE , x_size * y_size , ncores );
     249    }
     250
     251    if( EXPLICIT_PLACEMENT )
     252    {
     253        printf("\n[transpose] %d cluster(s) / %d core(s) / FBF[%d*%d] / PID %x / EXPLICIT\n",
     254        nclusters, ncores, fbf_width, fbf_height, getpid() );
     255
     256        // build instrumentation file name
     257        if( USE_DQT_BARRIER )
     258        snprintf( filename , 32 , "trsp_dqt_explicit_%d_%d_%d",
     259        IMAGE_SIZE , x_size * y_size , ncores );
     260        else
     261        snprintf( filename , 32 , "trsp_smp_explicit_%d_%d_%d",
     262        IMAGE_SIZE , x_size * y_size , ncores );
     263    }
     264
     265    if( PARALLEL_PLACEMENT )
     266    {
     267        printf("\n[transpose] %d cluster(s) / %d core(s) / FBF[%d*%d] / PID %x / PARALLEL\n",
     268        nclusters, ncores, fbf_width, fbf_height, getpid() );
     269
     270        // build instrumentation file name
     271        if( USE_DQT_BARRIER )
     272        snprintf( filename , 32 , "trsp_dqt_parallel_%d_%d_%d",
     273        IMAGE_SIZE , x_size * y_size , ncores );
     274        else
     275        snprintf( filename , 32 , "trsp_smp_parallel_%d_%d_%d",
     276        IMAGE_SIZE , x_size * y_size , ncores );
     277    }
     278
     279    // open instrumentation file
     280    snprintf( pathname , 64 , "/home/%s", filename );
     281    FILE * f = fopen( pathname , NULL );
     282    if ( f == NULL )
    197283    {
    198         printf("\n[transpose error] main cannot open file %s\n", INPUT_FILE_PATH );
    199         exit( 0 );
    200     }
    201 
    202 #if VERBOSE
    203 printf("\n[transpose] main open file %s / fd = %d\n", INPUT_FILE_PATH , fd_in );
    204 #endif
    205 
    206     // open output file
    207     fd_out = open( OUTPUT_FILE_PATH , O_CREAT , 0 );   // create if required
    208     if ( fd_out < 0 )
    209     {
    210         printf("\n[transpose error] main cannot open file %s\n", OUTPUT_FILE_PATH );
    211         exit( 0 );
    212     }
    213 
    214 #if  VERBOSE
    215 printf("\n[transpose] main open file %s / fd = %d\n", OUTPUT_FILE_PATH , fd_out );
    216 #endif
    217 
    218     // initialise barrier
     284        printf("\n[transpose error] cannot open instrumentation file %s\n", pathname );
     285        exit( 0 );
     286    }
     287
     288#if  VERBOSE_MAIN
     289printf("\n[transpose] main on core[%x,%d] open instrumentation file %s\n",
     290cxy_main, lid_main, pathname );
     291#endif
     292
     293    // main thread initializes barrier
    219294    if( USE_DQT_BARRIER )
    220295    {
     
    236311    }
    237312
    238     get_cycle( &date );
    239     printf("\n[transpose] main on core[%x,%d] completes initialisation at cycle %d\n"
    240            "- CLUSTERS     = %d\n"
    241            "- PROCS        = %d\n"
    242            "- THREADS      = %d\n",
    243            cxy_main, lid_main, (unsigned int)date, nclusters, ncores, nthreads );
    244 
    245 //////////////////////
    246 #if EXPLICIT_PLACEMENT
    247 
    248     // main thread launch other threads
    249     unsigned int x;
    250     unsigned int y;
    251     unsigned int l;
    252     unsigned int cxy;
    253     for( x = 0 ; x < x_size ; x++ )
    254     {
    255         for( y = 0 ; y < y_size ; y++ )
     313#if  VERBOSE_MAIN
     314printf("\n[transpose] main on core[%x,%d] completes barrier initialisation\n",
     315cxy_main, lid_main );
     316#endif
     317
     318    // main thread open input file
     319    int fd_in = open( INPUT_FILE_PATH , O_RDONLY , 0 );
     320
     321    if ( fd_in < 0 )
     322    {
     323        printf("\n[transpose error] main cannot open file %s\n", INPUT_FILE_PATH );
     324        exit( 0 );
     325    }
     326
     327#if  VERBOSE_MAIN
     328printf("\n[transpose] main open file <%s> / fd = %d\n", INPUT_FILE_PATH , fd_in );
     329#endif
     330
     331    // main thread map image_in buffer to input image file
     332    image_in = (unsigned char *)mmap( NULL,
     333                                      npixels,
     334                                      PROT_READ,
     335                                      MAP_FILE | MAP_SHARED,
     336                                      fd_in,
     337                                      0 );     // offset
     338    if ( image_in == NULL )
     339    {
     340        printf("\n[transpose error] main cannot map buffer to file %s\n", INPUT_FILE_PATH );
     341        exit( 0 );
     342    }
     343
     344#if  VERBOSE_MAIN
     345printf("\n[transpose] main map buffer to file <%s>\n", INPUT_FILE_PATH );
     346#endif
     347
     348    // main thread display input image on FBF
     349    if( fbf_write( image_in,
     350                   npixels,
     351                   0 ) )
     352    {
     353        printf("\n[transpose error] main cannot access FBF\n");
     354        exit( 0 );
     355    }
     356
     357#if SAVE_RESULT_IMAGE
     358
     359    // main thread open output file
     360    int fd_out = open( OUTPUT_FILE_PATH , O_CREAT , 0 );
     361
     362    if ( fd_out < 0 )
     363    {
     364        printf("\n[transpose error] main cannot open file %s\n", OUTPUT_FILE_PATH );
     365        exit( 0 );
     366    }
     367
     368#if  VERBOSE_MAIN
     369printf("\n[transpose] main open file <%s> / fd = %d\n", OUTPUT_FILE_PATH , fd_out );
     370#endif
     371
     372    // main thread map image_out buffer to output image file
     373    image_out = (unsigned char *)mmap( NULL,
     374                                       npixels,
     375                                       PROT_WRITE,
     376                                       MAP_FILE | MAP_SHARED,
     377                                       fd_out,
     378                                       0 );     // offset
     379    if ( image_out == NULL )
     380    {
     381        printf("\n[transpose error] main cannot map buf_out to file %s\n", OUTPUT_FILE_PATH );
     382        exit( 0 );
     383    }
     384
     385#if  VERBOSE_MAIN
     386printf("\n[transpose] main map buffer to file <%s>\n", OUTPUT_FILE_PATH );
     387#endif
     388
     389#endif  // SAVE_RESULT_IMAGE
     390
     391    /////////////////////////////////////////////////////////////////////////////////////
     392    get_cycle( &end_sequencial_cycle );
     393    SEQUENCIAL_TIME = (unsigned int)(end_sequencial_cycle - start_cycle);
     394    /////////////////////////////////////////////////////////////////////////////////////
     395
     396    //////////////////
     397    if( NO_PLACEMENT )
     398    {
     399        // the tid value for the main thread is always 0
     400        // main thread creates new threads with tid in [1,nthreads-1] 
     401        unsigned int tid;
     402        for ( tid = 0 ; tid < nthreads ; tid++ )
    256403        {
    257             cxy = HAL_CXY_FROM_XY( x , y );
    258             for( l = 0 ; l < ncores ; l++ )
     404            // register tid value in exec_args[tid] array
     405            exec_args[tid].tid = tid;
     406           
     407            // create other threads
     408            if( tid > 0 )
    259409            {
    260                 // no other thread on the core running the main
    261                 if( (cxy != cxy_main) || (l != lid_main) )
     410                if ( pthread_create( &exec_trdid[tid],
     411                                     NULL,                  // no attribute
     412                                     &execute,
     413                                     &exec_args[tid] ) )
    262414                {
    263                     // define thread attributes
    264                     attr[cxy][l].attributes = PT_ATTR_CLUSTER_DEFINED | PT_ATTR_CORE_DEFINED;
    265                     attr[cxy][l].cxy        = cxy;
    266                     attr[cxy][l].lid        = l;
    267 
    268                     tid[cxy][l] = (((x  * y_size) + y) * ncores) + l;
     415                    printf("\n[transpose error] cannot create thread %d\n", tid );
     416                    exit( 0 );
     417                }
     418
     419#if VERBOSE_MAIN
     420printf("\n[transpose] main created thread %d\n", tid );
     421#endif
     422
     423            }
     424            else
     425            {
     426                tid_main = 0;
     427            }
     428        }  // end for tid
     429
     430        // main thread calls itself the execute() function
     431        execute( &exec_args[0] );
     432
     433        // main thread wait other threads completion
     434        for ( tid = 1 ; tid < nthreads ; tid++ )
     435        {
     436            unsigned int * status;
     437
     438            // main wait thread[tid] status
     439            if ( pthread_join( exec_trdid[tid], (void*)(&status)) )
     440            {
     441                printf("\n[transpose error] main cannot join thread %d\n", tid );
     442                exit( 0 );
     443            }
     444       
     445            // check status
     446            if( *status != THREAD_EXIT_SUCCESS )
     447            {
     448                printf("\n[transpose error] thread %x returned failure\n", tid );
     449                exit( 0 );
     450            }
     451
     452#if VERBOSE_MAIN
     453printf("\n[transpose] main successfully joined thread %x\n", tid );
     454#endif
     455       
     456        }  // end for tid
     457
     458    }  // end if no_placement
     459
     460    ////////////////////////
     461    if( EXPLICIT_PLACEMENT )
     462    {
     463        // main thread places each other threads on a specific core[cxy][lid]
     464        // but the actual thread creation is sequencial
     465        unsigned int x;
     466        unsigned int y;
     467        unsigned int l;
     468        unsigned int cxy;                   // cluster identifier
     469        unsigned int tid;                   // thread continuous index
     470
     471        for( x = 0 ; x < x_size ; x++ )
     472        {
     473            for( y = 0 ; y < y_size ; y++ )
     474            {
     475                cxy = HAL_CXY_FROM_XY( x , y );
     476                for( l = 0 ; l < ncores ; l++ )
     477                {
     478                    // compute thread continuous index
     479                    tid = (((x  * y_size) + y) * ncores) + l;
     480
     481                    // register tid value in exec_args[tid] array
     482                    exec_args[tid].tid = tid;
     483
     484                    // no thread created on the core running the main
     485                    if( (cxy != cxy_main) || (l != lid_main) )
     486                    {
     487                        // define thread attributes
     488                        exec_attr[tid].attributes = PT_ATTR_CLUSTER_DEFINED |
     489                                                    PT_ATTR_CORE_DEFINED;
     490                        exec_attr[tid].cxy        = cxy;
     491                        exec_attr[tid].lid        = l;
    269492 
    270                     // create thread on core[cxy,l]
    271                     if (pthread_create( &trdid[cxy][l],   
    272                                         &attr[cxy][l],   
    273                                         &execute,
    274                                         &tid[cxy][l] ) )       
     493                        // create thread[tid] on core[cxy][l]
     494                        if ( pthread_create( &exec_trdid[tid],   
     495                                             &exec_attr[tid],   
     496                                             &execute,
     497                                             &exec_args[tid] ) )       
     498                        {
     499                            printf("\n[transpose error] cannot create thread %d\n", tid );
     500                            exit( 0 );
     501                        }
     502#if VERBOSE_MAIN
     503printf("\n[transpose] main created thread[%d] on core[%x,%d]\n", tid, cxy, l );
     504#endif
     505                    }
     506                    else
    275507                    {
    276                         printf("\n[convol error] created thread %x on core[%x][%d]\n",
    277                         trdid[cxy][l] , cxy , l );
    278                         exit( 0 );
     508                        tid_main = tid;
    279509                    }
    280 #if VERBOSE
    281 printf("\n[transpose] main created thread[%x,%d]\n", cxy, l );
    282 #endif
    283510                }
    284511            }
    285512        }
    286     }   
    287 
    288     // main thread calls itself the execute() function
    289     execute( &tid[cxy_main][lid_main] );
    290 
    291     // main thread wait other threads completion
    292     for( x = 0 ; x < x_size ; x++ )
    293     {
    294         for( y = 0 ; y < y_size ; y++ )
     513
     514        // main thread calls itself the execute() function
     515        execute( &exec_args[tid_main] );
     516
     517        // main thread wait other threads completion
     518        for( tid = 0 ; tid < nthreads ; tid++ )
    295519        {
    296             cxy = HAL_CXY_FROM_XY( x , y );
    297             for( l = 0 ; l < ncores ; l++ )
     520            // no other thread on the core running the main
     521            if( tid != tid_main )
    298522            {
    299                 // no other thread on the core running the main
    300                 if( (cxy != cxy_main) || (l != lid_main) )
     523                unsigned int * status;
     524
     525                // wait thread[tid]
     526                if( pthread_join( exec_trdid[tid] , (void*)(&status) ) )
    301527                {
    302                     unsigned int * status;
    303 
    304                     // wait thread[cxy][l]
    305                     if( pthread_join( trdid[cxy][l] , (void*)(&status) ) )
    306                     {
    307                         printf("\n[transpose error] main cannot join thread[%x,%d]\n", cxy, l );
    308                         exit( 0 );
    309                     }
     528                    printf("\n[transpose error] main cannot join thread %d\n", tid );
     529                    exit( 0 );
     530                }
    310531       
    311                     // check status
    312                     if( *status != THREAD_EXIT_SUCCESS )
    313                     {
    314                         printf("\n[transpose error] thread[%x,%d] returned failure\n", cxy, l );
    315                         exit( 0 );
    316                     }
    317 #if VERBOSE
    318 printf("\n[transpose] main joined thread[%x,%d]\n", cxy, l );
    319 #endif
     532                // check status
     533                if( *status != THREAD_EXIT_SUCCESS )
     534                {
     535                    printf("\n[transpose error] thread %d returned failure\n", tid );
     536                    exit( 0 );
    320537                }
     538#if VERBOSE_MAIN
     539printf("\n[transpose] main joined thread %d on core[%x,%d]\n", tid , cxy , l );
     540#endif
    321541            }
    322542        }
    323     }
    324 
    325 ///////////////////////////////
    326 #else  // no explicit placement
    327 
    328     // main thread launch other threads
    329     unsigned int n;
    330     for ( n = 1 ; n < nthreads ; n++ )
    331     {
    332         tid[n] = n;
    333         if ( pthread_create( &trdid[n],
    334                              NULL,                  // no attribute
    335                              &execute,
    336                              &tid[n] ) )
     543    }  // end if explicit_placement
     544
     545    ////////////////////////
     546    if( PARALLEL_PLACEMENT )
     547    {
     548        // compute covering DQT size an level
     549        unsigned int z          = (x_size > y_size) ? x_size : y_size;
     550        unsigned int root_level = ((z == 1) ? 0 :
     551                                  ((z == 2) ? 1 :
     552                                  ((z == 4) ? 2 :
     553                                  ((z == 8) ? 3 : 4))));
     554
     555        // create & execute the working threads
     556        if( pthread_parallel_create( root_level , &execute ) )
    337557        {
    338             printf("\n[transpose error] cannot create thread %d\n", n );
     558            printf("\n[transpose error] in %s\n", __FUNCTION__ );
    339559            exit( 0 );
    340560        }
    341 
    342 #if VERBOSE
    343 printf("\n[transpose] main created thread %d\n", tid[n] );
    344 #endif
    345 
    346     }
    347 
    348     // main thread calls itself the execute() function
    349     execute( &tid[0] );
    350 
    351     // main thread wait other threads completion
    352     for ( n = 1 ; n < nthreads ; n++ )
    353     {
    354         unsigned int * status;
    355 
    356         // main wait thread[n] status
    357         if ( pthread_join( trdid[n], (void*)(&status)) )
    358         {
    359             printf("\n[transpose error] main cannot join thread %d\n", n );
    360             exit( 0 );
    361         }
    362        
    363         // check status
    364         if( *status != THREAD_EXIT_SUCCESS )
    365         {
    366             printf("\n[transpose error] thread %x returned failure\n", n );
    367             exit( 0 );
    368         }
    369 
    370 #if VERBOSE
    371 printf("\n[transpose] main successfully joined thread %x\n", tid[n] );
    372 #endif
    373        
    374     }
    375 
    376 #endif
    377 
    378     // instrumentation
    379     instrument();
    380 
    381     // close input and output files
     561    }  // end if parallel_placement
     562
     563
     564    /////////////////////////////////////////////////////////////////////////////
     565    get_cycle( &end_parallel_cycle );
     566    PARALLEL_TIME = (unsigned int)(end_parallel_cycle - end_sequencial_cycle);
     567    /////////////////////////////////////////////////////////////////////////////
     568
     569    // main thread register instrumentation results
     570    instrument( f , filename );
     571
     572    // main thread close input file
    382573    close( fd_in );
     574
     575#if SAVE_RESULT_IMAGE
     576
     577    // main thread close output file
    383578    close( fd_out );
    384579
    385     // suicide
     580#endif
     581
     582    // main close instrumentation file
     583    fclose( f );
     584
     585    // main thread suicide
    386586    exit( 0 );
    387587   
     
    390590
    391591
    392 ///////////////////////////////////
    393 void execute( unsigned int * ptid )
     592
     593///////////////////////////////////////////////////
     594void execute( pthread_parallel_work_args_t * args )
    394595{
    395596    unsigned long long   date;
    396597 
    397     unsigned int l;                         // line index for loops
    398     unsigned int p;                         // pixel index for loops
    399 
    400     // get thread continuous index
    401     unsigned int my_tid = *ptid;
     598    unsigned int l;                         // line index for loop
     599    unsigned int p;                         // pixel index for loop
     600
     601    // WARNING
     602    //A thread is identified by the tid index, defined in the "args" structure.
     603    // This index being in range [0,nclusters*ncores-1] we can always write
     604    //       tid == cid * ncores + lid
     605    // with cid in [0,nclusters-1] and lid in [0,ncores-1].
     606    // if NO_PLACEMENT, there is no relation between these
     607    // thread [cid][lid] indexes, and the core coordinates [cxy][lpid]
     608
     609    // get thread abstract identifiers
     610    unsigned int tid = args->tid;
     611    unsigned int cid = tid / ncores;   
     612    unsigned int lid = tid % ncores;
     613
     614#if VERBOSE_EXEC
     615unsigned int cxy;
     616unsigned int lpid;
     617get_core_id( &cxy , &lpid );   // get core physical identifiers
     618printf("\n[transpose] exec[%d] on core[%x,%d] enters parallel exec\n",
     619tid , cxy , lpid );
     620#endif
     621
     622    get_cycle( &date );
     623    LOAD_START[cid][lid] = (unsigned int)date;
    402624
    403625    // build total number of pixels per image
    404626    unsigned int npixels = IMAGE_SIZE * IMAGE_SIZE;     
    405627
    406     // nuild total number of threads and clusters
    407     unsigned int nthreads  = x_size * y_size * ncores;
     628    // build total number of threads and clusters
    408629    unsigned int nclusters = x_size * y_size;
    409 
    410     // get cluster continuous index and core index from tid
    411     // we use (tid == cid * ncores + lid)
    412     unsigned int cid = my_tid / ncores;     // continuous index   
    413     unsigned int lid = my_tid % ncores;     // core local index
    414 
    415     // get cluster identifier from cid
    416     // we use (cid == x * y_size + y)
    417     unsigned int x   = cid / y_size;        // X cluster coordinate
    418     unsigned int y   = cid % y_size;        // Y cluster coordinate
    419     unsigned int cxy = HAL_CXY_FROM_XY(x,y);
    420    
    421 #if VERBOSE
    422 printf("\n[transpose] thread[%d] start on core[%x,%d]\n", my_tid , cxy , lid );
    423 #endif
    424 
    425     // In each cluster cxy,  thread[cxy,0] map input file
    426     // to buf_in[cxy] and map output file to buf_in[cxy]
    427 
    428     get_cycle( &date );
    429     MMAP_START[cxy][lid] = (unsigned int)date;
    430 
    431     if ( lid == 0 )
    432     {
    433         unsigned int length = npixels / nclusters;
    434         unsigned int offset = length * cid;
    435        
    436         // map buf_in
    437         buf_in[cid] =  mmap( NULL,
    438                              length,
    439                              PROT_READ,
    440                              MAP_SHARED,
    441                              fd_in,
    442                              offset );
    443 
    444         if ( buf_in[cid] == NULL )
     630    unsigned int nthreads  = nclusters * ncores;
     631
     632    unsigned int buf_size = npixels / nclusters;     // number of bytes in buf_in & buf_out
     633    unsigned int offset   = cid * buf_size;       // offset in file (bytes)
     634
     635    unsigned char  * buf_in = NULL;        // private pointer on local input buffer
     636    unsigned char  * buf_out = NULL;       // private pointer on local output buffer
     637
     638    // Each thread[cid,0] allocate a local buffer buf_in, and register
     639    // the base adress in the global variable buf_in_ptr[cid]
     640    // this local buffer is shared by all threads with the same cid
     641    if( lid == 0 )
     642    {
     643        // allocate buf_in
     644        buf_in = (unsigned char *)malloc( buf_size );
     645
     646        if( buf_in == NULL )
    445647        {
    446             printf("\n[transpose error] thread[%x,%d] cannot map input file\n", cxy, lid);
     648            printf("\n[transpose error] thread[%d] cannot allocate buf_in\n", tid );
    447649            pthread_exit( &THREAD_EXIT_FAILURE );
    448650        }
    449                  
    450 #if VERBOSE
    451 printf("\n[transpose] thread[%x,%d] map input file / length %x / offset %x / buf_in %x\n",
    452 cxy, lid, length, offset, buf_in[cid] );
    453 #endif
    454 
    455         // map buf_out           
    456         buf_out[cid] = mmap( NULL,
    457                              length,
    458                              PROT_WRITE,
    459                              MAP_SHARED,
    460                              fd_out,
    461                              offset );
    462 
    463         if ( buf_out[cid] == NULL )
     651
     652        // register buf_in buffer in global array of pointers
     653        buf_in_ptr[cid] = buf_in;
     654
     655#if VERBOSE_EXEC
     656printf("\n[transpose] exec[%d] on core[%x,%d] allocated buf_in = %x\n",
     657tid , cxy , lpid , buf_in );
     658#endif
     659
     660    }
     661
     662    // Each thread[cid,0] copy relevant part of the image_in to buf_in
     663    if( lid == 0 )
     664    {
     665        memcpy( buf_in,
     666                image_in + offset,
     667                buf_size );
     668    }
     669
     670#if VERBOSE_EXEC
     671printf("\n[transpose] exec[%d] on core[%x,%d] loaded buf_in[%d]\n",
     672tid , cxy , lpid , cid );
     673#endif
     674
     675    // Each thread[cid,0] allocate a local buffer buf_out, and register
     676    // the base adress in the global variable buf_out_ptr[cid]
     677    if( lid == 0 )
     678    {
     679        // allocate buf_out
     680        buf_out = (unsigned char *)malloc( buf_size );
     681
     682        if( buf_out == NULL )
    464683        {
    465             printf("\n[transpose error] thread[%x,%d] cannot map output file\n", cxy, lid);
     684            printf("\n[transpose error] thread[%d] cannot allocate buf_in\n", tid );
    466685            pthread_exit( &THREAD_EXIT_FAILURE );
    467686        }
    468                    
    469 #if VERBOSE
    470 printf("\n[transpose] thread[%x,%d] map output file / length %x / offset %x / buf_out %x\n",
    471 cxy, lid, length, offset, buf_out[cid] );
    472 #endif
    473 
    474     }
    475 
     687
     688        // register buf_in buffer in global array of pointers
     689        buf_out_ptr[cid] = buf_out;
     690
     691#if VERBOSE_EXEC
     692printf("\n[transpose] exec[%d] on core[%x,%d] allocated buf_out = %x\n",
     693tid , cxy , lpid , buf_out );
     694#endif
     695
     696    }
     697   
    476698    get_cycle( &date );
    477     MMAP_END[cxy][lid] = (unsigned int)date;
     699    LOAD_END[cid][lid] = (unsigned int)date;
    478700
    479701    /////////////////////////////////
    480702    pthread_barrier_wait( &barrier );
    481703
    482     // parallel transpose from buf_in to buf_out
    483     // each thread makes the transposition for nlt lines (nlt = IMAGE_SIZE/nthreads)
     704    get_cycle( &date );
     705    TRSP_START[cid][lid] = (unsigned int)date;
     706
     707    // All threads contribute to parallel transpose from buf_in to buf_out
     708    // each thread makes the transposition for nlt lines (nlt = npixels/nthreads)
    484709    // from line [tid*nlt] to line [(tid + 1)*nlt - 1]
    485710    // (p,l) are the absolute pixel coordinates in the source image
     711    // (l,p) are the absolute pixel coordinates in the source image
     712    // (p,l) are the absolute pixel coordinates in the dest image
    486713
    487714    get_cycle( &date );
    488     TRSP_START[cxy][lid] = (unsigned int)date;
     715    TRSP_START[cid][lid] = (unsigned int)date;
    489716
    490717    unsigned int nlt   = IMAGE_SIZE / nthreads;    // number of lines per thread
    491718    unsigned int nlc   = IMAGE_SIZE / nclusters;   // number of lines per cluster
    492719
    493     unsigned int src_cluster;
     720    unsigned int src_cid;
    494721    unsigned int src_index;
    495     unsigned int dst_cluster;
     722    unsigned int dst_cid;
    496723    unsigned int dst_index;
    497724
    498725    unsigned char byte;
    499726
    500     unsigned int first = my_tid * nlt;     // first line index for a given thread
     727    unsigned int first = tid * nlt;     // first line index for a given thread
    501728    unsigned int last  = first + nlt;      // last line index for a given thread
    502729
     730    // loop on lines handled by this thread
    503731    for ( l = first ; l < last ; l++ )
    504732    {
    505         // in each iteration we transfer one byte
     733        // loop on pixels in one line (one pixel per iteration)
    506734        for ( p = 0 ; p < IMAGE_SIZE ; p++ )
    507735        {
    508736            // read one byte from local buf_in
    509             src_cluster = l / nlc;
    510             src_index   = (l % nlc) * IMAGE_SIZE + p;
    511             byte        = buf_in[src_cluster][src_index];
     737            src_cid   = l / nlc;
     738            src_index = (l % nlc) * IMAGE_SIZE + p;
     739
     740            byte        = buf_in_ptr[src_cid][src_index];
    512741
    513742            // write one byte to remote buf_out
    514             dst_cluster = p / nlc;
    515             dst_index   = (p % nlc) * IMAGE_SIZE + l;
    516 
    517             buf_out[dst_cluster][dst_index] = byte;
     743            dst_cid  = p / nlc;
     744            dst_index = (p % nlc) * IMAGE_SIZE + l;
     745
     746            buf_out_ptr[dst_cid][dst_index] = byte;
    518747        }
    519748    }
    520749
    521 #if VERBOSE
    522 printf("\n[transpose] thread[%x,%d] completes transposed\n", cxy, lid );
     750#if VERBOSE_EXEC
     751printf("\n[transpose] exec[%d] on core[%x,%d] completes transpose\n",
     752tid , cxy , lpid );
    523753#endif
    524754
    525755    get_cycle( &date );
    526     TRSP_END[cxy][lid] = (unsigned int)date;
     756    TRSP_END[cid][lid] = (unsigned int)date;
    527757
    528758    /////////////////////////////////
    529759    pthread_barrier_wait( &barrier );
    530760
    531     // parallel display from local buf_out to frame buffer
    532     // all threads contribute to display
    533 
    534761    get_cycle( &date );
    535     DISP_START[cxy][lid] = (unsigned int)date;
    536 
     762    DISP_START[cid][lid] = (unsigned int)date;
     763
     764    // All threads contribute to parallel display
     765    // from local buf_out to frame buffer
    537766    unsigned int  npt   = npixels / nthreads;   // number of pixels per thread
    538767
    539     if( fbf_write( &buf_out[cid][lid * npt],
     768    if( fbf_write( &buf_out_ptr[cid][lid * npt],
    540769                   npt,
    541                    npt * my_tid ) )
    542     {
    543         printf("\n[transpose error] thread[%x,%d] cannot access FBF\n", cxy, lid );
     770                   npt * tid ) )
     771    {
     772        printf("\n[transpose error] thread[%d] cannot access FBF\n", tid );
    544773        pthread_exit( &THREAD_EXIT_FAILURE );
    545774    }
    546775
    547 #if VERBOSE
    548 printf("\n[transpose] thread[%x,%d] completes display\n", cxy, lid );
     776#if VERBOSE_EXEC
     777printf("\n[transpose] exec[%d] on core [%x,%d] completes display\n",
     778tid, cxy , lpid );
    549779#endif
    550780
    551781    get_cycle( &date );
    552     DISP_END[cxy][lid] = (unsigned int)date;
     782    DISP_END[cid][lid] = (unsigned int)date;
    553783
    554784    /////////////////////////////////
    555785    pthread_barrier_wait( &barrier );
    556786
    557     // all threads, but thread[0,0,0], suicide
    558     if ( (cxy != cxy_main) || (lid !=  lid_main) )
    559     {
     787#if SAVE_RESULT_IMAGE
     788
     789    // Each thread[cid,0] copy buf_out to relevant part of image_out
     790    if( lid == 0 )
     791    {
     792        memcpy( image_out + offset,
     793                buf_out,
     794                buf_size );
     795    }
     796
     797#if VERBOSE_EXEC
     798printf("\n[transpose] exec[%d] on core[%x,%d] saved buf_out[%d]\n",
     799tid , cxy , lpid , cid );
     800#endif
     801
     802#endif
     803
     804    // Each thread[cid,0] releases local buffer buf_out
     805    if( lid == 0 )
     806    {
     807        // release buf_out
     808        free( buf_in );
     809        free( buf_out );
     810    }
     811   
     812    // thread termination depends on the placement policy
     813    if( PARALLEL_PLACEMENT )   
     814    {
     815        // <work> threads are runing in detached mode
     816        // each thread must signal completion by calling barrier
     817        // passed in arguments before exit
     818
     819        pthread_barrier_wait( args->barrier );
     820
    560821        pthread_exit( &THREAD_EXIT_SUCCESS );
    561822    }
     823    else
     824    {
     825        // <work> threads are running in attached mode
     826        // each thread, but de main, simply exit
     827        if ( tid != tid_main )  pthread_exit( &THREAD_EXIT_SUCCESS );
     828    }
    562829
    563830} // end execute()
     
    565832
    566833
    567 ///////////////////////
    568 void instrument( void )
     834///////////////////////////
     835void instrument( FILE * f,
     836                 char * filename )
    569837{
    570838    unsigned int x, y, l;
     839
     840#if VERBOSE_EXEC
     841printf("\n[transpose] main enters instrument\n" );
     842#endif
    571843
    572844    unsigned int min_load_start = 0xFFFFFFFF;
     
    583855    unsigned int max_disp_ended = 0;
    584856 
    585     char string[64];
    586 
    587     snprintf( string , 64 , "/home/transpose_%d_%d_%d" , x_size , y_size , ncores );
    588 
    589     // open instrumentation file
    590     FILE * f = fopen( string , NULL );
    591     if ( f == NULL )
    592     {
    593         printf("\n[transpose error] cannot open instrumentation file %s\n", string );
    594         exit( 0 );
    595     }
    596 
    597857    for (x = 0; x < x_size; x++)
    598858    {
    599859        for (y = 0; y < y_size; y++)
    600860        {
    601             unsigned int cxy = HAL_CXY_FROM_XY( x , y );
     861            unsigned int cid = y_size * x + y;
    602862
    603863            for ( l = 0 ; l < ncores ; l++ )
    604864            {
    605                 if (MMAP_START[cxy][l] < min_load_start)  min_load_start = MMAP_START[cxy][l];
    606                 if (MMAP_START[cxy][l] > max_load_start)  max_load_start = MMAP_START[cxy][l];
    607                 if (MMAP_END[cxy][l]   < min_load_ended)  min_load_ended = MMAP_END[cxy][l];
    608                 if (MMAP_END[cxy][l]   > max_load_ended)  max_load_ended = MMAP_END[cxy][l];
    609                 if (TRSP_START[cxy][l] < min_trsp_start)  min_trsp_start = TRSP_START[cxy][l];
    610                 if (TRSP_START[cxy][l] > max_trsp_start)  max_trsp_start = TRSP_START[cxy][l];
    611                 if (TRSP_END[cxy][l]   < min_trsp_ended)  min_trsp_ended = TRSP_END[cxy][l];
    612                 if (TRSP_END[cxy][l]   > max_trsp_ended)  max_trsp_ended = TRSP_END[cxy][l];
    613                 if (DISP_START[cxy][l] < min_disp_start)  min_disp_start = DISP_START[cxy][l];
    614                 if (DISP_START[cxy][l] > max_disp_start)  max_disp_start = DISP_START[cxy][l];
    615                 if (DISP_END[cxy][l]   < min_disp_ended)  min_disp_ended = DISP_END[cxy][l];
    616                 if (DISP_END[cxy][l]   > max_disp_ended)  max_disp_ended = DISP_END[cxy][l];
     865                if (LOAD_START[cid][l] < min_load_start)  min_load_start = LOAD_START[cid][l];
     866                if (LOAD_START[cid][l] > max_load_start)  max_load_start = LOAD_START[cid][l];
     867                if (LOAD_END[cid][l]   < min_load_ended)  min_load_ended = LOAD_END[cid][l];
     868                if (LOAD_END[cid][l]   > max_load_ended)  max_load_ended = LOAD_END[cid][l];
     869                if (TRSP_START[cid][l] < min_trsp_start)  min_trsp_start = TRSP_START[cid][l];
     870                if (TRSP_START[cid][l] > max_trsp_start)  max_trsp_start = TRSP_START[cid][l];
     871                if (TRSP_END[cid][l]   < min_trsp_ended)  min_trsp_ended = TRSP_END[cid][l];
     872                if (TRSP_END[cid][l]   > max_trsp_ended)  max_trsp_ended = TRSP_END[cid][l];
     873                if (DISP_START[cid][l] < min_disp_start)  min_disp_start = DISP_START[cid][l];
     874                if (DISP_START[cid][l] > max_disp_start)  max_disp_start = DISP_START[cid][l];
     875                if (DISP_END[cid][l]   < min_disp_ended)  min_disp_ended = DISP_END[cid][l];
     876                if (DISP_END[cid][l]   > max_disp_ended)  max_disp_ended = DISP_END[cid][l];
    617877            }
    618878        }
    619879    }
    620880
    621     printf( "\n ------ %s ------\n" , string );
    622     fprintf( f , "\n ------ %s ------\n" , string );
    623 
    624     printf( " - MMAP_START : min = %d / max = %d / med = %d / delta = %d\n",
    625            min_load_start, max_load_start, (min_load_start+max_load_start)/2,
    626            max_load_start-min_load_start );
    627 
    628     fprintf( f , " - MMAP_START : min = %d / max = %d / med = %d / delta = %d\n",
    629            min_load_start, max_load_start, (min_load_start+max_load_start)/2,
    630            max_load_start-min_load_start );
    631 
    632     printf( " - MMAP_END   : min = %d / max = %d / med = %d / delta = %d\n",
    633            min_load_ended, max_load_ended, (min_load_ended+max_load_ended)/2,
    634            max_load_ended-min_load_ended );
    635 
    636     fprintf( f , " - MMAP_END   : min = %d / max = %d / med = %d / delta = %d\n",
    637            min_load_ended, max_load_ended, (min_load_ended+max_load_ended)/2,
    638            max_load_ended-min_load_ended );
    639 
    640     printf( " - TRSP_START : min = %d / max = %d / med = %d / delta = %d\n",
    641            min_trsp_start, max_trsp_start, (min_trsp_start+max_trsp_start)/2,
    642            max_trsp_start-min_trsp_start );
    643 
    644     fprintf( f , " - TRSP_START : min = %d / max = %d / med = %d / delta = %d\n",
    645            min_trsp_start, max_trsp_start, (min_trsp_start+max_trsp_start)/2,
    646            max_trsp_start-min_trsp_start );
    647 
    648     printf( " - TRSP_END   : min = %d / max = %d / med = %d / delta = %d\n",
    649            min_trsp_ended, max_trsp_ended, (min_trsp_ended+max_trsp_ended)/2,
    650            max_trsp_ended-min_trsp_ended );
    651 
    652     fprintf( f , " - TRSP_END   : min = %d / max = %d / med = %d / delta = %d\n",
    653            min_trsp_ended, max_trsp_ended, (min_trsp_ended+max_trsp_ended)/2,
    654            max_trsp_ended-min_trsp_ended );
    655 
    656     printf( " - DISP_START : min = %d / max = %d / med = %d / delta = %d\n",
    657            min_disp_start, max_disp_start, (min_disp_start+max_disp_start)/2,
    658            max_disp_start-min_disp_start );
    659 
    660     fprintf( f , " - DISP_START : min = %d / max = %d / med = %d / delta = %d\n",
    661            min_disp_start, max_disp_start, (min_disp_start+max_disp_start)/2,
    662            max_disp_start-min_disp_start );
    663 
    664     printf( " - DISP_END   : min = %d / max = %d / med = %d / delta = %d\n",
    665            min_disp_ended, max_disp_ended, (min_disp_ended+max_disp_ended)/2,
    666            max_disp_ended-min_disp_ended );
    667 
    668     fprintf( f , " - DISP_END   : min = %d / max = %d / med = %d / delta = %d\n",
    669            min_disp_ended, max_disp_ended, (min_disp_ended+max_disp_ended)/2,
    670            max_disp_ended-min_disp_ended );
    671 
    672     fclose( f );
     881    printf( "\n ------ %s ------\n" , filename );
     882    fprintf( f , "\n ------ %s ------\n" , filename );
     883
     884    printf( " - LOAD_START : min = %d / max = %d / delta = %d\n",
     885           min_load_start, max_load_start, max_load_start-min_load_start );
     886    fprintf( f , " - LOAD_START : min = %d / max = %d / delta = %d\n",
     887           min_load_start, max_load_start, max_load_start-min_load_start );
     888
     889    printf( " - LOAD_END   : min = %d / max = %d / delta = %d\n",
     890           min_load_ended, max_load_ended, max_load_ended-min_load_ended );
     891    fprintf( f , " - LOAD_END   : min = %d / max = %d / delta = %d\n",
     892           min_load_ended, max_load_ended, max_load_ended-min_load_ended );
     893
     894    printf( " - TRSP_START : min = %d / max = %d / delta = %d\n",
     895           min_trsp_start, max_trsp_start, max_trsp_start-min_trsp_start );
     896    fprintf( f , " - TRSP_START : min = %d / max = %d / delta = %d\n",
     897           min_trsp_start, max_trsp_start, max_trsp_start-min_trsp_start );
     898
     899    printf( " - TRSP_END   : min = %d / max = %d / delta = %d\n",
     900           min_trsp_ended, max_trsp_ended, max_trsp_ended-min_trsp_ended );
     901    fprintf( f , " - TRSP_END   : min = %d / max = %d / delta = %d\n",
     902           min_trsp_ended, max_trsp_ended, max_trsp_ended-min_trsp_ended );
     903
     904    printf( " - DISP_START : min = %d / max = %d / delta = %d\n",
     905           min_disp_start, max_disp_start, max_disp_start-min_disp_start );
     906    fprintf( f , " - DISP_START : min = %d / max = %d / delta = %d\n",
     907           min_disp_start, max_disp_start, max_disp_start-min_disp_start );
     908
     909    printf( " - DISP_END   : min = %d / max = %d / delta = %d\n",
     910           min_disp_ended, max_disp_ended, max_disp_ended-min_disp_ended );
     911    fprintf( f , " - DISP_END   : min = %d / max = %d / delta = %d\n",
     912           min_disp_ended, max_disp_ended, max_disp_ended-min_disp_ended );
     913
     914    printf( "\n   Sequencial = %d / Parallel = %d\n", SEQUENCIAL_TIME, PARALLEL_TIME );
     915    fprintf( f , "\n   Sequencial = %d / Parallel = %d\n", SEQUENCIAL_TIME, PARALLEL_TIME );
    673916
    674917}  // end instrument()
  • trunk/user/transpose/transpose.ld

    r646 r652  
    1 /****************************************************************************
     1/***************************************************************************
    22* Definition of the base address for all virtual segments
    3 *****************************************************************************/
     3***************************************************************************/
    44
    55seg_code_base      = 0x400000;
     6
     7/***************************************************************************
     8* Define code entry point (e_entry field in .elf file)
     9***************************************************************************/
     10
     11ENTRY( main )
    612
    713/***************************************************************************
Note: See TracChangeset for help on using the changeset viewer.