Changeset 676 for trunk/user/convol


Ignore:
Timestamp:
Nov 20, 2020, 12:11:35 AM (3 years ago)
Author:
alain
Message:

Introduce chat application to test the named pipes.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/user/convol/convol.c

    r659 r676  
    88// per core, and uses the POSIX threads API.
    99//
    10 // The main() function can be launched on any processor P[x,y,l].
    11 // It makes the initialisations, launch (N-1) threads to run the execute() function
    12 // on the (N-1) other processors than P[x,y,l], call himself the execute() function,
    13 // and finally call the instrument() function to display instrumentation results
    14 // when the parallel execution is completed.
     10// The input image is read from a file and the output image is saved to another file.
     11//
     12// - number of clusters containing processors must be power of 2 no larger than 256.
     13// - number of processors per cluster must be power of 2 no larger than 4.
     14// - number of working threads is the number of cores availables in the hardware
     15//   architecture : nthreads = nclusters * ncores.
    1516//
    1617// The convolution kernel is defined in the execute() function.
    1718// It can be factored in two independant line and column convolution products.
    18 // The five buffers containing the image are distributed in clusters.
    19 // For the philips image, it is a [201]*[35] pixels rectangle, and the.
    20 //
    21 // The (1024 * 1024) pixels image is read from a file (2 bytes per pixel).
    2219//
    23 // - number of clusters containing processors must be power of 2 no larger than 256.
    24 // - number of processors per cluster must be power of 2 no larger than 4.
     20// The main() function can be launched on any processor.
     21// - It checks software requirements versus the hardware resources.
     22// - It open & maps the input file to a global <image_in> buffer.
     23// - it open & maps the output file to another global <image_out> buffer.
     24// - it open the instrumentation file.
     25// - it creates & activates two FBF windows to display input & output images.
     26// - it launches other threads to run in parallel the execute() function.
     27// - it saves the instrumentation results on disk.
     28// - it closes the input, output, & instrumentation files.
     29// - it deletes the FBF input & output windows.
    2530//
    26 // The number N of working threads is always defined by the number of cores availables
    27 // in the architecture, but this application supports three placement modes.
     31// The execute() function is executed in parallel by all threads. These threads are
     32// working on 5 arrays of distributed buffers, indexed by the cluster index [cid].
     33// - A[cid]: contain the distributed initial image (NL/NCLUSTERS lines per cluster).
     34// - B[cid]: is the result of horizontal filter, then transpose B <= Trsp(HF(A)
     35// - C[cid]: is the result of vertical image, then transpose : c <= Trsp(VF(B)
     36// - D[cid]: is the the difference between A and FH(A) : D <= A - FH(A)
     37// - Z[cid]: contain the distributed final image Z <= C + D
     38//
     39// It can be split in four phases separated by synchronisation barriers:
     40// 1. Initialisation:
     41//    Allocates the 5 A[cid],B[cid],C[cid],D[cid],Z[cid] buffers, initialise A[cid]
     42//    from the <image_in> buffer, and display the initial image on FBF if rquired.
     43// 2. Horizontal Filter:
     44//    Set B[cid] and D[cid] from A[cid]. Read data accesses are local, write data
     45//    accesses are remote, to implement the transpose.
     46// 3. Vertical Filter: 
     47//    Set C[cid] from B[cid]. Read data accesses are local, write data accesses
     48//    are remote, to implement the transpose.
     49// 4. Save results:
     50//    Set the Z[cid] from C[cid] and D[cid]. All read and write access are local.
     51//    Move the final image (Z[cid] buffer) to the <image_out> buffer.   
     52//
     53// This application supports three placement modes, implemented in the main() function.
    2854// In all modes, the working threads are identified by the [tid] continuous index
    2955// in range [0, NTHREADS-1], and defines how the lines are shared amongst the threads.
    3056// This continuous index can always be decomposed in two continuous sub-indexes:
    31 // tid == cid * ncores + lid,  where cid is in [0,NCLUSTERS-1] and lid in [0,NCORES-1].
     57// tid == cid * NCORES + lid,  where cid is in [0,NCLUSTERS-1] and lid in [0,NCORES-1].
    3258//
    3359// - NO_PLACEMENT: the main thread is itsef a working thread. The (N_1) other working
     
    3864//   but has tid = 0 (i.e. cid = 0 & tid = 0).
    3965//
    40 // - EXPLICIT_PLACEMENT: the main thread is again a working thread, but the placement of
     66// - EXPLICIT_PLACEMENT: the main thread is again a working thread, but the placement
    4167//   of the threads on the cores is explicitely controled by the main thread to have
    4268//   exactly one working thread per core, and the [cxy][lpid] core coordinates for a given
     
    4672// - PARALLEL_PLACEMENT: the main thread is not anymore a working thread, and uses the
    4773//   non standard pthread_parallel_create() function to avoid the costly sequencial
    48 //   loops for pthread_create() and pthread_join(). It garanty one working thread
     74//   loops for pthread_create() and pthread_join(). It garanties one working thread
    4975//   per core, and the same relation between the thread[tid] and the core[cxy][lpid].
    5076//
     
    6591
    6692#define VERBOSE_MAIN               1
    67 #define VERBOSE_EXEC               0
     93#define VERBOSE_EXEC               1
    6894#define SUPER_VERBOSE              0
    6995
     
    74100#define THREADS_MAX                (X_MAX * Y_MAX * CORES_MAX)
    75101
    76 #define IMAGE_IN_PATH              "misc/philips_1024_2.raw"
    77 #define IMAGE_IN_PIXEL_SIZE        2                               // 2 bytes per pixel
    78 
    79 #define IMAGE_OUT_PATH             "misc/philips_after_1O24.raw"
    80 #define IMAGE_OUT_PIXEL_SIZE       1                               // 1 bytes per pixel
    81 
    82 #define FBF_TYPE                   420
    83 #define NL                         1024
    84 #define NP                         1024
    85 #define NB_PIXELS                  (NP * NL)
     102#define IMAGE_TYPE                 420                         // pixel encoding type
     103#define INPUT_IMAGE_PATH           "misc/couple_512.raw"       // default image_in
     104#define OUTPUT_IMAGE_PATH          "misc/couple_conv_512.raw"  // default image_out
     105#define NL                         512                         // default nlines
     106#define NP                         512                         // default npixels
    86107
    87108#define NO_PLACEMENT               0
     
    89110#define PARALLEL_PLACEMENT         1
    90111
     112#define INTERACTIVE_MODE           0
    91113#define USE_DQT_BARRIER            1
    92114#define INITIAL_DISPLAY_ENABLE     1
     
    116138unsigned int V_BEG[CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
    117139unsigned int V_END[CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
    118 unsigned int D_BEG[CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
    119 unsigned int D_END[CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
     140unsigned int F_BEG[CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
     141unsigned int F_END[CLUSTERS_MAX][CORES_MAX] = {{ 0 }};
    120142
    121143// pointer on buffer containing the input image, maped by the main to the input file
     
    128150unsigned int THREAD_EXIT_SUCCESS = 0;
    129151unsigned int THREAD_EXIT_FAILURE = 1;
     152
     153// pointer and identifier for FBF windows
     154void   *  in_win_buf;
     155int       in_wid;
     156void   *  out_win_buf;
     157int       out_wid;
    130158
    131159// synchronization barrier
     
    137165unsigned int  ncores;              // number of processors per cluster
    138166
     167// main thread continuous index
     168unsigned int     tid_main;
     169
    139170// arrays of pointers on distributed buffers in all clusters
    140 unsigned short * GA[CLUSTERS_MAX];
     171unsigned char * GA[CLUSTERS_MAX];
    141172int            * GB[CLUSTERS_MAX];
    142173int            * GC[CLUSTERS_MAX];
     
    153184pthread_parallel_work_args_t exec_args[THREADS_MAX];
    154185
    155 // main thread continuous index
    156 unsigned int     tid_main;
     186// image features
     187unsigned int   image_nl;
     188unsigned int   image_np;
     189char           input_image_path[128];
     190char           output_image_path[128];
    157191
    158192/////////////////////////////////////////////////////////////////////////////////////
     
    166200/////////////////
    167201void main( void )
     202/////////////////
    168203{
    169204    unsigned long long start_cycle;
     
    222257    unsigned int nthreads  = nclusters * ncores;
    223258
     259    // get input and output images pathnames and size
     260    if( INTERACTIVE_MODE )
     261    {
     262        // get image size
     263        printf("\n[convol] image nlines      : ");
     264        get_uint32( &image_nl );
     265
     266        printf("\n[convol] image npixels     : ");
     267        get_uint32( &image_np );
     268
     269        printf("\n[convol] input image path  : ");
     270        get_string( input_image_path , 128 );
     271
     272        printf("[convol] output image path : ");
     273        get_string( output_image_path , 128 );
     274    }
     275    else
     276    {
     277        image_nl = NL;
     278        image_np = NP;
     279        strcpy( input_image_path  , INPUT_IMAGE_PATH );
     280        strcpy( output_image_path , OUTPUT_IMAGE_PATH );
     281    }
     282
    224283    // main thread get FBF size and type
    225     unsigned int   fbf_width;
    226     unsigned int   fbf_height;
    227     unsigned int   fbf_type;
     284    int   fbf_width;
     285    int   fbf_height;
     286    int   fbf_type;
    228287    fbf_get_config( &fbf_width , &fbf_height , &fbf_type );
    229288
    230     if( (fbf_width != NP) || (fbf_height != NL) || (fbf_type != FBF_TYPE) )
    231     {
    232         printf("\n[convol error] image does not fit FBF size or type\n");
    233         exit( 0 );
    234     }
    235 
    236     if( nthreads > NL )
    237     {
    238         printf("\n[convol error] number of threads larger than number of lines\n");
     289    if( ((unsigned int)fbf_width  < image_np) ||
     290        ((unsigned int)fbf_height < image_nl) ||
     291        (fbf_type != IMAGE_TYPE) )
     292    {
     293        printf("\n[convol error] image not acceptable\n"
     294               "FBF width  = %d / npixels  = %d\n"
     295               "FBF height = %d / nlines   = %d\n"
     296               "FBF type   = %d / expected = %d\n",
     297               fbf_width, image_np, fbf_height, image_nl, fbf_type, IMAGE_TYPE );
     298        exit( 0 );
     299    }
     300
     301    if( nthreads > image_nl )
     302    {
     303        printf("\n[convol error] nthreads (%d] larger than nlines (%d)\n",
     304        nthreads , image_nl );
    239305        exit( 0 );
    240306    }
     
    248314        // build instrumentation file name
    249315        if( USE_DQT_BARRIER )
    250         snprintf( instru_name , 32 , "conv_dqt_no_place_%d_%d", x_size * y_size , ncores );
     316        snprintf( instru_name , 32 , "dqt_no_place_%d_%d", x_size * y_size , ncores );
    251317        else
    252         snprintf( instru_name , 32 , "conv_smp_no_place_%d_%d", x_size * y_size , ncores );
     318        snprintf( instru_name , 32 , "smp_no_place_%d_%d", x_size * y_size , ncores );
    253319    }
    254320
     
    260326        // build instrumentation file name
    261327        if( USE_DQT_BARRIER )
    262         snprintf( instru_name , 32 , "conv_dqt_explicit_%d_%d", x_size * y_size , ncores );
     328        snprintf( instru_name , 32 , "dqt_explicit_%d_%d", x_size * y_size , ncores );
    263329        else
    264         snprintf( instru_name , 32 , "conv_smp_explicit_%d_%d", x_size * y_size , ncores );
     330        snprintf( instru_name , 32 , "smp_explicit_%d_%d", x_size * y_size , ncores );
    265331    }
    266332
     
    272338        // build instrumentation file name
    273339        if( USE_DQT_BARRIER )
    274         snprintf( instru_name , 32 , "conv_dqt_parallel_%d_%d", x_size * y_size , ncores );
     340        snprintf( instru_name , 32 , "dqt_parallel_%d_%d", x_size * y_size , ncores );
    275341        else
    276         snprintf( instru_name , 32 , "conv_smp_parallel_%d_%d", x_size * y_size , ncores );
     342        snprintf( instru_name , 32 , "smp_parallel_%d_%d", x_size * y_size , ncores );
    277343    }
    278344
    279345    // open instrumentation file
    280     snprintf( instru_path , 64 , "/home/%s", instru_name );
     346    snprintf( instru_path , 64 , "/home/convol/%s", instru_name );
    281347    FILE * f_instru = fopen( instru_path , NULL );
    282348    if ( f_instru == NULL )
     
    289355printf("\n[convol] main on core[%x,%d] open instrumentation file %s\n",
    290356cxy_main, lid_main, instru_path );
     357#endif
     358
     359    // main create an FBF window for input image
     360    in_wid = fbf_create_window( 0,                   // l_zero
     361                                0,                   // p_zero
     362                                image_nl,            // lines
     363                                image_np,            // pixels
     364                                &in_win_buf );
     365    if( in_wid < 0 )
     366    {
     367        printf("\n[transpose error] cannot open FBF window for %s\n",
     368        input_image_path);
     369        exit( 0 );
     370    }
     371
     372    // activate window
     373    error = fbf_active_window( in_wid , 1 );
     374
     375    if( error )
     376    {
     377        printf("\n[transpose error] cannot activate window for %s\n",
     378        input_image_path );
     379        exit( 0 );
     380    }
     381
     382#if  VERBOSE_MAIN
     383printf("\n[convol] main on core[%x,%d] created FBF window (wid %d) for <%s>\n",
     384cxy_main, lid_main, in_wid, input_image_path );
     385#endif
     386
     387    // main create an FBF window for output image
     388    out_wid = fbf_create_window( 0,                   // l_zero
     389                                 image_np,            // p_zero
     390                                 image_nl,            // lines
     391                                 image_np,            // pixels
     392                                 &out_win_buf );
     393    if( out_wid < 0 )
     394    {
     395        printf("\n[transpose error] cannot create FBF window for %s\n",
     396        output_image_path);
     397        exit( 0 );
     398    }
     399
     400    // activate window
     401    error = fbf_active_window( out_wid , 1 );
     402
     403    if( error )
     404    {
     405        printf("\n[transpose error] cannot activate window for %s\n",
     406        output_image_path );
     407        exit( 0 );
     408    }
     409
     410#if  VERBOSE_MAIN
     411printf("\n[convol] main on core[%x,%d] created FBF window (wid %d) for <%s>\n",
     412cxy_main, lid_main, out_wid, output_image_path );
    291413#endif
    292414
     
    312434
    313435#if VERBOSE_MAIN
    314 printf("\n[convol] main on core[%x,%d] completes barrier init\n",
     436printf("\n[convol] main on core[%x,%d] completed barrier init\n",
    315437cxy_main, lid_main );
    316438#endif
    317439
    318440    // main open input file
    319     int fd_in = open( IMAGE_IN_PATH , O_RDONLY , 0 );
     441    int fd_in = open( input_image_path , O_RDONLY , 0 );
    320442
    321443    if ( fd_in < 0 )
    322444    {
    323         printf("\n[convol error] cannot open input file <%s>\n", IMAGE_IN_PATH );
    324         exit( 0 );
    325     }
    326 
    327 #if VERBOSE_MAIN
    328 printf("\n[convol] main on core[%x,%d] open file <%s>\n",
    329 cxy_main, lid_main, IMAGE_IN_PATH );
    330 #endif
    331    
    332     // main thread map image_in buffer to input file
     445        printf("\n[convol error] cannot open input file <%s>\n", input_image_path );
     446        exit( 0 );
     447    }
     448
     449    // main thread map input file to image_in buffer
    333450    image_in = (unsigned char *)mmap( NULL,
    334                                       NB_PIXELS * IMAGE_IN_PIXEL_SIZE,
     451                                      image_np * image_nl,
    335452                                      PROT_READ,
    336453                                      MAP_FILE | MAP_SHARED,
     
    339456    if ( image_in == NULL )
    340457    {
    341         printf("\n[convol error] main cannot map buffer to file %s\n", IMAGE_IN_PATH );
     458        printf("\n[convol error] main cannot map buffer to file %s\n", input_image_path );
    342459        exit( 0 );
    343460    }
    344461
    345462#if  VERBOSE_MAIN
    346 printf("\n[convol] main on core[%x,%x] map buffer to file <%s>\n",
    347 cxy_main, lid_main, IMAGE_IN_PATH );
     463printf("\n[convol] main on core[%x,%x] map <image_in> buffer to file <%s>\n",
     464cxy_main, lid_main, input_image_path );
    348465#endif
    349466
    350467    // main thread open output file
    351     int fd_out = open( IMAGE_OUT_PATH , O_CREAT , 0 );
     468    int fd_out = open( output_image_path , O_CREAT , 0 );
    352469
    353470    if ( fd_out < 0 )
    354471    {
    355         printf("\n[convol error] main cannot open file %s\n", IMAGE_OUT_PATH );
    356         exit( 0 );
    357     }
    358 
    359 #if  VERBOSE_MAIN
    360 printf("\n[convol] main on core[%x,%d] open file <%s>\n",
    361 cxy_main, lid_main, IMAGE_OUT_PATH );
    362 #endif
     472        printf("\n[convol error] main cannot open file %s\n", output_image_path );
     473        exit( 0 );
     474    }
    363475
    364476    // main thread map image_out buffer to output file
    365477    image_out = (unsigned char *)mmap( NULL,
    366                                        NB_PIXELS + IMAGE_OUT_PIXEL_SIZE,
     478                                       image_np * image_nl,
    367479                                       PROT_WRITE,
    368480                                       MAP_FILE | MAP_SHARED,
     
    371483    if ( image_out == NULL )
    372484    {
    373         printf("\n[convol error] main cannot map buffer to file %s\n", IMAGE_OUT_PATH );
     485        printf("\n[convol error] main cannot map buffer to file %s\n", output_image_path );
    374486        exit( 0 );
    375487    }
    376488
    377489#if  VERBOSE_MAIN
    378 printf("\n[convol] main on core[%x,%x] map buffer to file <%s>\n",
    379 cxy_main, lid_main, IMAGE_OUT_PATH );
     490printf("\n[convol] main on core[%x,%x] map <image_out> buffer to file <%s>\n",
     491cxy_main, lid_main, output_image_path );
    380492#endif
    381493
     
    389501{
    390502    // the tid value for the main thread is always 0
    391     // main thread creates new threads with tid in [1,nthreads-1] 
     503    // main thread creates other threads with tid in [1,nthreads-1] 
    392504    unsigned int tid;
    393505    for ( tid = 0 ; tid < nthreads ; tid++ )
     
    587699#endif
    588700
     701    // ask confirm for exit
     702    if( INTERACTIVE_MODE )
     703    {
     704        char byte;
     705        printf("\n[convol] press any key to to delete FBF windows and exit\n");
     706        getc( &byte );
     707    }
     708 
     709    // main thread delete FBF windows
     710    fbf_delete_window( in_wid );
     711    fbf_delete_window( out_wid );
     712
     713#if VERBOSE_MAIN
     714printf("\n[convol] main deleted FBF windows\n" );
     715#endif
     716
    589717    // main thread suicide
    590718    exit( 0 );
     
    597725
    598726
     727
     728
     729
     730
    599731//////////////////////////////////
    600732void * execute( void * arguments )
    601 
     733//////////////////////////////////
    602734{
    603735    unsigned long long date;
     
    628760    // thread [cid][lid] indexes, and the core coordinates [cxy][lpid]
    629761
    630     // get thread abstract identifiers 
     762    // get thread abstract identifiers[cid,lid]  from tid
    631763    unsigned int tid = args->tid;
    632764    unsigned int cid = tid / ncores;   
     
    642774#endif
    643775
    644     // build total number of threads and clusters from global variables
     776    // compute nthreads and nclusters from global variables
    645777    unsigned int nclusters = x_size * y_size;
    646778    unsigned int nthreads  = nclusters * ncores;
     
    652784    unsigned int z;                 // vertical filter index
    653785
    654     unsigned int lines_per_thread   = NL / nthreads;
    655     unsigned int lines_per_cluster  = NL / nclusters;
    656     unsigned int pixels_per_thread  = NP / nthreads;
    657     unsigned int pixels_per_cluster = NP / nclusters;
    658 
    659     // compute number of pixels stored in one abstract cluster cid
    660     unsigned int local_pixels = NL * NP / nclusters;       
    661 
    662     unsigned int first, last;
     786    unsigned int lines_per_thread   = image_nl / nthreads;
     787    unsigned int lines_per_cluster  = image_nl / nclusters;
     788    unsigned int pixels_per_thread  = image_np / nthreads;
     789    unsigned int pixels_per_cluster = image_np / nclusters;
     790
     791    // compute number of pixels stored in one cluster
     792    unsigned int local_pixels = image_nl * image_np / nclusters;       
    663793
    664794    get_cycle( &date );
    665795    START[cid][lid] = (unsigned int)date;
    666796
    667     // Each thread[cid][0] allocates 5 local buffers,
     797    // Each thread[cid][0] allocates 5 buffers local cluster cid
    668798    // and registers these 5 pointers in the global arrays
    669799    if ( lid == 0 )
    670800    {
    671         GA[cid] = malloc( local_pixels * sizeof( unsigned short ) );
     801        GA[cid] = malloc( local_pixels * sizeof( unsigned char ) );
    672802        GB[cid] = malloc( local_pixels * sizeof( int ) );
    673803        GC[cid] = malloc( local_pixels * sizeof( int ) );
     
    675805        GZ[cid] = malloc( local_pixels * sizeof( unsigned char ) );
    676806
    677         if( (GA[cid] == NULL) || (GB[cid] == NULL) || (GC[cid] == NULL) ||
    678             (GD[cid] == NULL) || (GZ[cid] == NULL) )
     807        if( (GA[cid] == NULL) ||
     808            (GB[cid] == NULL) ||
     809            (GC[cid] == NULL) ||
     810            (GD[cid] == NULL) ||
     811            (GZ[cid] == NULL) )
    679812        {
    680813            printf("\n[convol error] thread[%d] cannot allocate buf_in\n", tid );
     
    684817#if VERBOSE_EXEC
    685818get_cycle( &date );
    686 printf( "\n[convol] exec[%d] on core[%x,%d] allocated shared buffers / cycle %d\n"
     819printf("\n[convol] exec[%d] on core[%x,%d] allocated shared buffers / cycle %d\n"
    687820" GA %x / GB %x / GC %x / GD %x / GZ %x\n",
    688821tid, cxy , lpid, (unsigned int)date, GA[cid], GB[cid], GC[cid], GD[cid], GZ[cid] );
     
    694827    pthread_barrier_wait( &barrier );
    695828
    696     // Each thread[cid,lid] allocate and initialise in its private stack
     829    // Each thread[tid] allocates and initialises in its private stack
    697830    // a copy of the arrays of pointers on the distributed buffers.
    698     unsigned short * A[CLUSTERS_MAX];
     831    unsigned char * A[CLUSTERS_MAX];
    699832    int            * B[CLUSTERS_MAX];
    700833    int            * C[CLUSTERS_MAX];
     
    711844    }
    712845
    713     // Each thread[cid,0] access the file containing the input image, to load
    714     // the local A[cid] buffer. Other threads are waiting on the barrier.
    715     if ( lid==0 )
    716     {
    717         unsigned int size   = local_pixels * sizeof( unsigned short );
    718         unsigned int offset = size * cid;
    719 
    720         memcpy( A[cid],
    721                 image_in + offset,
    722                 size );
     846    unsigned int npixels  = image_np * lines_per_thread;     // pixels moved by any thread
     847    unsigned int g_offset = npixels * tid;             // offset in global buffer for tid
     848    unsigned int l_offset = npixels * lid;             // offset in local buffer for tid
     849
     850    // min and max line indexes handled by thread[tid] for a global buffer
     851    unsigned int global_lmin = tid * lines_per_thread;   
     852    unsigned int global_lmax = global_lmin + lines_per_thread; 
     853
     854    // min and max line indexes handled by thread[tid] for a local buffer
     855    unsigned int local_lmin  = lid * lines_per_thread;   
     856    unsigned int local_lmax  = local_lmin + lines_per_thread; 
     857
     858    // pmin and pmax pixel indexes handled by thread[tid] in a column
     859    unsigned int column_pmin = tid * pixels_per_thread; 
     860    unsigned int column_pmax = column_pmin + pixels_per_thread;
     861
     862    // Each thread[tid] copy npixels from image_in buffer to local A[cid] buffer
     863    memcpy( A[cid]   + l_offset,
     864            image_in + g_offset,
     865            npixels );
    723866 
    724867#if VERBOSE_EXEC
     
    728871#endif
    729872
    730     }
    731 
    732     // Optionnal parallel display of the initial image stored in A[c] buffers.
    733     // Eah thread[cid,lid] displays (NL/nthreads) lines.
    734 
     873    // Optionnal parallel display for the initial image
    735874    if ( INITIAL_DISPLAY_ENABLE )
    736875    {
    737         unsigned int line;
    738         unsigned int offset = lines_per_thread * lid;
    739 
    740         for ( l = 0 ; l < lines_per_thread ; l++ )
    741         {
    742             line = offset + l;
    743 
    744             // copy TA[cid] to TZ[cid]
    745             for ( p = 0 ; p < NP ; p++ )
    746             {
    747                 TZ(cid, line, p) = (unsigned char)(TA(cid, line, p) >> 8);
    748             }
    749 
    750             // display one line to frame buffer
    751             if (fbf_write( &TZ(cid, line, 0),                     // first pixel in TZ
    752                            NP,                                    // number of bytes
    753                            NP*(l + (tid * lines_per_thread))))    // offset in FBF
    754             {
    755                 printf("\n[convol error] in %s : thread[%d] cannot access FBF\n",
    756                 __FUNCTION__ , tid );
    757                 pthread_exit( &THREAD_EXIT_FAILURE );
    758             }
     876        // each thread[tid] copy npixels from A[cid] to in_win_buf buffer
     877        memcpy( in_win_buf + g_offset,
     878                A[cid]     + l_offset,
     879                npixels );
     880
     881        // refresh the FBF window
     882        if( fbf_refresh_window( in_wid , global_lmin , global_lmax ) )
     883        {
     884            printf("\n[convol error] in %s : thread[%d] cannot access FBF\n",
     885            __FUNCTION__ , tid );
     886            pthread_exit( &THREAD_EXIT_FAILURE );
    759887        }
    760888
     
    771899    ////////////////////////////////////////////////////////////
    772900    // parallel horizontal filter :
    773     // B <= convol(FH(A))
     901    // B <= Transpose(FH(A))
    774902    // D <= A - FH(A)
    775     // Each thread computes (NL/nthreads) lines.
     903    // Each thread computes (image_nl/nthreads) lines.
    776904    // The image must be extended :
    777905    // if (z<0)    TA(cid,l,z) == TA(cid,l,0)
    778     // if (z>NP-1) TA(cid,l,z) == TA(cid,l,NP-1)
     906    // if (z>image_np-1) TA(cid,l,z) == TA(cid,l,image_np-1)
    779907    ////////////////////////////////////////////////////////////
    780908
     
    782910    H_BEG[cid][lid] = (unsigned int)date;
    783911
    784     // l = absolute line index / p = absolute pixel index 
    785     // first & last define which lines are handled by a given thread
    786 
    787     first = tid * lines_per_thread;
    788     last  = first + lines_per_thread;
    789 
    790     for (l = first; l < last; l++)
     912    // l = global line index / p = absolute pixel index 
     913
     914    for (l = global_lmin; l < global_lmax; l++)
    791915    {
    792916        // src_c and src_l are the cluster index and the line index for A & D
     
    814938            TD(src_c, src_l, p) = (int) TA(src_c, src_l, p) - sum_p / hnorm;
    815939        }
    816         // second domain : from (hrange+1) to (NP-hrange-1)
    817         for (p = hrange + 1; p < NP - hrange; p++)
     940        // second domain : from (hrange+1) to (image_np-hrange-1)
     941        for (p = hrange + 1; p < image_np - hrange; p++)
    818942        {
    819943            // dst_c and dst_p are the cluster index and the pixel index for B
     
    825949            TD(src_c, src_l, p) = (int) TA(src_c, src_l, p) - sum_p / hnorm;
    826950        }
    827         // third domain : from (NP-hrange) to (NP-1)
    828         for (p = NP - hrange; p < NP; p++)
     951        // third domain : from (image_np-hrange) to (image_np-1)
     952        for (p = image_np - hrange; p < image_np; p++)
    829953        {
    830954            // dst_c and dst_p are the cluster index and the pixel index for B
    831955            int dst_c = p / pixels_per_cluster;
    832956            int dst_p = p % pixels_per_cluster;
    833             sum_p = sum_p + (int) TA(src_c, src_l, NP - 1)
     957            sum_p = sum_p + (int) TA(src_c, src_l, image_np - 1)
    834958                          - (int) TA(src_c, src_l, p - hrange - 1);
    835959            TB(dst_c, dst_p, l) = sum_p / hnorm;
     
    858982    ///////////////////////////////////////////////////////////////
    859983    // parallel vertical filter :
    860     // C <= transpose(FV(B))
    861     // Each thread computes (NP/nthreads) columns
     984    // C <= Transpose(FV(B))
     985    // Each thread computes (image_np/nthreads) columns
    862986    // The image must be extended :
    863987    // if (l<0)    TB(cid,p,l) == TB(cid,p,0)
    864     // if (l>NL-1)   TB(cid,p,l) == TB(cid,p,NL-1)
     988    // if (l>image_nl-1)   TB(cid,p,l) == TB(cid,p,image_nl-1)
    865989    ///////////////////////////////////////////////////////////////
    866990
     
    868992    V_BEG[cid][lid] = (unsigned int)date;
    869993
    870     // l = absolute line index / p = absolute pixel index
    871     // first & last define which pixels are handled by a given thread
    872 
    873     first = tid * pixels_per_thread;
    874     last  = first + pixels_per_thread;
    875 
    876     for (p = first; p < last; p++)
     994    // l = global line index / p = pixel index in column
     995
     996    for (p = column_pmin; p < column_pmax ; p++)
    877997    {
    878998        // src_c and src_p are the cluster index and the pixel index for B
     
    8831003
    8841004        // We use the specific values of the vertical ep-filter
    885         // To minimize the number of tests, the NL lines are split in three domains
     1005        // To minimize the number of tests, the image_nl lines are split in three domains
    8861006
    8871007        // first domain : explicit computation for the first 18 values
     
    8991019        }
    9001020        // second domain
    901         for (l = 18; l < NL - 17; l++)
     1021        for (l = 18; l < image_nl - 17; l++)
    9021022        {
    9031023            // dst_c and dst_l are the cluster index and the line index for C
     
    9191039        }
    9201040        // third domain
    921         for (l = NL - 17; l < NL; l++)
     1041        for (l = image_nl - 17; l < image_nl; l++)
    9221042        {
    9231043            // dst_c and dst_l are the cluster index and the line index for C
     
    9251045            int dst_l = l % lines_per_cluster;
    9261046
    927             sum_l = sum_l + TB(src_c, src_p, min(l + 4, NL - 1))
    928                   + TB(src_c, src_p, min(l + 8, NL - 1))
    929                   + TB(src_c, src_p, min(l + 11, NL - 1))
    930                   + TB(src_c, src_p, min(l + 15, NL - 1))
    931                   + TB(src_c, src_p, min(l + 17, NL - 1))
     1047            sum_l = sum_l + TB(src_c, src_p, min(l + 4, image_nl - 1))
     1048                  + TB(src_c, src_p, min(l + 8, image_nl - 1))
     1049                  + TB(src_c, src_p, min(l + 11, image_nl - 1))
     1050                  + TB(src_c, src_p, min(l + 15, image_nl - 1))
     1051                  + TB(src_c, src_p, min(l + 17, image_nl - 1))
    9321052                  - TB(src_c, src_p, l - 5)
    9331053                  - TB(src_c, src_p, l - 9)
     
    9581078    pthread_barrier_wait( &barrier );
    9591079
    960     // Optional parallel display of the final image Z <= D + C
    961     // Eah thread[x,y,p] displays (NL/nthreads) lines.
    962 
     1080    ///////////////////////////////////////////////////////////////
     1081    // build final image in local Z buffer from C & D local buffers
     1082    // store it in output image file, and display it on FBF.
     1083    // Z <= C + D
     1084    ///////////////////////////////////////////////////////////////
     1085
     1086    get_cycle( &date );
     1087    F_BEG[cid][lid] = (unsigned int)date;
     1088
     1089    // Each thread[tid] set local buffer Z[cid] from local buffers C[cid] & D[cid]
     1090
     1091    for( l = local_lmin ; l < local_lmax ; l++ )
     1092    {
     1093        for( p = 0 ; p < image_np ; p++ )
     1094        {
     1095            TZ(cid,l,p) = TC(cid,l,p) + TD(cid,l,p);
     1096        }
     1097    }
     1098
     1099    // Each thread[tid] copy npixels from Z[cid] buffer to image_out buffer
     1100    memcpy( image_out + g_offset,
     1101            Z[cid]    + l_offset,
     1102            npixels );
     1103
     1104    // Optional parallel display of the final image
    9631105    if ( FINAL_DISPLAY_ENABLE )
    9641106    {
    965         get_cycle( &date );
    966         D_BEG[cid][lid] = (unsigned int)date;
    967 
    968         unsigned int line;
    969         unsigned int offset = lines_per_thread * lid;
    970 
    971         for ( l = 0 ; l < lines_per_thread ; l++ )
    972         {
    973             line = offset + l;
    974 
    975             for ( p = 0 ; p < NP ; p++ )
    976             {
    977                 TZ(cid, line, p) =
    978                    (unsigned char)( (TD(cid, line, p) +
    979                                      TC(cid, line, p) ) >> 8 );
    980             }
    981 
    982             if (fbf_write( &TZ(cid, line, 0),                   // first pixel in TZ
    983                            NP,                                  // number of bytes
    984                            NP*(l + (tid * lines_per_thread))))  // offset in FBF
    985             {
    986                 printf("\n[convol error] thread[%d] cannot access FBF\n", tid );
    987                 pthread_exit( &THREAD_EXIT_FAILURE );
    988             }
    989         }
    990 
    991         get_cycle( &date );
    992         D_END[cid][lid] = (unsigned int)date;
    993 
    994 #if VERBOSE_EXEC
     1107        // each thread[tid] copy npixels from Z[cid] to out_win_buf buffer
     1108        memcpy( out_win_buf + g_offset,
     1109                Z[cid]      + l_offset,
     1110                npixels );
     1111
     1112        // refresh the FBF window
     1113        if( fbf_refresh_window( out_wid , global_lmin , global_lmax ) )
     1114        {
     1115            printf("\n[convol error] in %s : thread[%d] cannot access FBF\n",
     1116            __FUNCTION__ , tid );
     1117            pthread_exit( &THREAD_EXIT_FAILURE );
     1118        }
     1119
     1120#if VERBOSE_EXEC
    9951121get_cycle( &date );
    9961122printf( "\n[convol] exec[%d] on core[%x,%d] completed final display / cycle %d\n",
    997 tid , cxy , lid , (unsigned int)date );
     1123tid , cxy , lpid , (unsigned int)date );
    9981124#endif
    9991125
     
    10101136    }
    10111137
     1138    get_cycle( &date );
     1139    F_END[cid][lid] = (unsigned int)date;
     1140
    10121141    // thread termination depends on the placement policy
    10131142    if( PARALLEL_PLACEMENT )   
     
    10311160
    10321161} // end execute()
     1162
     1163
     1164
    10331165
    10341166
     
    10571189    unsigned int max_v_end = 0;
    10581190
    1059     unsigned int min_d_beg = 0xFFFFFFFF;
    1060     unsigned int max_d_beg = 0;
    1061 
    1062     unsigned int min_d_end = 0xFFFFFFFF;
    1063     unsigned int max_d_end = 0;
     1191    unsigned int min_f_beg = 0xFFFFFFFF;
     1192    unsigned int max_f_beg = 0;
     1193
     1194    unsigned int min_f_end = 0xFFFFFFFF;
     1195    unsigned int max_f_end = 0;
    10641196
    10651197    for (cc = 0; cc < nclusters; cc++)
     
    10821214            if (V_END[cc][pp] > max_v_end) max_v_end = V_END[cc][pp];
    10831215
    1084             if (D_BEG[cc][pp] < min_d_beg) min_d_beg = D_BEG[cc][pp];
    1085             if (D_BEG[cc][pp] > max_d_beg) max_d_beg = D_BEG[cc][pp];
    1086 
    1087             if (D_END[cc][pp] < min_d_end) min_d_end = D_END[cc][pp];
    1088             if (D_END[cc][pp] > max_d_end) max_d_end = D_END[cc][pp];
     1216            if (F_BEG[cc][pp] < min_f_beg) min_f_beg = F_BEG[cc][pp];
     1217            if (F_BEG[cc][pp] > max_f_beg) max_f_beg = F_BEG[cc][pp];
     1218
     1219            if (F_END[cc][pp] < min_f_end) min_f_end = F_END[cc][pp];
     1220            if (F_END[cc][pp] > max_f_end) max_f_end = F_END[cc][pp];
    10891221        }
    10901222    }
     
    11091241
    11101242    printf(" - D_BEG : min = %d / max = %d / med = %d / delta = %d\n",
    1111            min_d_beg, max_d_beg, (min_d_beg+max_d_beg)/2, max_d_beg-min_d_beg);
     1243           min_f_beg, max_f_beg, (min_f_beg+max_f_beg)/2, max_f_beg-min_f_beg);
    11121244
    11131245    printf(" - D_END : min = %d / max = %d / med = %d / delta = %d\n",
    1114            min_d_end, max_d_end, (min_d_end+max_d_end)/2, max_d_end-min_d_end);
     1246           min_f_end, max_f_end, (min_f_end+max_f_end)/2, max_f_end-min_f_end);
    11151247
    11161248    printf( "\n General Scenario   (Kcycles)\n" );
     
    11191251    printf( " - BARRIER HORI/VERT = %d\n", (min_v_beg - max_h_end)/1000 );
    11201252    printf( " - V_FILTER          = %d\n", (max_v_end - min_v_beg)/1000 );
    1121     printf( " - BARRIER VERT/DISP = %d\n", (min_d_beg - max_v_end)/1000 );
    1122     printf( " - DISPLAY           = %d\n", (max_d_end - min_d_beg)/1000 );
     1253    printf( " - BARRIER VERT/DISP = %d\n", (min_f_beg - max_v_end)/1000 );
     1254    printf( " - DISPLAY           = %d\n", (max_f_end - min_f_beg)/1000 );
    11231255    printf( " \nSEQUENCIAL = %d / PARALLEL = %d\n",
    11241256            SEQUENCIAL_TIME/1000, PARALLEL_TIME/1000 );
     
    11431275
    11441276    fprintf( f , " - D_BEG : min = %d / max = %d / med = %d / delta = %d\n",
    1145            min_d_beg, max_d_beg, (min_d_beg+max_d_beg)/2, max_d_beg-min_d_beg);
     1277           min_f_beg, max_f_beg, (min_f_beg+max_f_beg)/2, max_f_beg-min_f_beg);
    11461278
    11471279    fprintf( f , " - D_END : min = %d / max = %d / med = %d / delta = %d\n",
    1148            min_d_end, max_d_end, (min_d_end+max_d_end)/2, max_d_end-min_d_end);
     1280           min_f_end, max_f_end, (min_f_end+max_f_end)/2, max_f_end-min_f_end);
    11491281
    11501282    fprintf( f ,  "\n General Scenario (Kcycles)\n" );
     
    11531285    fprintf( f ,  " - BARRIER HORI/VERT = %d\n", (min_v_beg - max_h_end)/1000 );
    11541286    fprintf( f ,  " - V_FILTER          = %d\n", (max_v_end - min_v_beg)/1000 );
    1155     fprintf( f ,  " - BARRIER VERT/DISP = %d\n", (min_d_beg - max_v_end)/1000 );
    1156     fprintf( f ,  " - DISPLAY           = %d\n", (max_d_end - min_d_beg)/1000 );
     1287    fprintf( f ,  " - BARRIER VERT/DISP = %d\n", (min_f_beg - max_v_end)/1000 );
     1288    fprintf( f ,  " - SAVE              = %d\n", (max_f_end - min_f_beg)/1000 );
    11571289    fprintf( f ,  " \nSEQUENCIAL = %d / PARALLEL = %d\n",
    11581290    SEQUENCIAL_TIME/1000, PARALLEL_TIME/1000 );
Note: See TracChangeset for help on using the changeset viewer.