wiki:DsxDocumentation

Version 14 (modified by alain, 16 years ago) (diff)

--

DSX tool specification

A) Goals and general principles

DSX stands for Design Space eXplorer. It helps the system designer to map a multi-threaded software application on a multi-processor hardware architecture (MP-SoC) modeled with the SoCLib components.

It supports the hardware software codesign approach, allowing the designer to define successively :

  • the software application structure : number of tasks and communication channels
  • the hardware architecture : number of processors, number of memory banks, etc.
  • the mapping of the software application on the hardware architecture

A specific goal of DSX is to allow the system designer to control not only the placement of the tasks on the processors, but the placement of the software objects (execution stacks, communication buffers, synchronization locks, etc.) on the memory banks. In shared memory multi-processors architectures with several physically distributed memory banks, such control is mandatory to optimize both the performances and the power consumption.

The two targeted application domains are the telecommunication applications (where the tasks are handling packets or packet descriptors), and multi-media applications (where the tasks are handling audio or video streams).

The general principles of the DSX tool are the following:

  • The coarse grain parallelism of the software application must be statically defined as a Task & Communications Graph (TCG). The number of tasks, and the communication channels between tasks should not change during execution.
  • The software tasks are supposed to be written in C or C++, but - for portability reasons - the tasks must use an abstract System Resource Layer (SRL) API to access the communication and synchronizations resources.
  • Each task in the TCG can be implemented as a software task (software running on an embedded processor), or can be implemented as an hardware task, (running as a dedicated hardware coprocessor).
  • DSX allows the programmer to use unprotected shared memory spaces, but the prefered inter-tasks communication mechanism use the MWMR middleware. The MWMR (Multi-Writer, Multi-Reader)communication channels, are implemented as software FIFOs and can be shared by software tasks, and by hardware tasks.
  • DSX provides classical synchronization mechanisms such as barriers and locks, but inter-task synchronisation is mainly done through the data availability in the MWMR channels.
  • The target hardware architecture is a shared memory multi-processor system on chip (MP-SoC) using the SoCLib library of IP cores. But - in order to validate the multi-threaded software application - DSX is able to generate an executable binary code for a standard POSIX workstation.
  • DSX supports the POSIX compliant Mutek OS kernel for embedded MPSoCs
  • Finally, DSX defines the DSX/L language, based on PYTHON, that allows the system designer to describe in a single file the Task & Communication Graph (TCG), the MP-SoC hardware architecture, and various mapping of the TCG on the MP-Soc architecture.

The DSX/L script execution generates the binary code executable on the workstation, the SystemC model of the top cell correspondint to the MP-SoC architecture, and the binary code that will be uploaded in the MP-Soc embedded memory.

B) System Resources Layer

We want to map the multi-threaded software application on several hardware platforms, without any modification of the task code. One platform is a POSIX compliant workstation, as we want to validate the multi-threaded software application on a workstation before starting the mapping on the MPSoC architecture.

DSX defines a system Ressource Layer API , that is an abstraction of the synchronization and communication services provided by the various target platforms. The SrlApi helps the C programmer to distinguish the embedded application code from the system code used for inter-tasks communications and synchronizations.

  • blocking Read & Write access to a MWMR channel
    • srl_mwmr_read( )
    • srl_mwmr_write( )
  • non blocking Read & Write access to a MWMR channel
    • srl_mwmr_try_read( )
    • srl_mwmr_try_write( )
  • flush a MWMR channel
    • srl_mwmr_flush( )
  • Synchronization barrier
    • srl_barrier_wait( )
  • taking and releasing a lock
    • srl_loock_lock( )
    • srl_lock_unlock( )
  • accessing a shared memory space (address and size)
    • srl_memspace_addr( )
    • srl_memspace_size( )

Three platforms are presently supported :

  • Any Linux (or Unix) workstation supporting the POSIX threads,
  • MP-SoC architecture using the MUTEK/D operation system,
  • MP-SoC architecture using the MUTEK/S operating system,

MUTEK/D is an embedded, POSIX compliant, distributed, operating system for MP-SoCs?, while MUTEK/S is an optimized version: the performances are improved, and the memory footprint is reduced, at the cost of loosing the POSIX compatibility.

C) Software application definition

This section describes the DSX/L syntax used to define the Task & Communication Graph structure. The TCG is a bipartite graph: the two types of nodes are the tasks and the communication channels.

As an example, the following figure describes the TCG corresponding to an MJPEG decoder application.

The two TG & RAMDAC tasks will be implemented as hardware coprocessors : the TG component implements a wire-less receiver for the MJPEG stream, and the RAMDAC component is a graphic display controller. The 5 other tasks can be implemented as software tasks or as hardware tasks. In this particular example, all MWMR communication channels have one single producer, and one single consumer, which is frequent for stream oriented multi-media applications.

C1) Task Model definition

As a software application can instanciate several instances of the same task, we must distinguish the task, and the task model. A task model defines the code associated to the task, and the task interface (corresponding to the system resources used by the task : MWMR communications channels, synchronization barriers, locks, and memspaces).

task_model = TaskModel( 'model_name',
                    infifos = [ 'inport_name', ... ] ,
                    outfifos = [ 'outport_name', ... ] ,
                    locks = [ 'lock_name', ... ] ,
                    barriers = [ 'barrier_name', ... ] ,
                    memspaces = [ 'memspace_name', ... ] ,
                    signals = [ 'signal_name', ... ] ,
                    impls = [ SwTask( 'func', stack_size = 1024 , sources = [ 'func.c' ] )

If a task does not use a given type of resource, the corresponding parameter can be skipped.

C2) MWMR communication channel definition

A MWMR communication channel is a memory buffer handled as a software FIFO that can have several producers and several consumers. Each channel is protected by an implicit lock for exclusive access. Any MWMR transaction can be decomposed in five memory access:

  1. get the lock protecting the MWMR (READ access).
  2. test the status of the MWMR (READ access).
  3. transfer a burst of data between a local buffer and the MWMR (READ/WRITE access).
  4. update the status of the MWMR (WRITE access).
  5. release the lock (WRITE access).

Any data transfer to or from a MWMR channel mut be an integer number of items. The item width is an intrinsic property of the channel. It is defined as a number of bytes, and it defines the channel width. The channel depth is a number of items, and defines the total channel capacity. For performances reasons the channel width itself must be a multiple of 4 bytes.

my_channel = Mwmr( 'channel_name', width, depth )

In the mapping section of the DSX/L program, the 4 following software objects must be placed :

  1. desc : read only informations regarding the communication channel
  2. status : channel state (number of stored items, read & write pointers)
  3. buffer : channel buffer containing the data
  4. lock : lock protecting exclusive access

C3) Synchronization barrier definition

The synchronization barriers can be used when the synchronization through the data availability in the MWMR communication channels in not enough. The set of tasks that are linked to a given barrier is defined when the the tasks are intanciated. Exclusive access to the barrier is protected by an implicit lock.

my_barrier = Barrier( 'barrier_name' )

In the mapping section of the DSX/L program, the 3 following software objects must be placed :

  1. desc : read only informations regarding the synchronization barrier
  2. status : barrier state
  3. lock : lock protecting exclusive access

C4) Memspace definition

Direct communication through shared memory buffers is supported by DSX, but there is no protection mechanism, and the synchronization is the programmer responsability. A shared memory space is defined by two parameters : memspace_name is the name, and size defines the number of bytes to be reserved.

my_buffer = Memspace( 'buffer_name', size )

In the mapping section of the DSX/L program, the 2 following software objects must be placed :

  1. desc : read only informations regarding the memspace
  2. mem : the shared memory buffer

C5) lock definition

A lock is a variable that can be used to protect exclusive access to a shared resource such as a shared memory space. It is implemented as a spinlock : the srl_lock_lock() funtion returns only when the lock has been obtained.

my_lock = Lock( 'lock_name' )

In the mapping section of the DSX/L program, the lock can be explicitely placed in the memory space.

C6) Task instanciation

A task is an instance of a task model. The constructor arguments are the task name task_name, the task model Task_Model (created by the TaskModel?() function), and a list of resources (MWMR channels, synchronization barriers, locks or memspaces), that must be associated to the task ports. DSX performs type checking between the port name and the associated resource.

my_task = Task( 'task_name',
                         task_model ,
                         portmap = { 'port_name' : my_channel, 'barrier_name' : my_barrier, ... } )

In the mapping section of the DSX/L program, 4 software objects must be placed :

  1. desc : read-only informations associated to the task
  2. status : state of the task
  3. stack : execution stack
  4. run : processor running the task

C8) TCG definition

The Task and Communication Graph must be defined :

my_tcg = Tcg( 
             Task(  'task_name1, 
                     Task_Model1, 
                     portmap = { ’in’:input, ’out’:output } ), 
             Task(  'task2', 
                     Task_Model2, 
                     portmap = { ’in’:input2, ’out’:output2 } ) 
              ... )

D) Hardware architecture definition

This section describes the DSX/L syntax used to define the MP-SoC hardware architecture, using the hardware components defined in the SoCLib library.

D1) SoCLib components

In the present version of DSX, each hardware component must be described by a PYTHON class that defines the component interface, and the component parameters. The list of available components can be found in SoclibComponents. For all components, the instance name is mandatory, but all other parameters have default values and can be skipped:

# creation of a MIPS R3000 processor core
my_proc = Mips( 'proc' )

# creation of a cache controler
my_cache = Xcache( 'cache',
                        dcache_lines = 32,
                        dcache_words = 8,
                        dcache_lines = 32,
                        dcache_words = 8)

D2) Connecting the components

Hardware components have input/output ports, and are connected through signals, but those signals are implicit in the DSX/L description. To connect the port a of component c1 to the port b of component c2, DSX/L define the operator :

c1.a // c2.b

Depending on the component type, the port designation can vary:

  • When the number of ports is fixed, the ports are attributs : My_Proc0.cache define the cache port of the MIPS processor.
  • When the number of port is not fixed (typivally for interconnect component, the ports are accessed through a dedicated method : the getTarget() method of the LocalCrossbar component returns a VCI target port.

The following example describes asimple system with two processor and on e embedded memory:

# components instanciacion
my_proc0 = Mips( 'proc0' )
my_cache0 = Xcache( 'cache0' )
my_proc1 = Mips( 'proc1' )
my_cache1 = Xcache( 'cache1' )
my_ram = MultiRam( 'ram' )
my_crossbar = LocalCrossbar( 'crossbar' )
                     
# components connexion
my_proc0.cache // my_cache0.cache
my_proc1.cache // my_cache1.cache
my_crossbar.getTarget() // my_cache0.vci
my_crossbar.getTarget() // my_cache1.vci
my_crossbar.getInitiator() // my_cache0.vci

D3) Address space segmentation

All hardware components defining the hardware architecture should be grouped in a single object. As any MP-SoC architecture build with the SoCLib library uses a VCI interconnect hardware component, this component must be declared as the root of the architecture :

my_architecture = Hardware( vgmn )

In any shared memory architecture, the address space is a shared resource. This resource is structured in several segments. A segment has a name, a base address, a size (number of bytes), and a cacheability attribut (Boolean). A segment is a physical entity associated to a given VCI target. Several segments can be associated to the same VCI target, but a given segment cannot be distributed over several VCI targets.

The DSX/L language allows the system designer to define the various segments used by a given application, and to link those segments to the hardware components. The base address and the segment size are optional parameters :

# segments definition
seg_data1 = Segment( 'seg1', Cached )
seg_data2 = Segment( 'seg2', Uncached )
seg_reset = Segment( 'reset', Cached, addr = 0xBFC00000 )

# Instanciating a VCI target hardware component
# and Linking  the segments to this component
my_ram = MultiRam ( 'ram', seg_data1, seg_data2, seg_reset )

As a segment is defined by the MSB bits of the VCI address, the hardware interconnect must decode those MSB bits to select the proper VCI target. The corresponding decoder is generally implemented as a ROM. Those hardware decoders are automatically constructed using the Mapping Table. The mapping table is an associative table. Each entry corresponds to a physical segment. This object must be constructed, and initialised:

# mapping table construction
my_mt = MappingTable()
# mapping table initialisation
my_hardware.setConfig( 'maptab', my_mt )

D4) Generic platforms

As DSX/L is based on PYTHON, it is possible to define generic, parametrized architectutes, that can be reused for various applications. Those reusable architectures are derived classes from the basic Architecture class. The implementation is defined in the architecture() method.

As an example we define a parameterized multi-processors architecture, called MultiProc?, and containing

a variable number of processors. The parameter(s) must be named, and the actual parameter value is defined when the architecture is instanciated. The parameter is referenced with the getParam() method, and it is possible to define a default value.

#################################
# generic architecture definition
#################################
class MultiProc(Architecture) : 
    defaults = { ’nbcpu’ : 2 }
    def architecture(self): 

    # segments definition
    self.reset = Segment( ’reset’, address = 0xbfc00000, type = Cached ) 
    self.code = Segment( ’code’, type = Cached )
    self.data = Segment( ’data’, type = Uncached ) 

    # components instanciation and connexion
    self.vgmn = Vgmn( ’vgmn’ ) 
    self.ram = MultiRam( ’ram’, self.reset, self.code, self.data ) 
    # processors and caches
    self.cpus = [] 
    for i in self.getParam( ’nbcpu’ ): 
        m = Mips( ’mips%d’%i ) 
        self.cpus.append( m ) 
        c = Xcache( ’cache%d’%i )
        g:c.cache // m.cache ) 
        c.vci // self.vgmn.getTarget() ) 
    self.vgmn.getTarget() // self.c1 
    self.vgmn.getTarget() // self.c2 
    self.vgmn.getInit() // self.ram 

    # base definition 
    self.setBase( self.vgmn ) 

    # segment table initialization
    self.setConfig(’mapping_table’, MappingTable() ) 

####################################
# generic architecture instanciation
####################################
my_board = MultiProc( nbcpu = 4 )  

E) Mapping the software on the hardware

At this point, we have defined the object my_tcg (defining the software application), and the object my_architecture (defining the hardware architecture). This section describes the DSX/L syntax used to map the TCG on the hardware architecture.

E1) Mapper declaration

AS it is possible to define various mapping for a given TCG, and a given architecture, we must define a third object : this mapper will contain all the mapping directives defined by the system designer.

my_mapper = Mapper( my_tcg, my_architecture )

E2) Mapper definition

The mapper has a method map() that is used to assign a software object to an hardware component. An hardware component can b a processor, or a segment associated to an embedded memory bank, or a segment associated to an addressable peripheral.

# Mapper definition 
my_mapper = Mapper( my_tcg, my_architecture )
 
# a segment seg_x is designated by my_architecture.seg_x 

# For a MWMR channel, 4 software elements must be placed
my_mapper.map( my_channel,  
        lock = my_archi.seg_locks,  # The lock protecting the channel is placed in segment seg_locks
        status = my_archi.seg_data,  # The channel status is placed in segment seg_data
        desc = my_archi.segdata, # The channel descriptor is placed in segment seg_readonly
        buffer = my_archi.sgdata ) # The channel buffer is placed in segment seg_data

# for a software task, 4 software objects must be placed
my_mappe.map( my_task,
        desc = my_archi.seg_data,  # The task descriptor is placed in segment seg_readonly
        status = my_archi.seg_data,  # The task state is placed in segment seg_data
        stack = my_archi.seg_data,   # The private task stack  is placed in segment seg_stack
        run = my_archi.cpu0 )   # task will be running on cpu0

F) Code generation