wiki:user_applications

Version 3 (modified by alain, 9 years ago) (diff)

--

GIET_VM / User Applications

The following applications use the GIET_VM system calls and user libraries. The multi-threaded applications have been designed to analyse the TSAR manycore architecture scalability.

Display

This mono-processor application read a stream of images (128 lines * 128 pixels / 1 byte per pixel), from a file on a FAT32 disk controller, and display it interactively on the frame buffer. The images.raw file available in the application directory contains 20 images. This application can be used to test peripherals such as block devices, frame buffer, and dynamic allocation of TTY terminals.

The source code can be found here.

Transpose

This multi-threaded application read a stream of images (128 lines * 128 pixels), transpose it (X <-> Y), and display it on the frame buffer. It can run on a multi-processors, multi-clusters architecture, with one thread per processor. The input and output buffers containing the image are distributed in all clusters.

The number of clusters must be a power of 2 no larger than 32 The number of processors per cluster must be a power of 2 no larger than 4

For each image the application makes a self test (checksum for each line). The actual display on the frame buffer depends on frame buffer availability.

The source code can be found here.

Convol

Classif

This multi-threaded application takes a stream of Gigabit Ethernet packets, and makes packet analysis and classification, based on the source MAC address. It uses the multi-channels NIC peripheral, and the chained buffers DMA controller, to receive and send packets on the Gigabit Ethernet port.

It can run on architectures containing up to 256 clusters, and up to 8 processors per cluster.

This application is described as a TCG (Task and Communication Graph) containing (N+2) tasks per cluster: one load task, one store task, and N analyse tasks. Each container can contain from 2 to 60 packets and has a fixed size of 4 Kbytes. These containers are distributed in clusters:

  • one RX container per cluster (part of the kernel rx_chbuf), in the kernel heap.
  • one TX container per cluster (part of the kernel tx-chbuf), in the kernel heap.
  • N working containers per cluster (one per analysis task), in the user heap.

In each cluster, the "load", analysis" and "store" tasks communicates through three local MWMR FIFOs:

  • fifo_l2a : tranfer a full container from "load" to "analyse" task.
  • fifo_a2s : transfer a full container from "analyse" to "store" task.
  • fifo_s2l : transfer an empty container from "store" to "load" task.

For each fifo, one item is a 32 bits word defining the index of an available working container.

The pointers on the working containers, and the pointers on the MWMR fifos are defined by global arrays stored in cluster[0][0]. The MWMR fifo descriptors array is defined as a global variable in cluster[0][0].

Initialisation is done in three steps by the "load" & "store" tasks:

  1. Task "load" in cluster[0][0] initialises the heaps in all clusters. Other tasks are waiting on the global_sync synchronisation variable.
  2. Task "load" in cluster[0][0] initialises the barrier between all "load" tasks, allocates NIC & CMA RX channel, and starts the NIC_CMA RX transfer. Other "load" tasks are waiting on the load_sync synchronisation variable. Task "store" in cluster[0][0] initialises the barrier between all "store" tasks, allocates NIC & CMA TX channels, and starts the NIC_CMA TX transfer. Other "store" tasks are waiting on the store_sync synchronisation variable.
  3. When this global initialisation is completed, the "load" task in all clusters allocates the working containers and the MWMR fifos descriptors from the user local heap. In each cluster, the "analyse" and "store" tasks are waiting the local initialisation completion on the local_sync[x][y] variables.

When initialisation is completed, all tasks loop on containers:

  1. The "load" task get an empty working container from the fifo_s2l, transfer one container from the kernel rx_chbuf to this user container, and transfer ownership of this container to one "analysis" task by writing into the fifo_l2a.
  2. The "analyse" task get one working container from the fifo_l2a, analyse each packet header, compute the packet type (depending on the SRC MAC address), increment the correspondint classification counter, and transpose the SRC and the DST MAC addresses fot TX tranmission.
  3. The "store" task transfer get a full working container from the fifo_a2s, transfer this user container content to the the kernel tx_chbuf, and transfer ownership of this empty container to the "load" task by writing into the fifo_s2l.

Instrumentation results display is done by the "store" task in cluster[0][0] when all "store" tasks completed the number of clusters specified by the CONTAINERS_MAX parameter.