Version 61 (modified by alain, 3 years ago) (diff)


Input/Output Operations

A) Peripheral identification

ALMOS-MKH identifies a peripheral by a composite index (func,impl). The func index defines a functional type, the impl index defines a specific hardware implementation.

  • Each value of the functional index func defines a generic (implementation independent) device, that is characterized by a specific set of access functions. This API allows the kernel to access the peripheral without taking care on the actual hardware implementation.
  • For each generic device func, it can exist several hardware implementation, and each value of the implementation index impl is associated with a specific driver, that must implement the generic API defined for this func device.

ALMOS-MKH supports two types of peripheral components:

  • External peripherals are accessed through a bridge located in one single cluster (called cluster_io, identified by the io_cxy parameter in the arch_info description). External devices are shared resources that can be used by any thread running in any cluster. Examples are the generic IOC device (Block Device Controller), the generic NIC device (Network Interface Controller), the generic TXT device (Text Terminal), the generic FBF device (Frame Buffer for Graphical Display Controller).
  • Internal peripherals are replicated in all clusters. Each internal peripheral is controled by the local kernel instance, but can be accessed by any thread running in any cluster. There are very few internal peripherals. Examples are the generic DMA device (Direct Memory Access), or the generic MMC device (L2 Cache Configuration and coherence management).

ALMOS-MKH supports multi-channels peripherals, where one single peripheral controller contains N channels that can run in parallel. Each channel has a separated set of addressable registers, and each channel can be used by the OS as an independent device. Examples are the TXT peripheral (one channel per text terminal), or the NIC peripheral (one channel per MAC interface).

The set of available peripherals, and their location in a given many-core architecture is described in the file. For each peripheral, the composite (FUNC,IMPL) type is implemented as a 32 bit integer, where the 16 MSB bits define the functional type, and the 16 LSB bits define the implementation type.

B) Generic Devices APIs

To represent the peripherals, ALMOS-MKH uses device descriptors. For multi-channels peripherals, ALMOS-MKH defines one chdev_t structure per channel. This descriptor contains the functional index, the implementation index, the channel index, and the physical base address of the segment containing the addressable registers for this peripheral channel. It contains also the root of a waiting queue of pending commands registered by the various client threads.

The generic chdev API is defined in the chdev.h and chdev.c files.

For each functional type, a specific API defines the list of available commands, and the specific structure defining the command type and the arguments values. As an IO operation is blocking for the calling thread, a client thread can only post one command at a given time. This command is registered in the client thread descriptor, to be passed to the hardware specific driver.

The set of supported generic devices, and their associated APIs are defined below:

fonctionnel type usage api definition
IOC Block Device controller ioc_api
TXT Text Terminal controller txt_api
NIC Network interface controller nic_api
FBF Frame Buffer controller fbf_api
PIC Programmable Interrupt controller pic_api

WARNING: The PIC device in this list has a special role: it does NOT perform I/O operations, but is used as a configurable interrupt router to dynamically link a peripheral channel interrupt to a given core. Therefore, the functions defined by the PIC API are service functions, called by the other devices functions. These PIC functions don't use the waiting queue implemented in the generic device descriptor, but call directly the PIC driver.

C) Devices Descriptors Placement

Internal peripherals are replicated in all clusters. In each cluster, the device descriptor is stored in the same cluster as the device itself. These device descriptors are shared resources: they are mostly accessed by the local kernel instance, but can also be accessed by threads running in another cluster. This the case for both the ICU and the MMC devices.

External peripherals are shared resources, accessed through the bridge located in the I/O cluster. To minimize contention, the corresponding device descriptors are distributed on all clusters, as uniformly as possible. Therefore, an I/O operation involves generally three clusters:

  • the client cluster contains the client thread.
  • the I/O cluster contains the external peripheral.
  • the server cluster contains the chdev, and the server thread.

The devices_directory_t structure contains extended pointers on all devices descriptors available in a given manycore architecture. This structure is organized as a set of arrays:

  • There is one entry per channel for each external peripheral, and the corresponding array is indexed by the channel index.
  • There is one entry per cluster for each internal peripheral, and the corresponding array is indexed by the cluster index (it is not indexed by the cluster identifier cxy, because cxy is not a continuous index).

This device directory, implemented as a global variable, is replicated in all clusters, and is initialized in the kernel initialization phase.

D) Waiting queue Management

The commands waiting queue is implemented as a distributed XLIST, rooted in the chdev descriptor. To launch an I/O operation, a client thread, running in any cluster, calls a function of the device API. This function registers the command in the thread descriptor, and registers the thread in the waiting queue.

For most I/O operations, ALMOS-MK implements a blocking policy: the thread calling a command function is blocked on the THREAD_BLOCKED_IO condition, and descheduled. It will be re-activated by the driver signaling the completion of the I/O operation.

The waiting queue is handled as a Multi-Writers / Single-Reader FIFO, protected by a remote_lock. The N writers are the clients threads, whose number is not bounded. The single reader is the server thread associated to the device descriptor, and created at kernel initialization. This thread is in charge of consuming the pending commands from the waiting queue. When the queue is empty, the server thread blocks, because a DEV thread cannot be selected by the scheduler when the associated device queue is empty.

Finally, each generic device descriptor contains a link to the specific driver associated to the available hardware implementation. This link is established in the kernel initialization phase.

E) Drivers API

To start an I/O operation, the server thread associated to the device must call the specific driver corresponding to the hardware peripheral available in the architecture.

Any driver must therefore implement the three following functions:


This function initialises both the peripheral hardware registers, and the specific variables or data structures required by a given hardware implementation. It is called in the kernel initialization phase.

driver_cmd( xptr_t thread_xp , device_t * device )

This function is called by the server thread. It accesses to the peripheral hardware registers to start the I/O operation. Depending on the hardware peripheral implementation, it can be blocking or non-blocking for the server thread.

  • It is blocking on the THREAD_BLOCKED_DEV_ISR condition, if the hardware peripheral supports only one simultaneous I/O operation. Examples are a simple disk controller, or a text terminal controller. The blocked server thread must be re-activated by the ISR signaling completion of the current I/O operation.
  • It is non-blocking if the hardware peripheral supports several simultaneous I/O operations. Example is an AHCI compliant disk controller. It blocks only if the number of simultaneous I/O operations becomes larger than the max number of concurrent operations supported by the hardware.

The thread_xp argument is the extended pointer on the client thread, containing the embedded command descriptor. The device argument is the local pointer on the device descriptor.

driver_isr( chdev_t * device )

This function is executed in the server cluster, by the core executing the server thread, because the IRQ signaling the completion of an I/O operation by a given peripheral channel, is statically linked to this core during the kernel initialization. It accesses the peripheral hardware registers to get the I/O operation error status, acknowledge the IRQ, and unblock the server thread. The device argument is the pointer on the device descriptor.

F) General I/O operation scenario

For an external peripheral, the I/O operation mechanism involves generally three clusters : client cluster / server cluster / I/O cluster. It does not use any RPC, but uses only remote accesses to execute the three steps of any I/O operation:

  1. To post a new command in the waiting queue of a given (remote) device descriptor, the client thread uses only few remote accesses to be registered in the distributed XLIST rooted in the server cluster.
  2. To launch the I/O operation on the (remote) peripheral, the server thread uses only remote accesses to the physical registers located in the I/O cluster.
  3. To complete the I/O operation, the ISR running on the server cluster accesses peripheral registers in the I/O cluster, reports the I/O operation status in the command descriptor located in client cluster, and unblocks the server thread. Finally, the server thread uses a remote access to unblock the client thread.

G) Interrupts Routing

The completion of an I/O operation is signaled by the involved peripheral using an interrupt. In ALMOS-MKH, this interrupt is handled by the core running the server thread that launched the I/O operation. Therefore, the interrupt must be routed to the cluster containing the chdev involved in the I/O operation.

ALMOS-MKH makes the assumption that interrupt routing (from peripherals to cores) is done by a dedicated device, called PIC (Programmable Interrupt Controller). This device also helps the kernel interrupt handler, to select the relevant ISR (Interrupt Service Routine) to be executed.

This generic PIC device is generally implemented as a distributed hardware infrastructure containing two types of hardware components:

  • The IOPIC component (one single component in I/O cluster) interfaces the externals IRQs (one IRQ per channel) to the PIC infrastructure.
  • The LAPIC components (one component per cluster) interfaces the PIC infrastructure to the local cores in a given cluster.

The PIC device handles four types of interrupts detailed below :

1) EXT_IRQ (External IRQ)

These interrupts are generated by the external (shared) peripherals.

Each external IRQ is identified by an irq_id index, used as an identifier by the kernel. For a given hardware architecture, this index is defined - for each external device channel - by the arch_info file describing the architecture, and is registered by the kernel in the iopic_input structure, that is a global variable replicated in all clusters, allocated in the kernel_init.c file.

The interrupt routing is statically defined during the PIC device initialization, by the architecture specific PIC driver, using the dev_pic_bind_irq() function. This function statically links the EXT_IRQ identified by its irq_id to the core running the server thread associated to the external device channel that is the source of the IRQ. As the external devices server threads are distributed on all cores in all clusters, the corresponding IRQ is routed to the relevant core.

2) INT_IRQ (Internal IRQ)

These interrupts are generated by the internal (replicated) peripherals.

Each internal IRQ is identified by a cluster index and a local irq_id index, used as an identifier by the kernel. For a given hardware architecture, this index is defined - for each internal device channel - by the arch_info file describing the architecture, and is registered by the kernel in the local lapic_input structure, that is a global variable defined in each cluster.

The interrupt routing is local : For an internal peripheral, the server thread is always placed on a local core. The INT_IRQ, identified by its irq_id, is statically linked to the local core running the server thread by the dev_pic_bind_irq() function.

3) TIM_IRQ (Timer IRQ)

These interrupts are generated, in each cluster, by timers generating the interrupts used for context switch. They are supposed to be implemented in the local LAPIC component. There is one timer, and one timer IRQ per local core, identified by an irq_id index, used as an identifier by the kernel.

The TIM_IRQ identified by its irq_id is statically linked to the local core that has the same local index value.

4) IPI_IRQ (Inter Processor IRQ)

These inter-processor interrupts are used by the kernel to force the scheduling on a given core in given cluster.

To reduce the latency associated to synchronisation mechanisms, ALMOS-MKH uses IPIs (Inter Processor Interrupt) : Any kernel instance, running on any corein any cluster can send an IPI to any other core in the architecture, using a remote write access to the relevant register in the LAPIC component. An IPI simply forces a scheduling on the target core.

H) Text Terminals

The hardware architecture generally provide a variable - but bounded - number of text terminals (called TXT channels in ALMOS-MKH). The actual number of terminals (NB_TXT_CHANNELS) is defined in the arch_info.bin file. We describe here how ALMOS-MKH uses these terminals:

1) The TXT[0] terminal is reserved for the kernel. It is normally used by the kernel to display log and/or debug messages. It can also be used by user processes for debug, through the specific display_xxx() system calls, that should not be used in normal exploitation.

2) The other (NB_TXT_CHANNELS - 1) terminals TXT[i] are used by user processes. The first INIT user process, creates (NB_TXT_CHANNELS -1) KSH user processes (one shell per user text terminal). All processes created by a KSH[i] process share the same TXT[i] terminal as the parent process, and belong to the same group of process.

3) In the present implementation, the INIT process and the the KSH[i] processes are never deleted : they do not call the exit() syscall, and cannot be killed.

4) Regarding the WRITE accesses, all processes attached to the same TXT_TX[i] terminal can atomically display character strings. There is no guaranty on the order, when these strings are issued by different processes, because these strings are simply sequentialized by the kernel thread associated to the shared TXT_TX[i] device.

5) Regarding the READ accesses, only one process in the group of process attached to the TXT[i] terminal (called foreground process) is the owner of the TXT_RX[i] terminal, and can read characters . The other processes (called background processes) should not try to read characters. If a background process P try to read, it receives a SIGSTOP signal, and will keep blocked until the user uses the fg shell command to give P the ownership of the TXT_RX[i] terminal.

6) The control characters (ctrlC , ctrlZ) typed in a TXT_RX[i] terminal are only routed to the foreground process attached to this terminal.

7) When a new process is launched in a KSH (using the load command), it becomes the TXT terminal owner, and run in foreground, unless it is launched with the & attribute on command line. When a child process is directly created by a parent process (using the fork() syscall), it does not get the the TXT terminal ownership, and run in background.

I) Hardware Specific

1. TSAR_MIPS32 architecture

In the TSAR_MIPS32 architecture, the IOPIC is an external hardware controller providing two services:

  1. It translate each external IRQ identified by its irq_id to a write transactions targeting a specific mailbox contained in a local LAPIC controller, for a given core in a given cluster.
  2. It allows the kernel to selectively enable/disable any external IRQ identified by its irq_id index.

In the TSAR_MIPS32 architecture, the LAPIC controller (called XCU) is replicated in all clusters containing at least one core. It handle three types of event:

  1. A HWI (Hardware Interrupt) is generated by local internal peripherals. This type implements directly the internal interrupts.
  2. A PTI (Programmable Timer Interrupt) is generated by a software programmable timer implemented in the XCU controller. This type implements directly the timer interrupts required for context switch.
  3. A WTI (Write Triggered Interrupts) is actually a mailbox implemented in the local XCU. They are used to implement both inter-processor interrupts, or to register the external interrupts generated by the IOPIC controller. The first WTI mailboxes are used for IPI (one IPI per local core). The other WTI mailboxes are used for external interrupts.IRQs.

The actual numbers of events of each type supported by a given XCU component are defined in the XCU_CONFIG register of the XCU component, and cannot be larger than the SOCLIB_MAX_HWI, SOCLIB_MAX_WTI, SOCLIB_MAX_PTI constants defined in the soclib_pic.h file.

2. X86_64 architecture