= ALMOS-MKH Specification = [[PageOutline]] This document describes the general principles of ALMOS-MKH, which is an operating system targeting manycore architectures with CC-NUMA (Coherent Cache, Non Uniform Memory Access) shared address space, such as the TSAR architecture which can support up to 1024 32-bit MIPS cores. ALMOS-MKH also targets INTEL / AMD multi-core architectures using 64-bit I86 cores. Targeted architectures are assumed to be clustered, with one or more core and a physical memory bank per cluster. These architecture must support POSIX standard multi-threaded parallel applications. ALMOS-MKH inherited from the ALMOS system, developed by Ghassan Almaless. The general principles of the ALMOS system are described in his thesis. A first version of ALMOS-MKH, and in particular the distributed file system and the communication mechanism by RPC were developed by Mohamed Karaoui. The general principles of the proposed Multi-Kernel approach are described in his thesis. This system was called ALMOS-MK without H. ALMOS-MKH is based on the "Multi-Kernel" approach to ensure scalability, and support the distribution of system services. In this approach, each cluster of the architecture contains an instance of the kernel. Each instance controls the local resources (memory and computing cores). These multiple instances cooperate with each other to give applications the image of a single system controlling all resources. They communicate with each other using both (i) the client / server model, sending a Remote Procedure Call to a remote cluster for a complex service, (ii) the shared memory paradigm, making direct access to remote memory when required. To reduce energy consumption, ALMOS-MKH supports architectures using 32-bit cores. In this case, each cluster has a 32-bit physical address space, and the local physical addresses (internal to a cluster) have therefore 32-bit. To access the physical addressing space of other clusters, ALMOS-MKH uses 64-bit global physical addresses. For example, the physical space of the TSAR architecture uses 40 bits, and the 8 most significant bits define the target cluster number. ALMOS-MKH thus explicitly distinguishes two types of access:   * Local access (internal to a cluster) uses 32-bit addresses.   * remote accesses (to another cluster) use 64-bit addresses. On a hardware platform containing 32-bit cores, ALMOS-MKH runs entirely in physical addressing: the MMU is only used by the application code. The MMU is deactivated as soon as a core enters the kernel, and it is reactivated when it leaves it. 32-bit physical addresses allow the kernel instance of a K cluster to directly access all local resources (memory or devices). To directly access the address space of another cluster, ALMOS-MKH uses ''remote_read'' and ''remote_write'' primitives using 64-bit extended physical addresses (CXY / PTR). CXY is the 32-bit target cluster identifier, and PTR is the local physical address in the 32-bit target cluster. These primitives are used to implement the RPC mechanism, but are also used to speed up some access to kernel distributed data structures, which are critical in performance. On a hardware platform containing 64-bit cores, it is no longer necessary to run the kernel in physical addressing, since all of the physical space can be mapped into the 64-bit virtual space. However, to enhance access localization while minimizing contention points, ALMOS-MKH continues to distinguish between local and remote accesses, and the communication model between kernel instances is not changed. In both cases, communications between kernel instances are therefore implemented by a mix of RPCs (on the client / server model), and direct access to remote memory (when this is useful for performance). It is this hybrid approach that is the main originality of ALMOS-MKH and that is the reason of the H added after MK. {{{#!comment Ce document décrit les principes généraux de ALMOS-MK, qui est un système d'exploitation visant des architectures manycore à espace d'adressage partagé de type CC-NUMA (Cache Cohérent, Non Uniforme Memory Access), telles que l'architecture TSAR, qui peut supporter jusqu'à 1024 coeurs MIPS 32 bits. ALMOS-MK vise également des architectures multi-coeurs INTEL/AMD utilisant des coeurs I86 64 bits. Les architectures visées sont supposées clusterisées, avec un ou plusieurs coeur et un banc mémoire physique par cluster. On vise tout particulièrement des applications parallèles multi-thread respectant la norme POSIX. Le système ALMOS-MK est l'héritier du système ALMOS, développé par Ghassan Almaless, et les principes généraux du système ALMOS sont décrits dans sa thèse. La première version de ALMOS-MK, et en particulier le système de fichiers distribué et le mécanisme de communication par RPC ont été développés par Mohamed Karaoui, et les principes généraux de l'approche "Multi-Kernel proposée sont décrits dans sa thèse. Pour garantir le passage à l'échelle, et favoriser la distribution des services système, ALMOS-MK repose sur l'approche ''Multi-Kernel'', dans laquelle il existe une instance du noyau dans chaque cluster de l'architecture. Celle-ci contrôle les ressources locales (mémoire et coeurs de calcul). Ces multiples instances coopèrent entre elles pour donner aux applications l'image d'un unique système contrôlant l'ensemble des ressources. Elles communiquent entre elles sur le modèle client /serveur en utilisant des RPCs (Remote Procédure Call). Pour réduire la consommation énergétique, ALMOS-MK supporte des architectures utilisant des processeurs 32 bits. Dans ce cas, chaque cluster possède un espace d'adressage physique 32 bits, et les adresses physiques locales (internes à un cluster) sont donc codées sur 32 bits. Pour accéder à l'espace adressage physique des autres clusters, ALMOS-MK utilise des adresses physiques globales codées sur 64 bits. A titre d'exemple l'espace physique de l'architecture TSAR utilise 40 bits, et les 8 bits de poids fort définissent donc le numéro du cluster cible. ALMOS-MK distingue donc explicitement deux types d'accès: * les accès locaux (internes à un cluster) utilisent des adresses 32 bits. * les accès distants (vers un autre cluster) utilisent des adresses 64 bits. Sur une plate-forme matérielle contenant des processeurs 32 bits, ALMOS-MK s'exécute entièrement en adressage physique : la MMU paginée des coeurs n'est utilisée que par le code des applications. Elle est désactivée dès qu'on entre dans le noyau, et elle est réactivée quand on en sort. Les addresses physique 32 bits permettent à l'instance du noyau d'un cluster K d'accéder directement à toutes les ressource (mémoire ou périphériques) locales. Pour accéder directement à l'espace adressage d'un autre cluster, ALMOS-MK utilise des primitives ''remote_read'' et ''remote_write'' utilisant des adresses physiques étendues (CXY / PTR) sur 64 bits. CXY est l'identifiant du cluster cible, sur 32 bits, et PTR est l'adresse physique locale dans le cluster cible sur 32 bits. Ces primitives sont utilisées pour implémenter le mécanisme RPC, mais sont aussi utilisées pour accélérer certains accès aux structures de données distribuées du noyau, qui sont critiques en performance. Sur une plate-forme matérielle contenant des processeurs 64 bits, il n'est plus nécessaire d'exécuter le noyau en adressage physique, puisque l'ensemble de l'espace physique peut être mappé dans l'espace virtuel 64 bits. Néanmoins pour renforcer la localité des accès tout en minimisant les points de contention, ALMOS-MK continue à distinguer entre accès locaux et accès distants, et le modèle de communication entre instances du noyau n'est pas modifié. Dans les deux cas, les communications entre instances du noyau sont donc implémentées par un mélange de RPCs (sur le modèle client/serveur), et d'accès directs en mémoire distante (quand cela est utile pour les performances). C'est cette approche hybride qui constitue la principale originalité de ALMOS-MK. }}} == A) [wiki:arch_info Hardware Platform Definition] == This section describes the general assumptions made by ALMOS-MKH regarding the hardware architecture, and the mechanism to configure ALMOS-MKH for a given target architecture. == B) [wiki:processus_thread Process & threads creation/destruction] == ALMOS-MKH supports the POSIX threads API. In order to avoid contention in massively multi-threaded applications, ALMOS-MKH replicates the user process descriptors in all clusters containing threads of this process. This section describes the mechanisms for process and thread creation / destruction. == C) [wiki:replication_distribution Data replication & distribution policy] == This section describes the general policy for replication/distribution of the information on the various physical memory banks. We have two main goals: enforce memory access locality, and avoid contention when several threads access simultaneously the same information. To control the placement and the replication of the physical memory banks, the kernel uses the paged virtual memory. == D) [wiki:page_tables GPT & VSL implementation] == To avoid contention when several threads access the same page table to handle TLB miss, ALMOS-MKH replicates the page tables. For each multi-threaded user application P, the Generic Page Table (GPT), and the Virtual Segments List (VSL) are replicated in each cluster K containing at least one thread of the application. According to the "on-demand paging" principle, these replicated structures GPT(K,P) and VSL(K,P) are dynamically updated when page faults are detected. This section describes this building mechanism and the coherence protocol required by these multiple copies. == E) [wiki:thead_scheduling Trans-cluster lists of threads] == ALMOS-MKH must handle dynamic sets of threads, such as the set of all threads waiting to access a given peripheral device. These sets of threads are implemented as circular double linked lists. As these threads can be running on any cluster, these linked lists are ''trans-cluster'', and require specific technics in a multi kernel OS, where each kernel instance is handling only resources localized in a single cluster. == F) [wiki:rpc_implementation Remote Procedure Calls] == To enforce locality for complex operations requiring a large number of remote memory accesses, the various kernel instances can communicate using RPCs (Remote Procedure Call), following the client/server model. This section describe the RPC mechanism implemented by ALMOS-MKH. == G) [wiki:io_operations Input/Output Operations] == == H) [wiki:boot_procedure Boot procedure] == This section describes the ALMOS-MKH boot procedure. == I) [wiki:scheduler Threads Scheduling] == This section describes the ALMOS-MKH policy for threads scheduling. == J) [wiki:kernel_synchro Kernel level synchronisations] == This section describes the synchronisation primitives used by ALMO-MKH, namely the barriers used during the parallel kernel initialization, and the locks used to protect concurrent access to the shared kernel data structures. == K) [wiki:kernel_synchro User level synchronisations] == This section describes the ALMOS-MKH implementation of the POSIX compliant, user-level synchronisation services: mutex, condor, barrier and semaphore.