Changes between Version 51 and Version 52 of replication_distribution


Ignore:
Timestamp:
Dec 9, 2019, 3:01:22 PM (3 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • replication_distribution

    v51 v52  
    1 = Virtual segments replication & distribution policy =
     1= Data replication & distribution policy =
    22
    33[[PageOutline]]
    44
    5 The replication / distribution policy of segments has two goals: enforce locality (as much as possible), and avoid contention (it is the main goal).
    6  
    7 To actually control data placement on the physical memory banks, the kernel uses the paged virtual memory MMU to map a virtual segment to a given physical memory bank in a given cluster.
     5The replication / distribution policy of data on the physical memory banks has two goals: enforce locality (as much as possible), and avoid contention (it is the main goal).
     6
     7The data to be placed are the virtual segments defined - at compilation time - in the virtual space of the various user processes currently running, or in the virtual space of the operating system itself. 
     8
     9To actually control the placement of all these virtual segments on the physical memory banks, the kernel uses the paged virtual memory MMU to map a virtual segment to a given physical memory bank in a given cluster.
    810
    911A '''vseg''' is a contiguous memory zone in the process virtual space, defined by the two (base, size) values. All adresses in this interval can be accessed by in this process without segmentation violation: if the corresponding is not mapped, the page fault will be handled by the kernel, and a physical page will be dynamically allocated (and initialized if required). A '''vseg''' always occupies always an integer number of pages, as a given page cannot be shared by two different vsegs.
     
    1416 * A '''public''' vseg can be '''localized''' (all vseg pages are mapped in the same cluster), or '''distributed''' (different pages are mapped on different clusters). A '''private''' vseg is always '''localized'''.
    1517
    16 To avoid contention, in case of parallel applications defining a large number of threads in one single process P, almos-mkh replicates, the process descriptor in all clusters containing at least one thread of P, and these clusters are called active clusters.
    17 The virtual memory manager VMM(P,K) of process P in cluster K, contains two main structures:
     18To avoid contention, in case of parallel applications defining a large number of threads in one single process P, almos-mkh replicates, the process descriptor in all clusters containing at least one thread of P, and these clusters are called active clusters. Each process descriptor contains a VMM structure (Virtual Memory Manager)
     19that register all informations required by the MMU to make the address translation for the process. For a process P in cluster K, the VMM(P,K) structure, contains two main sub-structures:
    1820 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.h VSL(P,K)] is the list of all vsegs registered for process P in cluster K,
    1921 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.h GPT(P,K)] is the generic page table, defining the actual physical mapping of these vsegs.
    2022
    21 For a given process P, all VMM(P,K) descriptors in different clusters can have different contents for several reasons :
     23For a given process P, the different VMM(P,K) in different clusters can have different contents for several reasons :
    2224 1. A '''private''' vseg can be registered in only one VSL(P,K) in cluster K, and be totally undefined in the others VSL(P,K').
    2325 1. A '''public''' vseg can be replicated in deveral VSL(P,K), but the registration of a vseg in a given VSL(P,K) is ''on demand'': the vseg is only registered in VSL(P,K) when a thread of process P running in cluster K try to access this vseg.
    2426 1. Similarly, the mapping of a given virtual page VPN of a given vseg (i.e. the allocation of a physical page PPN to a virtual page VPN, and the registration of this PPN in the GPT(P,K) is ''on demand'': the page table entry will be updated in the GPT(P,K) only when a thread of process P in cluster K try to access this VPN. 
    2527 
    26 The replication of the VSL(P,K) and GPT(P,K) kernel structures creates a coherence problem for the public vsegs:
     28The replication of the VSL(P,K) and GPT(P,K) kernel structures in several clusters creates a coherence problem for the ''public'' vsegs:
    2729 * A VSL(P,K) contains all private vsegs in cluster K, but contains only the public vsegs that have been actually accessed by a thread of P running in cluster K. Only the '''reference''' process descriptor stored in the reference cluster KREF contains the complete list VSL(P,KREF) of all public vsegs for the P process.
    2830 * A GPT(P,K) contains all mapped entries corresponding to private vsegs but  for public vsegs, it contains only the entries corresponding to pages that have been accessed by a thread running in cluster K. Only the reference cluster KREF contains the complete GPT(P,KREF) of all mapped entries of public vsegs for process P.
     
    4244The Virtual Memory Manager API is defined in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.h almos_mkh/kernel/mm/vmm.h] and [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/mm/vmm.c almos-mkh/kernel/mm/vmm.c] files.
    4345
    44 == __1. User segments types__  ==
     46== __1. User segments__  ==
    4547
    4648This section describes the six types of user virtual segments defined by almost-mkh:
    4749
    48 || Type     ||           ||                  ||    Access     ||      Replication                                     ||    Placement                              ||  Allocation policy in virtual  space           ||
    49 || STACK    ||  private  || localized   || Read Write || one physical mapping per thread    || same cluster as thread using it  || dynamic (one stack allocator per cluster)   ||
    50 || CODE     ||  private  || localized   || Read Only   || one physical mapping per cluster   || same cluster as thread using it  || static  (defined in .elf file)                           ||
    51 || DATA     ||  public   || distributed || Read Write || same mapping for all threads              || distributed on all clusters          || static  (defined in .elf file)                            ||
    52 || ANON     ||  public   || localized   || Read Write || same mapping for all threads              || same cluster as calling thread   || dynamic (one heap allocator per process   ||
    53 || FILE     ||  public   || localized   || Read Write || same mapping for all threads              || same cluster as the file cache   || dynamic (one heap allocator per process)  ||
    54 || REMOTE ||  public   || localized   || Read Write || same mapping for all threads              || cluster defined by user             || dynamic (one heap allocator per process)  ||
     50=== 1.1 CODE vsegs ===
    5551
    56  1. '''CODE''' : This '''private''' vseg contains the application code. It is replicated in all clusters. ALMOS-MK creates one CODE vseg per active cluster. For a process P, the CODE vseg is registered in the VSL(P,Z) when the process is created in reference cluster KREF. In the other clusters K, the CODE vseg is registered in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. In each active cluster K, the CODE vseg is localized, and physically mapped in cluster K.
    57  1. '''DATA''' : This '''public''' vseg contains the user application global data. ALMOS-MK creates one DATA vseg, that is registered in the reference VSL(P,KREF) when the process P is created in reference cluster KREF.  In the other clusters K, the DATA vseg is registered  in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. To avoid contention, this vseg is physically distributed on all clusters, with a page granularity. For each page, the physical mapping is defined by the LSB bits of the page VPN.
    58  1. '''STACK''' : This '''private''' vseg contains the execution stack of a thread. Almos-mkh creates one STACK vseg for each thread of P running in cluster K. This vseg is registered in the VSL(P,K) when the thread descriptor is created in cluster K. To enforce locality, this vseg is of course mapped in cluster K.
    59  1. '''ANON''' : This '''public''' vseg is dynamically created by ALMOS-MK to serve an anonymous [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call executed by a client thread running in a cluster K. The first vseg registration and the physical mapping are done in the reference cluster KREF, but the vseg is mapped in the client cluster K.
    60  1. '''FILE''' : This '''public''' vseg is dynamically created by ALMOS-MK to serve a file based [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call executed by a client thread running in a cluster K. The first vseg registration and the physical mapping are done in the reference cluster KREF, but the vseg is mapped in cluster Y containing the file cache.
    61  1. '''REMOTE''' : This '''public''' vseg is dynamically created by ALMOS-MK to serve a remote [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call, where a client thread running in a cluster X requests to create a new vseg mapped in another cluster Y. The first vseg registration and the physical mapping are done by the reference cluster K, but the vseg is mapped in cluster Y specified by the user.
     52This '''private''' vseg contains the application code. It is replicated in all clusters. ALMOS-MK creates one CODE vseg per active cluster. For a process P, the CODE vseg is registered in the VSL(P,KREF) when the process is created in reference cluster KREF. In the other clusters K, the CODE vseg is registered in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. In each active cluster K, the CODE vseg is mapped in cluster K.
     53
     54=== 1.2 DATA vseg ===
     55 
     56This '''public''' vseg contains the user application global data. ALMOS-MK creates one single DATA vseg, that is registered in the reference VSL(P,KREF) when the process P is created in reference cluster KREF.  In the other clusters K, the DATA vseg is registered  in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. To avoid contention, this vseg is physically '''distributed''' on all clusters, with a page granularity. For each page, the physical mapping is defined by the LSB bits of the VPN.
     57
     58=== 1.3 STACK vseg ===
     59
     60This '''private''' vseg contains the execution stack of a thread. Almos-mkh creates one STACK vseg for each thread of P running in cluster K. This vseg is registered in the VSL(P,K) when the thread descriptor is created in cluster K. To enforce locality, this vseg is of course mapped in cluster K.
     61
     62=== 1.4 ANON vseg ===
     63
     64This '''public''' vseg is dynamically created by ALMOS-MK to serve an anonymous [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call executed by a client thread running in a cluster K. The vseg is registered in VSL(P,KREF), but the vseg is mapped in the client cluster K.
     65
     66=== 1.5 FILE vseg ===
     67
     68This '''public''' vseg is dynamically created by ALMOS-MK to serve a file based [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call executed by a client thread running in a cluster K. The vseg is registered in VSL(P,KREF), but the vseg is mapped in cluster Y containing the file cache.
     69
     70=== 1.6 REMOTE vseg ===
     71
     72This '''public''' vseg is dynamically created by ALMOS-MK to serve a remote [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] system call, where a client thread running in a cluster X requests to create a new vseg mapped in another cluster Y. The vseg is registered in VSL(P,KREF), but the vseg is mapped in cluster Y specified by the user.
     73
     74This table summarize the user vsegs features:
     75
     76|| Type        ||              ||                  ||    Access     ||    Replication     ||  Mapping in physical space       ||  Allocation policy in virtual  space              ||
     77|| STACK    ||  private  || localized   || Read Write  ||  one  per thread  || same cluster as thread using it  || dynamic (one stack allocator per cluster)   ||
     78|| CODE     ||  private  || localized   || Read Only   ||  one  per cluste  || same cluster as thread using it  || static  (defined in .elf file)                           ||
     79|| DATA      ||  public   || distributed || Read Write ||  non replicated    || distributed on all clusters          || static  (defined in .elf file)                           ||
     80|| ANON     ||  public   || localized   || Read Write ||  non replicated    || same cluster as calling thread   || dynamic (one heap allocator per process   ||
     81|| FILE        ||  public   || localized   || Read Write ||  non replicated    || same cluster as the file cache   || dynamic (one heap allocator per process)  ||
     82|| REMOTE ||  public   || localized   || Read Write ||  non replicated    || cluster defined by user              || dynamic (one heap allocator per process)  ||
    6283
    6384
     85== __ 2. kernel segments__==
    6486
    65 == __ 2. kernel segments types__==
     87For any process descriptor P in a cluster K, the VMM(P,K) contains not only the user vsegs defined above, but also the kernel vsegs, because all user theads can make system calls, that must access both the kernel instructions and the kernel data structures, and this requires address translation. This section describes the four types of kernel virtual segments defined by almost-mkh.
    6688
    67 For any process descriptor P in a cluster K, the VSL(P,K) and the GPT(P,K) contains not only the user vsegs, but also the kernel vsegs, because all user theads can make system calls, that must access to these kernel vsegs, and this requires address translation. This section describes the three types of kernel virtual segments defined by almost-mkh
     89=== 2.1. KCODE vsegs ===
    6890
    69 || Type        ||               ||                  ||    Access     ||      Replication                                     ||    Placement                              ||  Allocation policy in virtual space           ||
    70 || KCODE    ||  private  || localized   || Read Only   || one physical mapping per cluster       || same cluster as thread using it  || static  (defined in .elf file)                           ||
    71 || KDATA     ||  public   || localized   || Read Write || same mapping for all threads              || distributed on all cl         || static  (defined in .elf file)                            ||
    72 || KHEAP    || public
    73 || KDEV      ||  public   || localized   || Read Write || one physical mapping per thread        || same cluster as thread using it  || dynamic (one stack allocator per cluster)   ||
     91The KCODE vseg contains the kernel code defined in the ''kernel.elf'' file.  Almos-mkh creates one KCODE vseg in each cluster K, to avoid contention. It is a ''private'' vseg, that is accessed only by the threads running in cluster K. It can be an user thread executing a syscall, or it can be a specialized kernel thread (such as an IDLE thread, a DEV thread, or a RPC thread). In each cluster K, the KCODE vseg is registered in the VMM(0,K) associated to the kernel ''process_zero'', that contains all kernel threads, and in the VMM(P,K) of each user process P that has at least one thread running in cluster K. This vseg uses only big pages, that are mapped by the kernel_init function (no on demand paging for this vseg).
    7492
    75  1. '''KCODE''' : This '''private''' vseg contains the kernel code. Almost-mkh creates one KCODE vseg per cluster. For a process P, the CODE vseg is registered in the VSL(P,Z) when the process is created in reference cluster KREF. In the other clusters K, the CODE vseg is registered in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. In each active cluster K, the CODE vseg is localized, and physically mapped in cluster K.
    76  1. '''KDATA''' : This '''public''' vseg contains the global data,  statically allocated at compilation time. The initial values are identical in all clusters, but th global data. ALMOS-MK creates one DATA vseg in each vseg, that is registered in the reference VSL(P,KREF) when the process P is created in reference cluster KREF.  In the other clusters K, the DATA vseg is registered  in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. To avoid contention, this vseg is physically distributed on all clusters, with a page granularity. For each page, the physical mapping is defined by the LSB bits of the page VPN.* The read-only segment containing the user code is replicated in all clusters where there is at least one thread using it.
     93=== 2.2. KDATA vsegs ===
    7794
    78 To enforce locality, there is one KDATA segment per cluster, containing a copy of all global variables statically allocated at compilation time. But these vsegs are not
    79 read-only, and can evolve differently in different clusters. On the other hand, all structures dynamically allocated by the kernel (to create a new process descriptor, a new thread descriptor, a new file descriptor, etc.) are allocated in the KHEAP segment of the target cluster, and will be mainly handled by a kernel instance running in this same kernel. Therefore, most kernel memory accesses expected to be local.
     95This '''public''' vseg contains the global data,  statically allocated at compilation time. This vseg is also replicated in all clusters. The values initially contained in these KDATA vsegs are identical, as they are defined in the ''kernel.elf'' file. But they are not read-only,  and can evolve differently in different clusters.  As the KDATA vsegs are replicated in all clusters, most accesses to a KDATA segment are expected to be done by local threads. These local accesses can use the normal pointers in virtual kernel space.
     96
     97But there is only one vseg defined in the ''kernel.elf'' file, and there is as many KDATA segments as the number of clusters. Even if most accesses are local, a thread running in cluster K must be able to access a global variable stored in another cluster X, or to send a request to another kernel instance in cluster X, or to scan a globally distributed structure, such as the DQDT or the VFS. To support this cooperation between kernel instances, almos-mkh defines the ''remote_load( cxy , ptr )'' and ''remote_store( cxy , ptr ) functions, where ''ptr'' is a normal pointer in kernel virtual space on a variable stored in the KDATA vseg, and ''cxy'' is the remote cluster identifier. Notice that a given global variable is now identified by and extended pointer ''XPTR( cry , ptr )''. With these remote access primitives, any kernel instance in cluster K can access any global variable in any cluster.
     98
     99=== 2.3. KHEAP vsegs ===
     100
     101Beside the statically allocated global variables, a large number of kernel structures, such as the user ''process'' descriptors, the ''thread'' descriptors, the ''vseg'' descriptors, the ''file'' descriptors, etc. are dynamically descriptors
     102
     103=== 2.4. KDEV vsegs ===
     104
     105
     106==__3. Physical mapping of kernel vsegs__ ==
     107
     108The implementation of these remote access functions depends on the target architecture.
     109
     110
     111 rewant to communicate withA the writable and readable bi any functionin this vseg are identical in all clusters, and defined bbut this vseg being read/write,
     112the contentglobal data. ALMOS-MK creates one DATA vseg in each vseg, that is registered in the reference VSL(P,KREF) when the process P is created in reference cluster KREF.  In the other clusters K, the DATA vseg is registered  in VSL(P,K) when a page fault is signaled by a thread of P running in cluster K. To avoid contention, this vseg is physically distributed on all clusters, with a page granularity. For each page, the physical mapping is defined by the LSB bits of the page VPN.* The read-only segment containing the user code is replicated in all clusters where there is at least one thread using it.
     113
     114The remote_access primitives are defined in the  [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/hal/generic/hal_remote.h  almos_mkh/hal/generic/hal_remote.h] file.
     115
     116The TSAR implementation is available in the  [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/hal/tsar_mips32/core/hal_remote.c  almos_mkh/hal/tsar_mips32/core.hal_remote.c] file.
     117
     118
     119=== 2.1 TSAR-MIPS32 ===
     120
     121As the TSAR architecture uses 32 bits cores, to reduce the power consumption, the virtual space is much  smaller (4 Gbytes) than the physical space.
     122Even if
     123
     124To enforce locality, there is one KDATA segment per cluster, containing a copy of all global variables statically allocated at compilation time. On the other hand, all structures dynamically allocated by the kernel (to create a new process descriptor, a new thread descriptor, a new file descriptor, etc.) are allocated in the KHEAP segment of the target cluster, and will be mainly handled by a kernel instance running in this same kernel. Therefore, most kernel memory accesses expected to be local.
    80125
    81126In the - rare - situations where the kernel running in cluster K must access data in a remote cluster K' (to access a globally distributed structure such as the DQDT, or for inter-cluster client/server communication) almos-mkh uses specific remote access primitives defined in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/hal/generic/hal_remote.h  hal_remote.h] file.
    82127
    83 === 2.1 TSAR-MIPS32 ===
     128
    84129
    85130In the TSAR architecture, and for any process P in any cluster K, almost-mkh registers only one extra KCODE vseg in the VMM[P,K), because almos-mkh does not use the DATA-MMU during kernel execution : Each time a core enters the kernel, to handle a sys call, an interrupt, or an exception, the DATA-MMU is deactivated, and It is reactivated when the core returns to user code.
     
    90135
    91136
    92 === 2.2 Intel 64 bits ===
     137=== 3.2 Intel 64 bits ===
    93138
    94139TODO
    95140
    96 == __3. virtual space organisation__ ==
     141== __4. virtual space organisation__ ==
    97142
    98 === 3.1 TSAR-MIP32 ===
     143=== 4.1 TSAR-MIP32 ===
    99144
    100145The virtual address space of an user process P is split in 5 fixed size zones, defined by configuration parameters in [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h].  Each zone contains one or several vsegs, as described below.
    101146
    102 '''3.1.1 The ''kernel'' zone'''
     147'''4.1.1 The ''kernel'' zone'''
    103148
    104149It contains the ''kcode'' vseg (type KCODE), that must be mapped in all user processes.
    105150It is located in the lower part of the virtual space, and starts a address 0. Its size cannot be less than a big page size (2 Mbytes for the TSAR architecture), because it will be mapped as one (or several big) pages.
    106151 
    107 '''3.1.2 The ''utils'' zone'''
     152'''4.1.2 The ''utils'' zone'''
    108153
    109154It contains the two ''args'' and ''envs'' vsegs, whose sizes are defined by [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config. specific configuration parameters].  The ''args'' vseg (DATA type) contains the process main() arguments.
     
    111156It is located on top of the '''kernel''' zone, and starts at address defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h  CONFIG_VMM_ELF_BASE] parameter.
    112157
    113 '''3.1.3 The ''elf'' zone'''
     158'''4.1.3 The ''elf'' zone'''
    114159
    115160It contains the ''text'' (CODE type) and ''data'' (DATA type) vsegs, defining the process binary code and global data. The actual vsegs base addresses and sizes are defined in the .elf file and reported in the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/tools/arch_info/boot_info.h boot_info_t] structure by the boot loader.
    116161
    117 '''3.1.4 The ''heap'' zone'''
     162'''4.1.4 The ''heap'' zone'''
    118163
    119164It contains all vsegs dynamically allocated / released  by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_mmap.c mmap] / [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_munmap.c munmap] system calls (i.e. FILE / ANON / REMOTE types).
    120165It is located on top of the '''elf''' zone, and starts at the address defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h CONFIG_VMM_HEAP_BASE]  parameter. The VMM defines a specific MMAP allocator for this zone, implementing the ''buddy'' algorithm. The mmap( FILE ) syscall maps directly a file in user space. The user level ''malloc'' library uses the mmap( ANON ) syscall to allocate virtual memory from the heap and map it in the same cluster as the calling thread. Besides the standard malloc() function, this library implements a non-standard [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/libs/libalmosmkh/almosmkh.c remote_malloc()] function, that uses the mmap( REMOTE ) syscall to dynamically allocate virtual memory from the heap, and map it to a remote physical cluster. 
    121166
    122 '''3.1.5 The ''stack'' zone'''
     167'''4.1.5 The ''stack'' zone'''
    123168
    124169It is located on top of the '''mmap''' zone and starts at the address defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h CONFIG_VMM_STACK_BASE] parameter. It contains an array of fixed size slots, and each slot contains one ''stack'' vseg. The size of a slot is defined by the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kernel_config.h CONFIG_VMM_STACK_SIZE]. In each slot, the first page is not mapped, in order to detect stack overflows. As threads are dynamically created and destroyed, the VMM implements a specific STACK allocator for this zone, using a bitmap vector. As the ''stack'' vsegs are private (the same virtual address can have different mappings, depending on the cluster) the number of slots in the '''stack''' zone actually defines the max number of threads for given process in a given cluster.
    125170
    126 === 3.2 Intel 64 bits ===
     171=== 4.2 Intel 64 bits ===
    127172
    128173TODO