wiki:replication_distribution

Version 11 (modified by alain, 6 years ago) (diff)

--

Data replication & distribution policy

The replication / distribution policy has two goals: enforce locality (as much as possible), and avoid contention (as the main goal).

  • The read-only segments (type CODE) are replicated in all clusters whee they are used.
  • The private segments (type STACK) are placed in the same cluster as the thread using it.
  • The shared segments (types DATA, HEAP, etc ) are distributed on all clusters as regularly as possible to avoid contention.

To actually control data placement on the physical memory banks, the kernel uses the paged virtual memory MMU.

This policy is implemented by the Virtual Memory Manager (vmm.h / vmm.c files), that is a service replicated in all clusters for all processes. The VMM(P,K) is the Virtual memory manager of process P in cluster K.

A vseg is a contiguous memory zone in the process virtual space. It is always an integer number of pages. Depending on its type, a vseg has some specific attributes regarding access rights, replication policy, and distribution policy. The vseg descriptor is defined by the structure vseg_t (in vseg.h file).

The virtual memory manager VMM(P,K) contains a list of vsegs that can be accessed by the threads of P running in cluster K.

1) User segments types

  • A vseg is public when it can be accessed by any thread of the process, whatever the cluster where the thread is running. It is private when it can only be accessed by the threads running in the cluster containing the physical memory bank where this vseg is mapped. A private vseg is entirely mapped in one single cluster K. It is registered in the VMM segment list but of cluster K, but not in the other clusters.
  • A vseg can be localised (all vseg pages are mapped in the same cluster), or distributed (different pages are mapped on different clusters, using the virtual page number (VPN) LSB bits as distribution key). A private vseg is always localised.

For each process P, the process descriptor is replicated in all clusters containing at least one thread of P (called active clusters). The virtual memory manager VMM[P,K] is stored in the process descriptor, and contains two main structures: VSL(P,K) is the list of all vsegs registerer for process P in cluster K. GPT(P,K) is the generic page table, defining the actual physical mapping of those vsegs. The replication of the VSL and GPT structures creates a coherence problem for non private vsegs.

  • A VSL(P,K) contains all private vsegs in cluster K, but contains only the public vsegs that have been actually accessed by a thread of P running in cluster K. Only the reference process descriptor stored in the reference cluster Z contains the complete list VSL(P,Z) of all public vsegs for the P process.
  • A GPT(P,K) contains all contains all entries corresponding to private vsegs. For public vsegs, it contains only the entries corresponding to pages that have been accessed by a thread running in cluster K. Only the reference cluster Z contains the complete GPT(P,Z) page table of all mapped pages in all clusters for process P.

Therefore, the process descriptors - other than the reference one - are used as read-only caches.

There exist six vseg types:

type comment
CODE private localised one per active cluster / same virtual addresses / same content
DATA public distributed one per process
STACK private localised one per thread / in same cluster as the thread
HEAP public distributed one per mmap(anon) / also used by the malloc() library
REMOTE public localised one per remote_malloc()
FILE public localised one per mmap(file) / in the same cluster as the file cache itsel

Pour un process P, les vsegs de type CODE et DATA sont enregistrés dans la VSL d'un cluster K au moment de la création du premier thread de P (main thread) dans le cluster K. Les vsegs de type STACK sont enregistrés dans dans la VSL d'un cluster K au moment de la création du thread dans le cluster K. Les vsegs de type HEAP, REMOTE, ou FILE sont enregistrés dans la VSL du cluster de référence Z lors des appels systèmes mmap() malloc(), car seul le cluster de référence peut allouer dynamiquement de la place dans l'espace virtuel du processus. Ils ne sont enregistrés dans la VSL des autres clusters que lors des défauts de page détectés par ceux-ci (on demand registration).

Les tables de page PT sont mises à jour progressivement en réponse aux défauts de page (on demand paging).

2) User process virtual space organisation

The virtual space of an user process P in a given cluster K is split in four zones called vzone. Each vzone contains one or several vsegs.

  1. The utils vzone has a fixed size, and is located in the lower part of the virtual space. It contains the three vsegs kentry, args, envs, whose sizes are defined by configuration parameters. These vsegs are set by the kernel each time a new process is created. The kentry vseg has CODE type and contains the code that must be executed to enter the kernel from user space. The args vseg has DATA type, and contains the process main() thread arguments. The envs vseg has DATA type and contains the process environment variables.
  1. The elf vzone has a variable size, and contains the text and and data vsegs containing the process binary code and global data. The size is defined in the .elf file and reported in the boot_info structure by the boot loader. It is located on top of the utils vzone
  1. The stack vzone has a fixed size, and is located in the upper part of the virtual space. It contains as many vsegs of type STACK as the max number of threads for a process in a single cluster. The total size is defined as CONFIG_VSPACE_STACK_SIZE * CONFIG_PTHREAD_MAX_NR.
  1. The heap vzone has a variable size, and occupies all space between the top of the elf vzone and the base of the stack zone. It contains all vsegs of type HEAP, REMOTE or FILE that are dynamically allocated by the reference VMM manager.

3) segments utilisés par le noyau

Les différentes instances du noyau ne travaillant qu’en adressage physique, les segments kernel sont définis dans l’espace d’adressage physique. Puisqu'il existe une instance du noyau par coeur, les différents segments du noyau - y compris les données globales - sont répliqués dans chaque cluster.

Un segment kernel est private quand il ne peut être accédé que par l’instance locale du noyau. Il est public quand il peut être accédé par n’importe quel instance du noyau.

On identifie (pour l’instant) les segments suivants

  • KDATA : private
  • KCODE : private
  • KSTACK : private
  • KHEAP : private
  • SHARED : public