wiki:processus_thread

Version 18 (modified by alain, 8 years ago) (diff)

--

Dynamic process and thread creation

1) Process

The PID (Process Identifier) is coded on 32 bits. It is unique in the system, and has a fixed format: The 16 MSB (CXY) contain the owner cluster identifier. The 16 LSB bits (LPID) contain the local process index in owner cluster. The owner cluster is therefore defined by PID MSB bits.

To avoid contention, the process descriptor of a P process, and the associated structures, such as the Page Table, or the Open File Descriptors Table are (partially) replicated in all clusters containing at least one thread of P.

As it exists several copies of the process descriptors, ALMOS-MK defines a reference process descriptor, located in the reference cluster, and the other copies are used as local caches.

Warning : to support process migration, the reference cluster can be different from the owner process.

In each cluster K, the local cluster manager (cluster_t type in ALMOS-MK) contains a process manager (pmgr_t type in ALMOS-MK) that maintains three structures for all process owned by K :

  • The PREF_TBL[lpid]] is an array indexed by the local process index. Each entry contains an extended pointer on the reference process descriptor.
  • The COPIES_ROOT[lpid] array is also indexed by the local process index. Each entry contains the root of the global list of copies for each process owned by cluster K.
  • The LOCAL_ROOT is the root of the local list of all process descriptors in cluster K. A process descriptor copy of P is present in K, as soon as P has a thread in cluster K.

The process descriptor (process_t in ALMOS-MK) contains the following informations:

  • PID : proces identifier.
  • PPID : parent process identifier,
  • XMIN, XMAX, YMIN, YMAX : rectangle covering all active clusters.
  • PREF : extended pointer on the reference process descriptor.
  • VMM : virtual memory manager containing the PG_TBL and the VSEG_LIST.
  • FD_TBL : open file descriptors table.
  • TH_TBL : local table of threads owned by this process in this cluster.
  • LOCAL_LIST : member of local list of all process descriptors in same cluster.
  • COPIES_LIST : member of global list (globale) of all copies of same process.
  • CHILDREN_LIST : member of global list of all children of same parent process.
  • CHILDREN_ROOT : root of global list of children process.

Elements of a local list are in the same clusters, and ALMOS-MK uses local pointers. Elements of a global list can be distributed on all clusters, and ALMOS-MK uses extended pointers.

2) Thread

ALMOS-MK defines four types of threads :

  • one USER thread is created by a pthread_create() system call.
  • one KERNEL thread is dynamically created by the kernel to execute a kernel service in a cluster.
  • one RPC thread is activated by the kernel to execute one or several pending RPC requests.
  • the IDLE thread is executed when there is no other thread to execute on a core.

From the point of view of scheduling, a thread can be in six states, as described in section D.

This implementation of ALMOS-MK does not support thread migration: a thread is pinned on a given core in a given cluster. The only exception is the main thread of a process, that is automatically created by the kernel when a new process is created: This main thread follow its owner process in case of process migration.

An user thread is identified by a fixed format TRDID identifier, coded on 32 bits : The 16 MSB bits (CXY) define the cluster where the thread has been pinned. The 16 LSB bits (LTID) define the thread local index in the local TH_TBL[K,P] of a process descriptor P in a cluster K. This LTID index is allocated by the local process descriptor when the thread is created.

Therefore, the TH_TBL(K,P) thread table for a given process in a given clusters contains only the threads of P placed in cluster K. The set of all threads of a given process is defined by the union of all TH_TBL(K,P) for all active clusters K. To scan the set off all threads of a process P, ALMOS-MK traverse the COPIES_LIST of all process_descriptors associated to P process.

The thread descriptor (thread_t in ALMOS-MK) contains the following informations:

  • TRDID : thread identifier
  • PTRDID : parent thread identifier
  • TYPE : KERNEL / USER / IDLE / RPC
  • FLAGS : attributs du thread
  • PROCESS : pointer on the local process descriptor
  • STATE : CREATE / READY / USER / KERNEL / WAIT / DEAD
  • LOCKS_COUNT : current number of locks taken by this thread
  • PWS : zone de sauvegarde des registres du coeur.
  • XLIST : member of the global list of threads waiting on the same resource.
  • SCHED : pointer on the scheduler in charge of this thread.
  • CORE : pointer on the owner processor core.
  • IO : allocated devices (in case of privately allocated devices).
  • SIGNALS : bit vector permettant d’enregistrer les signaux reçus par le thread
  • etc.

3) Process creation

The process creation in a remote cluster implement the POSIX fork / exec mechanism. Notice that ALMOS-MK implements one private scheduler per processor core.

3.1) fork()

The parent process P is running on a core in cluster K, and execute the fork() system call to create a new process F on a remote cluster Z that will become the owner of the F process. It is selected by the kernel using the DQDT. Cluster K will contain the reference process descriptor. The fork() sys call execute the following steps:

  1. the cluster K allocates memory in K to store the reference process descriptor of F, and get a pointer on this process descriptor using the process_alloc() function.
  2. the cluster K ask to kernel Z to allocate a PID for the F process, and to register the process descriptor extended pointer in PREF_TBL(Z) of cluster Z manager. This is done by the RPC_PROCESS_PID_ALLOC that takes the process descriptor pointer as argument and returns the PID.
  3. after RPC completion, the kernel K initializes the F process descriptor from informations found in the P parent process descriptor. This includes the inherited ...
  4. the kernel K creates locally the main thread of process F, and register this thread in the TH_TBL(K,P),
  5. the kernel K register this new thread in the scheduler of the core executing the fork() system call, an return.

At the end of the fork(), the owner cluster for the F process is cluster Z, and the reference cluster is K. This F process contains one single thread running on K.

3.2) exec()

After a fork() system call, the F process can execute an exec() system call. This system call forces the F process to migrate from the K cluster to the owner cluster Z, to execute a new code, while keeping the same PID. Therefore a new reference process descriptor must be created in the Z cluster, and initialized. The Z cluster will become both the owner and the reference cluster of the F process. The old reference process descriptor in K must be deleted. The exec() system call execute the following steps:

  1. The kernel K send an RPC_PROCESS_MIGRATE to cluster Z with the following arguments : the extended pointer on the F process descriptor in cluster K, the binary file, and ...
  2. To execute this RPC, the kernel Z allocates a new reference process descriptor in cluster Z, and initializes it from the

informations found in process descriptor in cluster K, using a remote_memcpy().

  1. The kernel Z allocates and initializes the structures contained in the process VMM: PG_TBL(Z,F), VSEG_LIST(Z,F).
  2. The kernel Z creates the main thread associated to process P in cluster Z, initializes it, and register it in the TH_TBL(Z,P).
  3. The kernel Z registers this thread in the scheduler of the core selected by the Z kernel an acknowledges the RPC. When this thread starts execution, the binary code will be loaded in the kernel Z memory, as required by the page faults.
  4. When receiving the RPC acknowledge, the kernel K destroy the F process descriptor and the associated thread in cluster K, that is nor anymore involved in process F execution.

At the end of the exec() system call, the cluster Z is both the owner and the reference cluster for process F. This F process contains one single thread in cluster Z.

process Une fois que les structures sont initialisées, le thread principal du processus fils est attaché à l'ordonnanceur du cœur cible. Le code binaire (segments code et data) sera chargé dans la mémoire du cluster cible, lors du traitement des défauts de page.

4) Thread creation

Any thread T of any process P, running in any cluster K, can create a new thread NT in any cluster M. This creation is driven by the pthread_create() system call. The target M cluster is called the host cluster. If the M cluster does not contain a process descriptor copy for process P (because the NT thread is the first thread of process P in cluster M), a new process descriptor must be created in cluster M.

  • The target cluster M can be specified by the user application, using the CXY field of the pthread_attr_t argument. If the CXY is not defined (value larger than the max number of cluster), the target cluster M is selected by the kernel K, using the DQDT.
  • The Target core in cluster M can be specified by the user application, using the CORE_LID field of the pthread_attr_t argument. If the CORE_LID is not defined (value larger than the max number of cores in a cluster), the target core is selected by the target kernel M.

4.1) phase 1

The kernel K select a target cluster M, and send a RPC_THREAD_USER_CREATE request to cluster M. The argument is a complete structure pthread_attr_t (defined in the thread.h file in ALMOS-MK), containing the PID, the function to execute and its arguments, and optionally, the target cluster and target core. This RPC should return a the thread TRDID.

4.2) phase 2

To execute this RPC, the kernel M will make a local copy of the pthread_attr_t structure, and execute the following steps:

  1. The kernel M checks if it contains a copy of the P process descriptor.
  2. If not, the kernel M creates a process descriptor copy from the reference P process descriptor, using a remote_memcpy(), and using the cluster_get_reference_process_from_pid() to get the extended pointer on reference cluster. It allocates memory for the associated structures PG_TBL(M,P), VSEG_LIST(M,P), FD_TBL(M,P). It initializes (partially) these structures by using remote_memcpy() from the reference cluster. The PG_TBL structure will be filled by the page faults.
  3. The kernel M register this new process descriptor in the COPIES_LIST and LOCAL_LIST.
  4. When the local process descriptor is set, the kernel M select the core that will execute the thread, allocates a TRDID to this thread, and creates the thread descriptor for NT.
  5. The kernel M registers the thread descriptor in the local process descriptor TH_TBL(M,P), and in the selected core scheduler.
  6. The kernel M returns the TRDID to the client cluster K, and acknowledge the RPC.

5) Thread destruction

The destruction of a thread TRDID running in cluster K can be caused by the thread itself, with the pthread_exit() system call. It can also be caused by a signal (local or remote) requesting the thread to stop execution. In both case, the host kernel K is in charge of the destruction.

5.1) phase 1

5.2) phase 2

5.3) phase 3

Le noyau du cluster K enregistre le signal KILL dans le descripteur de thread TRDID au moyen d’une opération atomique.

Lors du traitement de ce signal, le thread TRDID est éliminé de l’ordonnanceur, il est éliminé de la TRDL(P,X), et la mémoire allouée pour le descripteur de thread dans le cluster K est libérée.

5) Destruction d’un processus

5.1) phase 1

Si l’appel système exit() est exécuté sur un cluster K différent du cluster Z propriétaire, le noyau du cluster K envoie une TASK_EXIT_RPC vers le cluster Z propriétaire. L’argument est le PID du processus P.

5.2) phase 2

Pour exécuter la TASK_EXIT_RPC, le noyau du cluster Z propriétaire de P broadcaste une TASK_EXIT_RPC vers tous les clusters M qui contiennent au moins un thread du processus P. Dans chaque cluster M, le noyau recevant une TASK_EXIT_RPC enregistre le signal KILL dans les descripteurs detous les threadsde P. Quand il détecte que la TRDL(P,M) est vide, il libère toutes les structures de données allouées au processus P dans le cluster M, et retourne la réponse au cluster Z.

5.3) phase 3

Lorsque toutes les réponses au TASK_EXIT_RCP ont été reçues par le cluster Z, celui-ci libère toutes les structures de données allouées au processus P dans le cluster Z.