wiki:processus_thread

Version 15 (modified by alain, 8 years ago) (diff)

--

Dynamic process and thread creation

1) Process

The PID (Process Identifier) is coded on 32 bits. It is unique in the system, and has a fixed format: The 16 MSB (CXY) contain the owner cluster identifier. The 16 LSB bits (LPID) contain the local process index in owner cluster. The owner cluster is therefore defined by PID MSB bits.

To avoid contention, the process descriptor of a P process, and the associated structures, such as the Page Table, or the Open File Descriptors Table are (partially) replicated in all clusters containing at least one thread of P.

As it exists several copies of the process descriptors, ALMOS-MK defines a reference process descriptor, located in the reference cluster, and the other copies are used as local caches.

Warning : to support process migration, the reference cluster can be different from the owner process.

In each cluster K, the local cluster manager (cluster_t type in ALMOS-MK) contains a process manager (pmgr_t type in ALMOS-MK) that maintains three structures for all process owned by K :

  • The PREF_TBL[lpid]] is an array indexed by the local process index. Each entry contains an extended pointer on the reference process descriptor.
  • The COPIES_ROOT[lpid] array is also indexed by the local process index. Each entry contains the root of the global list of copies for each process owned by cluster K.
  • The LOCAL_ROOT is the root of the local list of all process descriptors in cluster K. A process descriptor copy of P is present in K, as soon as P has a thread in cluster K.

The process descriptor (process_t in ALMOS-MK) contains the following informations:

  • PID : proces identifier.
  • PPID : parent process identifier,
  • XMIN, XMAX, YMIN, YMAX : rectangle covering all active clusters.
  • PREF : extended pointer on the reference process descriptor.
  • VMM : virtual memory manager containing the PG_TBL and the VSEG_LIST.
  • FD_TBL : open file descriptors table.
  • TH_TBL : local table of threads owned by this process in this cluster.
  • LOCAL_LIST : member of local list of all process descriptors in same cluster.
  • COPIES_LIST : member of global list (globale) of all copies of same process.
  • CHILDREN_LIST : member of global list of all children of same parent process.
  • CHILDREN_ROOT : root of global list of children process.

Elements of a local list are in the same clusters, and ALMOS-MK uses local pointers. Elements of a global list can be distributed on all clusters, and ALMOS-MK uses extended pointers.

2) Thread

ALMOS-MK defines four types of threads :

  • one USER thread is created by a pthread_create() system call.
  • one KERNEL thread is dynamically created by the kernel to execute a kernel service in a cluster.
  • one RPC thread is activated by the kernel to execute one or several pending RPC requests.
  • the IDLE thread is executed when there is no other thread to execute on a core.

From the point of view of scheduling, a thread can be in six states, as described in section D.

This implementation of ALMOS-MK does not support thread migration: a thread is pinned on a given core in a given cluster. The only exception is the main thread of a process, that is automatically created by the kernel when a new process is created: This main thread follow its owner process in case of process migration.

An user thread is identified by a fixed format TRDID identifier, coded on 32 bits : The 16 MSB bits (CXY) define the cluster where the thread has been pinned. The 16 LSB bits (LTID) define the thread local index in the local TH_TBL[K,P] of a process descriptor P in a cluster K. This LTID index is allocated by the local process descriptor when the thread is created.

Therefore, the TH_TBL(K,P) thread table for a given process in a given clusters contains only the threads of P placed in cluster K. The set of all threads of a given process is defined by the union of all TH_TBL(K,P) for all active clusters K. To scan the set off all threads of a process P, ALMOS-MK traverse the COPIES_LIST of all process_descriptors associated to P process.

The thread descriptor (thread_t in ALMOS-MK) contains the following informations:

  • TRDID : thread identifier
  • PTRDID : parent thread identifier
  • TYPE : KERNEL / USER / IDLE / RPC
  • FLAGS : attributs du thread
  • PROCESS : pointer on the local process descriptor
  • STATE : CREATE / READY / USER / KERNEL / WAIT / DEAD
  • LOCKS_COUNT : current number of locks taken by this thread
  • PWS : zone de sauvegarde des registres du coeur.
  • XLIST : member of the global list of threads waiting on the same resource.
  • SCHED : pointer on the scheduler in charge of this thread.
  • CORE : pointer on the owner processor core.
  • IO : allocated devices (in case of privately allocated devices).
  • SIGNALS : bit vector permettant d’enregistrer les signaux reçus par le thread
  • etc.

3) Process creation

The process creation in a remote cluster implement the POSIX fork / exec mechanism. Notice that ALMOS-MK implements one private scheduler per processor core.

3.1) fork()

The parent process P is running on a core in cluster K, and execute the fork() system call to create a new process F on a remote cluster Z. The target cluster Z will become the owner of the F process. It is selected by the kernel using the DQDT. Cluster K will contain the reference process descriptor. The fork() sys call execute the following steps:

  1. the process P in cluster K allocates memory in K to store the reference process descriptor of F, and get a pointer on this process descriptor using the process_alloc() function.
  2. the process P in cluster K ask to kernel Z to allocate a PID for the F process, and to register the process descriptor extended pointer in PREF_TBL(Z) of cluster Z manager. This is done by the RPC_PROCESS_PID_ALLOC that takes the process descriptor pointer as argument and returns the PID.
  3. the process P in kernel K initializes the F process descriptor from informations found in the P parent process descriptor. This includes the inherited ...
  4. the process P in kernel K creates locally the main thread of process F, and register this thread in the TH_TBL(K,P),
  5. the process P in kernel K register this new thread in the scheduler of the core executing the fork() system call, an return.

At the end of the fork(), the owner cluster for the F process is cluster Z, and the reference cluster is K. This F process contains one single thread running on K.

3.2) exec()

After a fork() system call, the F process can execute an exec() system call. This system call forces the F process to migrate from the K cluster to the owner cluster Z, to execute a new code, while keeping the same PID. Therefore a new reference process descriptor must be created in the Z cluster, and initialized. The Z cluster will become both the owner and the reference cluster of the F process. The old reference process descriptor in K must be deleted. The exec() system call execute the following steps:

  1. The kernel K send an RPC_PROCESS_MIGRATE to cluster Z with the following arguments : the extended pointer on the F process descriptor in cluster K, the binary file, and ...
  2. To execute this RPC, the kernel Z allocates a new reference process descriptor in cluster Z, and initializes it from the

informations found in process descriptor in cluster K, using a remote_memcpy().

  1. The kernel Z allocates and initializes the structures contained in the process VMM: PG_TBL(Z,F), VSEG_LIST(Z,F).
  2. The kernel Z creates the main thread associated to process P in cluster Z, initializes it, and register it in the TH_TBL(Z,P).
  3. The kernel Z registers this thread in the scheduler of the core selected by the Z kernel an acknowledges the RPC. When this thread starts execution, the binary code will be loaded in the kernel Z memory, as required by the page faults.
  4. When receiving the RPC acknowledge, the kernel K destroy the F process descriptor and the associated thread in cluster K, that is nor anymore involved in process F execution.

At the end of the exec() system call, the cluster Z is both the owner and the reference cluster for process F. This F process contains one single thread in cluster Z.

process Une fois que les structures sont initialisées, le thread principal du processus fils est attaché à l'ordonnanceur du cœur cible. Le code binaire (segments code et data) sera chargé dans la mémoire du cluster cible, lors du traitement des défauts de page.

4) New thread creation

Any thread T of any process P, running in any cluster K, can create a new thread NT, on any cluster M, using the pthread_create() system call. The new thread NT will be placed on a core of the M cluster. If the M cluster does not contain a process descriptor copy for process P (because the NT thread is the first thread of process P in cluster M), a new process descriptor must be created in cluster M.

  • The target cluster M can be specified by the user application, using the CXY field of the pthread_attr_t argument. If the CXY is not defined (value larger than the max number of cluster), the target cluster M is selected by the kernel K, using the DQDT.
  • The Target core in cluster M can be specified by the user application, using the CORE_LID field of the pthread_attr_t argument. If the CORE_LID is not defined (value larger than the max number of cores in a cluster), the target core is selected by the M kernel.

Only the local kernel can create or destroy a thread in cluster M for process P, and modify the TH_TBL(M,P).

4.1) phase 1

Le noyau du cluster K connait l'identifiant du cluster Z propriétaire de P, puisque cet identifiant est contenu dans le PID. le noyau de K envoie une RPC_THREAD_USER_CREATE au cluster Z propriétaire. L'argument est une structure pthread_attr_t contenant le PID, le TRDID, la fonction à exécuter et ses arguments, et optionnellement le cluster cible et le core cible. Le noyau du cluster Z propriétaire de P, ne répond pas immédiatement au cluster demandeur K, mais envoie au cluster cible M une RPC_THREAD_USER_CREATE, avec les mêmes arguments.

4.2) phase 2

Le noyau du cluster M vérifie s’il possède une copie du descripteur du processus P. Si ce n’est pas le cas il crée et un nouveau descripteur du processus P dans M, et alloue les structures PT(P,M), VSL(P,M), FDT(P,M), TRDL(P,M). Il initialise les structures VSL et FDT en utilisant des remote_memcpy() depuis le cluster Z vers le cluster M. La structure PT(P,M) reste vide, et se remplira à la demande lors du traitement des défauts de page. Quand il est sûr de posséder une copie du descripteur de processus P, il sélectionne un coeur charger d’exécuter le nouveau thread T’, Il attribue un TRDID au thread à T’, crée un descripteur de thread et l’enregistre dans la TRDL(P,M). Finalement, il enregistre T’ dans l’ordonnanceur du coeur sélectionné, et renvoie le TRDID dans la réponse RPC au cluster Z.

4.3) Phase 3

Le noyau du cluster Z propriétaire de P alloue un descripteur de thread et l’enregistre dans sa TRDL(P,Z). Cette TRDL(P,Z) doit en effet contenir la totalité des descripteurs de threads du processus P. Puis le noyau de Z renvoie une réponse RPC au cluster K.

5) Destruction d’un thread

La destruction d’un thread TRDID s’exécutant sur un cluster K est déclenchée :

  • soit par le thread lui-même (appel système pthread_exit() ),
  • soit par le noyau du cluster hôte K où s’exécute le thread,
  • soit par un signal provenant d’un autre cluster.

Dans tous les cas, c’est le noyau du cluster hôte K, qui pilote cette destruction.

5.1) phase 1

Le noyau du cluster K envoie une THREAD_EXIT_RPC au cluster Z propriétaire du processus P, pour qu’il mette à jour la TRDL(P,Z) qui contient tous les thread de P. Les arguments sont le PID et le TRDID.

5.2) phase 2

Le noyau du cluster Z propriétaire de P met à jour sa TRDL(P,Z), en supprimant le thread TRDID, puis envoie la réponse RPC au cluster K.

5.3) phase 3

Le noyau du cluster K enregistre le signal KILL dans le descripteur de thread TRDID au moyen d’une opération atomique.

Lors du traitement de ce signal, le thread TRDID est éliminé de l’ordonnanceur, il est éliminé de la TRDL(P,X), et la mémoire allouée pour le descripteur de thread dans le cluster K est libérée.

5) Destruction d’un processus

La destruction d’un processus P est déclenchée :

  • soit par un thread du processus P s’exécutant sur un cluster X quelconque (appel système exit() ),
  • soit par le noyau du cluster Z propriétaire P,
  • soit par un signal provenant d’un autre cluster

Dans tous les cas, c’est le noyau du cluster Z propriétaire de P, qui pilote cette destruction.

5.1) phase 1

Si l’appel système exit() est exécuté sur un cluster K différent du cluster Z propriétaire, le noyau du cluster K envoie une TASK_EXIT_RPC vers le cluster Z propriétaire. L’argument est le PID du processus P.

5.2) phase 2

Pour exécuter la TASK_EXIT_RPC, le noyau du cluster Z propriétaire de P broadcaste une TASK_EXIT_RPC vers tous les clusters M qui contiennent au moins un thread du processus P. Dans chaque cluster M, le noyau recevant une TASK_EXIT_RPC enregistre le signal KILL dans les descripteurs detous les threadsde P. Quand il détecte que la TRDL(P,M) est vide, il libère toutes les structures de données allouées au processus P dans le cluster M, et retourne la réponse au cluster Z.

5.3) phase 3

Lorsque toutes les réponses au TASK_EXIT_RCP ont été reçues par le cluster Z, celui-ci libère toutes les structures de données allouées au processus P dans le cluster Z.