Changes between Version 102 and Version 103 of processus_thread


Ignore:
Timestamp:
Jun 26, 2018, 2:57:25 PM (6 years ago)
Author:
nicolas.van.phan@…
Comment:

Add more links to referenced files and structures

Legend:

Unmodified
Added
Removed
Modified
  • processus_thread

    v102 v103  
    1414As ALMOS-MKH supports process migration, the '''reference cluster''' can be different from the '''owner cluster'''. The '''owner cluster''' cannot change (because the PID is fixed), but the '''reference cluster''' can change in case of process migration.
    1515
    16 In each cluster K, the local cluster manager ( [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h cluster_t] type in ALMOS-MKH ) contains a process manager ( [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h pmgr_t]  type in ALMOS-MKH ) that maintains three structures for all processes owned by K :
    17  * The '''PREF_TBL[lpid]''' is an array indexed by the local process index. Each entry contains an extended pointer on the reference process descriptor.
    18  * The '''COPIES_ROOT[lpid]''' array is also indexed by the local process index. Each entry contains the root of the global list of copies for each process owned by cluster K.   
    19  * The '''LOCAL_ROOT''' is the local list of all process descriptors in cluster K. A process descriptor copy of P is present in K, as soon as P has a thread in cluster K.
     16In each cluster K, the local cluster manager ( [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h#L97 cluster_t] type in ALMOS-MKH ) contains a process manager ( [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h#L75 pmgr_t]  type in ALMOS-MKH ) that maintains three structures for all processes owned by K :
     17 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h#L77 pref_tbl[lpid]] is an array indexed by the local process index. Each entry contains an extended pointer on the reference process descriptor.
     18 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h#L85 copies_root[lpid]] array is also indexed by the local process index. Each entry contains the root of the global list of copies for each process owned by cluster K.   
     19 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/cluster.h#L81 local_root] is the local list of all process descriptors in cluster K. A process descriptor copy of P is present in K, as soon as P has a thread in cluster K.
    2020
    2121A process can be in four states:
     
    2525 * '''KILLED''' : the process received a SIGKILL signal. It will be destroyed by the parent process executing a wait() sys call.
    2626
    27 You can find below a partial list of information stored in a process descriptor ( [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/process.h process_t] in ALMOS-MKH ):
     27You can find below a partial list of information stored in a process descriptor ( [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/process.h#L111 process_t] in ALMOS-MKH ):
    2828- '''PID''' :  proces identifier.
    2929- '''PPID''' : parent process identifier,
     
    3939- '''CHILDREN_ROOT''' : root of global list of children process.
    4040
    41 All elements of a ''local'' list are in the same cluster, so ALMOS-MKH uses local pointers. Elements of a ''global'' list can be distributed on all clusters, so ALMOS-MKH uses extended pointers. 
     41All elements of a ''local'' list are in the same cluster, so ALMOS-MKH uses local pointers. Elements of a ''global'' list can be distributed on all clusters, so ALMOS-MKH uses extended pointers.
    4242
    4343== __2) Thread definition__ ==
    4444
    45 ALMOS-MKH defines in [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h thread_type_t] four types of threads :
     45ALMOS-MKH defines in [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L58 thread_type_t] four types of threads :
    4646 * one '''USR''' thread is created by a pthread_create() system call.
    4747 * one '''DEV''' thread is created by the kernel to execute all I/O operations for a given channel device.
    4848 * one '''RPC''' thread is activated by the kernel to execute pending RPC requests in the local RPC FIFO.
    49  * the  '''IDL''' thread is executed when there is no other thread to execute on a core.
     49 * the '''IDL''' thread is executed when there is no other thread to execute on a core.
    5050
    5151From the point of view of scheduling, a thread can be in three states : RUNNING, RUNNABLE or BLOCKED.
    5252
    53 In a given process, a thread is identified by a fixed format TRDID kernel identifier, coded on 32 bits : The 16 MSB bits (CXY) define the cluster K where the thread has been created. The 16 LSB bits (LTID) define the thread local index in the local TH_TBL[K,P] of a process descriptor P in a cluster K. This LTID index is allocated by the local process descriptor when the thread is created.
     53In a given process, a thread is identified by a fixed format TRDID kernel identifier, coded on 32 bits : The 16 MSB bits (CXY) define the cluster K where the thread has been created. The 16 LSB bits (LTID) define the thread local index in the local [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/process.h#L139 TH_TBL[K,P]] of a process descriptor P in a cluster K. This LTID index is allocated by the local process descriptor when the thread is created.
    5454
    5555Therefore, the TH_TBL(K,P) thread table for a given process in a given cluster contains only the threads of P placed in cluster K. The set of all threads of a given process is defined by the union of all TH_TBL(K,P) for all active clusters K.
     
    5858This implementation of ALMOS-MKH does not support thread migration: a thread is pinned on a given core in a given cluster. In the future process migration mechanism, all threads of given process in a given cluster can migrate to another cluster for load balancing. This mechanism is not implemented yet (february 2018), and will require to distinguish the kernel thread identifier (TRDID, that will be modified by a migration), and the user thread identifier (THREAD, that cannot be modified by a migration). In the current implementation, the user identifier (returned by the pthread_create() sys call) is identical to the kernel identifier.
    5959
    60 You can find below a partial list of information stored in a thread descriptor ([https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h thread_t] in ALMOS-MKH):
     60You can find below a partial list of information stored in a thread descriptor ([https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L134 thread_t] in ALMOS-MKH):
    6161 * '''TRDID''' : thread identifier
    6262 * '''TYPE''' : KERNEL / USER / IDLE / RPC
     
    115115
    116116Any user thread T of any process P, running in any cluster K, can create a new thread NT in any cluster K'. This creation is initiated by the ''pthread_create'' system call.
    117  * The target cluster K' can be specified by the user application, using the CXY field of the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/shared_include/shared_pthread.h pthread_attr_t] argument. If the CXY is not defined by the user, the target cluster K' is selected by the kernel K, using the DQDT.
     117 * The target cluster K' can be specified by the user application, using the CXY field of the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/shared_include/shared_pthread.h#L44 pthread_attr_t] argument. If the CXY is not defined by the user, the target cluster K' is selected by the kernel K, using the DQDT.
    118118 * The target core in cluster K' can be specified by the user application, using the CORE_LID field of the pthread_attr_t argument. If the CORE_LID is not defined by the user, the target core is selected by the target kernel K'.
    119119
     
    125125== __5) Thread destruction__ ==
    126126
    127 The destruction of a target thread T can be caused by another thread K, executing  the '''pthread_cancel()''' sys call (see [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_thread_cancel.c sys_thread_cancel.c]) requesting the target thread T to stop execution. It can be caused by the thread T itself, executing the '''pthread_exit()''' syscall (see [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_thread_exit.c sys_thread_exit.c]) to suicide. Finally, it can be caused by the ''exit()'' or ''kill()'' syscalls requesting the destruction of all threads of a given process.
     127The destruction of a target thread T can be caused by another thread K, executing  the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_thread_cancel.c pthread_cancel()] syscall requesting the target thread T to stop execution. It can be caused by the thread T itself, executing the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_thread_exit.c pthread_exit()] syscall to suicide. Finally, it can be caused by the ''exit()'' or ''kill()'' syscalls requesting the destruction of all threads of a given process.
    128128
    129129The unique method to destroy a thread is to call the '''thread_kill()''' function, that sets the THREAD_FLAG_REQ_DELETE bit in the ''flags'' field of the target thread descriptor. The thread will be asynchronously deleted by the scheduler at the next scheduling point.
    130130The scheduler calls the ''thread_destroy()'' function that detaches the thread from the scheduler, detaches the thread from the local process descriptor, and releases the memory allocated to the thread descriptor. The '''thread_kill()''' function can be called by the target thread itself (for an exit), or by another thread (for a kill).
    131131
    132 If the target thread is running in attached mode, the '''thread_kill()''' function synchronizes with the joining thread, waiting the actual execution of the pthread_join() syscall (see [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_thread_join.c sys_thread_join.c]) before marking the target thread for delete.
     132If the target thread is running in attached mode, the '''thread_kill()''' function synchronizes with the joining thread, waiting the actual execution of the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_thread_join.c pthread_join()] syscall before marking the target thread for delete.
    133133
    134134If the target thread is the main thread (i.e. the thread 0 in the process owner cluster) the  '''thread_kill()''' does not mark the target thread for delete, because this must be done by the parent process main thread executing the ''sys_wait()'' syscall (see section [6] below).   
     
    138138The scenario is rather simple when the target thread T is not running in ATTACHED mode.
    139139The killer thread (that can be the target thread itself for an exit) calls the thread_kill() function that does the following actions:
    140  * the killer thread sets the THREAD_BLOCKED_GLOBAL bit in the target thread ''blocked'' field,
    141  * the killer thread sets the THREAD_FLAG_REQ_DELETE bit in the target thread ''flags'' field,
     140 * the killer thread sets the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L82 THREAD_BLOCKED_GLOBAL] bit in the target thread ''blocked'' field,
     141 * the killer thread sets the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L76 THREAD_FLAG_REQ_DELETE] bit in the target thread ''flags'' field,
    142142 * the killer thread returns without waiting the actual deletion.
    143143
     
    148148This destruction mechanism can involve three threads: the target thread T, the killer thread K, and the joining thread J:
    149149
    150 It uses two specific fields in the thread descriptor: the ''join_lock'' field is a remote_spin_lock, and the ''join_xp'' field contains an extended pointer on the first arrived thread. It uses also two specific THREAD_FLAG_JOIN_DONE and THREAD_FLAG_KILL_DONE flags in the target thread descriptor ''flags'' field, and one specific blocking bit THREAD_BLOCKED_JOIN, in the ''blocked'' field.
     150It uses two specific fields in the thread descriptor: the ''join_lock'' field is a remote_spin_lock, and the ''join_xp'' field contains an extended pointer on the first arrived thread. It uses also two specific [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L72 THREAD_FLAG_JOIN_DONE] and [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L73 THREAD_FLAG_KILL_DONE] flags in the target thread descriptor ''flags'' field, and one specific blocking bit [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L86 THREAD_BLOCKED_JOIN], in the ''blocked'' field.
    151151
    152152 * Both the killer thread K, executing the thread_kill() function), and the joining thread J, executing the sys_thread_join() function, try to take the ''join_lock'' implemented in the T thread descriptor (the ''join_lock'' in the J thread is not used).
     
    166166The process descriptors copies (other than the process descriptor in owner cluster) are simply deleted by the scheduler when the last thread of a given process in a given cluster is deleted. The process descriptor copy is removed from the list of copies in the owner process cluster descriptor, and the process copy disappears.
    167167
    168 The process destruction in the owner cluster is more complex, because the child process destruction must be reported to the parent process when the main thread of the parent process executes the blocking [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_wait.c sys_wait()] system call (in the parent owner cluster). Therefore, the child process in owner cluster cannot be destroyed before the parent calls the sys_wait() function. As the '''sys_wait()''' and the '''sys_kill()''' (or '''sys_exit()''') functions are executed by different threads running in different clusters, this requires a parent/child synchronization. To keep a process descriptor in ''zombie'' state after a sys-kill() or sys_exit(), the main thread (i.e. thread 0 in process owner cluster) is not deleted until the sys_wait() syscall is executed by the parent process main thread. This synchronization uses the '''term_state'''  field in process descriptor, that contains the following information :
    169  * The PROCESS_FLAG_KILL indicates that a KILL request has been received by the child; 
    170  * The PROCESS_FLAG_EXIT indicates that an EXIT request has been made by the child; 
    171  * The PROCESS_FLAG_BLOCK indicates that a SIGSTOP signal has been received by the child;
    172  * The PROCESS_FLAG_WAIT flag indicates that a WAIT  request from parent has been received by the child.
    173  * Moreover, for an exit(),  the exit() argument value is registered in this ''term_state'' field.
     168The process destruction in the owner cluster is more complex, because the child process destruction must be reported to the parent process when the main thread of the parent process executes the blocking [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/sys_wait.c#L34 sys_wait()] system call (in the parent owner cluster). Therefore, the child process in owner cluster cannot be destroyed before the parent calls the sys_wait() function. As the '''sys_wait()''' and the '''sys_kill()''' (or '''sys_exit()''') functions are executed by different threads running in different clusters, this requires a parent/child synchronization. To keep a process descriptor in ''zombie'' state after a sys_kill() or sys_exit(), the main thread (i.e. thread 0 in process owner cluster) is not deleted until the sys_wait() syscall is executed by the parent process main thread. This synchronization uses the '''term_state'''  field in process descriptor, that contains the following information :
     169 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/shared_include/shared_wait.h#L36 PROCESS_TERM_KILL] indicates that a KILL request has been received by the child; 
     170 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/shared_include/shared_wait.h#L37 PROCESS_TERM_EXIT] indicates that an EXIT request has been made by the child; 
     171 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/shared_include/shared_wait.h#L35 PROCESS_TERM_BLOCK] indicates that a SIGSTOP signal has been received by the child;
     172 * The [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/syscalls/shared_include/shared_wait.h#L38 PROCESS_TERM_WAIT] flag indicates that a WAIT  request from parent has been received by the child.
     173 * Moreover, for an exit(),  the exit() argument value is registered in this [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/process.h#L147 term_state] field.
    174174
    175 The actual deletion of the child owner process descriptor and child main thread are done by the sys_wait() function, executed by the parent main thread (i.e. thread 0 in parent owner cluster). This sys_wait() function executes an infinite loop. At each iteration, the parent main thread scans all children owner descriptors. When it detects that one child terminated, it sets the PROCESS_FLAG_WAIT in child owner process descriptor, sets the THREAD_FLAG_DELETE in the child main thread, and returns to report the child termination state to parent process. It is the responsibility of the parent process to re-enter the sys_wait() syscall for the other children. When the parent process does not detect a terminated child at the end of an iteration, it deschedules without blocking. 
     175The actual deletion of the child owner process descriptor and child main thread are done by the sys_wait() function, executed by the parent main thread (i.e. thread 0 in parent owner cluster). This sys_wait() function executes an infinite loop. At each iteration, the parent main thread scans all children owner descriptors. When it detects that one child terminated, it sets the PROCESS_FLAG_WAIT in child owner process descriptor, sets the [https://www-soc.lip6.fr/trac/almos-mkh/browser/trunk/kernel/kern/thread.h#L76 THREAD_FLAG_REQ_DELETE] in the child main thread, and returns to report the child termination state to parent process. It is the responsibility of the parent process to re-enter the sys_wait() syscall for the other children. When the parent process does not detect a terminated child at the end of an iteration, it deschedules without blocking. 
    176176
    177177
    178 === 6.2) detailed exit scenario ===
     178=== 6.2) Detailed exit scenario ===
    179179
    180180This section describes the termination of a process caused by a sys_exit().
     
    187187 1. The main thread, and the owner process descriptor on one hand, the calling thread and the associated process will be destroyed by the scheduler at the next scheduling point.
    188188
    189 === 6.3) detailed kill scenario ===
     189=== 6.3) Detailed kill scenario ===
    190190
    191191This section describes the termination of a target process caused by a sys_kill(  SIGKILL ).