wiki:rpc_implementation

Version 26 (modified by alain, 5 years ago) (diff)

--

Remote Procedure Call

To enforce locality for complex operations requiring a large number of remote memory accesses, the various kernel instances can communicate using RPCs (Remote Procedure Call), respecting the client/server model. This section describe the RPC mechanism implemented by ALMOS-MKH. The corresponding code is defined in the rpc.c and rpc.h files. The software FIFO implementing the communication channel is defined in the remote_fifo.c and remote_fifo.h.

1) Hardware platform assumptions

The target architecture is clusterised: the physical address space is shared by all cores, but it is physically distributed, with one physical memory bank per cluster, and the following assumptions:

  • The physical addresses - also called extended adresses - are 64 bits encoded. Therefore the global physical address space cannot be larger than (32 G * 32 G) bytes.
  • The max size of the physical address space in a single cluster is defined by the CONFIG_CLUSTER_SPAN configuration parameter, that must be a power of 2. It is 4 Gbytes for the TSAR architecture, but it can be larger for Intel based architectures.
  • For a given architecture, the physical address is therefore split in two fixed size fields : The LPA field (Local Physical Adress) contains the LSB bits, and defines the physical address inside a given cluster. The CXY field (Cluster Identifier Index) contains the MSB bits, and directly identifies the cluster.
  • Each cluster can contain several cores (including 0), several peripherals, and a physical memory bank of any size (including 0 bytes). This is defined in the arch_info file.
  • There is one kernel instance in each cluster containing at least one core, one interrupt contrôler, and one physical memory bank.

2) Unique access point in cluster

ALMOS-MKH replicates the KDATA segment (containing the kernel global variables) in all segments, using the same LPA (Local Physical Address) for the KDATA segment base. Therefore, in two different clusters, a given global variable, identified by its LPA can have different values. This feature is used by by ALMOS-MKH to allow a kernel instance in a cluster K to access the global variables of another kernel instance in cluster K', building a physical address by concatenation of a given LPA with the CXY cluster identifier for the target cluster K'.

ALMOS-MKH defines a software RPC_FIFO, implemented as a global variable. This RPC_FIFO, is a member of the "cluster_manager" structure, that is replicated in all clusters containing a kernel instance. It is used by client threads to register RPC requests to a given target cluster K.

This RPC_FIFO has a large number (N) of writers, an a small number (M) of readers:

  • N is the number of client threads (practically unbounded), running in any cluster, that can send a RPC request to a given target cluster K. To synchronize these multiple client threads, the RPC FIFO implements a "ticket based" policy, defining a first arrived / first served priority to register a new request into FIFO.
  • M is the number of server threads, (bounded by the CONFIG_RPC_THREAD_MAX parameter), created in cluster K to handle the RPC requests. To synchronize these multiple server threads (called RPC threads), the RPC FIFO implements a "light lock", that is a non blocking lock: only one RPC thread at a given time can take the lock and become the FIFO owner. This FIFO owner is in charge of handling pending RPC requests. When the light lock is taken by RPC thread T, another RPC thread T' failing to take the lock simply deschedules.

3) Client blocking policy

All RPCs are blocking for the client thread. ALMOS-MKH implements a descheduling policy: the client thread blocks on the specific RPC_WAIT condition, and deschedules. It is unblocked by the RPC thread, when the RPC is completed.

In order to reduce the RPC latency for the client thread, the RPC thread completing an RPC request send an IPI to force a scheduling on the core running the client thread. If there is no other thread with an higher priority, the client thread can be immediately rescheduled

3) Parallel RPC requests handling

Each cluster contains a pool of RPC threads, in charge of RPC requests handling. As the scheduler takes into account the thread type, it can take into account the RPC_FIFO state for the RPC threads. AN RPC thread is not runnable when the RPC_FIFO is empty. When the RPC_FIFO is not empty, only one RPC thread has the FIFO ownership and can consume RPC requests from the FIFO.

Nevertheless, ALMOS-MKH supports several RPC threads (up to CONFIG_RPC_THREAD_MAX per cluster), because a given RPC thread T handling a given request can block, waiting for a shared resource, such as a peripheral. In that case, the blocked RPC thread T releases the FIFO ownership before blocking and descheduling. This RPC thread T will complete the current RPC request when the blocking condition is solved, and the thread T is rescheduled. If the RPC FIFO is not empty, another RPC thread T' will be scheduled to handle the pending RPC requests. If all existing RPC threads are blocked, a new RPC thread is dynamically created.

Therefore, it can exist N simultaneously active RPC threads: the running one is the FIFO owner, and the (N-1) others are blocked on a wait condition.

4) Placement of RPC threads on cores

The client thread, after registering the RPC request in the target cluster FIFO, and before descheduling, send an IPI to a specific core in the client thread, and force this core to make a scheduling. The RPC threads being kernel threads, have an higher priority than the client threads that are generally user threads. Therefore this IPI will contribute to reduce the RPC latency for the client thread.

To distribute the load associated to RPC handling on all cores in a given target cluster, ALMOS-MHH implement the following policy: The client thread select in target cluster the core that has the same local index as the core running the client thread in the client cluster.

5) RPC request format

ALMOS-MKH implement two types of RPCs :

  • simple RPC : the client thread send the RPC request to one single server waiting one single response.
  • multicast RPC : the client thread send the same RPC request to several servers, expecting several responses.

Each entry in the RPC_FIFO contains a remote pointer (xptr_t) on a RPC descriptor (rpc_desc_t), that is stored on the client side (in the client thread stack). This RPC descriptor has a fixed format, and contains the following informations:

  • The index field defines the required service type.
  • The args field is an array of 10 uint64_t, containing the service arguments (both input & output).
  • The thread field defines the client thread (used by the server thread to unblock the client thread).
  • The lid field defines the client core local index (used by the server thread to send the completion IPI).
  • The response field defines the number of expected responses.

The semantic of the args array depends on the RPC index field.

This format supports both simple and multicast RPCs: The clientt thread initializes the response field with the number of expected responses, and each server thread atomically decrement this counter when the RPC request has been satisfied.

6) How to define a new RPC

To introduce a new RPC service rpc_new_service in ALMOS-MKH, you need to modify the rpc.c and rpc.h files:

  • You must identify (or possibly implement if it does not exist) the kernel function kernel_service() that you want to execute remotely. This function should not use more than 10 arguments.
  • You must register the new RPC in the enum rpc_index_t (in rpc.h file), and in the rpc_server [ ] array (in the rpc.c file).
  • You must implement the marshaling function rpc_kernel_service_client(), that is executed on the client side by the client thread, in the rpc.c and rpc.c files. This function (1) register the input arguments in the RPC descriptor, (2) register the RPC request in the target cluster RPC_FIFO, (3) extract the output arguments from the RPC descriptor.
  • You must implement the marshaling function rpc_kernel_service_server(), that is executed on the server side by the RPC thread, in the rpc.c and rpc.c files. This function (1) extract the input arguments from the RPC descriptor, (2) call the kernel_service() function, (3) register the outputs arguments in the RPC descriptor.