Changes between Version 12 and Version 13 of AtomicOperations


Ignore:
Timestamp:
Dec 7, 2015, 3:54:11 PM (7 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AtomicOperations

    v12 v13  
    55== 1.  Goals ==
    66
    7 The TSAR architecture implements atomic read-then-write operations to support various software synchronization mechanisms. The constraints are the following :
    8  * A software program must have the possibility to read a data at address X, test this data, and write the (possibly modified) data at the same address X, with the guaranty that no other access to this data was done between the read and write access. 
    9  * As we want to support commodity operating systems and existing software applications, any memory address can be the target of an atomic access.
    10  * As the atomic access can be used to implement spin-locks, the lock address must be cacheable in order to benefit from the general coherence protocol, and avoid unnecessary transactions on the interconnection network.
     7The TSAR architecture implements two atomic read-then-write operations to support various synchronization mechanisms :
     8 * The '''LL/SC''' (Linked Load / Store Conditional) operations are implemented as TWO specific Command/Response VCI transactions. As the LL/SC instructions are implemented in the MIPS32 instruction set, these instructions van be used by both the kernel code and by the application code to read a data at address X, test this data, and write the (possibly modified) data at the same address X, with the guaranty that no other access to this data was done between the read and write access.
     9 * The '''CAS''' (Compare and Swap) operation is implemented as ONE specific Command/Response VCI transaction. As there is no CAS instruction in the MIPS32 instruction set, this operation is cannot be used by the software. It is only used by some hardware components such as the MMU contained in the L1 cache controller (to update the DIRTY bit in the page tables), or by some DMA peripheral such as the vci_mwmr_dma component to atomically access the lock protecting a shared communication channel.
    1110
    1211== 2.  LL/SC mechanism ==
    1312
    14 The TSAR memory sub-system supports the LL/SC mechanism. The LL & SC commands are defined in the VCI/OCP protocol, and the LL and SC instructions must be defined in the processor Instructon Set Architecture. This is natively the case for the MIPS32 & PowerPC processors.
    15 On the direct network, the VCI CMD field can take four values : READ, WRITE, LINKED_LOAD (LL), and STORE_CONDITIONAL (SC). From a conceptual point of view, the atomicity is handled on the memory controller side, that must maintain a list of all pending atomic operations in a ''reservation table'' :
     13As we want to support commodity operating systems and existing software applications, any memory address can be the target of an atomic access.
     14As the atomic access can be used to implement spin-locks, the address must be cacheable in order to benefit from the general coherence protocol, and avoid unnecessary transactions on the interconnection network.
    1615
    1716=== 2.1 General principle ===
    1817
    19  * When a processor, identified by its SRCID, executes the LL(X) instruction to an address X, the memory controller registers an entry (SRCID, X) in the reservation table, and returns the memory value stored at address X in the VCI RDATA field. If there was another reservation for the same processor SRCID, but for another address X’, the previous reservation for X’ is lost (it means that the previous reservation is cancelled).
    20  * When a processor, identified by its SRCID, executes the SC(X) instruction, there is two possibilities. If there is a valid reservation entry (SRCID, X) indicating that no other access to the X address has been received, the atomic operation is a success : the write is done, the memory cache controller returns a ''success'' value in he RDATA VCI field and all entries in the reservation table for the X address are cancelled. If there is no valid reservation entry (SRCID, X) in the reservation table, the write is not done, and the memory cache returns a ''fail'' value in the RDATA field.
     18From a conceptual point of view, the atomicity is handled on the memory controller side, that is actually the L2 cache controller in the TSAR architecture. Each L2 cache controller contains a list of all pending LL/SC atomic operations in an associative ''reservation table'', that contains 32 entries.
    2119
    22 Clearly, in case of concurrent access, the winner is defined by the first SC instruction received by the memory controller.
     20 * When a processor P executes the LL(X) instruction to an address X, this réservation request is sent to the L2 cache by the L1 cache. The L2 cache allocates a 32 bits authentication key for this reservation. It registers both the X address and the K key in the associative ''reservation table'', and returns both the value stored at address X and the K value to the L1. Both the X address and the K key are also registered in the L1 cache. If another processor P' request a reservation for the same address X, it receives the save K value from the L2 cache.
    2321
    24 === 2.2 Failure / Success encoding ===
     22 * When a processor P executes the SC(X,D) instruction to an address X, this conditional write is sent to the L2 cache by the L1 cache, and the command contains also the reservation key K and the data to be written D. The L2 cache makes an associative search in the ''reservation table''. If both the address X and the key K matches, the atomic operation is a success : The reservation is canceled in the ''reservation table'', the D value is written at address X, and a ''success'' value is returned in the response to the L1 cache. If there is no match in the ''associative table'', the atomic operation is a failure: the D value is not written at address X, the ''reservation table is not modified, and a ''failure'' value is returned to the L1 cache.
     23
     24Clearly, in case of concurrent LL/SC access to the same address X by two or more L1 caches, the winner is defined by the first SC(X) instruction received by the L2 cache.
     25
     26=== 2.2 Key allocation policy ===
     27
     28Tha key allocator is a simple 32 bits counter, that is incremented each time a new K value is allocated to satisfy a LL requiring a new reservation. As there is a finite number of values, the mechanism requires that a K value allocated for a given reservation has a finite ''time of life''.
     29 
     30 * In the ''réservation table'' implemented in the L2 cache, the maximum ''time of  life'' for an entry containing the K value is defined by a maximum number of (2**31) allocated keys: Each time a new K' value is allocated, any entry in the ''reservation  table'' containing the K value, such as (|K-K'| == 2**31) is invalidated.   
     31
     32 * In the L1 cache, the bounded ''time of life'' of the registered reservation is implemented as a cycle counter: The counter starts when a new reservation is registered in the L1 cache, and the reservation is invalidated when the counter reaches 2**31 cycles.
     33
     34=== 2.3 Replacement policy ===
     35
     36=== 2.4 Detailed specification for the L1 cache ===
     37
     38Each L1 cache controller contains 4 registers to store one single LL/SC reservation:
     39 * physical address  : 40 bits
     40 * reservation key   : 32 bits
     41 * cycle counter     : 31 bits
     42 * valid reservation :  1 bit
     43
     44We summarize below the actions done by the L1 cache controller receiving a LL(X), SC(X,DT) or SW(X,DT) request from the processor:
     45 * '''LL(X)''' : The L1 cache registers the X address and the K key in the reservation register, activates the cycle counter, and send a single flit VCI LL command containing the X address to the L2 cache.
     46 * '''SC(X,D)''' : The L1 cache checks the X address agains the registered address. In case of miss, it returns a ''failure'' code to the processor, without any VCI transaction on the network. In case of hit, it invalidates the local reservation and sent a two flits VCI SC command containing the X address, the registered K value, and the D value.
     47 * '''SW(X,D)''' : The L1 cache checks the X address against the registered address. In case of hit the reservation is invalidated. In case of miss, the reservation is not modified. In both cases the write request is sent to the L2 cache.
     48
     49=== 2.5 Detailed specification for the L2 cache ===
     50
     51We summarize
     52
     53=== 2.6 Failure / Success encoding ===
    2554
    2655The actual encoding of the (success/failure) return value for a SC access depends on the processor core: For the MIPS2
     
    5281}}}
    5382
    54 == 3.  Cachable atomic operations ==
     83== 3.  CAS mechanism ==
    5584
    56 In order to support cachable spin-locks and a better scalability, the TSAR memory cache controller, and the L1 cache controller cooperate to implement the LL/SC mechanism. But the standard semantic of the LL/SC mechanism has to be modified:
    57  * The LL operation is implemented by the L1 cache controller as a standard Read operation.
    58  * The SC opration is implemented as a Compare and Swap operation.
    59 Furthermore, the LL/SC mechanism is extended to support both 32 and 64 bits atomic accesses.
    6085
    6186=== 3.1 new semantic ===
     
    151176  The L1 cache is updated or invalidated.
    152177}}}
    153