Changes between Version 13 and Version 14 of AtomicOperations


Ignore:
Timestamp:
Dec 7, 2015, 8:24:30 PM (8 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AtomicOperations

    v13 v14  
    55== 1.  Goals ==
    66
    7 The TSAR architecture implements two atomic read-then-write operations to support various synchronization mechanisms :
    8  * The '''LL/SC''' (Linked Load / Store Conditional) operations are implemented as TWO specific Command/Response VCI transactions. As the LL/SC instructions are implemented in the MIPS32 instruction set, these instructions van be used by both the kernel code and by the application code to read a data at address X, test this data, and write the (possibly modified) data at the same address X, with the guaranty that no other access to this data was done between the read and write access.
    9  * The '''CAS''' (Compare and Swap) operation is implemented as ONE specific Command/Response VCI transaction. As there is no CAS instruction in the MIPS32 instruction set, this operation is cannot be used by the software. It is only used by some hardware components such as the MMU contained in the L1 cache controller (to update the DIRTY bit in the page tables), or by some DMA peripheral such as the vci_mwmr_dma component to atomically access the lock protecting a shared communication channel.
     7The TSAR architecture implements two read-then-write atomic operations to support various synchronization mechanisms :
     8 * The '''LL/SC''' (Linked Load / Store Conditional) operation is implemented as two specific VCI transactions. As the LL/SC instructions are implemented in the MIPS32 instruction set, these instructions can be used by both the kernel code and by the application code to read a data at address X, test this data, and write the (possibly modified) data at the same address X, with the guaranty that no other access to this address was done between the read and write access.
     9 
     10 * The '''CAS''' (Compare and Swap) operation is implemented as a specific VCI transaction. As there is no CAS instruction in the MIPS32 instruction set, this operation cannot be used by the software. It is only used by some hardware components such as the L1 cache controller (to allow the MMU to update the DIRTY bit in the page tables), or by some DMA peripheral such as the vci_mwmr_dma component (to atomically access the lock protecting a shared communication channel).
    1011
    11 == 2.  LL/SC mechanism ==
     12== 2.  LL/SC operation ==
    1213
    1314As we want to support commodity operating systems and existing software applications, any memory address can be the target of an atomic access.
     
    1819From a conceptual point of view, the atomicity is handled on the memory controller side, that is actually the L2 cache controller in the TSAR architecture. Each L2 cache controller contains a list of all pending LL/SC atomic operations in an associative ''reservation table'', that contains 32 entries.
    1920
    20  * When a processor P executes the LL(X) instruction to an address X, this réservation request is sent to the L2 cache by the L1 cache. The L2 cache allocates a 32 bits authentication key for this reservation. It registers both the X address and the K key in the associative ''reservation table'', and returns both the value stored at address X and the K value to the L1. Both the X address and the K key are also registered in the L1 cache. If another processor P' request a reservation for the same address X, it receives the save K value from the L2 cache.
     21 * When a processor P executes the LL(X) instruction for an address X, this réservation request is sent to the L2 cache by the L1 cache. The L2 cache allocates a 32 bits authentication key for this reservation. It registers both the X address and the K key in the associative ''reservation table'', and returns both the value stored at address X and the K value to the L1. Both the X address and the K key are also registered in the L1 cache. If another processor P' request a reservation for the same address X, it receives the save K value from the L2 cache.
    2122
    22  * When a processor P executes the SC(X,D) instruction to an address X, this conditional write is sent to the L2 cache by the L1 cache, and the command contains also the reservation key K and the data to be written D. The L2 cache makes an associative search in the ''reservation table''. If both the address X and the key K matches, the atomic operation is a success : The reservation is canceled in the ''reservation table'', the D value is written at address X, and a ''success'' value is returned in the response to the L1 cache. If there is no match in the ''associative table'', the atomic operation is a failure: the D value is not written at address X, the ''reservation table is not modified, and a ''failure'' value is returned to the L1 cache.
     23 * When a processor P executes the SC(X,D) instruction to an address X, this conditional write is sent to the L2 cache by the L1 cache, and the command contains both the reservation key K and the data to be written D. The L2 cache makes an associative search in the ''reservation table''. If both the address X and the key K match, the atomic operation is a success : The reservation is canceled in the ''reservation table'', the D value is written at address X, and a ''success'' value is returned to the L1 cache. If there is no match in the ''associative table'', the atomic operation is a failure: the D value is not written at address X, the ''reservation table is not modified, and a ''failure'' value is returned to the L1 cache.
    2324
    2425Clearly, in case of concurrent LL/SC access to the same address X by two or more L1 caches, the winner is defined by the first SC(X) instruction received by the L2 cache.
     
    3435=== 2.3 Replacement policy ===
    3536
     37As the capacity of the ''reservation table'' is limited to 32 entries, this table can be full when a reservation request LL(x) is received by the L2 cache controller.
     38An existing reservation entry must be invalidated in the associative table. In order to improve the robustness of the mechanism against malicious attacks, the victim selection algorithm is unbalanced: All slots have not the same probability to be evicted from the ''reservation table'', as some slots are selected with a high probability (1/2), and some other slots are selected with a very low probability (1/4096).
     39
    3640=== 2.4 Detailed specification for the L1 cache ===
    3741
    38 Each L1 cache controller contains 4 registers to store one single LL/SC reservation:
     42Each L1 cache controller contains 4 registers to store one single reservation:
    3943 * physical address  : 40 bits
    4044 * reservation key   : 32 bits
     
    4953=== 2.5 Detailed specification for the L2 cache ===
    5054
    51 We summarize
     55Each entry in the ''reservation table'' contains 3 fields to store one reservation:
     56 * physical address  : 40 bits
     57 * reservation key   : 32 bits
     58 * valid reservation :  1 bit
     59
     60We summarize below the actions done by the L2 cache controller receiving a LL(X), SC(X,D,K) or SW(X,DT) VCI command from a L1 cache controller:
     61 * '''LL(X)''' : The L2 cache makes an associative search on the X address in the ''reservation table''. In case of hit (X = Xr), the L2 cache returns both the D value stored at address X, and the K value stored in the ''reservation table'' to the L1 cache. In case of miss, the L2 cache allocates a new K value from the key allocator,
     62registers a new entry in the ''reservation table''(this can require a victim eviction), and returns the D and K values to the L1 cache.
     63 * '''SC(X,D,K)''' : The L2 cache makes an associative search on both the the X address and the K key in the ''reservation table''. In case of hit, the reservation is invalided in the ''reservation table'', the D value is written at address X, and a ''success'' value is returned to the L1 cache. In case of miss, the D value is not written at address X, the ''reservation table'' is not modified, and a ''failure'' value is returned to the L1 cache.
     64 * '''SW(X,D)''' : The L2 cache makes an associative search on the X address in the ''reservation table''. As the write command can be a burst, with a Xmin and Xmax addresses, the HIT condition for each entry containing an address Xr is actually Xmin <= Xr <= Xmax. In case of hit, the reservation Xr is invalidated. In case of miss, the entry containing Xr is not invalided. In both cases the D value is written at address X.
    5265
    5366=== 2.6 Failure / Success encoding ===
    5467
    55 The actual encoding of the (success/failure) return value for a SC access depends on the processor core: For the MIPS2
     68The actual encoding of the (success/failure) response for a SC(X,D) instruction depends on the processor core: For the MIPS2
    5669and ARM processors, a success is encoded as a non-zero value. For the PPC processor, a success is encoded as a zero value.
    57 In the TSAR architecture, the memory cache controller returns the value 0 for a success, and the value 1 for a failure.
     70In the TSAR architecture, the memory cache controller returns the value 0 for a success, and the value 1 for a failure to a SC(X,D,K) VCI command.
    5871If the architecture uses a MIPS or ARM processor, the SC value must be transcoded by the L1 cache controller before
    5972to be transmitted to the processor core.
    6073 
    61 === 2.3 Software implementation on MIPS32 processor ===
     74=== 2.7 Software example on MIPS32 processor ===
    6275
    6376As described below, the LL/SC mechanism can be used to implement a spin-lock, using any memory address :
     
    8194}}}
    8295
    83 == 3.  CAS mechanism ==
     96== 3.  CAS operation ==
    8497
    8598