Changes between Version 1 and Version 2 of CacheCoherence


Ignore:
Timestamp:
Jul 1, 2009, 6:11:11 PM (15 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CacheCoherence

    v1 v2  
    1 [PageOutline]]
     1[[PageOutline]]
    22
    33= Cache coherence protocol =
     
    1414In the TSAR architecture, the memory controller is distributed, as it is implemented by the distributed memory caches (one per cluster). Therefore, the global directory itself is distributed.  The memory cache being inclusive: a cache line L that is present in at least one L1 cache must be present in the corresponding memory cache cache (in the home cluster). With this property, the Global Directory can be implemented as an extension of the memory cache directory.
    1515
    16 
    17 
    18 
    19 
    20 
    21 
    22 
    23 
    24 
    25 
    26 
    27 
    28 
    29 
    30 
    31 
    32 
    33 
    3416In case of MISS, the memory cache controller must evict a victim line to bring in the missing line. In order to maintain the inclusive property, all copies of the evicted cache line in L1 caches must be invalidated. To do it, the memory cache controller must send INVALIDATE requests to all L1 caches containing a copy.
    3517
    36 In summary, there is two types of coherence transactions sent by the memory cache controller to the L1 cache controllers : UPDATE requests in case of WRITE to a multi-replicated cache. INVALIDATE requests, in case of line eviction by the the memory cache.
     18The TSAR architecture wants to guaranty the cache coherence by hardware, for both the data and instruction caches (L1 caches). Reflecting the different behaviour of data & instruction caches, the DHCCP protocol defines different strategies, depending on the number of copies :
     19 * Regarding the data, the modifications of shared data are very frequent events, but – in average – the number of copies is not very high. Therefore, the DHCCP protocol will preferably use a ''multicast/update'' strategy for the data caches.
     20 * Regarding the instructions, the modifications of shared code are rather rare events ( in case of self modifying code, or dynamic libraries ), but the number of replicated copies can be very large ( the system call handler, or the libc are likely replicated in all L1 caches ). Therefore, the DHCCP ptotocol will generally use a ''broadcast/invalidate'' policy for instruction caches. 
    3721
    38 The TSAR architecture wants to guaranty the cache coherence by hardware, for both the DATA and INSTRUCTION caches (L1 caches). Reflecting the different behaviour of data & instruction caches, the coherence policy is different for data and instructions. Regarding the instructions, the modifications of shared code are rather rare events ( in case of self modifying code, or dynamic libraries ), but the number of replicated copies can be very large ( the system call handler, or the libc are likely replicated in all L1 caches ). Therefore, the TSAR architecture implements a broadcast/invalidate policy for instructions. Regarding the data, the modifications of shared data are very frequent events, but – in average – the number of copies is not very high. Therefore, the TSAR architecture implements a multicast/update policy for data.
    39 4.2     Types of transaction
     22== 2.  Types of transaction ==
    4023
    41 Three types of transactions, have been identified in the TSAR architecture
    42 -       Direct transactions : READ / WRITE / LL / SC
    43 -       Coherence transactions : UPDATE / INVALIDATE / CLEANUP
    44 -       External Transactions : PUT / GET
    45 -       
     24Three types of transactions, have been identified :
     25 * Direct transactions : READ / WRITE / LL / SC
     26 * Coherence transactions : UPDATE / INVALIDATE / CLEANUP
     27 * External Transactions : PUT / GET
     28       
    4629For dead-lock prevention, these three types of transaction must be transported on three (virtually or physically) separated networks.
    4730
    4831As a general rule, all these transactions respect the VCI advanced packet format, and there is one response packet for each command packet : For a burst transaction, a READ command packets contains one single flit, and the corresponding READ response packets contains N flits. Symmetrically, a WRITE command packet contains N flits, and the corresponding WRITE response contains one single flit.
    4932
    50 There is one exception : For a BROADCAST_INVALIDATE transaction, the initiator sends one single flit VCI packet, but receives several single flit VCI response packets (see section 4.2.2). 
    51 4.2.1   READ / WRITE / LL / SC
    52 Those transactions are initiated by a processor (actually the L1 cache controller), or by another initiator ( an I/O peripheral with a DMA capability, or a specialized hardware coprocessor). This initiator can be located in any cluster. For those transactions, the target is a memory cache controller, acting as a physical memory bank, or another VCI target peripheral. This target can be located in any cluster.
     33There is one exception : For a BROADCAST_INVALIDATE transaction, the initiator sends one single flit VCI packet, but receives several single flit VCI response packets (see section 2.2).
     34 
     35=== 2.1  READ / WRITE / LL / SC ===
     36 
     37These transactions are initiated by a processor (actually the L1 cache controller), or by another initiator ( an I/O peripheral or hardware coprocessor with a DMA capability). This initiator can be located in any cluster. For those transactions, the target is a memory cache controller, acting as a physical memory bank, or another VCI target peripheral. This target can be located in any cluster.
    5338
    54 •       A READ transaction can be a single word request (in case of uncached access), or a burst, corresponding to a complete cache line (16 words). A READ burst transaction initiated by any DMA controller must respect the same 16 words cache line format. For all READ transaction, the VCI command packet contains one single VCI flit. The  VCI CMD field must contain the VCI_READ code. The VCI PLEN field is used to define the burst length. A READ transaction has a type, encoded with two bits in the VCI TRDID field : bit 0 of the TRID field is 0 for an uncached access, and 1 for a cached access. bit 1 of the TRDID field is 0 for a data cache request, and 1 for an instruction cache request. The response packet contains one VCI flit (single word) or 16 VCI flits (cache line). The VCI PKTID field is not used.
     39 * A '''READ''' transaction can be a single word request (in case of uncached access), or a burst, corresponding to a complete cache line (16 words). A READ burst transaction initiated by any DMA controller must respect the same 16 words cache line format. For all READ transaction, the VCI command packet contains one single VCI flit. The  VCI CMD field contains the VCI_READ code. The VCI PLEN field is used to define the burst length. A READ transaction has a type, encoded with two bits in the VCI TRDID field : bit 0 of the TRID field is 0 for an uncached access, and 1 for a cached access. bit 1 of the TRDID field is 0 for a data cache request, and 1 for an instruction cache request. The response packet contains one VCI flit (single word) or 16 VCI flits (cache line). The VCI PKTID field is not used.
    5540
    56 •       A WRITE transaction can be a single word request or a variable length burst request. In case of burst, all words must belong to the same cache line, wit consecutive addresses. Therefore, the VCI command packet contains at most 16 VCI flits. The VCI BE field can have different values for each flit (including the zero value). The VCI response packet contains one VCI flit. A WRITE burst transaction initiated by any DMA controller must respect the same 16 words format. For a WRITE transaction, the VCI CMD field must contain the VCI_WRITE code. When the VCI PKTID field contains a non-zero value, it signals that the write request is “posted” : The VCI target must send a response to respect the VCI protocol, but this response can be send before the write is actually performed. This can be used by by the VCI/HT bridge. The VCI PKTID fields is not used. If the modified cache line is replicated in one or several other L1 caches, all copies must be updated or invalidated before the WRITE transaction is acknowledged.
     41 * A '''WRITE''' transaction can be a single word request or a variable length burst request. In case of burst, the the VCI command packet contains at most 8 VCI flits, with consecutive addresses. All words belong to the same half cache line, and the VCI BE field can have different values for each flit (including the zero value). The VCI response packet contains one VCI flit. A WRITE burst transaction initiated by any DMA controller must respect the same 8 aligned words constraint.  The VCI CMD field contains the VCI_WRITE code. When the VCI TRDID field contains a non-zero value, it signals that the write request is “posted” : The VCI target must send a response to respect the VCI protocol, but this response can be send before the write is actually performed. This can be used by by the VCI/HT bridge. The VCI PKTID fields is not used. If the modified cache line is replicated in one or several other L1 caches, all copies must be updated or invalidated before the WRITE transaction is acknowledged.
    5742
    58 •       The TSAR architecture support the LL/SC mechanism for atomic operations (see section 5). For both a  LL (Linked Load) or a SC (Store Conditionnal) transaction, the VCI command packet and the VCI response packet contain one single VCI flit. The VCI CMD field must contain the VCI_LINKED_READ value (resp. VCI_STORE_CONDITIONNAL) value. The VCI VCI PKTID and TRDID fields are not used.
    59 4.2.2   MULTI_UPDATE / MULTI_INVAL / BROADCAST_INVAL / CLEANUP
     43 * The TSAR architecture supports the '''LL/SC''' mechanism for atomic operations (see AtomicOperation). For both a  LL (Linked Load) or a SC (Store Conditionnal) transaction, the VCI command packet and the VCI response packet contain one single VCI flit. The VCI CMD field must contain the VCI_LINKED_READ value (resp. VCI_STORE_CONDITIONNAL) value. The VCI VCI PKTID and TRDID fields are not used.
     44
     45=== 2.2 MULTI_UPDATE / MULTI_INVAL / BROADCAST_INVAL / CLEANUP
    6046These transactions are initiated by a memory cache controller to update or invalidate copies in the L1 caches. For each cache line stored in the memory cache, the memory cache handles an INS bit indicating that this cache line is replicated in at least one L1 instruction cache. This bit is set as soon as the memory cache receives a cache line READ request with the INS bit set in the TRDID field. When the cache line is marked as data  (INS = 0), the memory cache handles an explicit set of the SRCIDs of all L1 caches  containing a copy. When the cache line is marked as instruction (INS = 1), the memory cache handles a counter containing the number of copies in L1 caches.
    6147