Changes between Version 4 and Version 5 of CacheCoherence


Ignore:
Timestamp:
Jul 2, 2009, 2:20:18 PM (15 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CacheCoherence

    v4 v5  
    1616In case of MISS, the memory cache controller must evict a victim line to bring in the missing line. In order to maintain the inclusive property, all copies of the evicted cache line in L1 caches must be invalidated. To do it, the memory cache controller must send INVALIDATE requests to all L1 caches containing a copy.
    1717
    18 The TSAR architecture wants to guaranty the cache coherence by hardware, for both the data and instruction L1 caches. Reflecting the different behaviour of data & instruction caches, the DHCCP protocol defines two different strategies, depending on the number of copies :
    19  * '''MULTICAST_UPDATE''' :  the modifications of shared data are very frequent events, but – in average – the number of copies is not very high. Therefore, when the number of copies is smaller than a given threshold, the cache controller registers the locations of all the copies, and use a ''multicast/update'' transaction.
    20  * Regarding the instructions, the modifications of shared code are rather rare events ( in case of self modifying code, or dynamic libraries ), but the number of replicated copies can be very large ( the system call handler, or the libc are likely replicated in all L1 caches ). Therefore, the DHCCP ptotocol will generally use a ''broadcast/invalidate'' policy for instruction caches. 
     18The TSAR architecture wants to guaranty the cache coherence by hardware, for both the data and instruction L1 caches. Reflecting the different behaviour of data & instruction caches, the ''hybrid" cache coherence protocol defines two different strategies, depending on the number of copies :
     19 * '''MULTICAST_UPDATE''' :  the modifications of shared data are very frequent events, but the number of copies is generally not very high. When the number of copies is smaller than the DHCCP threshold, the cache controller registers the locations of all the copies, and send a ''multicast_update'' transaction
     20to the concerned L1 caches.
     21 * '''BROADCAST_INVAL''' : the modifications of shared code rare events ( self modifying code, or dynamic libraries ), but the number of replicated copies can be very large ( the exception handler, or the libc are generally replicated in all L1 caches ). When the number of copies is larger than the DHCCP threshold, the memory cache controller will simply store the number of copies (without localization) and send a ''broadcast_inval'' transaction  to all L1 caches. 
    2122
    2223== 2.  Types of transaction ==
     
    3334There is one exception : For a BROADCAST_INVALIDATE transaction, the initiator sends one single flit VCI packet, but receives several single flit VCI response packets (see section 2.2).
    3435 
    35 === 2.1  READ / WRITE / LL / SC transactions ===
     36=== 2.1  Direct transactions ===
    3637 
    3738These transactions are initiated by a processor (actually the L1 cache controller), or by another initiator ( an I/O peripheral or hardware coprocessor with a DMA capability). This initiator can be located in any cluster. For those transactions, the target is a memory cache controller, acting as a physical memory bank, or another VCI target peripheral. This target can be located in any cluster.
     
    4344 * The TSAR architecture supports the '''LL/SC''' mechanism for atomic operations (see AtomicOperations). For both a  LL (Linked Load) or a SC (Store Conditionnal) transaction, the VCI command packet and the VCI response packet contain one single VCI flit. The VCI CMD field must contain the VCI_LINKED_READ value (resp. VCI_STORE_CONDITIONNAL) value. The VCI VCI PKTID and TRDID fields are not used.
    4445
    45 === 2.2 MULTI_UPDATE / MULTI_INVAL / BROADCAST_INVAL / CLEANUP transactions ===
     46=== 2.2 Coherence transactions ===
    4647
    47 These 4 transactions implement the DHCCP protocol : For each cache line stored in the memory cache, the memory cache implement a Registration Table that contain the copies replicated in the L1 caches. Each entry in this Registration Table contains the SRCID of a L1 cache that contains a copy, as well as the type of the copy (instruction/data). When the same cache line is replicated in both the instruction cache and the data cache of a processor, this defines two separated entries in the Registration Table. When the number copies for a given cache line L exceeds the DHCCP threshold, the corresponding Registration Table is flushed, and the memory cache register only the number of copies.
     48These transactions implement the DHCCP protocol : For each cache line stored in the memory cache, the memory cache implement a Registration Table that contain the copies replicated in the L1 caches. Each entry in this Registration Table contains the SRCID of the L1 cache that contains a copy, as well as the type of the copy (instruction/data). When the same cache line is replicated in both the instruction cache and the data cache of a processor, this defines two separated entries in the Registration Table. When the number copies for a given cache line L exceeds the DHCCP threshold, the corresponding Registration Table is flushed, and the memory cache register only the number of copies.
     49
     50The coherence transactions use a logically separated ''coherence network'', implementing a separated address space.
    4851 
    49  * A '''MULTI_UPDATE''' transaction is a multi-cast transaction sent by the memory cache controller when it receives a WRITE request to a replicated cache line and the number of copies does not exceeds the DHCCP threshold. It sends as many VCI transactions as the number of registered copies (but the writer). The VCI command packet contains (N+2) flits. The VCI ADDRESS field is constant & contains the address of the memory mapped UPDATE register in the L1 cache. The VCI CMD field contains the WRITE value. As the memory cache controller can handle several simultaneous update/invalidate transactions, the VCI TRDID field contains the transaction index. The VCI PLEN field contains the value  4*N, where N is the actual number of modified words in the cache line. The line index (34 bits) is transported in the VCI WDATA and VCI BE fields, of the first flit. The first modified word index (3 bits) is transported in the WDATA field of the second flit, and the N modified words in the WDATA and BE fields of the  N following flits. For each modified word, the VCI BE field can have a different value (including the 0x0 value). The VCI response packet contains one single flit. The memory cache controller counts the number of VCI responses to detect the completion of the MULTI_UPDATE transaction.
     52 * A '''MULTICAST_UPDATE''' transaction is a multi-cast transaction sent by the memory cache controller when it receives a WRITE request to a replicated cache line and the number of copies does not exceeds the DHCCP threshold. It sends as many VCI transactions as the number of registered copies (but the writer). The VCI command packet contains (N+2) flits. The VCI ADDRESS field is constant & contains the address of the memory mapped UPDATE register in the L1 cache. The VCI CMD field contains the WRITE value. As the memory cache controller can handle several simultaneous update/invalidate transactions, the VCI TRDID field contains the transaction index. The VCI PLEN field contains the value  4*N, where N is the actual number of modified words in the cache line. The line index (34 bits) is transported in the VCI WDATA and VCI BE fields, of the first flit. The first modified word index (3 bits) is transported in the WDATA field of the second flit, and the N modified words in the WDATA and BE fields of the  N following flits. For each modified word, the VCI BE field can have a different value (including the 0x0 value). The VCI response packet contains one single flit. The memory cache controller counts the number of VCI responses to detect the completion of the MULTI_UPDATE transaction.
    5053
    51  * A '''MULTI_INVAL''' transaction is a multi-cast transaction, that is composed of several VCI transactions. When a memory cache makes a cache line replacement (following a MISS), and the victim line has the data type (INS = 0), it sends as many VCI transactions as the number of registered copies. Both the VCI command packet and the VCI response packet contain only one flit. The VCI CMD field contains the WRITE value. The VCI ADDRESS field contains the address of the memory mapped INVAL register in the L1 cache. The VCI CMD field contains the WRITE value. As the memory cache controller can handle several update/invalidate transactions simultaneously, the VCI TRDID field contains the transaction index.The VCI WDATA field contains the line index. The memory cache controller counts the number of VCI responses to detect the completion of the  MULTI_INVAL transaction.
     54 * A '''MULTICAST_INVAL''' transaction is a multi-cast transaction, that is composed of several VCI transactions. When a memory cache makes a cache line replacement (following a MISS), and the victim line has a number of copies smaller than the DHCCP threshold, it sends as many VCI transactions as the number of registered copies. Both the VCI command packet and the VCI response packet contain only one flit. The VCI ADDRESS field contains the address of the memory mapped INVAL register in the L1 cache. The VCI CMD field contains the WRITE value. As the memory cache controller can handle several update/invalidate transactions simultaneously, the VCI TRDID field contains the transaction index.The VCI WDATA & VCI BE fields contain the 34 bits line index. The memory cache controller counts the number of VCI responses to detect the completion of the  MULTI_INVAL transaction.
    5255
    53 •       A BROADCAST_INVAL transaction is a broadcast transaction. This transaction is initiated when a memory cache controller replace a line that has the instruction type (INS = 1), or when the memory cache receives a WRITE request to a replicated cache line that has the instruction type (INS = 1). The VCI command packet contains one single flit. This packet is replicated & dynamically broadcasted by the network itself. The VCI CMD field contains the WRITE value. The VCI ADDRESS field contains the global broadcast address 0x000000003 (only the two LSB bits are set). The VCI WDATA field contains the line index. This VCI command is broadcasted to all L1 caches in the system, but only L1 caches that have a copy send a VCI response packet. All VCI response packets are independently returned to the memory cache initiator, that counts the number of VCI responses to detect the completion of the BROADCAST_INVAL transaction. If a L1 cache contains two copies of a cache line (i.e. the line is replicated in both the DATA cache, and the INSTRUCTION cache), it must send two VCI responses.
     56 * A '''BROADCAST_INVAL''' transaction is a broadcast transaction. This transaction is initiated when a memory cache controller replaces a line, or receives a WRITE request to a replicated cache line, that has a number of copies larger than the DHCCP threshold. The VCI command packet contains one single flit. This packet is replicated & dynamically broadcasted by the network itself. The VCI CMD field contains the WRITE value. The VCI ADDRESS field contains the global broadcast address 0x000000003 (only the two LSB bits are set). The VCI WDATA field contains the line index. This VCI command is broadcasted to all L1 caches in the system, but only L1 caches that have a copy send a VCI response packet. All VCI response packets are independently returned to the memory cache initiator, that counts the number of VCI responses to detect the completion of the BROADCAST_INVAL transaction. If a L1 cache contains two copies of a cache line (i.e. the line is replicated in both the DATA cache, and the INSTRUCTION cache), it must send two VCI responses.
    5457
    55 •       A CLEANUP transaction is initiated by a L1 cache controller to a memory cache controller, to signal that a cache line copy has been removed from an instruction or data cache. Both the VCI command packet and the VCI response packet contain one single flit. For a CLEANUP transaction, the VCI ADDRESS field must contain the removed cache line address, and the VCI TRDID field must contain a non zero value.
     58 * A '''CLEANUP''' transaction is initiated by a L1 cache controller to a memory cache controller, to signal that a cache line copy has been removed from an instruction or data cache. Both the VCI command packet and the VCI response packet contain one single flit. For a CLEANUP transaction, the VCI ADDRESS field must contain the removed cache line address, and the VCI TRDID field must contain a non zero value.
    5659
    57 4.2.3   PUT / GET
     60=== 2.3 External transactions ===
    5861
    59 The PUT and GET transactions are initiated by the memory caches, to get or save a complete cache line in case of MISS. The targets are always the external RAM controller(s). All these transactions use a separated network, and a separated address space. The memory cache and external RAM controller ports respect an simplified version of the VCI advanced format : the VCI fields PLEN, PKTID, CONST, CONTIG and BE are not used. The VCI ADDRESS field contains 30 bits (a 64 bytes cache line index). (30 bits). The VCI WDATA & RDATA fields contain 64 bits, in order to improve the bandwidth. The VCI SRCID field contains the memory cache index (cluster index). As the memory cache controller can process several PU and/or GET transaction simultaneously, the VCI TRDID field contains the transaction index.
     62These transactions are initiated by the memory caches, to fetch or save a complete cache line in case of MISS in the memory cache. The general policy between the memory caches and the external memory is WRITE_BACK : The external memory is only updated in case of line replacement. The target is always the external RAM controller.
    6063
    61 •         For a GET transaction, the VCI command packet contains one single flit. The VCI CMD field contains the READ value. The VCI response packet contains 8 flits (corresponding to the 64 bytes of a cache line).
     64All the external transactions use a separated ''external network'', implementing a separated address space. The memory cache and external RAM controller ports used to access the external network respect a simplified version of the VCI advanced format : the VCI fields PLEN, PKTID, CONST, CONTIG and BE are not used. The VCI ADDRESS field contains 30 bits (a 64 bytes cache line index). (30 bits). The VCI WDATA & RDATA fields contain 64 bits, in order to improve the bandwidth. The VCI SRCID field contains the memory cache index (cluster index). As the memory cache controller can process several external transactions simultaneously, the VCI TRDID field contains the transaction index.
    6265
    63 •         For a PUT transaction, the VCI command packet contains 8 flits. The VCI CMD field contains the WRITE value. The VCI response packet contains 1 flit.
     66 * For a '''GET''' transaction, the VCI command packet contains one single flit. The VCI CMD field contains the READ value. The VCI response packet contains 8 flits (corresponding to the 64 bytes of a cache line).
     67
     68 * For a PUT transaction, the VCI command packet contains 8 flits. The VCI CMD field contains the WRITE value. The VCI response packet contains 1 flit.
    6469
    6570