[[PageOutline]] = Communication Infrastructure = == 1. The 3 interconnection networks == The TSAR architecture defines three logically independent VCI compliant networks, that are fully separated for dead-lock prevention : * The '''Direct Network''' implements the 40 bits TSAR physical address space that is visible by the software. It transports the direct READ, WRITE, LL, & SC transactions from any VCI initiator (typically a L1 cache controller or another hardware coprocessor with a DMA capability) to any VCI target (typically a memory cache controller, or a memory mapped peripheral). * The '''Coherence Network''' implements a separated 40 bits physical address space, used to transport the coherence transactions : MULTI_UPDATE, MULTI_INVAL, BROADCAST_INVAL (from memory cache controllers to L1 cache controllers) and CLEANUP (from the L1 cache controllers to the memory cache controllers). This address space is not visible by the software. * The '''External Network''' implements a 34 bits physical address space.This network transports the PUT and GET transactions from the memory cache controller to the external RAM controller, in case of MISS or cache line replacement in the memory cache. This address space is not visible by the software. == 2. VCI initiators & targets indexing == A given hardware component can have several VCI ports. For example the L1 cache has three VCI ports : one initiator port to the direct network, one initiator port to the coherence network, and one target port on the coherence network. Each VCI port can have a different identifier that is defined by three indexes : * '''X_ID''' is the cluster X-coordinate. * '''Y_ID''' is the cluster Y-coordinate. * '''L_ID''' is the local index inside the cluster. An hardware component that has several VCI ports can have several different values for the L_ID local index. The X_ID, Y_ID and L_ID are coded on NX, NY, NL bits respectively. The NX, NY and NL parameters are global for a given instance of the TSAR architecture. NX & NY cannot be larger than 5 (no more than 1024 clusters), but can be smaller, if the number of clusters is smaller than 1024. NL is equal to 4 (no more than 16 target ports or 16 initiator ports per cluster). In order to simplify the hardware implementation of the memory coherence protocol, the L_ID values are standardized on the coherence network, and the same value is used for an initiator port and for a target port: If the number of processors per cluster is NPROCS, the processor L_ID value is between 0 and (NPROCS-1). The memory cache L_ID is equal to NPROCS. === 2.1 Target identification === The target identification is required to route a command packet. For both the direct and coherence networks, a VCI target is identified by the (NX + NY + NLADR) most significant bits of the VCI ADDRESS field : || X || Y || LADR || OFFSET || || NX bits || NY bits || NLADR bits || 40-NX-NY-NLADR || * According to the NUMA characteristics of the TSAR architecture, there is no transcoding of the X & Y fields, that directly define the target cluster coordinates (X_INDEX, Y_INDEX). * The network decodes the LADR field to obtain the target L_ID, using a local routing table (implemented as a wired decoder in each local interconnect controller). The local routing tables and the number of bits NLADR to be decoded depend on the cluster. === 2.2 Initiator identification === The initiator identification is required to route a response packet. For both the direct and coherence networks, a VCI initiator is identified by the VCI SRCID & RSRCID fields (NX + NY + NL bits) : || X_ID || Y_ID || L_ID || || NX bits || NY bits || NL bits || Therefore, the total SRCID width cannot be larger than 14 bits. It can use less than 14 bits when the number of clusters is smaller than 1024. == 3. Direct Network & Coherence Network == These two networks are implemented by the DSPIN network on chip general infrastructure : * The '''local interconnect''' is implemented as two physically independent local rings, and the coherence ring supports a broadcast service for single flit VCI commands. Note : These two physically independent rings could be implemented later as one single physical ring supporting two virtual networks. * The '''global interconnect''' is implemented as one DSPIN network, supporting two virtual sub-networks, and the coherence sub-network supports a broadcast service for single flit VCI commands. === 3.1  VCI encoding of the various transaction types on the direct network === There are 5 transaction types ('''READ''', '''WRITE''', '''LL''', '''SC''', '''CAS''') on the direct network, and four sub-types for '''READ''' transactions. These types are encoded through the pair of VCI fields '''CMD''' and '''PKTID'''. When a given initiator can send several simultaneous transactions of a given type (such as several simultaneous '''WRITE''' transactions), the VCI '''TRDID''' field is used to discriminate them. Possible values for the VCI '''CMD''' field are : || encoding (2 bits) || value || || || || || 00 || CMD_NOP / CMD_STORE_COND || || 01 || CMD_READ || || 10 || CMD_WRITE || || 11 || CMD_LOCKED_READ || The '''PKTID''' field in TSAR is 4 bits long. Only 8 types of transaction are used : the MSB is ignored (reserved for future use). A specific VCI '''CMD''' value must be used for each '''PKTID''' value, as described in the table below : || encoding (4 bits) || '''PKTID''' value || '''CMD''' value || || || || || || X000 || TYPE_READ_DATA_UNC || CMD_READ || || X001 || TYPE_READ_DATA_MISS || CMD_READ || || X010 || TYPE_READ_INS_UNC || CMD_READ || || X011 || TYPE_READ_INS_MISS || CMD_READ || || X100 || TYPE_WRITE || CMD_WRITE || || X101 || TYPE_CAS || CMD_NOP || || X110 || TYPE_LL || CMD_LOCKED_READ || || X111 || TYPE_SC || CMD_STORE_COND || Remarks on the '''PKTID''' field encoding : * for a TYPE_READ, bit 0 is set (resp. not set) for a miss (resp. uncached) request * for a TYPE_READ, bit 1 is set (resp. not set) for an instruction (resp. data) request * bit 2 can be used to check for a TYPE_READ (bit 2 = 0) ==== 3.1.1 VCI READ transaction ==== A VCI '''READ''' command packet contains one flit. * The VCI '''CMD''' field must be set to CMD_READ. * The VCI '''TRDID''' field is not used by the L1 cache. It is used by I/O controlers with multi channel DMA capabilities to transmit t * The VCI '''PKTID''' field can be any of the 4 TYPE_READ_* of the previous table. A VCI '''READ''' response packet returns either * A single flit containing the uncached data in the '''RDATA''' field (for a '''PKTID''' = TYPE_READ_*_UNC). * up to 16 flits containing one word per flit in the '''RDATA''' field (for a '''PKTID''' = TYPE_READ_*_MISS). ==== 3.1.2 VCI WRITE transaction ==== A VCI '''WRITE''' command packet contains from 1 to 16 flits within the same cache line. * The VCI '''CMD''' field must be set to CMD_WRITE. * The VCI '''TRDID''' field is used by the L1 cache to index its write buffer (4 write buffer slots of 4 words each). It is used by I/Olities to transmit the channel index. * The VCI '''PKTID''' field must be TYPE_WRITE. A VCI '''WRITE''' response packet always returns a single flit with a 0 value in the '''RDATA''' field. ==== 3.1.3 VCI LL (Linked Load) transaction ==== A VCI '''LL (Linked Load)''' command packet contains a single flit. ('''N.B.''': this request is only sent by a L1 cache and can only target a memory cache) * The VCI '''CMD''' field must be set to CMD_LOCKED_READ. * The VCI '''TRDID''' field is not used by the L1 cache. * The VCI '''PKTID''' field must be TYPE_LL. A VCI '''LL (Linked Load)''' response packet contains 2 flits : * The first flit contains in the '''RDATA''' field a signature returned by the memory cache for this LL reservation. * The second flit contains in the '''RDATA''' field the data that has been read in the memory cache. ==== 3.1.4 VCI SC (Store Conditional) transaction ==== A VCI '''SC (Store Conditionnal)''' command packet contains 2 flits. ('''N.B.''': this request is only sent by a L1 cache and can only target a memory cache) * The VCI '''CMD''' field must be set to CMD_STORE_COND. * The VCI '''TRDID''' field is not used by the L1 cache. * The VCI '''PKTID''' field must be TYPE_SC. * The first flit contains in the '''WDATA''' field the signature obtained with the last LL operation at this address. * The second flit contains in the '''WDATA''' field the data to be written. A VCI '''SC (Store Conditional)''' response packet contains 1 flit. * The '''RDATA''' field contains 0 (resp. 1) to indicate an SC success (resp. failure). ==== 3.1.5 VCI CAS (Compare & Swap) transaction ==== A VCI '''CAS (Compare & Swap)''' command packet contains 2 flits. ('''N.B.''': this request is only sent by a L1 cache and can only target a memory cache) * The VCI '''CMD''' field must be set to CMD_NOP. * The VCI '''TRDID''' field is not used by the L1 cache. * The VCI '''PKTID''' field must be TYPE_CAS. * The first flit contains in the '''WDATA''' field the old value of the data to be overwritten. * The second flit contains in the '''WDATA''' field the new value to be written. A VCI '''CAS (Compare & Swap)''' response packet contains 1 flit. * The '''RDATA''' field contains 0 (resp. 1) to indicate a CAS success (resp. failure). === 3.2 VCI encoding of the various transaction types on the coherence network === On the coherence network the VCI encoding is defined by the hardware with the following policy: For all command packets (multi-update, multi-invalidate, broadcast-invalidate, and cleanup), the VCI CMD field is a WRITE. The line index (up to 34 bits if we use 40 bits addresses) is transported in the WDATA and BE fields of the first VCI flit. The WDATA field contains the 32 LSB bits of the line index, and the BE field contain the 2 MSB bits of the line index. The multicast invalidate, broadcast invalidate, and cleanup packets contain one single VCI flit. The multi-cast update packets contain (2+N) flits : the WDATA field of the second flit contains the index of the first word to be updated in the cache line. The following flits (at most 16 flits) contains the values to be written. * In a '''multicast''' command packet from a memory cache controller to a L1 cache controller, the address is obtained by copying the target L1 cache SRCID in the MSB bits of the VCI ADDRESS (left aligned) : The L1 cache L_ID is actually used as the LADR address field. UPDATE/INVAL requests are distinguished by the bit ADDRESS[3] (0 for INVAL, 1 for UPDATE). DATA/INSTRUCTION caches are distinguished by the bit ADDRESS[2] (0 for DATA, 1 for INSTRUCTION). * In a '''cleanup''' command packet from a L1 cache controller to a memory cache controller, the address is obtained by copying the (NX + NY) MSB bits of the line index in the VCI ADDRESS field (left aligned). The NPROCS value for the LADR address field is used to select the memory cache. * In a '''broadcast_invalidate''' command packet, from a memory cache controller to a L1 cache controller, the ADDRESS[1:0] bits must be equal to 0x3. The 20 bits ADDRESS[39:20] contain the XMIN,XMAX,YMIN,YMAX values defining the bounding box of the broadcast: || XMIN || XMAX || YMIN || YMAX || reserved ||11 || || (5) || (5) || (5) || (5) || (18) ||(2)|| === 3.3 VCI parameters === All Hardware components connected to the direct network or to the coherence network respect the VCI/OCP communication interface. The direct network, and the coherence network being ''time-multiplexed'' on the DSPIN infrastructure, have identical VCI formats : || VCI Field || width || || || || ||ADDRESS || 40 bits || ||WDATA , RDATA || 32 bits || ||PLEN || 8 bits || ||SRCID, RSRCID || 14 bits || ||TRDID, RTRDID || 4 bits || ||PKTID, RPKTID || 4 bits || ||RERROR || 1 bit || The TSAR architecture uses one single bit for the VCI RERROR field, even if the DSPIN infrastructure supports 2 bits for the error field. === 3.3 DSPIN Packet format === The VCI command & response packets are translated (actually serialized) to a more convenient DSPIN network format by the VCI/RING wrappers (in platform using the RING local interconnect) or by the VCI/DSPIN wrappers (in platforms using the XBAR local interconnect). These wrappers are located between the VCI initiator and target components and the DSPIN network. The DSPIN command packet width is 40 bits, and the DSPIN response packet width is 33 bits. The DSPIN interconnexion network uses only the following information to route both the DSPIN packets to the proper destination: * The MSB bit is the EOP flag, defining the last flit of a DSPIN packet. * The LSB bit of the first flit is the BC flag, defining a DSPIN broadcast packet. * For a response packet, BC=0 and the RSRCID field is used to route the packet to the proper destination. * For a non broadcast command packet, BC = 0), and the (NX+NY+NL) MSB bits of the ADDRESS field are used to route the packet to the proper destination. * For a broadcast packet, BC = 1, and the XMIN, XMAX, YMIN, YMAX fields (5 bits each), are used by the network to limit the broadcast. The DSPIN format has been designed to transport 40 bits VCI ADDRESS, and 14 bits VCI SRCID. If the VCI ADDRESS use less than 40 bits (for example 32 bits), the VCI ADDRESS field is left aligned, and the LSB bits of the DSPIN field are completed with "0". If the SRCID field uses less than 14 bits (NX < 5 or NY < 5), the SRCID field is left aligned, and the LSB bits of the DSPIN field are completed with "O". The five types of DSPIN packets are defined below: ==== 3.3.1 DSPIN Read Command packet format (40 bits) ==== A single flit VCI Read Command packet (this includes LL packets) is translated to a 2 flits DSPIN Read Command packet : Flit 0 : ||EOP||----------------ADDRESS--------------------||BC || || 0 || (38) || 0 || Flit 1 : ||EOP||SRCID||CMD||CGT||PLEN||TRDID||PKTID||BE ||res|| || 1 || (14)||(2)||(2)|| (8)|| (4) || (4) ||(4)||(1)|| ==== 3.3.2 DSPIN write Command packet format (40 bits) ==== A N flits VCI Write Command packet (this includes SC packets) is translated to a N+2 flits DSPIN Write Command packet : Flit 0 : ||EOP||----------------ADDRESS--------------------||BC|| || 0 || (38) || 0|| Flit 1 : ||EOP||SRCID||CMD||CGT||PLEN||TRDID||PKTID||BE ||res|| || 0 || (14)||(2)||(2)|| (8)|| (4) || (4) ||(4)||(1)|| Flit N : ||EOP||-res-||BE ||--------------WDATA---------------|| || 1 || (3) ||(4)|| (32) || ==== 3.3.3 DSPIN Broadcast Command packet format (40 bits) ==== The single flit VCI Write Broadcast is translated to a 2 flits DSPIN Broadcast Command packet. The CID field contains the 10 MSB bits of the VCI SRCID (actually the source cluster coordinates). The XMIN,XMAX, YMIN, YMAX fields are the 20 MSB bits of the VCI ADDRESS, used by the network to limit the broadcast. Flit 0 : ||EOP||XMIN||XMAX||YMIN||YMAX||SRCID||TRDID ||BC|| || 0 || (5)|| (5)|| (5)|| (5)|| (14)|| (4) || 1|| Flit 1 : ||EOP||-res-||----------------NLINE-----------------|| || 1 || (5) || (34) || ==== 3.3.4 DSPIN Response packet format (33 bits) ==== A single flit DSPIN Response packet is built for the following VCI response packets: * a single flit VCI response packet to a WRITE command (no data transmitted), * a single flit VCI response packet to a READ command, where the RDATA field has value 0, * a single flit VCI response packet to a SC or CAS command, where the RDATA field has value 0, For all other VCI response packets (multi-flits VCI response packet, or non-zero RDATA value) a multi-flits DSPIN response packet is built : a N flits VCI response packet is translated to a N+1 flits DSPIN response packet. Flit 0 : ||EOP||RSRCID||RERROR||RTRDID||RPKTID||res||BC|| || 0 || (14) || (2) || (4) || (4) ||(7)|| 0|| Flit 1 : ||EOP||---------------RDATA------------------------|| || 1 || (32) || ==== 3.3.5 DSPIN Write response packet format (33 bits) ==== A single flit VCI Write Response packet is translated to a single flit DSPIN Write Response packet. Flit 0 : ||EOP||RSRCID||RERROR||RTRDID||RPKTID||res||BC|| || 1 || (14) || (2) || (4) || (4) ||(7)|| 0|| Note : This format is also used for the response packets to a broadcast command, as each VCI response packet to a broadcast command is actually a VCI response packet to a single flit write command. == 4. External Network == This network has a specific topology, as the communication scheme is very peculiar: All PUT/GET transactions are from N initiators (one initiator per cluster) to one single target (the external RAM controller). === 4.1 VCI parameters === The external network, that is only transporting cache lines does not use all VCI fields. The address is coded on 34 bits (it is actually a cache line index), and the data field is 64 bits, to increase the bandwidth. || VCI Field || width || || || || ||ADDRESS || 34 bits || ||WDATA , RDATA || 64 bits || ||PLEN || unused || ||SRCID, RSRCID || 10 bits || ||TRDID, RTRDID || 4 bits || ||PKTID, RPKTID || unused || ||RERROR || 1 bit ||