Changes between Version 2 and Version 3 of Specification


Ignore:
Timestamp:
Jun 27, 2009, 2:17:53 PM (13 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Specification

    v2 v3  
    77The main technical issue is the scalability, as this architecture is intended to integrate up to 4096 cores (even if the first prototype will contain only 16 cores). The second technical issue is the power consumption, and all technical choices described below are driven by these two goals.
    88
    9 == The processor core ==
     9== 1. Processor core ==
    1010
    1111In order to obtain the best MIPS/MicroWatt ratio, the TSAR processor core is a simple 32 bits, single instruction issue RISC processor, with no superscalar features, no out of order execution, no branch prediction, no speculative execution. In order to avoid the enormous effort to develop a brand new compiler, TSAR will use an existing processor core. The choice is not important : It could be a MIPS32, a PPC405, a SPARC V8, or an ARM7 core, as all these processor cores have similar performances.
     
    1313The first TSAR architecture demonstrator will use a MIPS32 processor core.
    1414
    15 == The memory layout ==
     15== 2. Memory layout ==
    1616
    1717The physical address space size is a parameter. The maximal value is 1 Tbytes (40 bits physical address). For scalability reasons, the TSAR physical memory is logically shared, but physically distributed : The architecture is clusterized , and has a 2D mesh topology. Each cluster contains up to 4 processors, a local interconnect and one physical memory bank. The architecture is NUMA (Non Uniform Memory Access) : All processors can access all memory banks, but the access time, and the power consumption depend on the distance between the processor and the memory bank.
     
    2323
    2424
    25 ==      The virtual memory support ==
     25==      3. Virtual memory support ==
    2626
    2727The TSAR architecture implements a paginated virtual memory. It defines a generic MMU (Memory Management Unit), physically implemented in the L1 cache controller. This generic MMU is independent on the  processor core, and can be used with any 32 bits, single instruction issue RISC processor. To be independent from the processor core, the TLB MISS are handled by an hardwired FSM, and do not use any specific instructions.
     
    3131In order to help the operating system to implement efficient page replacement policies, each entry in the page table contains three bits that are updated by the hardware MMU :  a dirty bit to indicate modifications, and two separated access bits for “local access” (processor and memory cache located in the same cluster), and “remote access” (processor and memory cache located in different clusters).
    3232
    33 == The DHCCP cache coherence protocol ==
     33== 4. DHCCP cache coherence protocol ==
    3434
    3535The shared memory TSAR architecture implements the DHCCP protocol (Distributed Hybrid Cache Coherence Protocol). As it is not possible to monitor all simultaneous transaction in a distributed network on chip, the DHCCP protocol is  based on the global directory paradigm.
     
    4141Finally, the DHCCP protocol is called “hybrid”, as it uses a multicast/update policy when the number of copies is lower than a given threshold, and automatically switches to a broadcast/invalidate policy when this number of copies exceeds this threshold.
    4242
    43 == The interconnection networks ==
     43== 5. Interconnection networks ==
    4444 
    4545The TSAR architecture requires a hierarchical two levels interconnect : each cluster must contain a local interconnect, and the communications between clusters relies on a global interconnect.
    4646
    47 As described in section, the DHCCP protocol defines three classes of transactions that must use three separated interconnection networks : the D_network, used for the direct read/write transactions; the C_network, used for coherence transactions; the X _network, used to access the external memory in case of Miss on the memory cache.
     47As described in [CacheCoherence the cache coherence section], the DHCCP protocol defines three classes of transactions that must use three separated interconnection networks : the D_network, used for the direct read/write transactions; the C_network, used for coherence transactions; the X _network, used to access the external memory in case of Miss on the memory cache.
    4848
    4949The DSPIN network on chip (developed by the LIP6 laboratory) implements the D_network and the C_network. It has the requested 2D mesh topology, and  provides the shared memory TSAR architecture a truly scalable bandwidth. It supports the VCI/OCP standard, and implements a logically “flat” address space.  It is well suited to power consumption management, as it relies on the GALS (Globally Asynchronous, Locally Synchronous) approach : Both the voltage & the clock frequency can be independently adjusted in each cluster. It provides two fully separated virtual channels for the direct traffic and for the coherence traffic. It provides the broadcast service requested by the DHCCP protocol.
     
    5252
    5353
    54 ==      Atomic instructions ==
     54==      6. Atomic operations ==
    5555
    5656Any multi-processor architecture must provide an hardware support for atomic operations. These “read-then-write” atomic operations are used by the software for synchronization.
     
    6060Each processor instruction set defines a different set of atomic instruction. The TSAR architecture implements the LL/SC mechanism, that are natively defined by the  MIPS32 & PPC405 processors, and are directly supported by the VCI/OCP standard. Other atomic instructions, such as the SWAP, or LDSTUB instructions defined by the SPARC processor can be emulated using the LL/SC instructions.
    6161
    62 With this mechanism, the TSAR architecture allows the system developers to use cachable spin-locks.
     62With this mechanism, the TSAR architecture allows the software developers to use cachable spin-locks.
    6363