Changes between Version 1 and Version 2 of Specification


Ignore:
Timestamp:
Jun 27, 2009, 2:06:44 PM (15 years ago)
Author:
alain
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Specification

    v1 v2  
    11[[PageOutline]]
    22
    3 = General Principles =
     3= Architecture Overview =
    44
    55The TSAR shared memory architecture is a scalable, cache coherent, general-purpose multicore architecture. It is intended to support commodity applications and operating systems running on standard PCs, such as LINUX or FreeBSD. Therefore, the cache coherence must be entirely guaranteed by the hardware. Moreover, the TSAR architecture must provide hardware support for a paginated virtual memory and efficient atomic operations for synchronization.
     
    2525==      The virtual memory support ==
    2626
    27 The TSAR architecture implements a paginated virtual memory. It defines a generic MMU (Memory Management Unit), physically implemented in the L1 cache controller. This generic MMU is independent on the  processor core, and can be used with any 32 bits, single instruction issue RISC processor. The TLB MISS are handled by an hardwired FSM, and do not use any specific instructions.
     27The TSAR architecture implements a paginated virtual memory. It defines a generic MMU (Memory Management Unit), physically implemented in the L1 cache controller. This generic MMU is independent on the  processor core, and can be used with any 32 bits, single instruction issue RISC processor. To be independent from the processor core, the TLB MISS are handled by an hardwired FSM, and do not use any specific instructions.
    2828
    2929The virtual address is 32 bits, and the physical address has up to 40 bits. It defines two types of pages (4 Kbytes pages, and 2 Mbytes pages). The page tables are mapped in memory and have a classical two level hierarchical structure. There is of course two separated TLB (Translation Look-aside Buffers) for instruction addresses and data addresses.
    3030
    3131In order to help the operating system to implement efficient page replacement policies, each entry in the page table contains three bits that are updated by the hardware MMU :  a dirty bit to indicate modifications, and two separated access bits for “local access” (processor and memory cache located in the same cluster), and “remote access” (processor and memory cache located in different clusters).
    32 1.4     The DHCCP protocol
     32
     33== The DHCCP cache coherence protocol ==
     34
    3335The shared memory TSAR architecture implements the DHCCP protocol (Distributed Hybrid Cache Coherence Protocol). As it is not possible to monitor all simultaneous transaction in a distributed network on chip, the DHCCP protocol is  based on the global directory paradigm.
    3436
     
    3739This choice increases the number of write transactions, and enforces the importance of a proper placement of the data on this NUMA architecture. This is the price to pay for the scalability.
    3840
    39 Finally, the DHCCP protocol is called “hybrid”, as it uses a multicast/update policy for data cache, and a broadcast/invaidate policy for instruction caches.
    40 1.5     The interconnection networks 
     41Finally, the DHCCP protocol is called “hybrid”, as it uses a multicast/update policy when the number of copies is lower than a given threshold, and automatically switches to a broadcast/invalidate policy when this number of copies exceeds this threshold.
     42
     43== The interconnection networks ==
     44 
    4145The TSAR architecture requires a hierarchical two levels interconnect : each cluster must contain a local interconnect, and the communications between clusters relies on a global interconnect.
    4246
     
    4852
    4953
    50 1.6     Atomic instructions
    51 Any multi-processor architecture must provide an hardware support for atomic operations. These “read-then-write” atomic operations are used by the software for synchronization.
     54==      Atomic instructions ==
    5255
    53 In a distributed, yet shared memory, architecture using a NoC, these atomic operations must be implemented in both the memory controller (in our case, the memory caches), and the L1 cache controller.
     56Any multi-processor architecture must provide an hardware support for atomic operations. These “read-then-write” atomic operations are used by the software for synchronization.
    5457
    55 Each processor instruction set defines a different set of atomic instruction.  The TSAR architecture implements the LL/SC mechanism, that are natively defined by the  MIPS32 & PPC405 processors, and are directly supported by the VCI/OCP standard. Other atomic instructions, such as the SWAP, or LDSTUB instructions defined by the SPARC processor can be emulated using the LL/SC instructions.
     58In a distributed architecture using a NoC, these atomic operations must be implemented in both the memory controller (in our case, the memory caches), and the L1 cache controller.
     59
     60Each processor instruction set defines a different set of atomic instruction. The TSAR architecture implements the LL/SC mechanism, that are natively defined by the  MIPS32 & PPC405 processors, and are directly supported by the VCI/OCP standard. Other atomic instructions, such as the SWAP, or LDSTUB instructions defined by the SPARC processor can be emulated using the LL/SC instructions.
    5661
    5762With this mechanism, the TSAR architecture allows the system developers to use cachable spin-locks.
    5863
    59 = Virtual memory =
    60 
    61 = Cache Coherence Protocol =
    62 
    63 = Atomic Operations =
    64 
    65 = Interconnection Networks =
    66 
    67 = VCI/OCP parameters =