[[PageOutline]] = TSAR MMU = The TSAR MMU (Memory Management Unit) is an hardware component implemented as a L1 cache controller. This generic component can be used with any single instruction issue, 32 bits processor. As the processor core can issue two simultaneous instruction and data requests, there is actually two separated MMUs for data and instructions. These two MMUs share the same physical access to the VCI/OCP interconnect. Each MMU contains a set-associative cache and a TLB (Translation look-aside buffer), that is in charge of the virtual to physical address translation, and perfoms access right verifications. The Tsar generic MMU implements a paginated virtual memory, supporting two page sizes : 4 Kbytes pages, and 2 Mbytes pages. In order to be independent on the processor core choice, the TLB MISS are handled by an hardwired Finite State Machine (called a Table Walk), without any software action. [[Image(generic_mmu.png, nolink)]] == 1. Virtual Memory == The TSAR architecture defines two page sizes : 4 Kbytes pages, and 2 Mbytes pages. The virtual address space size is 4 Gbytes (32 bits virtual addresses). The physical address space is limited to 1 Tbytes (40 bits physical addresses). The page table are build by the operating system, and are stored in the main memory. Each execution context (such as an UNIX process) has is own page table. The MMU performs the translation from the VPN (Virtual Page Number) to the PPN (Physical Page Number). * For a 4 Kbytes page, the VPN uses 20 bits, and the PPN requires 28 bits. * For a 2 Mbytes page, the PPN uses 11 bits, and the PPN requires 19 bits. === 1.1 Two levels Page Table structure === As described below, the Page Table has a hierarchical two levels structure : [[Image(two_levels_pages_tables.png, nolink)]] * All page tables (first & second level) must be aligned : the page table base adress must be a multiple of 8K bytes for a first level page table, and multiple of 4K bytes for a second level page table. * The page tables can be placed anywhere in the physical address space. * The PTPR register is located in the generic MMU, and is re-initialised by the OS at each context switch. It contains the 27 MSB bits of the first level page table base address, and is extended (left-shifted) to 40 bits by the Table-Walk FSM in case of TLB MISS. === 1.2 First Level Page Table Entry Format === Each entry in a first level page table can contain either a 2M bytes page descriptor (called PTE1), or a second level page table descriptor (called PTD1). It is implemented as a single 32 bits word : * PTE1 : ||V||T||L||R||C||W||X||U||G||D|| reserved (3 bits) || PPN1 (19 bits) || * PTD1 ||V||T|| reserved (2 bits) || PTBA (28 bits) || The various fields are defined as follows : || V || Valid bit || Valid entry when 1 (set by the OS) || || T || Type bit || PTD1 when 1 (set by the OS) || || L || Local access bit || Used by the OS for page replacement (set by the hardware) || || R || Remote access bit || Used by the OS for page replacement (set by the hardware) || || C || Cachable bit || The page is cachable in the L1 cache when 1 (set by the OS) || || W || Writable bit || The page is writable when 1 (set by the OS) || || X || eXecutable bit || The page can contain instructions when 1 (set by the OS) || || U || User bit || The page is accessible in user mode when 1 (set by the OS) || || G || Global bit || Entry not invalidated in TLB flush when 1 (set by the OS) || || D || Dirty bit || The page has been modified when 1 (set by the hardware) || || PPN1 || Physical Page Number || Concatened to the page offset to build the physical address || || PTBA || Page Table Base Address || Second level page table base address / 4096 || The L, R, D bits are used by the operating system to implement the page replacement policy. * The D bit is set by the hardware, when a page is written and when it is not already set, using an atomic access (CAS). * The L bit is set by the hardware, when the page is accessed by a local processor or coprocessor, after a TLB miss, and when it is not already set. * The R bit is set by the hardware, when the page is accessed by a remote processor or coprocessor, after a TLB miss, and when it is not already set. These page table updates use atomic access (CAS). If the entry is a PTE1, the PPN1 value (19 bits) must be concatened with the page offset (21 bits) to build the 40 bits physical address. If the entry is a PTD1, the PTBA value (28 bits) must be left-shifted by 12 bits to define the base address of the level 2 page table. The page table being aligned in memory, the 12 LSB bits of this base address have a 0 value. The ''reserved'' bits are are specified for future hardware extensions, and must not be used by the operating system. === 1.3 Second Level Page Table Entry Format === Each entry in a second level page table contains a 4K bytes page descriptor (called PTE2). It is implemented as two 32 bits words: the first word contains the flags; the second word contains the 28 bits physical page number (PPN2). * PTE2 first word : ||V||T||L||R||C||W||X||U||G||D|| reserved (14 bits) || soft (8 bits) || * PTE2 second word : || reserved (4 bits) || PPN2 (28 bits) || The various fields are defined as follows : || V || Valid bit || Valid entry when 1 (set by the OS) || || T || Type bit || Must be 0 for a PTE2 (set by the OS) || || L || Local access bit || Used by the OS for page replacement (set by the hardware) || || R || Remote access bit || Used by the OS for page replacement (set by the hardware) || || C || Cachable bit || The page is cachable in the L1 cache when 1 (set by the OS) || || W || Writable bit || The page is writable when 1 (set by the OS) || || X || eXecutable bit || The page can contain instructions when 1 (set by the OS) || || U || User bit || The page is accessible in user mode when 1 (set by the OS) || || G || Global bit || Entry not invalidated in TLB flush when 1 (set by the OS) || || D || Dirty bit || The page has been modified when 1 (set by the hardware) || || PPN2 || Physical Page Number || Concatened to the page offset to build the 40 bits address || The L, R, D bits are used by the operating system to implement the page replacement policy. * The D bit is set by the hardware, when a page is written and when it is not already set, using an atomic access (CAS). * The L bit is set by the hardware, when the page is accessed by a local processor or coprocessor, after a TLB miss, and when it is not already set. * The R bit is set by the hardware, when the page is accessed by a remote processor or coprocessor, after a TLB miss, and when it is not already set. These page table updates use atomic access (CAS). The PPN2 value (28 bits) must be concatened with the page offset (12 bits) to build the 40 bits physical address. The ''reserved'' bits are are specified for future hardware extensions, and must not be used by the operatin system. The ''soft'' bits can be used by the operating system, will not be modified by the hardware MMU. == 2. MMU/processor interface == In order to be used with the various (32 bits, single instruction issue) processor cores available in the SoCLib library, the TSAR generic MMU defines a generic processor/MMU interface for data and instructions. === 2.1 Instruction MMU interface === The Instruction MMU interface is defined by the following signals : {{{ struct InstructionRequest { bool valid; uint30_t addr; enum ExecMode mode; }; struct InstructionResponse { bool valid; bool error; uint32_t instruction; }; }}} The addr virtual address is a 32 bits word address. It is coded on 30 bits. The possible values for the Execution Mode are defined below : || ExecMode || Value || || || || || Hyper || *1 || || Kernel || 00 || || User || 10 || === 2.2 Data MMU interface === The Data MMU interface is defined by the following signals : {{{ struct DataRequest { bool valid; uint30_t addr; uint32_t wdata; enum DataOperationType type; uint4_t be; enum ExecMode mode; }; struct DataResponse { bool valid; bool error; uint32_t rdata; }; }}} The addr virtual address is a 32 bits word address. It is coded on 30 bits. The wdata field is only significant for be-masked bytes: * wdata[7:0] is at ![addr], masked by be[0] * wdata[15:8] is at [addr+1], masked by be[1] * wdata[23:16] is at [addr+2], masked by be[2] * wdata[31:24] is at [addr+3], masked by be[3] The possible values for the execution mode are the same as for the instructions. The possible values for the OperationType field are defined below : || Data.OperationType || R X L Z S || semantic || || || || || DATA_READ || 1 0 0 0 0 || load 32 bits from memory address space || || DATA_WRITE || 0 0 0 0 0 || store 32 bits to memory adress space || || DATA_LL || 1 0 1 0 0 || load 32 bits from memory address space with reservation || || DATA_SC || 0 0 1 0 0 || conditionnal store 32 bits to memory address space || || XTN_READ || 1 1 0 0 0 || load 32 bits from MMU register || || XTN_WRITE || 0 1 0 0 0 || store 32 bits to MMU register || || DATA_LLRST || * * * 1 0 || reset a previous LL reservation || || DATA_SYNC || * * * * 1 || flush the write buffer before returning || Note : Instruction request & data requests are independent : the processor can issue simultaneous data & instruction requests that have different execution modes. In case of access to an eXTerNal register (XTN_READ or XTN_WRITE) the MMU register is identified by the Data.Address field. (see section 3). == 3. MMU architecture == The generic MMU is implemented in the L1 cache controller. As the processor core can issue two simultaneous instruction and data requests, there is actually two separated data and instructions caches, sharing the same physical access to the VCI/OCP interconnect. These cache are set associative, and have a total capacity of 16 Kbytes : * cache line width = 64 bytes * number of associative sets = 64 sets * number of associative ways = 4 ways The data cache L1 implement a writhe-through policy, in order to simplify the cache-coherence protocol. It contains a write-buffer that is in charge to build write burst, with the following constraints : * the burst length is variable * the maximal burst length is 8 32 bits words * all addresses in a burst belongs to the same "half cache line" (32 bytes aligned) * each address in a burst can have a different Byte Enable value (including the 0 value) Similarly, the L1 cache controller contains two separated hardware MMUs for instruction and data. Each MMU contains a 64 entries TLB (Translation Look-aside Buffer). These TLBs are implemented as set-associative caches (16 sets of 4 ways). Each entry in these TLBs can contain either a 4 Kbytes page descriptor, or a 2 Mbytes page descriptor. The figure below illustrate the general structure of the TSAR L1 caches. [[Image(cache_tlb.png, nolink)]] For both data & instructions, the TSAR L1 caches use physical addresses : the cache directories are indexed by the physical addresses, and the tags contained in the directories are obtained from the physical addresses. The access to the L1 cache being a critical path, the TSAR MMU use a speculative approach to avoid to serialize the TLB access and the L1 cache access: * After each TLB hit, the input VPN and the resulting PPN values are saved in two VPN_save & PPN_save registers. * During access (n), the PPN_save value, corresponding to access (n_1) is used to access the cache. Simultaneously, the cache controller checks that the VPN value is equal to the VPN_save value (no page change). * In case of TLB hit with a page change, the cache must be accessed twice, which means one cycle penalty. === 3.1 MMU activation === After general RESET, the the MMU is desactivated : As long as the MMU is not activated, the 32 bits virtual address is simply extended to 40 bits (for both data and instructions), by appending 8 nul bits and directly used as a physical address. As long as the caches are not activated, all read requests are considered ''uncached'' by the cache controller. The instruction cache, the data cache, the instruction MMU and the data MMU can be separately activated by the software, by writing in the MMU_MODE register. === 3.2 Generic MMU exceptions === The hardware MMU can signal exceptions by rising the general instruction_bus_error and data_bus_error signals (for an instruction or data accesss respectively). The access type (Read or Write) and the error type is written in the MMU_IETR & MMU_DETR registers, as described below: || Exception type || code || cause || severity || || || || || || ||MMU_WRITE_PT1_UNMAPPED || 0x0001 || Write access : Page fault on Table1 (invalid PTE) || non fatal error || ||MMU_WRITE_PT2_UNMAPPED || 0x0002 || Write access : Page fault on Table 2 (invalid PTE) || non fatal error || ||MMU_WRITE_PRIVILEGE_VIOLATION || 0x0004 || Write access : Protected access in user mode || user error || ||MMU_WRITE_ACCES_VIOLATION || 0x0008 || Write access : Write to a non writable page || user error || ||MMU_WRITE_UNDEFINED_XTN || 0x0020 || Write access : Undefined external access address || user error || ||MMU_WRITE_PT1_ILLEGAL_ACCESS || 0x0040 || Write access : Bus Error in Table1 access || kernel error || ||MMU_WRITE_PT2_ILLEGAL_ACCESS || 0x0080 || Write access : Bus Error in Table2 access || kernel error || ||MMU_WRITE_DATA_ILLEGAL_ACCESS || 0x0100 || Write access : Bus Error during the cache access ||kernel error || ||MMU_READ_PT1_UNMAPPED || 0x1001 || Read access : Page fault on Table1 (invalid PTE) || non fatal error || ||MMU_READ_PT2_UNMAPPED || 0x1002 || Read access : Page fault on Table 2 (invalid PTE) || non fatal error || ||MMU_READ_PRIVILEGE_VIOLATION || 0x1004 || Read access : Protected access in user mode || user error || ||MMU_READ_EXEC_VIOLATION || 0x1010 || Read access : Exec access to a non exec page || user error || ||MMU_READ_UNDEFINED_XTN || 0x1020 || Read access : Undefined external access address || user error || ||MMU_READ_PT1_ILLEGAL_ACCESS || 0x1040 || Read access : Bus Error in Table1 access || kernel error || ||MMU_READ_PT2_ILLEGAL_ACCESS || 0x1080 || Read access : Bus Error in Table2 access || kernel error || ||MMU_READ_DATA_ILLEGAL_ACCESS || 0x1100 || Read access : Bus Error during the cache access ||kernel error || == 4. generic MMU registers == The generic MMU contains a set of 32 bits registers (or pseudo-registers) that can be accessed by operating system, through a dedicated MMU driver. The register index is contained in the 7 LSB bits of the address field (dreq.addr>>2). The value to be written in the register is contained in the data field (dreq.wdata). In the case of the MIPS processor, these registers are implemented in coprocessor 2, and are accessed using the ''mtc2'' (write) and ''mfc2'' (read) instructions. These registers are described below : || register name || index || description || mode || || || || || || || MMU_PTPR || 0 || Page Table Pointer Register || R/W || || MMU_MODE || 1 || Mode Register || R/W || || MMU_ICACHE_FLUSH || 2 || Instruction Cache flush || W || || MMU_DCACHE_FLUSH || 3 || Data Cache flush || W || || MMU_ITLB_INVAL || 4 || Instruction TLB line invalidation || W || || MMU_DTLB_INVAL || 5 || Data TLB line Invalidation || W || || MMU_ICACHE_INVAL || 6 || Instruction Cache line invalidation || W || || MMU_DCACHE_INVAL || 7 || Data Cache line invalidation || W || || MMU_ICACHE_PREFETCH || 8 || Instruction Cache line prefetch || W || || MMU_DCACHE_PREFETCH || 9 || Data Cache line prefetch || W || || MMU_SYNC || 10 || Complete pending writes || W || || MMU_IETR || 11 || Instruction Exception Type Register || R || || MMU_DETR || 12 || Data Exception Type Register || R || || MMU_IBVAR || 13 || Instruction Bad Virtual Address Register || R || || MMU_DBVAR || 14 || Data Bad Virtual Address Register || R || || MMU_PARAMS || 15 || Caches & TLBs hardware parameters || R || || MMU_RELEASE || 16 || Generic MMU release number || R || || MMU_WORD_LO || 17 || Lowest part of a double word || R/W || || MMU_WORD_HI || 18 || Highest part of a double word || R/W || || MMU_ICACHE_PA_INVAL || 19 || Instruction cache inval physical address || W || || MMU_DCACHE_PA_INVAL || 20 || Data cache inval physical address || W || || MMU_LL_RESET || 21 || LLSC reservation buffer invalidation || W || || MMU_DOUBLE_LL || 22 || 64 bits linked-load transaction || W || || MMU_DOUBLE_SC || 23 || 64 bits store-conditional transaction || W || || MMU_DATA_PADDR_EXT || 24 || Physical address extension for data access || W || || MMU_INST_PADDR_EXT || 25 || Physical address extension for inst access || W || '''Note''': A change to this table should be synchronised with https://www.soclib.fr/trac/dev/wiki/Component/Iss2Api === 4.1 MMU_PTPR === The '''MMU_PTPR''' register contains the base address of the currently used first level page table. The PTPR is a 32 bits register, and the physical address can be (up to) 40 bits. As the base address is aligned on a 8 Kbytes boundary, the PTPR register contains only the 27 MSB bits of the base address : || 00000 || BASE_ADDRESS[39:13] || === 4.2 MMU_MODE === The '''MMU_MODE''' register has four bits and these 16 values are described as below. A device is activated when the corresponding bit is set to 1. || MODE3 || MODE2 || MODE1 || MODE0 || || (INS TLB) || (DATA TLB) || (INS CACHE) || (DATA CACHE) || === 4.3 MMU_ICACHE_FLUSH & MMU_DCACHE_FLUSH === Writing any value in the '''MMU_ICACHE_FLUSH''' register (resp. '''MMU_DCACHE_FLUSH''' register) invalidates all cache lines stored in the instruction cache (resp. data cache). === 4.4 MMU_ITLB_INVAL & MMU_DTLB_INVAL === The value written in the 32 bits '''MMU_ITLB_INVAL''' register (resp. '''MMU_DTLB_INVAL''' register) is interpreted as a virtual address. If the instruction TLB (resp. the data TLB) contains an entry corresponding to this address, this entry is invalidated. This is a blocking request for the processor. === 4.5 MMU_ICACHE_INVAL & MMU_DCACHE_INVAL === The value written in the 32 bits '''MMU_ICACHE_INVAL''' register (resp. '''MMU_DCACHE_INVAL''' register) is interpreted as a virtual address. This address is translated to a physical address. If the instruction cache (resp. the data cache) contains an entry corresponding to this address, this entry is invalidated. This is a blocking request for the processor. === 4.6 MMU_ICACHE_PREFETCH & MMU_DCACHE_PREFETCH === The value written in the 32 bits '''MMU_ICACHE_PREFETCH''' register (resp. '''MMU_DCACHE_PREFETCH''' register) is interpreted as a virtual address. This address is translated to a physical address. If the instruction cache (resp. the data cache) does not contain the corresponding cache line, this cache line is fetched from memory. This is a non-blocking request for the processor. === 4.7 MMU_SYNC === Writing any value in this '''MMU_SYNC''' register will force execution of posted write requests. This is a blocking request for the processor. === 4.8 MMU_IETR & MMU_DETR === MMU exceptions are reported in these two registers, as described in section 3.2. === 4.9 MMU_IBVAR & MMU_DBVAR === Faulty virtual adresses will be written in these two registers. === 4.10 MMU_PARAMS === The '''MMU_PARAMS''' register define the instruction and data caches & TLBs characteristics : ||WTD||STD||WCD||SCD||WTI||STI||WCI||SCI||NBL|| * WTD (3 bits) : Ln(number of associative ways for the Data TLB) * STD (4 bits) : Ln(number of sets for the Data TLB) * WCD (3 bits) : Ln(number of associative ways for the Data Cache) * SCD (4 bits) : Ln(number of sets ways for the Data Cache) * WTI (3 bits) : Ln(number of associative ways for the Instruction TLB) * STI (4 bits) : Ln(number of sets for the Instruction TLB) * WCI (3 bits) : Ln(number of associative ways for the Instruction Cache) * SCI (4 bits) : Ln(number of sets ways for the Instruction Cache) * NBL (4 bits) : Ln(number of bytes per Data or Instruction cache line) === 4.11 MMU_RELEASE === The '''MMU_RELEASE''' register contains the release number for a given hardware implementation : || SPECIFICATION_INDEX || IMPLEMENTATION_INDEX || * SPECIFICATION_INDEX (16 bits) * IMPLEMENTATION_INDEX (16 bits) === 4.12 MMU_WORD_HI & MMU_WORD_LO === The two 32 bits '''MMU_DOUBLE_HI''' & '''MMU_DOUBLE_LO''' registers implement a double word data storage. They are used to support cache line invalidation in physical adressing (section 4.15), and to support double words LL & SC accesses (section 4.13 & section 4.14). === 4.13 MMU_ICACHE_PA_INV & MMU_DCACHE_PA_INV === Writing any value in the '''MMU_ICACHE_PA_INV''' register (resp. '''MMU_DCACHE_PA_INV''' register) can invalidate a cache line in the instruction cache (resp. data cache). The values stored in the MMU_WORD_HI & MMU_WORD_LO registers is interpreted as a physical address. If the instruction cache (resp. data cache) contains a cache line corresponding to this address, it is invalidated. This is a blocking request for the processor. === 4.14 MMU_DOUBLE_LL === The value written in the '''MMU_DOUBLE_LL''' register is interpreted as a virtual address. It is translated to a physical address, and a double word (64 bits) Linked Load transaction is initiated. The access must be aligned on a double word boundary (the 3 LSB bits of the address are ignored). The read value is written in the MMU_WORD_HI & MMU_WORD_LO registers. This is a blocking request for the processor. === 4.15 MMU_DOUBLE_SC === The value written in the'''MMU_DOUBLE_SC''' register is interpreted as a virtual address. It is translated to a physical address, and a double word (64 bits) Store Conditionnal transaction is initiated. The access must be aligned on a double word boundary (the 3 LSB bits of the address are ignored). The transmitted data are the values stored in the he MMU_WORD_HI & MMU_WORD_LO registers. The returned value is written in the MMU_WORD_LO register. This is a blocking request for the processor. === 4.16 MMU_DATA_PADDR_EXT === The value written in the '''MMU_DATA_PADDR_EXT''' register is used as a physical address extension during a data access. It is only used when the DTLB is deactivated. It is used to access a memory location which is beyond the 4 Gbytes address space. === 4.17 MMU_INST_PADDR_EXT === The value written in the '''MMU_INST_PADDR_EXT''' register is used as a physical address extension during an instruction access. It is only used when the ITLB is deactivated. It is used to access a memory location which is beyond the 4 Gbytes address space.