wiki:boot_loader

GIET_VM / Boot Procedure

The boot procedure is done in three phases:

  • The generic preloader (also called reset code) is hard-coded in the external ROM. It is executed by processor P(0,0,0), and loads the GIET_VM boot-loader code into memory.
  • The GIET_VM boot-loader code is executed in parallel by all processors P(x,y,0): one processor per cluster. It loads the map.bin file, build the page tables, initializes the schedulers as specified in the mapping, and load the kernel code and the user application(s) code into memory.
  • Finally, the GIET_VM kernel_init() function is executed by all processors P(x,y,p), and completes the kernel initialization.

Phase 1 : Reset Initialization

After hard reset, all processors execute the same reset code (also called preloader code) stored in the external ROM. The work done depends on the processor global index:

  • Processor P(0,0,0) loads the GIET_VM boot-loader code from the external block device, to the physical memory bank in cluster(0,0). It load the two segments seg_boot_code and seg_boot_data from the boot.elf file. This boot.elf file must be stored at (lba = 2) on the external block device. The first 4 Mbytes in cluster (0,0) are reserved for the four segments used by the boot-loader (seg_boot_mapping, seg_boot_code, seg_boot_stack, and seg_boot_data), and the base addresses ans sizes defined in the mapping must fit in these reserved 4 Mbytes.
  • All other processors initialize their private interrupt controller, to be able to receive an inter-processor interrupt (WTI), and enter the wait_state, in low-power mode.

This reset code is generic, and can be used to boot any operating system.

Phase 2 : Boot Initialisation

The GIET_VM boot-loader is defined in the boot.c and boot_entry.S files. The steps 0 and 1 are executed by processor P(0,0,0) only, but the other steps are executed in parallel by all processor P(x,y,0) : one processor per cluster.

step 0

The boot_entry.S file defines the entry point in the GIET_VM bootloader. This assembly code is executed by all processors. It allocates a stack from the seg_boot_stack segment (defined by the SEG_BOOT_STACK_BASE and SEG_BOOT_STACK_SIZE parameters in the hard_config.h file), and initialises the CP0_SP (stack pointer) register.

  • The stack size for P(x,y,0) is 1.25 Kbytes.
  • The stack size for other processors is 0.25 Kbytes. Therefore, the SEG_BOOT_STACK_SIZE cannot be smaller than : 0x100 * (NB_PROCS_MAX-1) + 0x500) * X_SIZE * Y_SIZE

step 1

Processor P(0,0,0) initializes the FAT, initializes the TTY0 lock, initializes the synchronization barrier, and load the map.bin file to the physical memory bank in cluster(0,0), in segment seg_boot_mapping. Then processor P(0,0,0) uses inter-processor-interrupts (WTI) to start the parallel execution, and activates one processor per cluster (processor P(x,y,0) ) in all clusters containing processors.

step 2

In each cluster(x,y), processor P(x,y,0) makes the physical memory allocator initialisation (function boot_pmem_init() ). The GIET VM uses two types of pages: BPP (Big Physical Page, 2 Mbytes), and SPP (Small Physical Page, 4 Kbytes). There is one SPP and one BPP allocator per cluster containing a physical memory bank. All the physical memory allocation is done by the boot-loader in the boot phase, and these memory allocators should not be used by the kernel in the execution phase.

step 3

In each cluster(x,y), processor P(x,y,0) makes the local page table initialisation (function boot_ptabs_init()) as specified in the mapping. There is one page table per user application (vspace) defined in the mapping, and it is replicated in all clusters containing processors. In each cluster, all pages tables are packed in one segment (seg_ptab) occupying one single big page (2 Mbytes). Global vsegs are mapped in all vspaces. As the kernel read-only segments (seg_kcode and seg_kinit) are replicated in all clusters to avoid contention, the content of the page tables depends on the cluster-coordinates: for the kernel code, a given virtual address is mapped to different physical addresses, depending on the cluster coordinates.

step 4

In each cluster(x,y), processor P(x,y,0) makes the schedulers initialization for all processors in the cluster (function boot_schedulers init()) as specified in the mapping:

  • There is one scheduler per processor.
  • The HWI, PTI, WTI interrupts vectors are initialised.
  • The local IRQs routing (defined by the XCU masks) is statically defined.
  • Any task defined in any application can be allocated to any processor.
  • The allocation of task to processors is fully static (no task migration).
  • One single processor cannot schedule more than 14 tasks (including the idle_task).
  • One scheduler occupies 8 Kbytes, and contains the contexts of all tasks allocated to the processor (256 bytes per task).

step 5

For each kernel.elf or application.elf file, processor P(0,0,0) load the binary file into the dedicated boot_elf_buffer' code, and all processors P(x,y,0) make the copies in the distributed physical memory banks, as specified in the mapping (function boot_elf_load()).

step 6

Finally, in each cluster(x,y) processor(x,y,0) starts all other processors in the cluster, using an inter-processor interrupt (WTI). Each processor initializes its own CP0_SCHED register, its own CP2_MODE register to activates its MMU, its own CP0_SR register to use the GIET_VM exception handler, and jumps to the kernel_init() function.

Phase 3 : Kernel Initialisation

This code is executed in parallel by all processors P(x,y,p). All processors enter the same kernel_init.c code and execute the following steps, separated by synchronization barriers. Step 0 is done by processor P(0,0,0) only, others steps are done by all processors in parallel.

step 0

Processor P(0,0,0) makes initializations that cannot be done in parallel : kernel_heap[x][y] array, MMC distributed locks, TTY0 sqt lock, kernel barrier, external IRQ controler, and external peripherals.

step 1

Each processor P(x,y,p) get its scheduler virtual address from CP0_SCHED register and contributes to _schedulers[x][y][p] array initialization.

step 2

Each processor P(x,y,p) loops on all allocated tasks to build the _ptabs_vaddr[vspace] & _ptabs_ptprs[vspace] arrays from the tasks contexts, and to complete the tasks contexts initialization (CTX_RA and CTS_EPC slots).

step 3

Each processor P(x,y,p) completes its private idle_task context, and starts its TICK timer if it has at least one task allocated. Processor P(0,0,0) initializes the kernel FAT.

step 4

Each processor P(x,y,p) computes the index of first task to be executed (can be the IDLE task), set registers SP, SR, PTPR, EPC, with the values corresponding to this task, and jumps to the task entry point.

Last modified 8 years ago Last modified on Feb 17, 2016, 6:07:10 PM