= Boot procedure =

[[PageOutline]]

== A) __General Principles__

The ALMOS-MKH boot procedure can be decomposed in two phases:
 * The architecture dependent phase, implemented by an architecture specific '''boot_loader''' procedure.
 * The architecture independent phase, implemented by a generic (architecture independent) '''kernel-init''' procedure.

As the generic (i.e. architecture independent) kernel initialization procedure is executed in parallel by all kernel instances in all clusters containing at least one core and one memory bank, the main task of the boot-loader is to load - in each cluster - a local copy of
the ALMOS-MKH kernel code, and a description of the hardware architecture, contained in a local ''boot_info_t'' data-structure.

This fixed size ''boot_info_t'' structure is build by the boot-loader, and stored at the beginning of the local copy of the kdata segment. As it contains both general and cluster specific information, the content depends on the cluster:
 * general hardware architecture features : number of clusters, topology, etc.
 * available external (shared) peripherals : types and features. 
 * number of cores in cluster,
 * available internal (private) peripherals in cluster : types and features.
 * available physical memory in cluster.

This boot_info_t structure is defined in the '''boot_info.h''' file.

To build the various boot_info_t structures (one per cluster), the boot-loader uses the '''arch_info_t''' binary structure, that is described in
section [wiki:arch_info Hardware Platform Definition]. This binary structure is contained in the '''arch_info.bin''' file, and must be stored
in the file system root directory. 

This method allows an intelligent boot_loader to check and - if required - reconfigure the hardware components, to guaranty that the generated boot_info_t structures contain only functionally tested hardware components.

We describe below the boot_loader for the TSAR architecture, the boot_loader for the I86 architecture, and the generic kernel initialization procedure.

== B) __Boot-loader for the TSAR architecture__ ==

The TSAR boot-loader uses an OS-independent '''pre-loader''', stored in an addressable but non-volatile device, that load the TSAR
'''boot-loader''' code from an external block-device to the cluster 0 physical memory. This preloader is specific for the TSAR architecture, but independent on the Operating System. It is used by ALMOS-MKH, but also by LINUX, NetBSD, ALMOS_MKH, or the GIET-VM.  

The TSAR boot_loader allocates - in each cluster containing a physical memory bank - six fixed size memory zones, to store various
binary files or data structures. The two first zones are permanently allocated: The '''PRE_LOADER''' zone is only defined in cluster 0, and contains the pre-loader code. The '''KERNEL_CODE''' zone containing the ''kcode'' and ''kdata'' sgments is directly used by the kernel when the boot_loader transfers control - in each cluster - to the kernel_init procedure. The '''BOOT_CODE''', '''ARCH_INFO''', '''KERNEL_ELF''', and '''BOOT_STACK''' zones are temporary: they are only used - in each cluster - by the boot-loader code, and the corresponding physical memory can be freely re-allocated by the local kernel instance when it starts execution.

 || name                        || description                            ||  base address (physical)          || size                                                             ||
 || PRE_LOADER           || pre-loader code                    || PRELOADER_BASE (0)           || PRELOADER_MAX_SIZE (16KB)                  ||
 || KERNEL_CODE        || kernel code and data            || KERNEL_CODE_BASE (16 KB) || KERNEL_CODE_MAX_SIZE (2 MB - 16 KB) ||
 || BOOT_CODE            || boot-loader code and data   || BOOT_CODE_BASE (2 MB)     || BOOT_CODE_MAX_SIZE (1 MB)                  ||
 || ARCH_INFO             || arch_info.bin file copy           || ARCH_INFO_BASE (3 MB)       || ARCH_INFO_MAX_SIZE (1 MB)                   ||
 || KERNEL_ELF           || kernel.elf file copy                 || KERNEL_ELF_BASE (4 MB)     || KERN_ELF_MAX_SIZE (2 MB)                     ||
 || BOOT_STACK          || boot stacks (one per core)     || BOOT_STACK_BASE (6 MB)    || BOOT_STACK_MAX_SIZE (1MB)                ||

The values given in this array are indicative. The actual values are defined by configuration parameters in the ''boot_config.h'' file.
The two main constraint are the following: 
   * the ''kcode'' and ''kdata'' segments (in the KERNEL_CODE zone) must be entirely contained in one single big physical page (2 Mbytes), because it will be mapped as one single big page in all process virtual spaces.
   * the BOOT_CODE zone (containing the boot loader instructions and data) must be entirely contained in the
next big physical page, because it will be mapped in the boot-loader page table to allow the cores to access locally the boot code as soon as it has been copied in the local cluster.

A core is identified by  two indexes: '''cxy''' is the cluster identifier, an '''lid''' is the core local index in cluster.
The CP0 register containing the core gid (global hardware identifier) has a fixed format:   '''gid''' = ('''cxy''' << 2) + '''lid'''

All cores contribute to the boot procedure, but all cores are not simultaneously active:
 * in the first phase - fully sequencial - only core[0][0] is running (core 0 in cluster 0).
 * In the second phase - partially parallel - only core[cxy][0] is running in each cluster. 
 * in the last phase - fully parallel - all core[cxy][lid] are running.

We describe below the four phases of the TSAR boot-loader:

=== B1. Pre-loader phase ===

 * In the TSAR_LETI architecture, the pre-loader code is stored in the first 16 kbytes of the physical address space in cluster 0.
 * In the TSAR_IOB architecture, the preloader is stored in an external ROM, that is accessed throug the IO_bridge located in cluster 0.

At reset, the MMU is de-activated (for both data and instructions), and the extension address registers supporting direct access to remote memory banks (for data only)  contain the 0 value.
Therefore, all cores can only access the physical address space of cluster 0.

All cores execute the same pre-loader code, but the work done depends on the core identifier:
   * The core[0][0] load in the BOOT_CODE zone of cluster 0, the boot-loader code stored on disk. 
   * All other cores do only one task before going to sleep (i.e. low-power state): each core activates its private WTI channel in the local ICU (Interrupt Controller Unit) to be later activated by an IPI (Inter Processor Interrupt).  

=== B2. Boot-loader sequencial phase ===

In this phase, only core [0][0] is running, while all other cores are blocked in the preloaded, waiting to be activated by an IPI. 

The first instructions of the boot-loader are defined in the ''boot_entry.S'' file. This assembly code is executed by all cores entering the boot-loader, but not at the same time.

Each core running this assembly code makes the 3 following actions: 
   * It initializes the core stack pointer depending on the '''lid''' value extracted from the '''gid''', using the BOOT_STACK_BASE and BOOT_STACK_SIZE parameters defined in the  ''boot_config.h'' file,                                                                    
   * It changes the value of the DATA address extension CP2 register, using the '''cxy''' value extracted from the '''gid''', to force all cores to use the local stack segments.                                                                            
   * It jumps to the boot_loader() C function defined in the ''boot.c'' file, passing the two (cxy , lid) arguments.

In this sequencial phase, the core[0][0]  executing this C function makes the following actions:
   * The core[0][0] initializes 2 peripherals: The '''TTY''' terminal (channel 0)  to display log messages, and the '''IOC''' peripheral to access the disk file system.
   * The core[0][0] initializes the boot-loader FAT32, allowing the boot loader to access files stored in the FAT32 file system on disk.
   * The core[0][0] load in the KERNEL_ELF zone the ''kernel.elf'' file from the disk file system..
   * Then it copies in the KERNEL_CORE zone the ''kcode'' and ''kdata'' segments, using the addresses contained in the .elf file (identity mapping).
   * The core[0][0] load in the ARCH_INFO zone the ''arch_info.bin'' file from the disk file system.
   * Then it builds from this ''arch_info.t'' structure the specific ''boot_info_t'' structure for cluster 0, and stores it in the ''kdata'' segment.
   * The core[0][0] send IPIs to activate all cores [cxy][0] in all other clusters.
    
=== B3. Boot-loader partially parallel phase ===

In this phase all core[cxy][0], other than the core[0][0] are running.

At this point, all DATA extension registers point already on the local cluster( to use the local stack).

The core[cxy][0] exécute the following tasks:
    * To access the global data stored in cluster cxy, the core[cxy][0] copies the boot-loader code from BOOT_CODE zone in cluster 0 to BOOT_CORE zone in cluster cxy.
    * To access the instructions stored in cluster cxy, the core[cxy][0] creates a minimal page table containing two big pages mapping respectively the local BOOT_CORE zone, and the local KERNEL_CODE zone, and activates the instruction MMU. '''[TO BE DONE]'''
    * The core[cxy][0] copies the ''arch_info.bin'' structure from ARCH_INFO zone in cluster 0 to ARCH_INFO zone in cluster cxy.
    * The core[cxy][0] copies the ''kcode'' and ''kdata'' segments from KERNEL_CODE zone in cluster 0 to KERNEL_CODE zone in cluster cxy. 
    * The core[cxy][0] builds from the ''arch_info.t'' the specific ''boot_info_t'' structure for cluster cxy, and stores it in the local ''kdata'' segment.
    * All core[cxy][0], including core[0][0], synchronize using a global barrier.
    * In each cluster cxy, the core[cxy][0] activates the other cores that are  blocked in the pre-loader.

=== B4. Boot-loader fully parallel phase ===

In this phase all core[cxy][lid] are running.

Each core must initialise few registers, as described below, and jump to the kernel_entry address. This address is defined in the ''kernel.elf'' file, and registered in the ''kernel_entry'' global variable.  

   * '''argument''' : the kernel_init() function unique argument is a pointer on the ''boot_info_t'' structure, that is the first variable in the ''data'' segment.
   * '''stack pointer''' : In each cluster an array of idle thread descriptors, indexed by the local core index, is defined in the ''kdata''segment, on top of the ''boot_info_t'' structure. For any thread, the thread descriptor contains the kernel stack, and this is used to initialize the stack pointer. 
   * '''base register''' : in each core, the cp0_ebase register, defines the kernel entry point in case of interrupt, exception, or syscall, and must be initialized. '''[TO BE DONE in kernel_init()]'''
   * '''status register''' : in each core, the cp0_sr register defines the core state, and must be initialized  (UM bit reset / IE bit reset / BEV bit reset ).

At this point, the boot-loader completed its job: 
 * The kernel code ''kcode'' and ''kdata'' segments are loaded - in all clusters - in the first ''offset'' physical pages.
 * The hardware architecture described by the '''arch_info.bin'''file has been analyzed, and copied - in each cluster - in the '''boot_info_t''' structure, stored in the kdata segment.
 * Each  local kernel instance can use all the physical memory that is not used to store the kernel ''kcode'' and ''kdata'' segments themselves.

== C) __Boot-loader for the I86 architecture__ ==

TODO

== D) __Generic kernel initialization procedure__ ==

The kernel_init( boot_info_t * info ) function is the kernel entry point when the boot_loader transfer control to the kernel. The argument is a pointer on the fixed size boot_info_t structure, that is stored in the data kernel segment.

All cores execute this procedure in parallel, but some tasks are only executed by the CP0 core.
This procedure uses two synchronisation barriers, defined as global variables in the data segment:
 * the global_barrier variable is used to synchronize all CP0 cores in all clusters containing a kernel instance. 
 * the local_barrier variable is used to synchronize all cores in a given cluster.

The kernel initialization procedure execute sequentially the following steps:

=== D1) Core and cluster identification ===

Each core has an unique hardware identifier, called '''gid''', that is hard-wired in a read-only register.
From the kernel point of view a core is identified by a composite index (cxy,lid), where '''cxy''' is the cluster identifier, and ''lid'' is a local (continuous) index in the cluster. The association between the gid hardware index and the (cxy,lid) composite index is defined in the boot_info_t structure.
In this first step, each core makes an associative search in the boot_info_t structure to obtain the ('''cxy,lid''') indexes from the '''gid''' index.
Then the CP0 initialize the global variable '''local_cxy''' defining the local cluster identifier.

=== D2) TXT0 device initialization ===

The core[io_cxy][0] (i.e. CP0 in I/O cluster) initializes the chdev descriptor associated to the kernel text terminal TXT0. This terminal is used by any kernel instance running on any core to display log or debug messages.  This terminal is configured in ''non-descheduling'' mode :
the calling thread call directly the relevant TXT driver, without using a server thread.

A first synchonization barrier is used to avoid other cores to use the TXT0 terminal before initialization completion.

=== D3) Cluster manager initialization ===

In each cluster, the CP0 makes the cluster manager initialization, namely the cores descriptors array, and the physical memory allocators.
Then it initializes the local process_zero, containing al kernel threads in a given cluster.

A second synchonization barrier is used to avoid other cores to access cluster manager before initialization completion.

=== D4) Internal & external devices initialization ===

In each cluster, the CP0 makes the devices initialization. For multi-channels devices, there is one channel device (called chdev_t) per channel.
For internal (replicated) devices, the khedive descriptors are allocated in the local cluster. For external (shared) devices, the chdev descriptors are regularly distributed on all clusters. These external chdev are indexed by a global index, and the host cluster is computed from this
index by a modulo. 

The internal devices descriptors are created first( ICU, then MMC, then DMA ), because the ICU device is used by all other devices.
Then the WTI mailboxes used for IPIs (Inter Processor Interrupt) are allocated in local ICU : one WTI mailbox per core.
Then each external chdev descriptor is created by the CP0 in the cluster where it must be created.

A third synchonization barrier is used to avoid other cores to access devices before initialization completion.  

=== D5) Idle thread initialization ===

In this step, each core creates and initializes its private idle thread descriptor.

=== D6) File system initialization ===

The CP0 in I/O cluster) initializes the file system.

A fourth synchonization barrier is used to avoid other cores to access file system before initialization completion.  

=== D7) Scheduler activation ===

Finally, each core enables its private timer IRQ to activate its private scheduler, and jump to the idle thread code.= Boot procedure =