Changes between Initial Version and Version 1 of ALMPhDAbsEN


Ignore:
Timestamp:
May 9, 2014, 2:14:41 PM (10 years ago)
Author:
almaless
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ALMPhDAbsEN

    v1 v1  
     1Nowadays, single-chip cache-coherent many-core processors having up to 100 cores
     2are a reality. Many-cores with hundreds or even a thousand of cores are planned
     3in the near future. In theses architectures, the question of the locality of L1
     4cache-miss related traffic (data, instruction and TLB) is essential for both
     5scalability and power consumption (energy by moved bit). Our thesis is that:
     6(i) handling the locality of memory accesses should be done at kernel level of
     7an operating system in a transparent manner to user applications; and (ii) the
     8current monolithic kernels are not able to enforce the locality of memory accesses
     9of multi-threaded applications, because the concept of thread in these kernels is
     10inherently unsuitable for many-core processors. Therefore, we believe that the
     11evolution approach of monolithic kernels undertaken until now is insufficient and
     12it is imperative to put the question of the locality of memory accesses in the
     13heart of this evolution.
     14
     15To prove our thesis, we designed and implemented ALMOS (Advanced Locality
     16Management Operating System), an experimental operating system based on a
     17distributed monolithic kernel. ALMOS has a new concept of thread, called Hybrid
     18Process. It allows its kernel to enforce the locality of memory accesses of each
     19thread. The resources (cores and physical memory) management in ALMOS's kernel is
     20distributed enforcing the locality of memory accesses when performing system
     21services. Decision making regarding memory allocation, tasks placement and load
     22balancing in ALMOS's kernel is decentralized, multi-criteria and without locking.
     23It is based on a distributed infrastructure coordinating, in a scalable manner,
     24the accesses to resources.
     25
     26Using the cycle accurate and bit accurate virtual prototype of TSAR many-core
     27processor, we experimentally demonstrated that: (i) performance (scalability and
     28execution time) on 256 cores of the distributed scheduling scheme of ALMOS's kernel
     29outperform those of the shared scheduling scheme found in existing monolithic
     30kernels; (ii) distributed realization of the fork system call enables this system
     31service to scale on 512 cores; (iii) updating the distrusted decision-making
     32infrastructure of ALMOS's kernel costs just 0.05 % of the total computing power of
     33TSAR processor; (iv) performance (scalability, execution time and remote traffic)
     34of memory affinity strategy of ALMOS's kernel, called Auto-Next-Touch, outperform
     35those of two existing strategies First-Touch and Interleave on 64 cores; (v) concept
     36of Hybrid Process of ALMOS's kernel scales up two existing highly multi-threads
     37applications on 256 cores and a third one on 1024 cores; and finally (vi) the couple
     38ALMOS/TSAR (64 cores) gives systematically much better scalability than the couple
     39Linux/AMD (Interlagos 64 cores) for 8 multi-threads applications belonging to HPC
     40and image processing domains.