Notes on MH RSS Feed

Introducing NUX, a kernel framework

History and motivation

Circa 2018, I decided that the Murgia Hack System needed a fresh start to support newer architectures.

MH's kernel is quite clean and simple, but suffers from an aging low level support. Incredibly, some of that i386 code can be traced back to my early experiments (in 1999!) and code that I wrote for my first SMP machine – a dual Pentium III bought in Akihabara in the early 2000s!

Unfortunately, emotional attachment to code doesn't create great engineering, and I had to start from scratch.

The driving principle behind this effort – that later became NUX – was to rationalise my kernel development.

At its core, a kernel is an executable, running in privileged mode. It's special because it handles exceptions, IRQs and syscalls, essentially events, so it can be seen as an event-based program. And it runs on multiple CPUs concurrently, we can even draw similarities with multi-threading.

The very annoying and often project specific part of a kernel is the bootstrap. A kernel usually starts in a mode that it's either very limited (think x86 legacy boot) or very different in terms of runtime (think EFI).

A kernel is thus required to set up its own data structures (and virtual memory), and then jump in it (through magic pieces of assembler called trampolines).

In a nutshell, NUX can be seen as an attempt to solve all the abovementioned problems that differentiate a kernel from a normal executable.

Solving the bootstrapping problem.

To solve the setup of the kernel executable data structures, NUX introduces APXH, an ELF bootloader. APXH – (upper case of αρχη), greek for beginning – is a portable bootloader whose goal is to load an ELF executable, create the page tables based on the ELF's Program Header, and jump to the entry point. It attempts to be the closest thing to an exec() you can possibly have at boot.

APXH also supports special program header entries, – such as Frame Buffer, 1:1 Physical Map, Boot Information page – that allows the kernel to immediately use system features discoverable at boot, further reducing low level initialisation.

APXH is extremely portable, and currently works on i386, AMD64 and RISCV64, and also supports booting from multiple environemnts, currently EFI, GRUB's multiboot and OpenSBI.

Creating an embedded executable: the need for a small libc.

In order to create an executable in C, you'll have to create against a C Runtime (crt) and a C Library.

This is why NUX introduces libec, an embedded quasi-standard libc.

libec is based on the NetBSD libc, guaranteeing extreme portability and simplicity. It is meant to be used as a small, embedded libc.

Every binary built by NUX – whether APXH, a NUX kernel, or the example kernel's userspace program – are all compiled against libec.

A kernel as a C executable.

As for any C-program, the kernel will have to define a main function, that is called after the C-runtime has initialised. The libec is complex enough to support constructors, so that you can, define initialisation functions that run before main.

A special function of NUX, that diverts from normal C-programs, is main_ap. This is a main funciton, that is called on secondary processors, that is other processors that are not the bootstrapping CPU.

Kernel entries as events.

As mentioned above, a kernel has to deal with requests from userspace and hardware events. In NUX, this is done by defining entry functions for these events.

The whole state of the running kernel can be defined by the actions of these entry functions.

A kernel entry has a uctxt passed as a parameters and returns a uctxt. uctxt is a User Context, the state of the userspace program. The kernel can modify the User Context passed as an argument and return the same one, or can return a completely new one.

The former is how system calls return a value, the latter is how you implement threads and process switches.

The NUX library interface

Finally, NUX provides three libraries:

  1. libnux: a machine-independent library that provides the higher level funcitonalities you need to develop a fully functional OS kernel. The 'libnux' interface is here.
  1. libhal: This is a machine-dependent layer. Exports a common interface to handle low level CPU functionalities. The HAL interface is here.
  1. libplt: This is a machine-dependent layer. Exports a common interface to handle low level Platform functionalities, such as device discovery, interrupt controller configuration and timer handling. The Platform Driver interface is here.

The separation between hal and plt is possibly a unique choice of NUX, and allows, as many other design choices of NUX, for a gradual and quick porting to new architectures.

For example, when the AMD64 support was added, the ACPI platform library needed no changes, as the CPU mode was different but the platform was exactly the same.

Similarly, an upcoming support for ACPI support for Risc-V consists mostly on expanding the ACPI libplt to support Risc-V specific tables and the different interrupt controllers.

A useful tool for kernel prototyping.

NUX goal is to remove the burden of bootstrapping a kernel. And be portable.

The hope is that NUX will be useful to others the same way it has been useful to me: experimenting with kernel and OS architectures, while skipping the hard part of low level initialisation and handling.

Full Article and Comments

Overview of a MH System.

In a system running MH, each process has its own local bus. The kernel mantains a global list of devices.
If it has enough permissions, a process can add a device to its own local bus and use it.

There are three broad kinds of devices:

  • User devices. These are devices created by processes, and it's the only inter-process communication mechanism present in the system.
  • Kernel Devices. These are special devices created by the kernel, used to provide kernel services to a process – timers, memory allocation, etc.
  • Hardware Devices. This is a mapping from real hardware to I/O devices. A special hardware device is the platform device, which gives access to the whole machine to a process.

The Process-Device interface

A process and a device communicate using three mechanisms:

  • IO ports. Similar to Intel I/O space, a process can write, or read, an inline value to a specified port of the device. The meaning assigned to these action is device specific.
  • DMA. A device can read or write directly to a process space. An I/O MMU mechanism is present so a device can only access memory explicitely exported by the process.
  • IRQs. A device can send interrupts to a process.

The MH Process/Device Interface

MH Process/Device Interface


Full Article and Comments

Virtual Memory Mappings States in MH

I like to go back at the drawing board. Gives you opportunity to think, to look at what you have, and to reconsider what you need.

The MH's memory map system (pmap) move to machine independent part – similar to the change that did this for the CPUs and IPIs management subsystem) – offered such opportunity for this key component.

Compared to other memory subsystems I have seen in the past, I always thought of MH's virtual memory to be extremely simple. So, before blindly porting a code that shows signs of ageing and where different strata starts emerging, I decided to look at the states of a virtual mapping. Afterall, I expected it to be simple.

The first layer was easy to draw:

Top-level view

I.e., a user PTE is either unmapped, or mapped to a hardware I/O page, or mapped to a memory page. The dot points to the default state, that is at start we expect all virtual memory to be unmapped.

Of course, it's not that simple. Mapped pages can be wired by the kernel, i.e. made unable to be unmapped by userspace programs. The act of wiring a page is still initiated by the user, by exporting a memory to a hwdev. Wiring is needed because exporting a page to a hwdev allows doing DMA on that memory region.

An updated graph will look like this:

Refinement 1

Which is – still – very simple. The square represents what's left of the node called Mapped in the earlier figure.

But this is ignoring something really important: copy-on-write. MH supports forking of processes. As essentially any other OS who supports paging, it uses copy-on-write as an optimization for duplicating address spaces: instead of being copied, the memory is shared but set read only.

When a process tries to write to a shared page, it will generate a page fault, which will be forwarded as an interrupt to the process. It is userspace responsibility to create a new page, copying its content, and substitute the read-only shared mapping with a writable copy.

This then means that this state, RO Shared, can only be changed by userspace by unmapping it. There is an exception of course, which is due to an optimization: when all processes but one have all created their own copies, the page is effectively not shared anymore, and can be re-made private.

An important note to add at this point is that copy-on-write is the only case in MH where memory is shared between processes.

Updating the graph yields:

Refinement 2

This starts to look interesting, and messy enough to look real. It is not complete, yet.

As said earlier, the read/write information is lost during the process of copy-on-write, so in the current implementation the page returns into being private, but read-only, no matter the original state. This is not a problem, though, as long as the userspace process can handle unexpected RO page faults by fixing them back to writable. The MRG system library supports this.

Adding details about R/W state brings to a more complete graph, with clear signs of state explosion:

Refinement 3

Is this complete? Of course not. We're missing information about the executable state of a page. A page can be executable or not, and as long as it is private and not wired its state can be changed.

I won't display this, though, as it is essentially an orthogonal state with regards to the lifetime of a page.

Full Article and Comments

History and motivation

As opposed to other kernels and microkernels – probably –, MH is based on a completely random ideology, picked arbitrarily, in a Cambridge pub, after evidently too many beers.

Unimpressed by the lack of shape in modern software, some day in 2014 I thought that it would be really cool to build a system made of tangible abstractions. A system described in terms of objects that can be very easily understood would be – I decided – very pleasant to play with, and to use as a base for complex systems!

A system built with a single tangible abstraction – I continued – would be even more pleasant and simple!

Abstractions and inspirations

The search for that abstraction wasn't easy. Ruled out exokernels and L4 pretty quickly, I decided to have a look at the classics.

Mach is beautiful, if you don't look at the code. I had my share of fun hacking it and it is definitely made of abstraction that are easy to tinker with, and that have proven themselves definitely capable of building complex systems since 1985. But its abstractions are not clearly linkable to well defined existing objects: every introduction to Mach needs to explain what is a port, and port sets, and port rights, and memory objects.

The beauty of Mach, and what I wanted to take from it, is that it defines its basic abstractions, and a set of principles, and maps every possible activity of a computer into these abstractions. Mach calls you into experimenting with it. That's what I wanted to have.

Another system I wanted to steal from is the UNIX operating system. "Everything is a file", despite being a lie since at least the addition of networking, is an incredibly powerful principle.

The early UNIXes loosely presented to the user a model of what a machine was at the time: a single cpu, interrupts (signals), a disk (filesystem), and a terminal.

World views

The world of a userspace program in Mach is made of ports and memory objects, in UNIX is that of a simplified computer. I liked the latter. A computer is understandable by a programmer.

I decided to move toward a system that presented something familiar, a UNIX process model, in a world where a computer is not made only of internal disks and not many terminals are around. Furthermore, I wanted to achieve an extensibility similar to that of Mach by letting userspace processes export the same abstractions that the kernel uses to export its services. And finally, I wanted a system fun to use and extend.

Full Article and Comments