Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Foreword

Hermes is a capability-based microkernel. As a microkernel, Hermes is responsible for certain core operations of an operating system, including:

  • Physical memory allocation
  • Address space isolation
  • Task scheduling
  • Interprocess communication
  • Interrupt handling and dispatch
  • Capability rights enforcement
  • Userspace I/O port mediation
  • Userspace MMIO mediation

Hermes is a low-level part of a broader operating system design and requires a separate set of drivers and low-level services to provide a higher-level application platform (e.g. POSIX).

Hermes is designed to replace its predecessor, the Helios microkernel.

Important

This documentation is a work in progress. Pages indicated with “🔸” in the sidebar are stubs.

Community

The source code, bug tracker, and mailing lists for Hermes are available on SourceHut.

We meet in #ares on Libera Chat to collaborate and discuss Hermes development.

High-level design

The high-level design of Hermes is to a certain extent inspired by the design of the seL4 kernel, but with many notable differences.

Hermes schedules tasks to execute user code on available CPUs. Each task is provided with a capability space and an address space, either of which may be shared with other tasks in whole or in part, allowing for the implementation of threads, processes, or shared memory IPC.

The capabilities which are available in a task’s capability space define the scope of I/O and IPC operations available to that task, and task isolation is achieved by curating the list of capabilities available to a task. These capabilities may be based on IPC, to communicate with services running in other tasks or processes, or may offer other rights, such as access to IRQs and memory-mapped I/O, to facilitate the implementation of drivers in userspace.

Getting Hermes

The source code for Hermes is available on SourceHut under the GNU GPLv3 license. Information about dependencies and compiling the kernel are provided in the repository.

Booting Hermes

Bootloaders are provided for each supported target in the kernel repository. See System initialization for information about giving the kernel some work to do once booted.

x86_64 EFI

An EFI bootloader is available in ./boot/efi and will be compiled if ENABLE_EFI=1 in your config.mk file. The built EFI bootloader is written to ./boot/efi/bootx64.efi and should be installed at /EFI/boot/bootx64.efi on the EFI System Partition of your boot media.

The EFI bootloader loads the kernel from /hermes of the boot media and loads boot modules from /modules in (ASCII) alphabetical order. The first boot module is used as init by the kernel.

x86_64 multiboot

A multiboot-compatible legacy BIOS bootloader is provided in ./boot/multiboot and will be compiled if ENABLE_LEGACY=1 in your config.mk file. The bootloader is written to ./boot/multiboot/sysboot.mb and should be loaded by a multiboot-compatible bootloader such as syslinux or grub.

Multiboot modules are loaded as boot modules and passed to the kernel in the order defined by the multiboot environment. The first boot module is used as init by the kernel.

System design

Hermes is designed primarily around the use of capabilities, which represent an unforgeable object which offers its bearer various operations associated with a kernel object, such as a page of memory or an address space.

Capabilities represent rights for various object types, including resources managed by the kernel or IPC objects used to communicate with services and other processes. Capabilities supported by the kernel are enumerated and documented in the Capability API.

The following verbs are associated with capabilities:

  • Send: A send operation sends a message to a capability.
  • Recv: A receive operation receives a message from a capability.
  • Call: A call operation sends a message to a capability and then blocks until a reply is received.
  • Reply: Replying to a call will unblock the sender and deliver the outcome of an operation to it.
  • Invoke: Calling, receiving, or replying are all ways to invoke that capability.
  • Transfer: IPC interactions via endpoints may cause capabilities to be transferred, copying or moving them from one task to another.

A capability resides in a capability slot, or “cslot”. Most capabilities reside in a capability space, or “CSpace”, which provides addressable storage for capability slots.

Memory management

Memory capabilities provide access to general-purpose memory to the bearer, and from these capabilities, various kernel objects may be allocated, such as address spaces and page tables, tasks, IPC objects, and so on.

On startup, after loading the kernel and the user init program, all remaining general purpose memory is enumerated and provided to the user init program in the form of these Memory capabilities. The user init program may allocate resources from their Memory capabilities, subdivide them into further Memory capabilities, or transfer them to other processes.

Allocating objects

Objects are allocated via Memory::ALLOC. Some objects have a fixed size, such as Tasks, while others accept a size parameter that governs the size of the resulting object. For example, a CSpace has a radix parameter that determines the number of available capability slots.

Reference counts

Objects allocated from memory capabilities have reference counts. Copying or destroying the capability associated with an object generally updates its reference count. The maximum number of references to any kernel object is 256, further attempts to copy capabilities that reference such objects will result in ERANGE errors.

Overhead

Each memory area stores a bitmap of allocations and a list of reference counts, as well as some metadata. The overhead is subtracted from the total amount of available memory in a Memory capability.

Contiguous allocations

Over time, a Memory capability can become fragmented, and allocations are not guaranteed to occupy continuous ranges of physical memory.

If contiguous allocations are required, for example to establish a large area for DMA, it is recommended to allocate a new Memory capability and then allocate Pages from that capability. Allocations on a new Memory capability are guaranteed to be contiguous until the first object is freed, after which point the allocations may be non-contiguous.

Object sizes

The amount of memory required for a kernel object is either a fixed size, or is a function of its initialization parameters. The semantics for memory allocation are described on each allocatable capability’s page in the documentation.

By scrutinizing these semantics, userspace programs may reckon the amount of memory available in a Memory capability before and after any allocation. This information may also be queried at runtime via Memory::GET_FREE_PAGES.

Capability spaces

Capability spaces (CSpaces) are kernel resources designed to hold an addressable table of capability slots, or “cslots”, in which capabilities are stored that can be used by tasks which can address their enclosing cspace.

Tasks refer to a capability stored in their cspace via a capability address, or “caddr”, and pass these addresses in as arguments to system calls to perform operations against the capabilities at the corresponding address.

A capability slot is 26 bits (64 bytes) and a capability space stores 2R capability slots, where the radix, R, is configurable at the time the CSpace is allocated. A CSpace with a radix of 8, for example, stores 256 capability slots, which requires 16 KiB of memory. The last slot of any CSpace is reserved for the kernel to store runtime state concerning the CSpace.

Capability slot operations

A capability space provides a number of functions which operate on the capability slots in the CSpace. These include:

  • Destroy: Destroys a capability and replaces its capability slot with a Null capability. The reference count of the resource in question is decremented, and if there are no further references, the resource is reclaimed (e.g. by returning its memory to a Memory capability). All capabilities can be destroyed but the last reference to a capability which is in-use cannot always be destroyed; for example one must unmap all pages in an address space before destroying the last reference to a VSpace capability.
  • Copy: Copies a capability from one slot to another, incrementing the reference count. Some capabilities can be copied as often as one wishes, but most support only up to 256 copies before ERANGE is returned. Some capability types cannot be copied (namely Reply capabilities).
  • Move: Moves a capability from one capability slot to another without updating the reference count, leaving a Null capability in the source slot. All capabilities can be moved.

Capability addressing

A CSpace may contain capabilities for other CSpaces among its cslots. Through the capability addressing system, one may address capabilities stored in nested CSpaces. These addresses are resolved in a manner similar to seL4 through guarded page tables.

A CSpace capability contains both the radix of the cspace, denoting how many capability slots it contains, as well as a guard value with a variable number of bits. Resolving a capability address through possibly nested CSpaces takes their guards into account, so that a CSpace may be constructed to resolve addresses in any manner the user wishes.

When looking up a capability from its address, the kernel begins with the root CSpace associated with the caller’s task. It checks the most significant bits of the address against the guard bits of the capability, and, if they match, shifts it away and uses the next R bits to identify a capability slot. If any unprocessed bits remain in the address, and the indicated slot stores a CSpace capability, the process continues with that CSpace.

Consider the following set of CSpaces:

A diagram of a series of CSpaces

CSpace A holds a capability that references CSpace B, and CSpace B holds a reference to CSpace C. Each has a variety of radii and guards. For the task shown here to invoke the Page capability in CSpace C, the 32-bit capability address 0x5DE1F0CA is used.

The first four bits are matched against the 4-bit guard of CSpace A, 0b0101 (0x5), then the next 8 bits are used to identify a slot among the 28 slots in CSpace A. 20 bits remain, 0x1F0CA, and so resolution continues from CSpace B. CSpace B has a radix of 4 and no guard bits, so the next 4 most-significant bits, 0x1, are used to identify a slot. 16 bits remain, so the resolver continues to CSpace C and resolves the guard bits 0b11110000 (0xF0). The final 8 bits identify one of the 28 slots in CSpace C, namely slot 0xCA, where the Page capability resides.

By arranging capability spaces with the necessary radii and guard bits, one may construct any desired 32-bit capability address space.

Address spaces

Tasks

Interprocess Communication (IPC)

Fault handling

Memory-mapped I/O and I/O ports

Interrupt processing

System initialization

When Hermes is booted, among its various roles in initializing the system and various kernel subsystems, the kernel will load and execute a user-provided program image as the first task, referred to as the “init” task, or simply “init”.

User image loading

The user image is an ELF executable file built for the system architecture. The image is loaded using the smallest pages supported on the target architecture (usually 4KiB) and each program header must be located at a page boundary.

The kernel allocates the user image pages (and any intermediate page tables necessary to map them) from the largest available Memory capability on the system. The kernel additionally allocates and maps a stack for the init process.

Warning

The kernel makes no attempt to initialize thread-local storage for the init task.

Bootinfo

Information about the system, boot environment, and the capabilities allocated to init during system startup is available to userspace through the bootinfo structure.

Initial task environment

The init task is configured with a CSpace which is populated with capabilities that enumerate all of the resources available on the system.

The initial task affinity is set to the first CPU in the bootinfo’s cpu_info list.

x86_64-specific details

Registers are initialized as follows:

RegisterInitial valuePurpose
%ripELF entry pointEntry point
%rsp0x7fff80000000Initial stack (grows down)
%rdi0x7fff80000000Bootinfo address (grows up)

Initial CSpace

The init task receives a CSpace allocated with a fixed radix of 16, which can store up to 65,536 capabilities. The initial CSpace has 16 guard bits set to 0x0000, allowing the init task to address capabilities using their index in the initial CSpace.

The init task receives the following capabilities at fixed capability addresses:

caddrTypeNotes
0x0000CSPACEInit task CSpace
0x0001VSPACEInit task VSpace
0x0002TASKInit task
0x0008-0x0010VariesArch-specific capabilities
0x0010+VariesDynamically allocated capabilities

The dynamically allocated capabilities include a variable number of capabilities for Memory areas installed on the system, page and page table capabilities for the mapped user image, stack, and bootinfo, and so on. The address assignments are noted in the bootinfo structure.

Any unused capability slots contain Null capabilities.

x86_64-specific initial capabilities

The init task on x86_64 receives the following additional capabilities:

caddrTypeNotes
0x0008IOCONTROLGlobal I/O control
0x0009IRQCONTROLGlobal IRQ control

Note

All unallocated capability addresses below address 0x0010 are reserved for future use and should not be used by the init task.

Bootinfo

The bootinfo structure is stored at a fixed address, which is passed to the init task via the first parameter using the system ABI. This structure describes the boot environment, resources available to the user, and the allocation of capabilities required to bootstrap the init task, as well as other architecture-specific information.

// Information passed to the init task about the boot environment.
export type bootinfo = struct {
	argv: str,

	// Capability ranges
	// Page capabilities for the bootinfo structure
	bootinfo: cap_range,
	// Memory capabilities
	memory: cap_range,
	// DeviceMemory capabilities
	devmem: cap_range,
	// Page capabilities for the user image
	userimage: cap_range,
	// Page capabilities for the user stack
	stack: cap_range,
	// Unallocated (Null) capabilities
	unused: cap_range,

	// Description of the Memory capabilities in the memory cap_range, in order
	memory_info: []memory_desc,
	// Description of the DeviceMemory capabilities in the devmem cap_range, in
	// order
	devmem_info: []memory_desc,
	// Information about the installed CPUs
	cpu_info: []cpu_desc,

	// Arch-specific details
	arch_bootinfo,
};

// Indicates a range of capabilities.
export type cap_range = struct {
	// Inclusive
	start: u32,
	// Exclusive
	end: u32,
};

// A description of a region of memory.
export type memory_desc = struct {
	phys: uintptr,
	pages: uint,
};

x86_64 bootinfo

// x86_64-specific boot information
export type arch_bootinfo = struct {
	// Page table capabilities used to load the user image
	pdpt: cap_range,
	pd: cap_range,
	pt: cap_range,

	// TSC rate in Hz
	tsc_rate: u64,

	// Framebuffer provisioned by bootloader, if applicable
	fb: bootfb,
};

// Details about a CPU core installed on the system.
export type cpu_desc = struct {
	id: u32,
};

// Framebuffer provided by bootloader.
export type bootfb = struct {
	// fb_base is set to zero if no framebuffer was prepared by the bootloader
	fb_base: uintptr,
	fb_size: size,
	fmt: pixel_format,
	width: u32,
	height: u32,
	stride: u32,
};

// Framebuffer pixel format.
export type pixel_format = enum uint {
	RGBX8,
	BGRX8,
};

System calls

Access to kernel resources and IPC primitives is accomplished primarily through system calls (syscalls). There are a small number of syscalls that are used primarily to interact with capabilities. Note that the syscall API differs from the capability API, which is largely built on top of of SYS_call.

Common ABI types

Type declarations for Hare code are provided by the uapi module in the kernel source tree.

TypeDescription
u88-bit unsinged integer
u1616-bit unsinged integer
u3232-bit unsinged integer
u6464-bit unsinged integer
uaddr64-bit memory address
caddr32-bit capability address
ctype8-bit capability type

Capability types

IDTypeDescription
0NULLEmpty capability slot
1MEMORYGeneral-purpose memory
2CSPACECapability space
3VSPACEVirtual address space
4TASKSchedulable task
5PAGEPage of memory
6NOTIFICATIONIPC notification
7ENDPOINTIPC endpoint
8REPLYIPC reply

x86_64 specific

IDTypeDescription
9PDPTPage-directory pointer table
10PDPage directory
11PTPage table
12IOCONTROLI/O control
13IOPORTI/O port
14IRQCONTROLIRQ control
15IRQIRQ

x86_64 ABI

The x86_64 syscall ABI is based on the System-V ABI. All arguments are passed in registers. Note that floating point registers are not used by the ABI (and are preserved by the kernel).

Input registers

RegisterPurpose
%raxSyscall number (8 bits) & flags
%rdi(a1) 1st argument register
%rsi(a2) 2nd argument register
%rdx(a3) 3rd argument register
%r10(a4) 4th argument register1
%r8(a5) 5th argument register
%r9(a6) 6th argument register

Output registers

RegisterPurpose
%raxSyscall outcome (8 bits) + 56 syscall-specific bits
%rsi(r1) 1st return register
%rdx(r1) 2nd return register

The syscall outcome is 0 on success or an error code on failure.

Other registers

RegisterPurpose
%r12-%r15Kernel-saved
%rbpKernel-saved
%rbxKernel-saved
%fs, %gsKernel-saved

Note

Certain syscalls, notably SYS_call and SYS_recv, differ from the standard register allocation.

Error codes

The following error codes are defined:

CodeNameDescription
0n/aIndicates a successful outcome.
1EWRONGTYPEAn incorrect resource type was used in an operation.
2ENOMEMInsufficient memory available for requested operation.
3EINVALIDAn invalid parameter was supplied for an operation.
4EINVALCADDRAn invalid capability address was used.
5ERANGEA parameter exceeds the permissible range.
6EEXISTA resource already exists at the given address.
7ENOENTA required resource does not exist.
8EBUSYA required resource is currently in use.
9ENOTSUPThe requested operation is not supported.
10ENOSYSInvalid syscall or function.
11EFAULTUse of invalid address.
12EL1PTVSpace mapping is missing a required level 1 page table.
13EL2PTVSpace mapping is missing a required level 2 page table.
14EL3PTVSpace mapping is missing a required level 3 page table.
15EL4PTVSpace mapping is missing a required level 4 page table.
16EDESTROYEDA resource was destroyed during the operation.

  1. Note that the System-V ABI assigns the 4th argument to %rcx.

SYS_schedule

Yields the current task slice to the scheduler.

Inputs

None.

Outputs

None.

SYS_cpu_stat

Returns timing statistics about a given CPU.

Inputs

RegisterTypePurpose
a1u32Target CPU ID
a2uaddrcpu_stat structure

The ID of each CPU is noted in the bootinfo structure’s cpu_info list.

cpu_stat structure

Each field of the cpu_stat structure represents the amount of time this CPU has spent in each state from an arbitrarily defined epoch. Time increases monotonically at the rate defined by the tick_rate parameter of the bootinfo structure.

// CPU usage statistics.
export type cpu_stat = struct {
	idle: u64,
	user: u64,
	kern: u64,
};

Outputs

The cpu_stat structure is filled in with the latest timing info.

Errors

  • ENOENT: No CPU by this ID was found
  • EFAULT: The address of the cpu_stat structure is not valid

SYS_ident

Identifies a capability type.

Inputs

RegisterTypePurpose
a1caddrCapability address

Outputs

RegisterTypePurpose
r1ctypeCapability type

Errors

  • EINVALCADDR: Capability address is invalid

SYS_signal

SYS_wait

SYS_call

SYS_recv

SYS_reply

Capabilities

General purpose capabilities

NULL

MEMORY

CSPACE

VSPACE

TASK

PAGE

NOTIFICATION

ENDPOINT

REPLY

x86_64-specific capabilities

PDPT

PD

PT

IOCONTROL

IOPORT

IRQCONTROL

IRQ

Examples