Foreword

Hermes is a capability-based microkernel. As a microkernel, Hermes is responsible for certain core operations of an operating system, including:

Physical memory allocation
Address space isolation
Task scheduling
Interprocess communication
Interrupt handling and dispatch
Capability rights enforcement
Userspace I/O port mediation
Userspace MMIO mediation

Hermes is a low-level part of a broader operating system design and requires a separate set of drivers and low-level services to provide a higher-level application platform (e.g. POSIX).

Hermes is designed to replace its predecessor, the Helios microkernel.

Important

This documentation is a work in progress. Pages indicated with “🔸” in the sidebar are stubs.

Community

The source code, bug tracker, and mailing lists for Hermes are available on SourceHut.

We meet in #ares on Libera Chat to collaborate and discuss Hermes development.

High-level design

The high-level design of Hermes is to a certain extent inspired by the design of the seL4 kernel, but with many notable differences.

Hermes schedules tasks to execute user code on available CPUs. Each task is provided with a capability space and an address space, either of which may be shared with other tasks in whole or in part, allowing for the implementation of threads, processes, or shared memory IPC.

The capabilities which are available in a task’s capability space define the scope of I/O and IPC operations available to that task, and task isolation is achieved by curating the list of capabilities available to a task. These capabilities may be based on IPC, to communicate with services running in other tasks or processes, or may offer other rights, such as access to IRQs and memory-mapped I/O, to facilitate the implementation of drivers in userspace.

Getting Hermes

The source code for Hermes is available on SourceHut under the GNU GPLv3 license. Information about dependencies and compiling the kernel are provided in the repository.

Booting Hermes

Bootloaders are provided for each supported target in the kernel repository. See System initialization for information about giving the kernel some work to do once booted.

x86_64 EFI

An EFI bootloader is available in ./boot/efi and will be compiled if ENABLE_EFI=1 in your config.mk file. The built EFI bootloader is written to ./boot/efi/bootx64.efi and should be installed at /EFI/boot/bootx64.efi on the EFI System Partition of your boot media.

The EFI bootloader loads the kernel from /hermes of the boot media and loads boot modules from /modules in (ASCII) alphabetical order. The first boot module is used as init by the kernel.

x86_64 multiboot

A multiboot-compatible legacy BIOS bootloader is provided in ./boot/multiboot and will be compiled if ENABLE_LEGACY=1 in your config.mk file. The bootloader is written to ./boot/multiboot/sysboot.mb and should be loaded by a multiboot-compatible bootloader such as syslinux or grub.

Multiboot modules are loaded as boot modules and passed to the kernel in the order defined by the multiboot environment. The first boot module is used as init by the kernel.

System design

Hermes is designed primarily around the use of capabilities, which represent an unforgeable object which offers its bearer various operations associated with a kernel object, such as a page of memory or an address space.

Capabilities represent rights for various object types, including resources managed by the kernel or IPC objects used to communicate with services and other processes. Capabilities supported by the kernel are enumerated and documented in the Capability API.

The following verbs are associated with capabilities:

Send: A send operation sends a message to a capability.
Recv: A receive operation receives a message from a capability.
Call: A call operation sends a message to a capability and then blocks until a reply is received.
Reply: Replying to a call will unblock the sender and deliver the outcome of an operation to it.
Invoke: Calling, receiving, or replying are all ways to invoke that capability.
Transfer: IPC interactions via endpoints may cause capabilities to be transferred, copying or moving them from one task to another.

A capability resides in a capability slot, or “cslot”. Most capabilities reside in a capability space, or “CSpace”, which provides addressable storage for capability slots.

Memory management

Memory capabilities provide access to general-purpose memory to the bearer, and from these capabilities, various kernel objects may be allocated, such as address spaces and page tables, tasks, IPC objects, and so on.

On startup, after loading the kernel and the user init program, all remaining general purpose memory is enumerated and provided to the user init program in the form of these Memory capabilities. The user init program may allocate resources from their Memory capabilities, subdivide them into further Memory capabilities, or transfer them to other processes.

Allocating objects

Objects are allocated via Memory::ALLOC. Some objects have a fixed size, such as Tasks, while others accept a size parameter that governs the size of the resulting object. For example, a CSpace has a radix parameter that determines the number of available capability slots.

Reference counts

Objects allocated from memory capabilities have reference counts. Copying or destroying the capability associated with an object generally updates its reference count. The maximum number of references to any kernel object is 256, further attempts to copy capabilities that reference such objects will result in ERANGE errors.

Overhead

Each memory area stores a bitmap of allocations and a list of reference counts, as well as some metadata. The overhead is subtracted from the total amount of available memory in a Memory capability.

Contiguous allocations

Over time, a Memory capability can become fragmented, and allocations are not guaranteed to occupy continuous ranges of physical memory.

If contiguous allocations are required, for example to establish a large area for DMA, it is recommended to allocate a new Memory capability and then allocate Pages from that capability. Allocations on a new Memory capability are guaranteed to be contiguous until the first object is freed, after which point the allocations may be non-contiguous.

Object sizes

The amount of memory required for a kernel object is either a fixed size, or is a function of its initialization parameters. The semantics for memory allocation are described on each allocatable capability’s page in the documentation.

By scrutinizing these semantics, userspace programs may reckon the amount of memory available in a Memory capability before and after any allocation. This information may also be queried at runtime via Memory::GET_FREE_PAGES.

Capability spaces

Capability spaces (CSpaces) are kernel resources designed to hold an addressable table of capability slots, or “cslots”, in which capabilities are stored that can be used by tasks which can address their enclosing cspace.

Tasks refer to a capability stored in their cspace via a capability address, or “caddr”, and pass these addresses in as arguments to system calls to perform operations against the capabilities at the corresponding address.

A capability slot is 2⁶ bits (64 bytes) and a capability space stores 2^R capability slots, where the radix, R, is configurable at the time the CSpace is allocated. A CSpace with a radix of 8, for example, stores 256 capability slots, which requires 16 KiB of memory. The last slot of any CSpace is reserved for the kernel to store runtime state concerning the CSpace.

Capability slot operations

A capability space provides a number of functions which operate on the capability slots in the CSpace. These include:

Destroy: Destroys a capability and replaces its capability slot with a Null capability. The reference count of the resource in question is decremented, and if there are no further references, the resource is reclaimed (e.g. by returning its memory to a Memory capability). All capabilities can be destroyed but the last reference to a capability which is in-use cannot always be destroyed; for example one must unmap all pages in an address space before destroying the last reference to a VSpace capability.
Copy: Copies a capability from one slot to another, incrementing the reference count. Some capabilities can be copied as often as one wishes, but most support only up to 256 copies before ERANGE is returned. Some capability types cannot be copied (namely Reply capabilities).
Move: Moves a capability from one capability slot to another without updating the reference count, leaving a Null capability in the source slot. All capabilities can be moved.

Capability addressing

A CSpace may contain capabilities for other CSpaces among its cslots. Through the capability addressing system, one may address capabilities stored in nested CSpaces. These addresses are resolved in a manner similar to seL4 through guarded page tables.

A CSpace capability contains both the radix of the cspace, denoting how many capability slots it contains, as well as a guard value with a variable number of bits. Resolving a capability address through possibly nested CSpaces takes their guards into account, so that a CSpace may be constructed to resolve addresses in any manner the user wishes.

When looking up a capability from its address, the kernel begins with the root CSpace associated with the caller’s task. It checks the most significant bits of the address against the guard bits of the capability, and, if they match, shifts it away and uses the next R bits to identify a capability slot. If any unprocessed bits remain in the address, and the indicated slot stores a CSpace capability, the process continues with that CSpace.

Consider the following set of CSpaces:

A diagram of a series of CSpaces

CSpace A holds a capability that references CSpace B, and CSpace B holds a reference to CSpace C. Each has a variety of radii and guards. For the task shown here to invoke the Page capability in CSpace C, the 32-bit capability address 0x5DE1F0CA is used.

The first four bits are matched against the 4-bit guard of CSpace A, 0b0101 (0x5), then the next 8 bits are used to identify a slot among the 2⁸ slots in CSpace A. 20 bits remain, 0x1F0CA, and so resolution continues from CSpace B. CSpace B has a radix of 4 and no guard bits, so the next 4 most-significant bits, 0x1, are used to identify a slot. 16 bits remain, so the resolver continues to CSpace C and resolves the guard bits 0b11110000 (0xF0). The final 8 bits identify one of the 2⁸ slots in CSpace C, namely slot 0xCA, where the Page capability resides.

By arranging capability spaces with the necessary radii and guard bits, one may construct any desired 32-bit capability address space.

Address spaces

Tasks

Interprocess Communication (IPC)

Fault handling

Memory-mapped I/O and I/O ports

Interrupt processing

System initialization

When Hermes is booted, among its various roles in initializing the system and various kernel subsystems, the kernel will load and execute a user-provided program image as the first task, referred to as the “init” task, or simply “init”.

User image loading

The user image is an ELF executable file built for the system architecture. The image is loaded using the smallest pages supported on the target architecture (usually 4KiB) and each program header must be located at a page boundary.

The kernel allocates the user image pages (and any intermediate page tables necessary to map them) from the largest available Memory capability on the system. The kernel additionally allocates and maps a stack for the init process.

Warning

The kernel makes no attempt to initialize thread-local storage for the init task.

Bootinfo

Information about the system, boot environment, and the capabilities allocated to init during system startup is available to userspace through the bootinfo structure.

Initial task environment

The init task is configured with a CSpace which is populated with capabilities that enumerate all of the resources available on the system.

The initial task affinity is set to the first CPU in the bootinfo’s cpu_info list.

x86_64-specific details

Registers are initialized as follows:

Register	Initial value	Purpose
%rip	ELF entry point	Entry point
%rsp	`0x7fff80000000`	Initial stack (grows down)
%rdi	`0x7fff80000000`	Bootinfo address (grows up)

Initial CSpace

The init task receives a CSpace allocated with a fixed radix of 16, which can store up to 65,536 capabilities. The initial CSpace has 16 guard bits set to 0x0000, allowing the init task to address capabilities using their index in the initial CSpace.

The init task receives the following capabilities at fixed capability addresses:

caddr	Type	Notes
0x0000	CSPACE	Init task CSpace
0x0001	VSPACE	Init task VSpace
0x0002	TASK	Init task
0x0008-0x0010	Varies	Arch-specific capabilities
0x0010+	Varies	Dynamically allocated capabilities

The dynamically allocated capabilities include a variable number of capabilities for Memory areas installed on the system, page and page table capabilities for the mapped user image, stack, and bootinfo, and so on. The address assignments are noted in the bootinfo structure.

Any unused capability slots contain Null capabilities.

x86_64-specific initial capabilities

The init task on x86_64 receives the following additional capabilities:

caddr	Type	Notes
0x0008	IOCONTROL	Global I/O control
0x0009	IRQCONTROL	Global IRQ control

Note

All unallocated capability addresses below address 0x0010 are reserved for future use and should not be used by the init task.

Bootinfo

The bootinfo structure is stored at a fixed address, which is passed to the init task via the first parameter using the system ABI. This structure describes the boot environment, resources available to the user, and the allocation of capabilities required to bootstrap the init task, as well as other architecture-specific information.

// Information passed to the init task about the boot environment.
export type bootinfo = struct {
	argv: str,

	// Capability ranges
	// Page capabilities for the bootinfo structure
	bootinfo: cap_range,
	// Memory capabilities
	memory: cap_range,
	// DeviceMemory capabilities
	devmem: cap_range,
	// Page capabilities for the user image
	userimage: cap_range,
	// Page capabilities for the user stack
	stack: cap_range,
	// Unallocated (Null) capabilities
	unused: cap_range,

	// Description of the Memory capabilities in the memory cap_range, in order
	memory_info: []memory_desc,
	// Description of the DeviceMemory capabilities in the devmem cap_range, in
	// order
	devmem_info: []memory_desc,
	// Information about the installed CPUs
	cpu_info: []cpu_desc,

	// Arch-specific details
	arch_bootinfo,
};

// Indicates a range of capabilities.
export type cap_range = struct {
	// Inclusive
	start: u32,
	// Exclusive
	end: u32,
};

// A description of a region of memory.
export type memory_desc = struct {
	phys: uintptr,
	pages: uint,
};

x86_64 bootinfo

// x86_64-specific boot information
export type arch_bootinfo = struct {
	// Page table capabilities used to load the user image
	pdpt: cap_range,
	pd: cap_range,
	pt: cap_range,

	// TSC rate in Hz
	tsc_rate: u64,

	// Framebuffer provisioned by bootloader, if applicable
	fb: bootfb,
};

// Details about a CPU core installed on the system.
export type cpu_desc = struct {
	id: u32,
};

// Framebuffer provided by bootloader.
export type bootfb = struct {
	// fb_base is set to zero if no framebuffer was prepared by the bootloader
	fb_base: uintptr,
	fb_size: size,
	fmt: pixel_format,
	width: u32,
	height: u32,
	stride: u32,
};

// Framebuffer pixel format.
export type pixel_format = enum uint {
	RGBX8,
	BGRX8,
};

System calls

Access to kernel resources and IPC primitives is accomplished primarily through system calls (syscalls). There are a small number of syscalls that are used primarily to interact with capabilities. Note that the syscall API differs from the capability API, which is largely built on top of of SYS_call.

Common ABI types

Type declarations for Hare code are provided by the uapi module in the kernel source tree.

Type	Description
u8	8-bit unsinged integer
u16	16-bit unsinged integer
u32	32-bit unsinged integer
u64	64-bit unsinged integer
uaddr	64-bit memory address
caddr	32-bit capability address
ctype	8-bit capability type

Capability types

ID	Type	Description
0	NULL	Empty capability slot
1	MEMORY	General-purpose memory
2	CSPACE	Capability space
3	VSPACE	Virtual address space
4	TASK	Schedulable task
5	PAGE	Page of memory
6	NOTIFICATION	IPC notification
7	ENDPOINT	IPC endpoint
8	REPLY	IPC reply

x86_64 specific

ID	Type	Description
9	PDPT	Page-directory pointer table
10	PD	Page directory
11	PT	Page table
12	IOCONTROL	I/O control
13	IOPORT	I/O port
14	IRQCONTROL	IRQ control
15	IRQ	IRQ

x86_64 ABI

The x86_64 syscall ABI is based on the System-V ABI. All arguments are passed in registers. Note that floating point registers are not used by the ABI (and are preserved by the kernel).

Input registers

Register	Purpose
%rax	Syscall number (8 bits) & flags
%rdi	(a1) 1st argument register
%rsi	(a2) 2nd argument register
%rdx	(a3) 3rd argument register
%r10	(a4) 4th argument register¹
%r8	(a5) 5th argument register
%r9	(a6) 6th argument register

Output registers

Register	Purpose
%rax	Syscall outcome (8 bits) + 56 syscall-specific bits
%rsi	(r1) 1st return register
%rdx	(r1) 2nd return register

The syscall outcome is 0 on success or an error code on failure.

Other registers

Register	Purpose
%r12-%r15	Kernel-saved
%rbp	Kernel-saved
%rbx	Kernel-saved
%fs, %gs	Kernel-saved

Note

Certain syscalls, notably SYS_call and SYS_recv, differ from the standard register allocation.

Error codes

The following error codes are defined:

Code	Name	Description
0	n/a	Indicates a successful outcome.
1	EWRONGTYPE	An incorrect resource type was used in an operation.
2	ENOMEM	Insufficient memory available for requested operation.
3	EINVALID	An invalid parameter was supplied for an operation.
4	EINVALCADDR	An invalid capability address was used.
5	ERANGE	A parameter exceeds the permissible range.
6	EEXIST	A resource already exists at the given address.
7	ENOENT	A required resource does not exist.
8	EBUSY	A required resource is currently in use.
9	ENOTSUP	The requested operation is not supported.
10	ENOSYS	Invalid syscall or function.
11	EFAULT	Use of invalid address.
12	EL1PT	VSpace mapping is missing a required level 1 page table.
13	EL2PT	VSpace mapping is missing a required level 2 page table.
14	EL3PT	VSpace mapping is missing a required level 3 page table.
15	EL4PT	VSpace mapping is missing a required level 4 page table.
16	EDESTROYED	A resource was destroyed during the operation.

Note that the System-V ABI assigns the 4th argument to %rcx. ↩

SYS_schedule

Yields the current task slice to the scheduler.

Inputs

None.

Outputs

None.

SYS_cpu_stat

Returns timing statistics about a given CPU.

Inputs

Register	Type	Purpose
a1	u32	Target CPU ID
a2	uaddr	cpu_stat structure

The ID of each CPU is noted in the bootinfo structure’s cpu_info list.

cpu_stat structure

Each field of the cpu_stat structure represents the amount of time this CPU has spent in each state from an arbitrarily defined epoch. Time increases monotonically at the rate defined by the tick_rate parameter of the bootinfo structure.

// CPU usage statistics.
export type cpu_stat = struct {
	idle: u64,
	user: u64,
	kern: u64,
};

Outputs

The cpu_stat structure is filled in with the latest timing info.

Errors

ENOENT: No CPU by this ID was found
EFAULT: The address of the cpu_stat structure is not valid

SYS_ident

Identifies a capability type.

Inputs

Register	Type	Purpose
a1	caddr	Capability address

Outputs

Register	Type	Purpose
r1	ctype	Capability type

Keyboard shortcuts

Hermes