Foreword
Hermes is a capability-based microkernel. As a microkernel, Hermes is responsible for certain core operations of an operating system, including:
- Physical memory allocation
- Address space isolation
- Task scheduling
- Interprocess communication
- Interrupt handling and dispatch
- Capability rights enforcement
- Userspace I/O port mediation
- Userspace MMIO mediation
Hermes is a low-level part of a broader operating system design and requires a separate set of drivers and low-level services to provide a higher-level application platform (e.g. POSIX).
Hermes is designed to replace its predecessor, the Helios microkernel.
Important
This documentation is a work in progress. Pages indicated with “🔸” in the sidebar are stubs.
Community
The source code, bug tracker, and mailing lists for Hermes are available on SourceHut.
We meet in #ares on Libera Chat to collaborate and discuss Hermes development.
High-level design
The high-level design of Hermes is to a certain extent inspired by the design of the seL4 kernel, but with many notable differences.
Hermes schedules tasks to execute user code on available CPUs. Each task is provided with a capability space and an address space, either of which may be shared with other tasks in whole or in part, allowing for the implementation of threads, processes, or shared memory IPC.
The capabilities which are available in a task’s capability space define the scope of I/O and IPC operations available to that task, and task isolation is achieved by curating the list of capabilities available to a task. These capabilities may be based on IPC, to communicate with services running in other tasks or processes, or may offer other rights, such as access to IRQs and memory-mapped I/O, to facilitate the implementation of drivers in userspace.
Getting Hermes
The source code for Hermes is available on SourceHut under the GNU GPLv3 license. Information about dependencies and compiling the kernel are provided in the repository.
Booting Hermes
Bootloaders are provided for each supported target in the kernel repository. See System initialization for information about giving the kernel some work to do once booted.
x86_64 EFI
An EFI bootloader is available in ./boot/efi and will be compiled if
ENABLE_EFI=1 in your config.mk file. The built EFI bootloader is written to
./boot/efi/bootx64.efi and should be installed at /EFI/boot/bootx64.efi on
the EFI System Partition of your boot media.
The EFI bootloader loads the kernel from /hermes of the boot media and loads
boot modules from /modules in (ASCII) alphabetical order. The first boot
module is used as init by the kernel.
x86_64 multiboot
A multiboot-compatible legacy BIOS bootloader is provided in ./boot/multiboot
and will be compiled if ENABLE_LEGACY=1 in your config.mk file. The
bootloader is written to ./boot/multiboot/sysboot.mb and should be loaded by a
multiboot-compatible bootloader such as syslinux or grub.
Multiboot modules are loaded as boot modules and passed to the kernel in the order defined by the multiboot environment. The first boot module is used as init by the kernel.
System design
Hermes is designed primarily around the use of capabilities, which represent an unforgeable object which offers its bearer various operations associated with a kernel object, such as a page of memory or an address space.
Capabilities represent rights for various object types, including resources managed by the kernel or IPC objects used to communicate with services and other processes. Capabilities supported by the kernel are enumerated and documented in the Capability API.
The following verbs are associated with capabilities:
- Send: A send operation sends a message to a capability.
- Recv: A receive operation receives a message from a capability.
- Call: A call operation sends a message to a capability and then blocks until a reply is received.
- Reply: Replying to a call will unblock the sender and deliver the outcome of an operation to it.
- Invoke: Calling, receiving, or replying are all ways to invoke that capability.
- Transfer: IPC interactions via endpoints may cause capabilities to be transferred, copying or moving them from one task to another.
A capability resides in a capability slot, or “cslot”. Most capabilities reside in a capability space, or “CSpace”, which provides addressable storage for capability slots.
Memory management
Memory capabilities provide access to general-purpose memory to the bearer, and from these capabilities, various kernel objects may be allocated, such as address spaces and page tables, tasks, IPC objects, and so on.
On startup, after loading the kernel and the user init program, all remaining general purpose memory is enumerated and provided to the user init program in the form of these Memory capabilities. The user init program may allocate resources from their Memory capabilities, subdivide them into further Memory capabilities, or transfer them to other processes.
Allocating objects
Objects are allocated via Memory::ALLOC. Some objects have a fixed size,
such as Tasks, while others accept a size parameter that governs the size of the
resulting object. For example, a CSpace has a radix parameter that determines
the number of available capability slots.
Reference counts
Objects allocated from memory capabilities have reference counts. Copying or destroying the capability associated with an object generally updates its reference count. The maximum number of references to any kernel object is 256, further attempts to copy capabilities that reference such objects will result in ERANGE errors.
Overhead
Each memory area stores a bitmap of allocations and a list of reference counts, as well as some metadata. The overhead is subtracted from the total amount of available memory in a Memory capability.
Contiguous allocations
Over time, a Memory capability can become fragmented, and allocations are not guaranteed to occupy continuous ranges of physical memory.
If contiguous allocations are required, for example to establish a large area for DMA, it is recommended to allocate a new Memory capability and then allocate Pages from that capability. Allocations on a new Memory capability are guaranteed to be contiguous until the first object is freed, after which point the allocations may be non-contiguous.
Object sizes
The amount of memory required for a kernel object is either a fixed size, or is a function of its initialization parameters. The semantics for memory allocation are described on each allocatable capability’s page in the documentation.
By scrutinizing these semantics, userspace programs may reckon the amount of
memory available in a Memory capability before and after any allocation. This
information may also be queried at runtime via Memory::GET_FREE_PAGES.
Capability spaces
Capability spaces (CSpaces) are kernel resources designed to hold an addressable table of capability slots, or “cslots”, in which capabilities are stored that can be used by tasks which can address their enclosing cspace.
Tasks refer to a capability stored in their cspace via a capability address, or “caddr”, and pass these addresses in as arguments to system calls to perform operations against the capabilities at the corresponding address.
A capability slot is 26 bits (64 bytes) and a capability space stores 2R capability slots, where the radix, R, is configurable at the time the CSpace is allocated. A CSpace with a radix of 8, for example, stores 256 capability slots, which requires 16 KiB of memory. The last slot of any CSpace is reserved for the kernel to store runtime state concerning the CSpace.
Capability slot operations
A capability space provides a number of functions which operate on the capability slots in the CSpace. These include:
- Destroy: Destroys a capability and replaces its capability slot with a Null capability. The reference count of the resource in question is decremented, and if there are no further references, the resource is reclaimed (e.g. by returning its memory to a Memory capability). All capabilities can be destroyed but the last reference to a capability which is in-use cannot always be destroyed; for example one must unmap all pages in an address space before destroying the last reference to a VSpace capability.
- Copy: Copies a capability from one slot to another, incrementing the reference count. Some capabilities can be copied as often as one wishes, but most support only up to 256 copies before ERANGE is returned. Some capability types cannot be copied (namely Reply capabilities).
- Move: Moves a capability from one capability slot to another without updating the reference count, leaving a Null capability in the source slot. All capabilities can be moved.
Capability addressing
A CSpace may contain capabilities for other CSpaces among its cslots. Through the capability addressing system, one may address capabilities stored in nested CSpaces. These addresses are resolved in a manner similar to seL4 through guarded page tables.
A CSpace capability contains both the radix of the cspace, denoting how many capability slots it contains, as well as a guard value with a variable number of bits. Resolving a capability address through possibly nested CSpaces takes their guards into account, so that a CSpace may be constructed to resolve addresses in any manner the user wishes.
When looking up a capability from its address, the kernel begins with the root CSpace associated with the caller’s task. It checks the most significant bits of the address against the guard bits of the capability, and, if they match, shifts it away and uses the next R bits to identify a capability slot. If any unprocessed bits remain in the address, and the indicated slot stores a CSpace capability, the process continues with that CSpace.
Consider the following set of CSpaces:
CSpace A holds a capability that references CSpace B, and CSpace B holds a
reference to CSpace C. Each has a variety of radii and guards. For the task
shown here to invoke the Page capability in CSpace C, the 32-bit capability
address 0x5DE1F0CA is used.
The first four bits are matched against the 4-bit guard of CSpace A, 0b0101
(0x5), then the next 8 bits are used to identify a slot among the
28 slots in CSpace A. 20 bits remain, 0x1F0CA, and so resolution
continues from CSpace B. CSpace B has a radix of 4 and no guard bits, so the
next 4 most-significant bits, 0x1, are used to identify a slot. 16 bits remain,
so the resolver continues to CSpace C and resolves the guard bits 0b11110000
(0xF0). The final 8 bits identify one of the 28 slots in CSpace C,
namely slot 0xCA, where the Page capability resides.
By arranging capability spaces with the necessary radii and guard bits, one may construct any desired 32-bit capability address space.
Address spaces
Tasks
Interprocess Communication (IPC)
Fault handling
Memory-mapped I/O and I/O ports
Interrupt processing
System initialization
When Hermes is booted, among its various roles in initializing the system and various kernel subsystems, the kernel will load and execute a user-provided program image as the first task, referred to as the “init” task, or simply “init”.
User image loading
The user image is an ELF executable file built for the system architecture. The image is loaded using the smallest pages supported on the target architecture (usually 4KiB) and each program header must be located at a page boundary.
The kernel allocates the user image pages (and any intermediate page tables necessary to map them) from the largest available Memory capability on the system. The kernel additionally allocates and maps a stack for the init process.
Warning
The kernel makes no attempt to initialize thread-local storage for the init task.
Bootinfo
Information about the system, boot environment, and the capabilities allocated to init during system startup is available to userspace through the bootinfo structure.
Initial task environment
The init task is configured with a CSpace which is populated with capabilities that enumerate all of the resources available on the system.
The initial task affinity is set to the first CPU in the bootinfo’s cpu_info list.
x86_64-specific details
Registers are initialized as follows:
| Register | Initial value | Purpose |
|---|---|---|
| %rip | ELF entry point | Entry point |
| %rsp | 0x7fff80000000 | Initial stack (grows down) |
| %rdi | 0x7fff80000000 | Bootinfo address (grows up) |
Initial CSpace
The init task receives a CSpace allocated with a fixed radix of 16, which can
store up to 65,536 capabilities. The initial CSpace has 16 guard bits set to
0x0000, allowing the init task to address capabilities using their index in
the initial CSpace.
The init task receives the following capabilities at fixed capability addresses:
| caddr | Type | Notes |
|---|---|---|
| 0x0000 | CSPACE | Init task CSpace |
| 0x0001 | VSPACE | Init task VSpace |
| 0x0002 | TASK | Init task |
| 0x0008-0x0010 | Varies | Arch-specific capabilities |
| 0x0010+ | Varies | Dynamically allocated capabilities |
The dynamically allocated capabilities include a variable number of capabilities for Memory areas installed on the system, page and page table capabilities for the mapped user image, stack, and bootinfo, and so on. The address assignments are noted in the bootinfo structure.
Any unused capability slots contain Null capabilities.
x86_64-specific initial capabilities
The init task on x86_64 receives the following additional capabilities:
| caddr | Type | Notes |
|---|---|---|
| 0x0008 | IOCONTROL | Global I/O control |
| 0x0009 | IRQCONTROL | Global IRQ control |
Note
All unallocated capability addresses below address 0x0010 are reserved for future use and should not be used by the init task.
Bootinfo
The bootinfo structure is stored at a fixed address, which is passed to the init task via the first parameter using the system ABI. This structure describes the boot environment, resources available to the user, and the allocation of capabilities required to bootstrap the init task, as well as other architecture-specific information.
// Information passed to the init task about the boot environment.
export type bootinfo = struct {
argv: str,
// Capability ranges
// Page capabilities for the bootinfo structure
bootinfo: cap_range,
// Memory capabilities
memory: cap_range,
// DeviceMemory capabilities
devmem: cap_range,
// Page capabilities for the user image
userimage: cap_range,
// Page capabilities for the user stack
stack: cap_range,
// Unallocated (Null) capabilities
unused: cap_range,
// Description of the Memory capabilities in the memory cap_range, in order
memory_info: []memory_desc,
// Description of the DeviceMemory capabilities in the devmem cap_range, in
// order
devmem_info: []memory_desc,
// Information about the installed CPUs
cpu_info: []cpu_desc,
// Arch-specific details
arch_bootinfo,
};
// Indicates a range of capabilities.
export type cap_range = struct {
// Inclusive
start: u32,
// Exclusive
end: u32,
};
// A description of a region of memory.
export type memory_desc = struct {
phys: uintptr,
pages: uint,
};
x86_64 bootinfo
// x86_64-specific boot information
export type arch_bootinfo = struct {
// Page table capabilities used to load the user image
pdpt: cap_range,
pd: cap_range,
pt: cap_range,
// TSC rate in Hz
tsc_rate: u64,
// Framebuffer provisioned by bootloader, if applicable
fb: bootfb,
};
// Details about a CPU core installed on the system.
export type cpu_desc = struct {
id: u32,
};
// Framebuffer provided by bootloader.
export type bootfb = struct {
// fb_base is set to zero if no framebuffer was prepared by the bootloader
fb_base: uintptr,
fb_size: size,
fmt: pixel_format,
width: u32,
height: u32,
stride: u32,
};
// Framebuffer pixel format.
export type pixel_format = enum uint {
RGBX8,
BGRX8,
};
System calls
Access to kernel resources and IPC primitives is accomplished primarily through system calls (syscalls). There are a small number of syscalls that are used primarily to interact with capabilities. Note that the syscall API differs from the capability API, which is largely built on top of of SYS_call.
Common ABI types
Type declarations for Hare code are provided by the uapi module in the kernel source tree.
| Type | Description |
|---|---|
| u8 | 8-bit unsinged integer |
| u16 | 16-bit unsinged integer |
| u32 | 32-bit unsinged integer |
| u64 | 64-bit unsinged integer |
| uaddr | 64-bit memory address |
| caddr | 32-bit capability address |
| ctype | 8-bit capability type |
Capability types
| ID | Type | Description |
|---|---|---|
| 0 | NULL | Empty capability slot |
| 1 | MEMORY | General-purpose memory |
| 2 | CSPACE | Capability space |
| 3 | VSPACE | Virtual address space |
| 4 | TASK | Schedulable task |
| 5 | PAGE | Page of memory |
| 6 | NOTIFICATION | IPC notification |
| 7 | ENDPOINT | IPC endpoint |
| 8 | REPLY | IPC reply |
x86_64 specific
| ID | Type | Description |
|---|---|---|
| 9 | PDPT | Page-directory pointer table |
| 10 | PD | Page directory |
| 11 | PT | Page table |
| 12 | IOCONTROL | I/O control |
| 13 | IOPORT | I/O port |
| 14 | IRQCONTROL | IRQ control |
| 15 | IRQ | IRQ |
x86_64 ABI
The x86_64 syscall ABI is based on the System-V ABI. All arguments are passed in registers. Note that floating point registers are not used by the ABI (and are preserved by the kernel).
Input registers
| Register | Purpose |
|---|---|
| %rax | Syscall number (8 bits) & flags |
| %rdi | (a1) 1st argument register |
| %rsi | (a2) 2nd argument register |
| %rdx | (a3) 3rd argument register |
| %r10 | (a4) 4th argument register1 |
| %r8 | (a5) 5th argument register |
| %r9 | (a6) 6th argument register |
Output registers
| Register | Purpose |
|---|---|
| %rax | Syscall outcome (8 bits) + 56 syscall-specific bits |
| %rsi | (r1) 1st return register |
| %rdx | (r1) 2nd return register |
The syscall outcome is 0 on success or an error code on failure.
Other registers
| Register | Purpose |
|---|---|
| %r12-%r15 | Kernel-saved |
| %rbp | Kernel-saved |
| %rbx | Kernel-saved |
| %fs, %gs | Kernel-saved |
Note
Certain syscalls, notably SYS_call and SYS_recv, differ from the standard register allocation.
Error codes
The following error codes are defined:
| Code | Name | Description |
|---|---|---|
| 0 | n/a | Indicates a successful outcome. |
| 1 | EWRONGTYPE | An incorrect resource type was used in an operation. |
| 2 | ENOMEM | Insufficient memory available for requested operation. |
| 3 | EINVALID | An invalid parameter was supplied for an operation. |
| 4 | EINVALCADDR | An invalid capability address was used. |
| 5 | ERANGE | A parameter exceeds the permissible range. |
| 6 | EEXIST | A resource already exists at the given address. |
| 7 | ENOENT | A required resource does not exist. |
| 8 | EBUSY | A required resource is currently in use. |
| 9 | ENOTSUP | The requested operation is not supported. |
| 10 | ENOSYS | Invalid syscall or function. |
| 11 | EFAULT | Use of invalid address. |
| 12 | EL1PT | VSpace mapping is missing a required level 1 page table. |
| 13 | EL2PT | VSpace mapping is missing a required level 2 page table. |
| 14 | EL3PT | VSpace mapping is missing a required level 3 page table. |
| 15 | EL4PT | VSpace mapping is missing a required level 4 page table. |
| 16 | EDESTROYED | A resource was destroyed during the operation. |
-
Note that the System-V ABI assigns the 4th argument to %rcx. ↩
SYS_schedule
Yields the current task slice to the scheduler.
Inputs
None.
Outputs
None.
SYS_cpu_stat
Returns timing statistics about a given CPU.
Inputs
| Register | Type | Purpose |
|---|---|---|
| a1 | u32 | Target CPU ID |
| a2 | uaddr | cpu_stat structure |
The ID of each CPU is noted in the bootinfo structure’s cpu_info list.
cpu_stat structure
Each field of the cpu_stat structure represents the amount of time this CPU has spent in each state from an arbitrarily defined epoch. Time increases monotonically at the rate defined by the tick_rate parameter of the bootinfo structure.
// CPU usage statistics.
export type cpu_stat = struct {
idle: u64,
user: u64,
kern: u64,
};
Outputs
The cpu_stat structure is filled in with the latest timing info.
Errors
- ENOENT: No CPU by this ID was found
- EFAULT: The address of the cpu_stat structure is not valid
SYS_ident
Identifies a capability type.
Inputs
| Register | Type | Purpose |
|---|---|---|
| a1 | caddr | Capability address |
Outputs
| Register | Type | Purpose |
|---|---|---|
| r1 | ctype | Capability type |
Errors
- EINVALCADDR: Capability address is invalid