October 15, 2020 PB173/B Kernel Development P. Ročkai PB173/B Kernel Development 2/82 October 15, 2020 Organisation • you write a tiny operating system kernel • use this document as a ‘todo list’ and a springboard • use OSDev wiki, architecture manuals, specs, … • use the chat (lounge) to ask questions PB173/B Kernel Development 3/82 October 15, 2020 Grading • there are 6 suggested checkpoints • some have dependencies, others don’t • meet any 4 to pass the subject • feel free to negotiate different goals • set up any schedule you like PB173/B Kernel Development 4/82 October 15, 2020 Your Own OS (in 6 easy steps) 1. Booting 2. Memory 3. libc &c. 4. System Calls 5. Userland 6. Interrupts PB173/B Kernel Development 5/82 October 15, 2020 Resources • the OSDev wiki • OSKit • the m.br. book • open-source kernels ∘ Linux, *BSD ∘ MINIX 3 ∘ IncludeOS • pdclib, libc++, ... PB173/B Kernel Development 6/82 October 15, 2020 Non-Goals • writing a realistic kernel • portability • long-term maintainability • hardware / drivers • file systems • POSIX PB173/B Kernel Development 7/82 October 15, 2020 Goals • learn stuff & have fun • cross an item off your bucket list Technical goals (stuff to try) • something that boots • memory management basics • C++ in kernel space • kernel-user separation PB173/B Kernel Development 8/82 October 15, 2020 Platform • protected mode, 32 bit x86 • some assembly required (tm) • let’s not muck with cross toolchains • GRUB2 as the bootloader • qemu as the system emulator • serial port for IO PB173/B Kernel Development 9/82 October 15, 2020 Part 1: Booting PB173/B Kernel Development 10/82 October 15, 2020 The Boot Sequence • very platform-specific • on x86, either legacy or UEFI • all sorts of stuff elsewhere • man-years of work The Easy Way Out • GRUB with multiboot2 • not actually portable either :( PB173/B Kernel Development 11/82 October 15, 2020 Multiboot2 • lands you in protected mode • getting to C in under 10 instructions • module preloading • example in study materials PB173/B Kernel Development 12/82 October 15, 2020 Checkpoint 1: Part 1 • get a copy of GRUB and build it from source ∘ also grab xorriso to go with it • read through the multiboot2 spec • set up version control for your code • build the example multiboot kernel ∘ multiboot.tgz in study materials • ask questions PB173/B Kernel Development 13/82 October 15, 2020 Multiboot Modules • GRUB can load extra files for you • dump it at some location in memory • give you a list of the modules • and their load addresses / sizes PB173/B Kernel Development 14/82 October 15, 2020 Checkpoint 1: Part 2 • print a list of multiboot modules • load a text file as a module • copy the text to screen • we will use this later to load user programs PB173/B Kernel Development 15/82 October 15, 2020 Checkpoint 1: Part 3 • write a very simple serial port driver • https://wiki.osdev.org/Serial_ports • you will need inb and outb • do not use interrupt mode (for now) • this lets us get some user input • more details about this next week PB173/B Kernel Development 16/82 October 15, 2020 Assembly Syntax • immediate values get a $ prefix • registers get a % prefix • unprefixed numbers are addresses • opcode source, destination • note well that there are other conventions PB173/B Kernel Development 17/82 October 15, 2020 The Calling Convention • specific to C on x86 • scratch registers: eax, ecx, edx • return value is in eax mov 4(%esp), %eax // arg 1 mov 8(%esp), %edx // arg 2 // do stuff ret PB173/B Kernel Development 18/82 October 15, 2020 Sidenote: Calling C Functions pushl %edx pushl %eax pushl $fmt call printf addl $12, %esp // clean up arguments PB173/B Kernel Development 19/82 October 15, 2020 Wrapping I/O Instructions • read with inb port, register • and write with outb register, port • note the argument order • note the C argument order on the stack • maybe draw a picture PB173/B Kernel Development 20/82 October 15, 2020 Serial Port (RS 232) • https://wiki.osdev.org/Serial_ports • write inb and outb in assembly • so that they can be called from C • defining symbols in asm: .global foo • foo is then a standard label • don’t forget to write a C prototype for both PB173/B Kernel Development 21/82 October 15, 2020 Part 2: Memory PB173/B Kernel Development 22/82 October 15, 2020 Kernels vs Memory • physical memory • MMU and page tables • memory protection • dynamic memory in kernel PB173/B Kernel Development 23/82 October 15, 2020 MMU • part of the CPU • Memory Management Unit • responsible for memory protection • also virtual memory PB173/B Kernel Development 24/82 October 15, 2020 Address Types • physical – what shows up on the memory bus ∘ not directly accessible to (normal) software ∘ shows up as frame addresses in page tables • virtual ∘ normal pointers in C ∘ user-mode software only sees this ∘ managed by the OS PB173/B Kernel Development 25/82 October 15, 2020 Paging • physical memory is split into 4K frames • virtual memory is split into 4K pages • i.e. page is the content, frame is a place • pages can be moved in and out of frames PB173/B Kernel Development 26/82 October 15, 2020 Properties of Pages • each page is of a fixed and uniform size • pages have permission bits (read, write, execute) • page table decides which pages ’exist’ • the page table can be changed by the OS ∘ useful for context switching PB173/B Kernel Development 27/82 October 15, 2020 Aside: Segmentation • different memory protection scheme • variable-sized segments • specific use: code, stack & data segments • not used in modern systems • we will not use segmentation either PB173/B Kernel Development 28/82 October 15, 2020 Page Directory • first level of paging metadata • lives at a 4K-aligned physical address • the address of the PD lives in CR3 • lists 1024 pointers to 4K page tables PB173/B Kernel Development 29/82 October 15, 2020 Page Table • second level of paging metadata • also lives at 4K-aligned physical addresses • lists 1024 physical (frame) addresses ∘ the page may or may not be present in the frame ∘ the P bit decides this ∘ accessing a P-less pages traps PB173/B Kernel Development 30/82 October 15, 2020 Enabling Paging • paging must be explicitly enabled • you need to set up a page directory first • and the page tables to go with it • then load the physical address of PD into CR3 • and flip the PG and PE bits in CR0 PB173/B Kernel Development 31/82 October 15, 2020 Identity Mapping • portions of memory can be mapped 1:1 • those virtual addresses will be the same as physical • this is called identity mapping • makes your life easier, but limits your flexibility PB173/B Kernel Development 32/82 October 15, 2020 Reserved Physical Memory • there are areas you cannot touch • this includes BIOS data structures • the PCI address space • data on this is available from multiboot PB173/B Kernel Development 33/82 October 15, 2020 Memory Allocation • there are two levels of allocation in kernels • one deals with obtaining physical pages • another deals with fine-grained memory ∘ it is hard to live without malloc() ∘ linked lists, dynamic arrays &c. &c. PB173/B Kernel Development 34/82 October 15, 2020 Page Allocator • the page allocator can be quite simple • page size is uniform • the memory chunks are fairly big ∘ which makes metadata small in comparison ∘ there aren’t that many pages to be had PB173/B Kernel Development 35/82 October 15, 2020 Implementing malloc() • malloc works by subdividing bigger chunks of memory • userspace malloc() typically gets memory from mmap() • you can use the page allocator as a backend for malloc • alternative: fixed size memory area for kernel data ∘ simpler, but also less flexible PB173/B Kernel Development 36/82 October 15, 2020 How does malloc work? • many different approaches • often size-bucketed storage for small allocations ∘ per-bucket bump allocator ∘ per-bucket, inline free lists • alternative: pre-filled free lists • passthrough of big allocations (page-sized) PB173/B Kernel Development 37/82 October 15, 2020 Aside: Optimising malloc • consider cache interaction • free list used in FIFO or LIFO order? • separate per-thread arenas/pools • free still has to work cross-thread PB173/B Kernel Development 38/82 October 15, 2020 Checkpoint 2: Part 1 • set up page tables • identity-map your kernel • make those pages supervisor-only • write code to map/unmap user pages PB173/B Kernel Development 39/82 October 15, 2020 Checkpoint 2: Hints • you can implement most of page management in C • like with inb/outb, you need asm to flip cr3 ∘ and to change bits in cr0 • identity-mapping the kernel will save you a lot of trouble ∘ but you can do a bootstrap with physical/virtual split ∘ no bonus points for doing this PB173/B Kernel Development 40/82 October 15, 2020 Checkpoint 2: Part 2 • pick a range of addresses for kernel data • obtain physical memory reservations at boot time • write malloc for in-kernel use • also write free and realloc • if you feel adventurous, try a threadsafe implementation PB173/B Kernel Development 41/82 October 15, 2020 Checkpoint 2 Resources • https://wiki.osdev.org/Paging • https://wiki.osdev.org/Setting_Up_Paging • https://wiki.osdev.org/Page_Frame_Allocation • https://wiki.osdev.org/Memory_Allocation • x86 reference manual PB173/B Kernel Development 42/82 October 15, 2020 Part 3: libc &c. PB173/B Kernel Development 43/82 October 15, 2020 What is libc • provides ISO C library functions ∘ printf, scanf, strcmp, … ∘ malloc, free, … • and the POSIX syscall interface ∘ open, read, write PB173/B Kernel Development 44/82 October 15, 2020 Using libc in a Kernel • no system call interface • reduced file abstraction • malloc never fails? • what about thread support? PB173/B Kernel Development 45/82 October 15, 2020 Support for FILE • this includes printf and friends • it makes sense to tie this to console ∘ in our case, serial port • FILE does not need much ∘ only a few callbacks PB173/B Kernel Development 46/82 October 15, 2020 Kernel Threads • libc may contain pthread support • this is very much user-level • probably a bad idea to use this API in kernel • kernels still need mutexes and the like PB173/B Kernel Development 47/82 October 15, 2020 Porting libc • memory allocation (malloc) ∘ we did this last time • file abstraction (FILE *) • random platform glue ∘ exit, atexit, sleep, ... PB173/B Kernel Development 48/82 October 15, 2020 Porting libc++ • based mostly on libc • and pthread support code • also needs libc++abi ∘ RTTI, exceptions, ... PB173/B Kernel Development 49/82 October 15, 2020 Thread Support • our kernel will be single-threaded • we still need to provide thread APIs ∘ libc++ needs a rudimentary one • mutex functions can do nothing • pthread_once (equivalent) has to work though PB173/B Kernel Development 50/82 October 15, 2020 Dependencies Everywhere • std::stringstream is nice to have • but it needs a locale library ∘ we need to provide locale stubs for libc++ • normal streams are based on FILE * PB173/B Kernel Development 51/82 October 15, 2020 Checkpoint 3 • take a libc of your choosing ∘ pdclib would be a good candidate • make it build and run • adapt it for kernel use • tie stdout and stdin to the serial port • printf away PB173/B Kernel Development 52/82 October 15, 2020 Part 4: System Calls PB173/B Kernel Development 53/82 October 15, 2020 What is a System Call • calls from user code into the kernel • works (almost) like a function call ∘ with a special calling convention • switches the CPU into privileged mode PB173/B Kernel Development 54/82 October 15, 2020 How? • software interrupts ∘ synchronous ∘ saves CPU state • sysenter or syscall (on x64) • return with iret, sysleave or sysret PB173/B Kernel Development 55/82 October 15, 2020 Software Interrupts • user side: an int instruction ∘ you get to pick a number (from 32 up) • kernel side: IDT ∘ interrupt descriptor table ∘ address stored in idtr ∘ load with lidt PB173/B Kernel Development 56/82 October 15, 2020 Loading IDT (and GDT) • lidt and lgdt expect both size and address ∘ this is given as a pointer to a 2-tuple • the address is a virtual address PB173/B Kernel Development 57/82 October 15, 2020 IDT Structure • another table a bit like the page directory ∘ or like GDT and LDT (which we don’t use) ∘ oops, IDT refers to GDT or LDT • see also https://wiki.osdev.org/IDT • set all but the system call P (present) bits to 0 PB173/B Kernel Development 58/82 October 15, 2020 IDT Entry • contains a code reference (segment + offset) ∘ segment really means a GDT selector ∘ you will want this to be a TSS • and a few control bits / type info PB173/B Kernel Development 59/82 October 15, 2020 TSS • task state segment • used for hardware-assisted context switching • also needed for ring 3 → ring 0 transition • you only need to set ss0 and esp0 ∘ and set iopb to 104 (since we won’t use the bitmap) PB173/B Kernel Development 60/82 October 15, 2020 User Side • the exact sequence is up to you • you want to send syscall number somehow ∘ eax is customary • you want to send in arguments too ∘ probably mostly via stack PB173/B Kernel Development 61/82 October 15, 2020 User Side in C • you will probably want a syscall function • implement it in assembly • needs to cooperate with the kernel side PB173/B Kernel Development 62/82 October 15, 2020 Checkpoint 4 • implement a system call interface • testing will be tricky without userland • but you can do int in kernel ∘ you won’t be able to check ring transitions ∘ all else should work like normal PB173/B Kernel Development 63/82 October 15, 2020 Part 5: Userland PB173/B Kernel Development 64/82 October 15, 2020 Checkpoint 5 • build a userland version of libc • build a user program that uses printf ∘ turn it into a multiboot module and load at boot • prepare memory (including stack) for the program • execute the program in ring 3 PB173/B Kernel Development 65/82 October 15, 2020 Userland libc • mostly the same as kernel libc • link it statically into your program • don’t forget the syscall mechanism • hook up file ops into syscalls PB173/B Kernel Development 66/82 October 15, 2020 Linking • write a link script to link the program • you can use a fixed load address ∘ feel free to experiment with PIC/PIE • the linker will produce an ELF binary PB173/B Kernel Development 67/82 October 15, 2020 Multiboot Module • you can use a separate module for each section ∘ you’ll probably need text and data • you can use objdump to extract the sections • it’s also OK to keep & use ELF metadata instead PB173/B Kernel Development 68/82 October 15, 2020 Loading • GRUB will load your modules wherever • set up page tables for userspace • map the module data on the right virtual addresses ∘ either those agreed ahead of time ∘ or those parsed out of the ELF header PB173/B Kernel Development 69/82 October 15, 2020 Switching to User Mode • you will need to do an iret ∘ even though no interrupt happened • set up a stack as if an interrupt just happened • then do an iret into the user mode • see also https://is.muni.cz/go/ki6k82 PB173/B Kernel Development 70/82 October 15, 2020 A Few Hints • user mode, stack setup and loading are independent • you can switch into ring 3 within the kernel • you can create another stack within the kernel too • you can load (and execute) program without user mode PB173/B Kernel Development 71/82 October 15, 2020 Bonus: Cooperative Multitasking • allow 2 (different) programs to be loaded • add a ‘yield’ system call • let the two tasks alternate in execution • run them in separate address spaces PB173/B Kernel Development 72/82 October 15, 2020 Part 6: Interrupts PB173/B Kernel Development 73/82 October 15, 2020 Hardware Interrupts • hardware can asynchronously signal events • typically related to input/output ∘ new input available ∘ finished processing something • data is moved some other way ∘ DMA, PIO (inb, outb) PB173/B Kernel Development 74/82 October 15, 2020 Interrupt Enable • the CPU can mask/unmask interrupts • on x86, this is controlled by eflags • instructions: ∘ sti enables interrupts ∘ cli masks (disables) interrupts ∘ popf can change the interrupt flag PB173/B Kernel Development 75/82 October 15, 2020 Interrupt Service Routine (ISR) • the bit that runs in response to an IRQ ∘ also called the top half • runs on the interrupt stack • ends with an iret ∘ chances are the iret lands in user mode PB173/B Kernel Development 76/82 October 15, 2020 Re-entry • ISRs are concurrent to the rest of the kernel • if the ISR calls into the rest of the kernel ∘ the same function may already be executing ∘ similar to POSIX signal handlers • mutual exclusion will not help PB173/B Kernel Development 77/82 October 15, 2020 Prohibiting Nesting • the easiest way is to cli • this masks all (maskable) interrupts • do not forget to sti before iret • this is the easiest (not best) approach PB173/B Kernel Development 78/82 October 15, 2020 Nested Interrupts • an interrupt can arrive while an ISR is running • those are nested interrupts • in this case, more reentrancy is required • also, the interrupt stack is finite PB173/B Kernel Development 79/82 October 15, 2020 Fully Re-entrant ISR • worst case if the same ISR runs nested ∘ only applies to the ‘top half’ ∘ bottom halves run from a queue • for example, this is forbidden in Linux ∘ but different ISRs can nest on the same CPU PB173/B Kernel Development 80/82 October 15, 2020 IRQ: Interrupt ReQuest • the hardware side of interrupts • (TBD) PB173/B Kernel Development 81/82 October 15, 2020 PIC • Programmable Interrupt Controller • you need to set this up to get IRQs • IRQs are mapped to interrupts • https://wiki.osdev.org/PIC PB173/B Kernel Development 82/82 October 15, 2020 Checkpoint 6 • write an IRQ-driven serial port driver • IDT principles stay the same as with syscalls