Blog: Real Time Linux Scheduler

One of the most challenging and interesting classes I've taken at SDSU is Dr. Hyunjong Choi's Theory Of Real-Time Systems course. As part of this class, I modified the Linux kernel to support explicit real-time task reservations, periodic execution, and end-to-end latency monitoring. While Linux supports several real-time scheduling mechanisms, implementing textbook real-time scheduling models required working directly with core kernel structures like the task_struct, scheduler, and kernel timers. Using custom kernel system calls, I implemented Rate Monotonic scheduling and partitioned Earliest Deadline First scheduling across multiple CPU cores. This post focuses less on scheduling theory and more on what building these features revealed about how the Linux kernel represents tasks and enforces behavior across context switches.

This project uses classic real-time models, with tasks that are periodic and declare a computation budget

C

and a period

T

. Schedulability depends on whether the kernel can allocate CPU time to all tasks without violating these constraints.

Enforcing this model in Linux required more than just choosing a scheduling policy. It required tracking execution time across context switches, associating timing state with each task, and integrating kernel timers into the task lifecycle. These requirements pushed the project into core kernel structures and scheduler hot paths, which became the primary learning focus for me.

For development of this project I modified and compiled the Raspberry Pi Kernel-4.9.80

CanaKit Raspberry Pi 3

The first thing I had to figure out was how to track all my new information, like the budget and period. It needed to be something that was intrinsically tied with the tasks themselves and would persist between different states and changes. This eventually led me to the monster that is the task_struct. When I got to this point of researching how this was put together I was WOW'd by how massive the file was, I didn't even know structs could be that big. If you've ever worked with anything relating to threads/processes in the Linux kernel you know exactly what I'm talking about, it's basically unavoidable.

For those who don't know, a quick tl;dr: the task_struct is a gigantic struct with 300+ different fields (depending on the kernel version), with many of those being other structs with more fields! The task_struct is the DNA of every process in Linux and has every single piece of information you'd want to know about said process. There's a few different ways to access this magical structure, for this project I used find_task_by_pid_ns() and task_active_pid_ns() in order to get the struct using the task's pid and namespace.

One of the coolest parts of the project was adding custom system calls and realizing how different kernel code feels. At first I thought that making system calls would be like making a normal function, but in practice you have to be programming with extreme caution every step of the way. The kernel has to treat everything with suspicion, arguments from the userspace can't be trusted and there are no error cases that can be ignored because continuing with a partially validated state risks corrupting global kernel data.

Even the mechanics of adding the system call was a headache. In order to add new system calls, you have to specify in unistd.h how many entries you need to have in your system call table. I lost more time than I'd like to admit toiling over this one line: #define __NR_syscalls (401) not realizing that the table had to be aligned in increments of four...

Defining the system call itself was also very different, you have to use the SYSCALL_DEFINE* macro, with the asterisk replaced with however many parameters you need. At one point, I needed to pass seven parameters in order to support chain metadata, only to discover that SYSCALL_DEFINE6 was the hard limit. The solution was to pack all the information as my own chain_struct and pass it as a void *. In the user space things like this would proabably just feel like an inconvenience, but in the kernel, it felt purposeful and made me design with more intention and with efficiency in mind.

During development, I leaned on kernel modules to make testing a little easier. I created the system calls with hooks that could be modified with loadable modules. Being able to test functionality without needing to rebuild the entire kernel saved me time and some sanity, but the process was still slow compared to user-space work. I really got a feel for how expensive minor mistakes become at this level.

Most of the work for this part of the project resides in kernel/sched/core.c yet another intimidating file, this is where the Linux scheduler's core logic is written. This file decides when a task stops running, when another one begins, and everything in between.

At the center of this, is the __schedule(). This function is evoked every time we need a context switch, whether it was because the current task was preempted, blocked, or voluntarily gave up execution. Inside __schedule(), the scheduler picks the next task to run, does some bookkeeping, and triggers the context switch.

This made for the ideal place for time tracking. When a new task is scheduled in, the kernel has a precise moment where CPU ownership changes. By recording timestamps at the point when a new task is scheduled and when said task completes, execution time can be found as the difference between these two events.

One minor detail made this a little more complicated than it seems: a task doesn't necessarily run in one continuous stretch. A task could be preempted, rescheduled, blocked by I/O, etc. Each of these events gives a separate "slice" of the execution time. Because of this, execution time can't be measured with one start and end timestamp. Instead, we have to make use of a per-task time accumulator, where every time a task is scheduled out, the elapsed time since it was last scheduled in is added. The accumulator is then reset at the beginning of the next period.

Implementing the accumulator forced me to be precise about where each piece of information lived and when it was updated. This was one of the kernel's hottest paths, and as such the tiniest changes could propogate through the entire system.

Subtle foreshadowing

This leads me to the last night that I had to finish the project. Right when I started to think I had a grasp on this whole kernel development business. I was implementing the logic for measuring end-to-end latency for chained tasks and it was finally time to test my code. I ran the cross compiler, had to go back to fix some minor mistakes, and booted up the Pi.

Although this time, I wasn't getting any response from my serial connection. I thought maybe the kernel got corrupted, so I clean built it again but got the same result. I was beginning to panic. This had never happened before. I had compiled this kernel hundreds of times at this point and I had never seen this before. Usually if I have some kind of error in my code the compiler would break and I could go back to fix it, but this time I wasn't getting any errors or warnings.

My next thought was that maybe the hardware had given up on me. I tried multiple different ports and cables, I even borrowed another group's RPI kit and tried to run my kernel on all of their hardware and nothing changed. Maybe the problem was somehow with the serial connection to my laptop? I tore the lab apart trying to find an hdmi cable, keyboard, and mouse so I could run the Pi independently and when I booted up, I was greeted with this wonderful screen:

I would later learn this is the splash screen from the GPU when the kernel can't boot.

what.

After hours of debugging, reflashing, and second-guessing the hardware, the process started feeling less like engineering and more like faith. Recompile. Reboot. Wait. Stare. Hope. Emotionally, the night felt like I was in Las Vegas hoping that one more pull of the slot machine would finally pay out. Same ritual, different casino.

Recompiling the kernel at 4am feels statistically indistinguishable from gambling

Eventually, I found my error and I felt like the biggest idiot in the world. See, in my __schedule() (what'd I call it? One of the hottest paths in the linux kernel?), I had implemented logic for tracking chain information in a shared chain_struct that I added a pointer for in the task_struct. But somehow, I failed to think about the fact that this code was running for

every

instance of

every

process

EVERY

time it ran, and I was

NOT

implementing chain information for every process on the machine. Without checking for the chain_struct first, I was dereferencing a null pointer like a bajillion times a second. So bad was this mistake that the kernel couldn't even boot to a point where I could see any errors. *sigh* You live and you learn.

And that is no exaggeration. Slow compile times, long reboot cycles, limited feedback; so many factors play into needing to get things right the first time. There is no carelessness that will go unpunished in the kernel, and the benefit of understanding the system and planning ahead is priceless. A whole host of unique challenges are introduced as well.

Early on, every kernel change meant manually copying the configuration, mounting the target filesystem, backing up the kernel image, cross-compiling, loading modules, and unmounting everything again. It worked, but each iteration added friction, and debugging became as much about patience as correctness.

Eventually, I realized the real problem wasn't in the kernel code itself, but the minor inefficiencies at every turn. I scripted the entire process into a single command, turning minutes of repetitive monotony into one line. That small change dramatically reduced the overhead of each iteration and made it easier to focus more on writing the code and less on running the code.

I learned a (pardon my french) sh*t ton in this project, I would say this project is at least top 3 impact to my development as a programmer. I had to learn how to be more deliberate with my structure; Think about cleanup paths early, asking questions like "if this fails halfway what state am I leaving behind?"; Make sure I'm maintaining everything properly: canceling timers, freeing memory, resetting fields, etc; Writing code that future me will be able to debug, like making sure to include verbose and detailed printouts as I go, rather than going back and adding print statements everywhere after it breaks (I found myself recompiling the kernel multiple times just because I needed to make my printouts more readable for myself).

The kernel has no safety net, and learning to work without one reshaped how I write code everywhere else. I'm much more deliberate, cautious, and better at breaking down large complex problems. These habits spread across all domains, and they're something I'll carry forward for the rest of my career.