Definitely not Windows 95: What operating systems keep things running in space?
The ESA’s recently launched Solar Orbiter will spend years in one of the most unwelcoming places in the Solar System: the Sun. During its mission, Solar Orbiter will get 10 million kilometers closer to the Sun than Mercury. And, mind you, Mercury is close enough to have sustained temperatures reaching 450°C on its Sun-facing surface.
To withstand such temperatures, Solar Orbiter is going to rely on an intricately designed heat shield. This heat shield, however, will protect the spacecraft only when it is pointed directly at the Sun—there is no sufficient protection on the sides or in the back of the probe. So, accordingly, ESA developed a real-time operating system (RTOS) for Solar Orbiter that can act under very strict requirements. The maximum allowed off-pointing from the Sun is only 6.5 degrees. Any off-pointing exceeding 2.3 degrees is acceptable only for a very brief period of time. When something goes wrong and dangerous off-pointing is detected, Solar Orbiter is going to have only 50 seconds to react.
“We’ve got extremely demanding requirements for this mission,” says Maria Hernek, head of flight software systems section at ESA. “Typically, rebooting the platform such as this takes roughly 40 seconds. Here, we’ve had 50 seconds total to find the issue, have it isolated, have the system operational again, and take recovery action.”
To reiterate: this operating system, located far away in space, needs to remotely reboot and recover in 50 seconds. Otherwise, the Solar Orbiter is getting fried.
Billiard ball OS
To deal with such unforgiving deadlines, spacecraft like Solar Orbiter are almost always run by real-time operating systems that work in an entirely different way than the ones you and I know from the average laptop. The criteria by which we judge Windows or macOS are fairly simple. They perform a computation, and if the result of this computation is correct, then a task is considered to be done correctly. Operating systems used in space add at least one more central criterium: a computation needs to be done correctly within a strictly specified deadline. When a deadline is not met, the task is considered failed and terminated. And in spaceflight, a missed deadline quite often means your spacecraft has already turned into a fireball or strayed into an incorrect orbit. There’s no point in processing such tasks any further; things must adhere to a very precise clock.
The time, as measured by the clock, is divided into singular ticks. To simplify it, space operating systems are typically designed in such a way that each task is performed within a set number of allocated ticks. It can take three ticks to upload data from sensors; four further ticks are devoted to fire up engines and so on. Each possible task is assigned a specific priority, so a higher-priority task can take precedent over the lower-priority task. And this way, a software designer knows exactly which task is going to be performed in any given scenario and how much time it is going to take to get it done.
To compare this to operating systems we all know, just watch any given speed comparison between modern smartphones. In this one made by EverythingApplePro, the iPhone XS Max and Samsung S10 Plus go head to head opening some popular apps. Before the test, both phones are restarted, and the cache is cleared in them. Samsung opens all the apps in 2 minutes 30 seconds, and the iPhone clocks in at 2 minutes 54 seconds. In the second round, all the apps are closed and opened again without restarting or clearing the RAM. Because the apps are still in RAM, Samsung finishes the opening in 46 seconds, and the iPhone does it in 42 seconds. That’s a whopping two-minute time difference between the first try and the second. But if the phones had to run the kind of real-time operating systems used for spaceflight, opening those apps would take exactly the same amount of time no matter how many times you tried it—down to a millisecond.
Beyond time, space operating systems have more tricks up their sleeves. Real-time operation is one thing, and determinism is another. If you somehow convinced Craig Federighi to take part in one of those speed comparisons, gave him full access to the iPhone about to be tested, and asked him to predict exactly how much time it would take for this iPhone to complete the test, he would likely have no idea. Sure, he’d probably say something like “fast,” or “fast enough,” or even “blazingly fast,” but nothing more specific than that. Neither iOS nor Android is a deterministic system. The number of factors that could potentially affect speed results is so huge that making such exact predictions is practically impossible. But if the phone was running a space-grade OS, an engineer with access to the system would know exactly what causes what in a given sequence and could calculate the exact time necessary for any given task. Space-grade software has to be fully predictable and perform within super specific deadlines.
Shooting at the Moon (and beyond) with VxWorks
Back in the Apollo days, operating systems were custom-built for each mission. Sure, some of the code got reused—parts of the software made for the Apollo program made their way to Skylab and the Shuttle program, for instance. But for the most part, things had to be done from scratch.
Eventually, NASA’s preferred OS solution came from WindRiver, a company based in Alameda, California. WindRiver released a fully operational commercial off-the-shelf, real-time operating system called VxWorks back in 1987. While VxWorks wasn’t the first system of this kind, it quickly became the most widely deployed of them all, meaning VxWorks soon caught the eye of NASA mission designers.
The first mission to fly VxWorks was the Clementine Moon probe, otherwise known as the Deep Space Program Science Experiment. Back in the early 1990s, Clementine marked NASA’s shift away from behemoth, Apollo-like programs. Everything was supposed to be lean, developed quickly, and on a tight budget. As such, one of the design choices made for the Clementine probe was to use VxWorks, and the system made a good enough impression to get a second date. VxWorks was the choice for the Mars Pathfinder mission.
But not everything was all rosy for this RTOS, though. A bug—the priority inversion problem—caused a lot of trouble for NASA’s ground control team. Shortly after landing, Pathfinder’s system started to reboot for no apparent reason, which delayed transmitting the collected data back to Earth. It took three weeks to find the problem and another 18 hours to fix it; the issue turned out to be buried deep down in the VxWorks mechanics.
Listing image by Lee Hutchinson (original image)