Real-time operating systems

NASA’s Orion uncrewed capsule uses the Integrity RTOS (Courtesy of Green Hills Software)
NASA’s Orion uncrewed capsule uses the Integrity RTOS (Courtesy of Green Hills Software)

Separation kernels are becoming more important for real-time sensor processing in uncrewed systems. Nick Flaherty explains how they work

Real-time operating system (RTOS) implementations are changing as uncrewed system platforms evolve. The technology is used widely, from small operating systems for UAVs to larger and more complex implementations running critical systems in driverless cars.

The implementations in both areas are changing significantly. The code is becoming smaller, more secure and more reliable on a single microcontroller, while the larger ones are also more secure, as well as scalable, modular and updateable on microprocessors with multiple cores in order to ease the challenges of certifying the software as safe.

As the complexity of uncrewed platforms increases, larger real-time systems of various architectures are being used in subsystems such as vision systems and Lidar that have to combine the inputs of multiple sensors and handle a wide range of sensing algorithms in real time.

Different architectures are being used for the RTOS, moving from the traditional hypervisor to separation kernels and unikernels.

An RTOS provides a layer directly on a microcontroller or microprocessor, called a ‘bare metal’ implementation, that provides real-time control of the system. However, it can be difficult to programme and is limited in the applications that can be added, such as Bluetooth or video processing.

To address this, an RTOS provides the basis for a hypervisor to manage various functions in what is called a virtual machine (VM). The hypervisor is either integrated into the RTOS or added as a layer on top to allow higher level code to run on a microprocessor. This is particularly important now with machine learning (ML) engines running on Linux or its real-time version, Posix.

However, a hypervisor has to manage the resources in the processor, so it can leak data between applications or provide an entry route for hackers.

The evolution of RTOS implementations is addressing the challenges of developing and certifying millions of lines of code for uncrewed systems where different levels of system robustness and reliability are required. This creates several challenges for software architects and engineers.

As certification cost is a function of the number of lines of source code being certified, a large operating system such as Linux is impractical. For avionics systems in large UAVs, certifying software to the highest level, DO-178C DAL A, can takes months if not years.

With applications sharing resources such as CPU cores, memory and I/O, the software must be able to deliver deterministic and high reliability in the face of cyber attacks, poorly written code or a failure of system functionality.

With technologies such as AI continuing to evolve, the software must also continue to add functionality to deployed systems through over-the-air updates, which have to be safe, secure and reliable.

This has led to a special type of bare metal hypervisor called a separation kernel, which looks very much like an RTOS. This is a tiny piece of carefully crafted code (as small as 15 kbytes) that uses virtualisation hooks in modern processors to define a VM and control information flows.

2D and 3D views of an RTOS versus separation kernel architecture (Courtesy of Lynx)
2D and 3D views of an RTOS versus separation kernel architecture (Courtesy of Lynx)

Separation kernels contain no device drivers, user model, shell access or dynamic memory; these ancillary tasks are all pushed up into guest software running in the VMs. This simple architecture results in a minimal implementation that is suitable for embedded real-time and safety-critical systems. A separation kernel is much more than just a minimal hypervisor implementation.

Separation kernels can be used to partition processor hardware resources into high-assurance VMs that are tamper-proof and can’t be bypassed, as well as to set up strictly controlled information flows between a VM and the peripherals so that the VM is isolated except where explicitly allowed.

Separation kernels allow modular software design architectures to be extended securely. With a separation kernel foundation, VM modules can support a wider spectrum of operating systems, from bare metal to RTOS, Linux or even Windows.

Separation kernels do not take on ancillary RTOS functions. Even if those functions are really small, as they are in an RTOS, it is not the size that matters but their complete absence that makes a separation kernel clean, secure, robust and RTOS-agnostic.

The PikeOS RTOS (Courtesy of Sysgo)
The PikeOS RTOS (Courtesy of Sysgo)

RTOS-based hypervisors and separation kernels are based on the same hardware virtualisation extensions, but there is far more code present in an RTOS hypervisor. The RTOS scheduler will be there, scheduling the tasks and the VM.

The only common code with a separation kernel and an RTOS-based hypervisor is the code that configures a VM. In a separation kernel, that job is done at boot time, and – for security reasons – the code is ‘thrown away’ once each VM is configured. This is why the separation kernel VM configuration is static, because the code to change it is purged after boot-up.

All this extra code in an RTOS-based hypervisor is an attack surface that represents a security risk. It is code running at privileged hypervisor level that, if it goes wrong, has the power to break the system’s security. The code is also unfeasibly large to prove correct through the formal methods that are key for certification.

This extra code is valuable, and provides useful features. In a separation kernel architecture though, it must not be in the hypervisor but instead pushed up into a guest VM.

The system is distributed in the sense that the master RTOS is eliminated and the services it provides are distributed into a number of virtual machines. Each VM is able to run a mix of just enough RTOS functions, for example a tight real-time control loop, and be certified to a high safety assurance.

Open source modules can be reused in another bare metal VM, alongside a VM hosting entire open source RTOS implementations.

Each VM benefits from hardware enforcement of the software interfaces. The separation kernel defines which memory regions are accessible to the VM, which regions are read-only and which are read/write, and which regions can be shared with another VM.

This definition is set up in the memory management unit (MMU), the part of a microprocessor that distinguishes it from a microcontroller by the separation kernel from the privileged hypervisor mode, and cannot be changed or bypassed by the VM.

Peripherals such as serial ports, network interfaces, graphics and storage devices are also allocated to a VM, and access is enforced by the separation kernel using hardware by programming the interrupt controller and the MMU.

This allows a highly modular software system architecture, as any guest operating system running inside a VM still requires what are known as board support packages to adapt them to the VM memory map and peripherals. But the separation kernel can map the same minimal set of memory, a serial port and a virtual network interface (NIC) .

An engineer using a separation kernel must allocate hardware resources to VMs statically at boot-up time, and any sharing must be specially arranged. It is common to use one CPU core per VM, and useful to have extra NICs and serial ports to get access to the VM, but that is changing in favour of multiple VMs on each core.

S-plane uses the Integrity RTOS for control of its Skeldar rotorcraft (Courtesy of Green Hills Software)
S-plane uses the Integrity RTOS for control of its Skeldar rotorcraft (Courtesy of Green Hills Software)

Degraded environments

One area where this separation, either via a secure RTOS or separation kernel, is key for uncrewed systems is in a degraded visual environment (DVE).

This is one of the most challenging tasks for rotor craft for example, particularly when some of that condition is created by rotor wash blowing up clouds of dust, sand or snow.

DVE conditions have many causes, including naturally occurring smoke, fog, smog, clouds, sand, dust, heavy rain, blowing snow, darkness and flat light. These can occur in combination, and some of the most challenging DVEs are induced by the aircraft itself, creating a brownout or whiteout from dust, sand or snow.

The primary problem with a DVE is the loss of visual references, such as the horizon, the ground and any nearby obstacles. Situational awareness of the terrain and obstacles is required for safe operations during all phases of flight, and losing this situational awareness en route can result in a crash or a collision with human-made obstacles.

In an autonomous helicopter, for example, the loss of visual reference during take-off or landing can lead to undetected drift or bank of the helicopter, or even create a sensation of self-motion called vection. These

effects significantly increase the risk of dynamic rollover and a hard landing, often resulting in the aircraft being damaged or even lost.

DVE mitigation solutions fall into a few broad categories: enhanced vision, synthetic vision or a combination of the two, and this is where an RTOS has become a major component.

Robust partitioning of the functions in the sensor subsystem is needed to ensure any hosted application has no unintended effect on any other hosted application. In a system with a multi-core processor, robust partitioning is enabled by meeting the objectives of safety standards, including the mitigation of multi-core interference.

A multi-core partitioned RTOS or separation kernel includes the ability to add functions without the need for re-testing and re-verification of the entire system as the operational flight programs evolve.

Mitigating DVE starts with sensors capable of penetrating the environmental conditions. US regulator the FAA refers to a system that provides real-time imagery of the external scene as an ‘enhanced vision’ system.

For example, infrared (IR) has a high frame rate but limited obscurant penetration. Millimetre-wave radar penetrates very well but is low resolution. Lidar has high resolution to detect obstacles and find a flat area to land but doesn’t penetrate obscurants very well, as it takes several scans to form a complete image and has a shorter range than other technologies.

Because no one sensor can handle all types of DVE, a combination of them is used and the data is fused to provide a real-time image of the external scene topography and obstacles, with a latency of under 100 ms. That requires a safety-critical RTOS.

An alternative to enhanced vision is synthetic vision, which is a computer-generated image of the external scene topography relative to the aircraft derived from a database of terrain and obstacles coupled with aircraft attitude and a high-precision navigation solution.

These databases can require a lot of memory, depending on the geographical area coverage, and military synthetic vision systems often combine a civilian terrain database with a more specialised military database.

To accommodate such large databases, the operating system should support 64-bit memory addresses in order to access more than 4 Gbytes.

Compared to enhanced vision, synthetic vision does not provide a real-time view of the actual external scene, and it is only as accurate as the database and the GPS location. The database can have errors, and the GPS is subject to interference and jamming.

However, synthetic vision is not limited in range or field of view. That provides a compelling reason to augment enhanced sensing with synthetic vision.

There are several steps in the sensor fusion process to create the combined sensing. The initial 3D terrain database is often augmented with specialised higher resolution data for the area around the target landing zone. As an autonomous helicopter approaches the target landing area before there is any significant brownout or whiteout, a Lidar sensor captures high-resolution terrain data of the area.

Because the Lidar data is real time, it also captures any vehicles, other obstacles or changes to the terrain. The Lidar data is also more reliable than the pre-loaded database, as it does not include any errors in the database or GPS positioning.

The Lidar data is geo-registered and fused with the pre-loaded terrain databases, and the real-time imagery of the IR camera is then fused with the terrain rendering to form a combined real-time image of the landing zone.

This typically has enough image contrast to differentiate between gravel, grass, dirt and so on, which would not be differentiated in the Lidar data if they were all the same height.

Using an RTOS allows developers to add new software partitions or modify existing ones without having an impact on the critical software partitions that have already been verified. Rather than using a separation kernel, interference mitigation between the partitions can ensure all the functions can run on a multi-core processor system without the need for re-testing and re-verification of the entire system.

An automotive demonstrator of a Lidar 3D perception stack demo developed by Apex.Ai integrates the latest certifiable robot operating system (ROS 2) stack from the Apex.OS with PikeOS running on the Renesas V4H multi-core processor (Courtesy of Renesas Electronics)
An automotive demonstrator of a Lidar 3D perception stack demo developed by Apex.Ai integrates the latest certifiable robot operating system (ROS 2) stack from the Apex.OS with PikeOS running on the Renesas V4H multi-core processor (Courtesy of Renesas Electronics)

Driverless cars

The ability to handle data from Lidar sensors in real time is also key for driverless cars and automated guided vehicles (AGVs) to detect objects that are not visible to cameras or radar sensors. The technology is already used as part of advanced driving assistance systems that boost the safety of cars, as well as AGVs on the factory floor.

Lidar and radar are often used together, along with traditional cameras, with sensor fusion algorithms to bring all the data together. The data is used by the software to identify obstacles and plot potential paths to avoid a collision or take actions to keep passengers and other road users safe.

Detection, acquisition (classification) and tracking of objects at long range are all heavily influenced by laser shot rate. The latest time-of-flight Lidar systems for example can detect small objects and pedestrians at over 200 m, vehicles at 300 m, and a truck at 1 km.

Third-generation Lidar technology, which is due on the market next year, will make it possible to delegate driving to the vehicle in many situations, including at speeds of up to 130 kph on the highway.

A typical software stack for a Lidar system has a range of software components, from C++ code to the API for ASIL B safety applications using Posix real-time protocols. These are combined with many different software elements that all need to operate safely and not interact with each other in unexpected ways.

The TCP/IP network stack, MIPI cameras, UART drivers and general-purpose I/Os all feed into various layers of software for the sensor fusion and the ML inference.

A key way to combine this data safely and securely on these processors is to use a separation kernel. If something goes wrong with the software in one partition it does not have an impact on the others, and with a redundant architecture there can be other partitions that can take up any processing slack.

It is also a more secure architecture. If one partition is compromised, there is no way to access the others, and each one can be monitored for any unexpected activity. Processing nodes can be distributed across multiple partitions to provide mixed levels of criticality and redundancy. This also enables more defined deterministic behaviour for the code, as the partitions can limit the impact of unexpected code slowing down a process.

The complexity rises if the CPU has multiple cores so that applications can run concurrently on all cores in parallel. Appropriate scheduling mechanisms can handle this with various concepts so that ideally a scheduler should be adaptable by considering the system configuration and the application design.

If an application is safety-critical then the predictability of the processing is key. Safety standards will mandate a timing analysis to prove that the system can react in a guaranteed time to any event.

The sharing of resources can become a challenge for time-critical scheduling. If resources are shared, delays or even deadlocks can happen if a resource is blocked by another application.

Real-time systems need to be able to make guarantees about the temporal behaviour of the processes that are running. This means the process has to know when it will be able to use the processor and for how long. This is key for processing the Lidar data.

An RTOS partition scheduler uses a combination of priority and time-driven scheduling. The time-driven scheduler is a mechanism to distribute the available CPU time among the partitions.

However, there is one time partition that is active at all times. It runs the non-real-time applications with lower priority whenever resources are available to ensure efficient use of the processor cores.

The MOSA.IC avionics RTOS is a key building for a univernel(Courtesy of Lynx)
The MOSA.IC avionics RTOS is a key building for a univernel(Courtesy of Lynx)

Unikernels

Unikernels enable programs to link all operating system services in a single address space, avoiding the need for the microprocessor to switch into a special kernel mode called a system service. In the unikernel architecture, based on a separation kernel, applications just link to the operating system features needed.

Because unikernels are not context-switching and subject to blocking by competing processes, their execution behaviour is much easier to observe and characterise. This reduces the burden of multi-core timing analysis, and makes the safety certification process more manageable. The intrinsic independence and timing properties of a unikernel simply make it a better unit of integration to compose systems where the integrity and predictability of a system is simpler to verify.

A unikernel combined with a hypervisor enables system architects to compose systems with a higher level of fidelity. This allows designers to move applications between the RTOS and safety-critical environments.

Unikernels work best for applications requiring speed, agility and a small attack surface for increased security and certifiability. They run pre-built applications using their own libraries, reducing the attack surface. This also supports containerised applications, which are moving increasingly from enterprise to embedded designs, driven largely by the need to support AI frameworks.

Unikernels are also very well suited to mission-critical systems with mixed workloads that need the coexistence of RTOS, Linux and bare-metal guest operating systems.

The PX5 RTOS implementation(Courtesy pf PX5)
The PX5 RTOS implementation(Courtesy pf PX5)

Small RTOSs

At the other end of the scale are RTOSs for small UAVs. At its smallest, the latest RTOS to be developed for them is less than 1 kbyte of RAM for the code and 1 kbyte of flash memory for storage, enabling its use in some of the most memory-constrained devices.

It still provides sub-microsecond context switching and API calls on most microcontrollers though, as well as determinism for real-time platforms. On typical 32-bit microcontrollers running at 80 MHz, most API calls and context switches take less than a microsecond.

The processing for each API and context switch is completely predictable and not a function of how many threads are active. For example, the processing required to obtain a flag is the same whether two or 100 threads are active.

The latest RTOS implementation includes pointer/data verification (PDV) technology, which provides visibility of the run-time function pointer, system object, buffer and stack verification. This is important in an RTOS, as function pointers provide an easy path to unwanted program execution – both unintentional and intentional.

For example, it is not good practice to place function pointers inside buffers, since a buffer overflow could also overwrite the function pointer. It’s also good to verify function pointers before they are used via a small hash or checksum. Function pointer corruption represents the easiest way for an attacker to initiate unwanted remote execution.

As the RTOS is implemented with loosely coupled C functions, the size of the code automatically scales based on the application’s use. If the API and its associated functions are not used, they are simply not included in the code.

The API consists of a native implementation of the Posix pthreads standard, which makes PX5 RTOS applications easily portable to any Posix pthread implementation, such as in Linux or even other RTOS implementations.

However, there are always other requirements, so optional Posix pthread extensions are designed specifically for deeply embedded, real-time applications.

Conclusion

A separation kernel combines the capabilities of a real-time operating system with a hypervisor to allow different applications to run safely and securely in their own partitions, while still making use of the increased performance of the latest multi-core automotive processors.

The scheduling capabilities of the RTOS allows safe and secure partitions for the real-time processing of Lidar point clouds and sensor fusion in driverless cars and UAVs. Extending an RTOS to a unikernel provides more scalability to reuse more types of software such as containers.

Acknowledgements

The author would like to thank Blll Lamie at PX5, Jose Almeida at Sysgo, Tim Loveless at Lynx and Dan Mender at Green Hills Software for their help with researching this article.

Examples of real-time operating system suppliers

CANADA

Mannarino Systems & Software (M-RTOS)

+1 514 381 1360

www.mss.ca
QNX

+1 613 591 0931

www.blackberry.qnx.com

FRANCE

Adacore (GnatPRO) 33 1 49 70 67 16  www.adacore.com

GERMANY

Segger Microcontroller (embOS)     +49 21 73 99 31 20 www.segger.com
Sysgo (PikeOS) +49 6136 99480 www.sysgo.com

JAPAN

eSOL (eMCOS)  +81 3 5365 1560 www.esol.com

UK

Amazon Web Services (FreeRTOS)   www.aws.amazon.com/freertos
ARM (Keil RTX)  +44 1223 400400 www.arm.com
Wittenstein High Integrity Systems (SafeRTOS) +44 1782 286427 www.wittenstein.co.uk

USA

DDC-I (HeartOS, DeOS)   +1 602 275 7172    www.ddci.com
Green Hills Software (Integrity)  +1 805 965 6044       www.ghs.com
Linux Foundation (Zephyr)  www.zephyrproject.org
Lynx Software Technologies (LynxOS) +1 408 979 3900 www.lynx.com
Micrium (µC/OS) +1 954 217 2036 www.micrium.com
Micro-ROS micro-ros.github.io
Microsoft (ThreadX) +1 858 613 6640 www.rtos.com
NXP (MQX) +1 800 521 6274 www.nxp.com
PX5  www.px5rtos.com
Siemens (Nucleus) www.mentor.com/embeddedsoftware/nucleus
Wind River (VxWorks) +1 510 748 4100 www.windriver.com
    UPCOMING EVENTS