Init Systems in Confidential VMs: An Ongoing Investigation

fnerdman · February 18, 2025, 8:59am

The init system - the first process that starts during boot and manages all other processes throughout the system’s lifecycle - presents unique challenges in TEE environments. While init system choice in traditional environments often comes down to preference or familiarity, in confidential VMs it significantly impacts security, reliability, and the overall trusted computing base (TCB).

We’ve been investigating suitable init systems for confidential VMs, and our understanding continues to evolve as we explore practical implementations.

Core Challenges

Operating in confidential VMs introduces strict constraints around security and operational isolation. Unlike traditional VMs, these environments are deliberately isolated - SSH access for manual intervention isn’t an option. The init system must therefore handle failures autonomously and gracefully.

This creates a fundamental tension: we need an init system that’s both minimal enough to be securely audited yet robust enough to manage complex failure scenarios without human intervention. The size of the TCB becomes a critical consideration, as all code running in the TEE must be trusted and auditable.

The Requirements

Through our investigation and real-world experience, we’ve identified several key requirements for init systems in TEEs. These fall into three categories:

Must Haves

These are non-negotiable requirements that any init system in a TEE must provide:

Process Supervision: The ability to keep critical processes running and handle failures automatically
Dependency Management: Services need to start in the right order and only when their dependencies are actually ready

Should Haves

Features that significantly improve security and reliability:

Health Checking: Beyond just “is the process running?”, we need to know if services are actually healthy
Reasonable size/complexity: Though a more feature-complete init system might be simpler to use securely
Self Recovery: When things go wrong (and they will), the system should try to fix itself
Declarative Configs: Shell scripts are flexible but error-prone. We want something more structured

Nice to Haves

Features that make life easier but aren’t critical:

Logging Architecture: Built-in log rotation and management
Resource Control: Fine-grained control over CPU, memory, and other resources
Privilege Management: Running services as different users, in different namespaces, etc.

The Evolution of Our Investigation

Our understanding of the init system landscape has evolved significantly as we’ve dug deeper into practical implementations. What initially seemed like clear alternatives to systemd have revealed unexpected complexities and trade-offs.

Current State: sysvinit

We currently use sysvinit, where services will look something like this:

#!/bin/sh
### BEGIN INIT INFO
# Provides:          reth
# Required-Start:    $network $remote_fs fetch-config reth-sync
# Required-Stop:     $network $remote_fs
# Default-Start:     5
# Default-Stop:      0 1 6
### END INIT INFO

start() {
    # Example of manual dependency checking
    while [ ! -f $RETH_DIR/sync_complete ]; do
        sleep 10
    done

    # Manual process supervision
    start-stop-daemon -S --pidfile $PIDFILE -c reth:eth

    # Custom monitoring loop
    while true; do
        if ! pgrep -f "$DAEMON" > /dev/null; then
            echo "Process crashed, restarting..."
            start
        fi
        sleep 5
    done
}

While functional, this approach requires us to implement a lot of basic functionality manually, leading to potential reliability issues and maintenance overhead.

The Search for Alternatives

Supervisord: A Quick Dead End

Early in our investigation, supervisord seemed promising due to its simple configuration and good documentation:

[program:reth]
command=/usr/bin/reth node --full --datadir /persistent/reth
directory=/persistent/reth
user=reth
autostart=true
autorestart=true
startsecs=5
startretries=3
environment=HOME="/persistent/reth"

However, we quickly discovered that its Python dependency would actually increase our TCB more than using systemd itself - adding over 15MB to our codebase. Putting this in context, it is not necessarily the increase in TCB that is the most worrying about this but the amount of increased attack surface we’d be exposing ourselves to. This was a deal-breaker given our security requirements.

The S6 Ecosystem: A Deep Dive

Our investigation into the S6 ecosystem revealed distinct components with different strengths and limitations:

S6 Core

S6 itself is an impressive piece of software - a minimal, security-focused init system with robust process supervision. Its small TCB and reliable process management align well with our needs. However, it feels raw in isolation, lacking built-in dependency management and declarative configuration:

# /service/reth/run
#!/bin/sh
s6-setuidgid reth:eth
fdmove -c 2 1
s6-notifyoncheck -d -n 300 -c "curl -s <http://localhost:8545>"
reth node --full --datadir ${RETH_DIR}

S6-rc

S6-rc adds dependency management on top of S6, but introduces additional complexity through its database compilation requirement and somewhat clumsy interface. While it solves the dependency problem, it still lacks declarative configuration and feels more complex than necessary for our use case.

S6-66

66 is a service manager built on top of S6 that adds systemd-like declarative configuration. While it elegantly solves many of S6’s usability challenges, its recent development and limited production deployment make it risky for our security-critical environment. It also includes extensive service management functionality that’s unnecessary in our context, where manual service management isn’t possible.

Emerging Directions

Our investigation has revealed some fundamental challenges in finding a feature-rich alternative to systemd. More feature-rich alternatives tend to be newer and less battle-tested, introducing risks in our security-critical environment. Additionally, increased functionality often brings increased complexity that needs careful consideration in a TEE context.

This has led us to consider several potential paths forward:

Minimal systemd: Rather than seeking alternatives to systemd’s functionality, we could explore stripping it down to just the components we need. This approach recognizes that systemd, despite its size, is battle-tested and somewhat well-understood due to it’s wide adoption. The challenge becomes identifying and isolating only the essential components we require such that we can keep it auditable.
Minimal Base with Custom Dependencies: Build dependency management on top of a minimal supervision system like s6 core. This approach acknowledges that while s6 provides excellent process supervision, its existing dependency management solutions (s6-rc) feel overly complex for our needs. By building just the dependency management we need, we could maintain a minimal TCB while getting exactly the functionality required.
Service Management API: Instead of implementing complex, fully autonomous self-healing, we could expose a simple, well-documented service management API. This would allow external management of services through basic operations like start, stop, and restart in specific sequences. While this introduces new security considerations, it could significantly simplify our init system requirements.
Container-Native Approach: Taking the service isolation concept further, why not just run each service in its own TEE VM container, think of the approach taken by the Confidential Containers project (https://github.com/confidential-containers). This would leverage existing container orchestration tools and simplify our init system needs within each container.

Open Questions

Several critical challenges remain unsolved:

Minimal vs Secure

How do we quantify the security implications of different approaches? A smaller codebase isn’t automatically more secure - sometimes additional complexity provides important security features.

Dependency Management Scope

How much dependency management do we actually need? Could a simpler approach handle our use cases without the complexity of full dependency resolution?

Container Integration

Should container managers handle some of these concerns? Tools like podman offer built-in solutions for process supervision and resource isolation, but splitting responsibilities between an init system and a container manager feels architecturally wrong.

Call to Action

These challenges are crucial to solve for the wider adoption of confidential computing. We’re actively working on solutions and would value community input:

Join the discussion at our community forum
Share your experiences running init systems in TEEs
Help us evaluate the trade-offs between different approaches

The init system choice in a TEE environment has far-reaching implications for security, reliability, and developer experience. We recognize that a perfect solution may not exist, but we’re committed to finding the best balance for our needs.

Our investigation continues, and we’ll update this analysis as we learn more. If you’re working on similar problems or have insights to share, please reach out.

Special thanks goes out to Alex and the rest of the Andromeda team! They’ve done most of the investigation into this topic, me being only the one to summarize it.