Modularizing Dstack: SDKs and Default Patterns for Creating P2P CVM Clusters

Over the last week I’ve been quite interested with dstack’s logic of leveraging TDX for creating replicable CVMs and I’ve been thinking about new ways we can think about how developers might want to design and build dstack-alike clusters.

The main points I like about @socrates1024’s initial dstack implementation and the migrations proposal is that everything is kept as simple as possible, but it still poses a fairly enshrined implementation which I think is limiting considering each production application that will want to run in a dstack-alike environment will have different requirements regarding:

  • encryption and keypair
  • typesystem
  • how much and what kind of data is being posted on-chain
  • using or not additional centralized services for availability concerns (e.g using/not using pubsub)
  • which chain to use for managing the cluster and how to pull data from it, e.g using cloud RPCs vs local clients vs hybrid model (both) depending on availability requirements.
  • how to handle migrations and how much control the managing entity has over the migrations.
  • probably more I haven’t thought of

I don’t think we can expect all developers to agree on a standard for all of the above considering that, at least the way I see it, this is not a brand-new network design (and should not become one) rather “just” an off-chain computation layer that is replicable without trust assumptions.

rs-modular-dstack

For the above reasons (and really to learn more about dstack’s design) I’ve built an MVP for what could be a library the help developers setup dstack implementations following well-defined standard patterns (replication and onboarding threads, standard micro services paths, etc) all while being able to implement the actual logic themselves. A good analogy would be that the library provides empty bundled building blocks that only need to be filled by the implementors. The idea is also to have a set of standardized crates for each one of these blocks, and that these blocks should be isolated from one another impl wise allowing the implementors rely on standards (and choose among the existing ones depending on requirements) were they don’t require custom implementations speeding up the development process.

Standard Paths

dstack-core is the crate that defines all the interfaces (guest and host services) and standard paths. For example, guest services currently hold two interfaces (one meant to be host-facing and one guest-only):

#[async_trait]
pub trait GuestServiceInner: TdxOnlyGuestServiceInner {
    type Pubkey: Send + Sync + DeserializeOwned + Serialize;
    type EncryptedMessage: Send + Sync + Serialize;
    type Quote: Send + Sync + DeserializeOwned;
    type SharedKey;

    fn get_secret(&self) -> anyhow::Result<Self::SharedKey>;

    async fn replicate_thread(&self) -> anyhow::Result<Self::SharedKey>;

    async fn onboard_new_node(
        &self,
        quote: Self::Quote,
        pubkeys: Vec<Self::Pubkey>,
    ) -> anyhow::Result<Self::EncryptedMessage>;
}

#[async_trait]
pub trait TdxOnlyGuestServiceInner {
    type Tag: Send + Sync + DeserializeOwned;
    type DerivedKey: Send + Sync + Serialize;

    /// Note: tag here is not necessarily. string since we want to allow for more
    /// customizability around them e.g have structured tag objects.
    async fn get_derived_key(&self, tag: Self::Tag) -> anyhow::Result<Self::DerivedKey>;
}

and this is how the paths for guest services are exported by dstack-core:

pub struct GuestPaths<H: GuestServiceInner> {
    pub inner_guest: Arc<H>,
}

pub mod requests {
    use serde::{Deserialize, Serialize};

    use super::super::GuestServiceInner;

    #[derive(Deserialize, Serialize)]
    pub struct OnboardArgs<H: GuestServiceInner> {
        pub quote: H::Quote,
        pub pubkeys: Vec<H::Pubkey>,
    }

    #[derive(Deserialize, Serialize)]
    pub struct GetKeyArgs<H: GuestServiceInner> {
        pub tag: H::Tag,
    }
}

impl<H: GuestServiceInner + Send + Sync> GuestPaths<H> {
    pub fn new(guest_internal: Arc<H>) -> Self {
        Self {
            inner_guest: guest_internal,
        }
    }

    pub fn status(
        &self,
    ) -> impl Filter<Extract = impl warp::Reply, Error = warp::Rejection> + Clone {
        warp::path("status")
            .and(warp::get())
            .map(|| format!("Live"))
    }

    pub fn onboard_new_node(
        &self,
    ) -> impl Filter<Extract = impl warp::Reply, Error = warp::Rejection> + Clone {
        warp::path("onboard")
            .and(warp::post())
            .and(warp::body::json())
            .and(with_impl(self.inner_guest.clone()))
            .and_then(
                |request: requests::OnboardArgs<H>, guest_impl: Arc<H>| async move {
                    match guest_impl
                        .onboard_new_node(request.quote, request.pubkeys)
                        .await
                    {
                        Ok(encrypted) => {
                            return Ok::<Json, Rejection>(warp::reply::json(&encrypted))
                        }
                        Err(e) => {
                            return Ok(warp::reply::json(&serde_json::json!({
                                "error": format!("{:?} while onbnoarding in inner guest impl", e)
                            })))
                        }
                    }
                },
            )
    }

    // Should only be callable within trusted enclaves.
    pub fn get_derived_key(
        &self,
    ) -> impl Filter<Extract = impl warp::Reply, Error = warp::Rejection> + Clone {
        warp::path!("getkey")
            .and(warp::post())
            .and(warp::body::json())
            .and(with_impl(self.inner_guest.clone()))
            .and_then(
                |request: requests::GetKeyArgs<H>, guest_impl: Arc<H>| async move {
                    match guest_impl.get_derived_key(request.tag).await {
                        Ok(derived) => {
                            return Ok::<Json, Rejection>(warp::reply::json(
                               &derived
                            ))
                        }
                        Err(e) => {
                            return Ok(warp::reply::json(&serde_json::json!({
                                "error": format!("{:?} while getting derived key in inner guest impl", e)
                            })))
                        }
                    }
                },
            )
    }
}

Default helper interfaces

Since one of the goals of the codebase is to enable implementors to seamlessly integrate standard implementation for helpers such as attestations or cryptographic primitives, dstack-core provides some default interfaces that are not inferred by the guest/host interfaces but are advised to be used in order to easily make them standards and swappable with others that respect the very same interface. Currently, there are only interfaces for attestation and crypto helpers, even though I think interfaces for interfacing with the “comms” chain should be standardized too.

Example Implementation

An example implementation is rs-modular-dstackl/new-york which uses DH for key exchanges (x25519), stellar as comms network, and dummy tdx dcap for both generating and verifying quotes.

The cool thing about this is that:

  1. The implementor just fills the interfaces without having to worry about how to bind these together.
  2. The implementor can mix their custom implementations and standard crates that respect the interfaces defined in core for ease implementation.

Speaking about the actual implementation, new-york is a base layer for a highly available service since it does not theoretically need to delegate trust to external APIs, everything is on-chain including the quotes (no pubsub). The modular design of the crate also makes it easy to swap current single failure points (pulling data from the chain, attestations) with locally run services.

Feedback requested!

Like other dstack implementations, there’s still a lot missing (besides all of the open questions that already exist for dstack). Mainly I’m interested in hearing feedback around:

  • [idea] is this modular design something that should be further pursued? Curious to know what others think about the whole idea.
  • [implementation] what’s missing in the current interfaces? should the interface be this abstract or maybe at least we should infer hard types? is this generics + abstractions setup the best for tackling the goal? (the current codebase is just pure prototyping so there hasn’t been significant thoughts behind how to best structure the typesystem and the abstractions.
6 Likes