Distributed, persistent storage for suave apps

mateusz · July 15, 2024, 2:47pm

What are the currently available solutions for medium-volume distributed & persistent storage? Aim is probably on the order of a gigabyte over the lifetime of an app, so we can’t just put it on chain.

Should we just use IPFS, or are there any more recent alternatives? Not to say that there’s something wrong with IPFS.

Pitch your favorite project

antonio · July 16, 2024, 11:38am

Hello! For persistant storage you can try Swarm, retrievability is pretty awesome, permissionless, and it’s as private as storage can possibly be.

DM if you want to know more, I can give you some tokens to play with.

dmarz · July 17, 2024, 8:49pm

There are some concerns around censorship in IPFS and DHTs in general. The authors do provide a solution but it’s a bit tricky because it relies on approximating a global view of the network to detect. Some months ago, I wrote up a summary of the paper here.

Hello! For persistant storage you can try Swarm , retrievability is pretty awesome, permissionless, and it’s as private as storage can possibly be.

@antonio could you provide a link to bench marks on using Swarm? None jumped out at me from a look at the docs.

dmarz · July 17, 2024, 8:54pm

I started another thread to discuss a simple pricing strawman

antonio · July 18, 2024, 1:59pm

I found this grafana dashboard made by a guy from Datafund, retrieving data and gateway’s latency: Grafana

Have a look, its a bit messy because the size labels are not in order and are missing for min and Max, but it will give you a good idea.

Bottom line, I would say that Swarm is more flexible/adaptable to all storage needs.

antonio · July 26, 2024, 8:48am

At the moment, the Bee client in Swarm tries to retrieve chunks (canonical storage unit) on top 32 different paths on which intermediate nodes can serve the content because of the bandwidth incentives. This also raises the probability to ask a honest node in the target neighborhood and to avoid from “black holes” on the Kademlia path.

Thereby, the malicious node needs to maintain as many active and reliable connections as it can with peers on the network to somewhat prevent successful retrievals. The bigger the network, the costly it gets.

On top of that, if someone succeeds in pulling this and takes over a neighborhood like this then built-in erasure coding can help to retrieve the requested data from other honest/closer neighborhoods. Since files are split into 4K chunks, and those scattered across neighborhoods, with erasure coding in place the attack is even harder to pull.