There has been some interest and demand around using IPFS + Arweave together, below is a proposal for how we can do this:
Definitions
- Arweave is a blockchain that leverages a combination of cryptographic proofs and (crypto) economic considerations to enable data stored with the chain (not to be confused with data stored in the chain) to be stored perpetually.
- IPFS is a system for handling content addressable data, it refers to data by a content based identifier (e.g. a hash) rather than a location identifier (e.g. IP address) or system identifier (e.g. a blockchain account). It’s jobs are primarily around finding, fetching and validating content addressable data rather than storing. Because IPFS is not tied to a specific data type there are many different types of data and content addressable graphs supported by within the IPFS ecosystem
There are two main forms of interoperability that could be considered with IPFS:
- Make it possible to load existing data stored with Arweave (and identified with Arweave identifiers) using IPFS tooling
- An example might be a browser plugin/translator that given some Arweave identifier
ar://<arID>
it is possible to load an equivalent ipfs://<cid-and-path>
that can fetch the data from a local kubo node hosting a copy of the data which could help as a performance optimization, if the user is offline, if public arweave gateways are blocked, …
- Make it possible to take existing data stored with IPFS compatible systems and store it with Arweave such that it is also retrievable using IPFS tooling
- An example might be to take a jpeg hosted on a local kubo node with CID
bafyfoo
and store it with Arweave such that it’s possible to turn off the local kubo node and still have another user’s Brave browser or a public gateway like bafyfoo.ipfs.dweb.link
be able to load the jpeg
Below are some proposals for how we can do this. The proposals are based on the following assumptions/understandings from conversations with people more knowledgeable about the Arweave ecosystem than the author.
- It is more important to handle storing existing IPFS data with Arweave (interop option 2)
- Practically all retrieval by end users in the Arweave ecosystem happens through Arweave gateways rather than miners today, so continuing that trend for IPFS interop is fine (i.e. no need to change the software miners run)
Proposal
When indexing the chain (or receiving chain updates) and seeing a new data item:
- Check if there is an Arweave Tag (e.g.
IPFS-CAR
, or reuse Content-Type
tag and existing IPFS types from IANA: application/vnd.ipld.car and application/vnd.ipld.raw)
- If so, read through the CAR validating each block and add an entry to a local database mapping multihash (the relevant portion of the CID) → the data
- Note: the way to make this most cost effective will depend on the Arweave gateway infrastructure. It might be reasonable to store mappings to where data is inside the CAR so it can easily be served in the same way
- For serving via an Arweave gateway:
- On inbound requests to an Arweave gateway if they’re IPFS requests send them to the IPFS handler
- IPFS requests look like
*.ip(f|n)s.gateway.tld
or gateway.tld/ip(f|n)s/*
- Use existing IPFS tooling hooked up to the database to answer gateway requests
- For serving data to the p2p network (e.g. Brave, local kubo nodes, ipfs.io, etc.)
- Advertise to data to IPNI and/or DHT
- Serve data:
- Short term: setup a system for serving data via Bitswap backed by the gateway infrastructure’s storage
- Medium term: Just reuse the same infrastructure as the for serving via the Arweave gateway since it can support https://specs.ipfs.tech/http-gateways/trustless-gateway/
- Depending on how valuable you’d find this we can figure out the timeline for when this will be doable such that you wouldn’t need Bitswap support.
Alternatives
There are a number of options we can look at here if there’s some more native Arweave-like integration desired, or there’s some optimizations you’re worried about (e.g. if you want to serve the data using IPFS identifiers and Arweave identifiers backed by the same bytes and want to save on UnixFS processing time with minimal caching). However, the initial proposal seems like a good way to get the conversation started.