Skip to content

Explain the relationship between MFS, IPFS, and UnixFS. #297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hsanjuan opened this issue Apr 3, 2020 · 15 comments
Closed

Explain the relationship between MFS, IPFS, and UnixFS. #297

hsanjuan opened this issue Apr 3, 2020 · 15 comments
Labels
dif/hard Having worked on the specific codebase is important effort/days Estimated to take multiple days, but less than a week help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or an improvement to an existing feature P2 Medium: Good to have, but can wait until someone steps up status/inactive No significant work in the previous month topic/docs Documentation

Comments

@hsanjuan
Copy link
Contributor

hsanjuan commented Apr 3, 2020

Tracking a user request at ipfs/kubo#7084 (comment)

URL of the page in question:

Maybe: https://docs.ipfs.io/guides/concepts/mfs/

What's wrong with this page?

See confusion in above thread.

Related: how to manually create and modify unixfs directories (it has come up several times in the last week).

@tmbdev
Copy link

tmbdev commented Apr 16, 2020

The doc says:

Because files in IPFS are content-addressed and immutable, they can be complicated to edit. Mutable File System (MFS) is a tool built into IPFS that lets you treat files like you would a normal name-based filesystem — you can add, remove, move, and edit MFS files and have all the work of updating links and hashes taken care of for you.

So, apparently, I can run things like:

$ ipfs files mkdir /foo
$ ipfs files ls -l /foo
$ echo hello | ipfs files write --create /foo/world
$ ipfs files ls -l /foo # hash has changed
$ ipfs files read /foo/world

That's nice. Somewhere, there is a mapping from path names to hashes. The documentation says nothing about how this mapping is established or where it is stored.

My first thought was that ipfs files maintains a local key/value store that eventually gets turned into an mDAG. But apparently ipfs files refers to a rooted tree stored under the hash one gets from ipfs files stat /. Apparently, this root node can be changed with ipfs files chcid. None of that is obvious from the (minimal) documentation.

My suggestion would be to update the documentation to something like this:

"The ipfs files commands allows a Merkle DAG rooted at a given CID to be manipulated somewhat like a file system. Subcommands of ipfs files mirror common UNIX commands. While traditional UNIX commands modify stateful data structures, their ipfs files equivalents simply return a new root CID that reflects the altered file system state; the original state is still available under the original CID. The current root is stored ___. Note that concurrent usage is/is not possible due to serializing operations/race conditions (pick one).

The ipfs files commands are a simple set of convenience functions that keep track of file system state by keeping track of the root CID; but the Merkle DAG they generate under to root CID is identical to one created with the basic commands.

Paths used by ipfs files are indistinguishable from regular UNIX paths, so you cannot tell by looking at a path whether it refers to a file in the UNIX namespace or a file in the ipfs files namespace. The paths used by ipfs files are also different from, and incompatible with, IPFS paths mounted on /ipfs or /ipns; paths in /ipfs always refer to CIDs, and paths in /ipns always refer to entities where name resolution has been set up, while ipfs files paths always refer to whatever CID root is currently in effect."

This probably needs more a lot more elaboration and clarity, but at least it identifies some points that need addressing.

I think one issue is that the semantics of ipfs files and the choice to have four different namespaces that can't be distinguished syntactically lead to the complexity of the documentation. The best way of making the documentation better and create a better user experience might be to change the commands themselves.

For example, consider this:

ipfs files provides a set of commands similar to basic POSIX file system commands (cp, rm, ls, ln, mount, etc.). These operate on the regular file system with IPFS subtrees mounted on particular locations. Such mounts can be established with ipfs files mount <cid> <path>. The current mount table can be returned with ipfs files mount [--format=json], and the CID corresponding to a particular path can be returned with ipfs files cidof <path>. The mount table is kept in a database at ~/.ipfs/mounts. The ipfs files operations lock any database record that are modified by an operation (e.g., the mount that includes the destination of a cp operation, but not the source mount), so that concurrent accesses from multiple scripts are serialized. The database location can be overridden with the IPFS_FILES_MOUNTS environment variable. For the construction of large directory trees with IPFS, consider using the ipfs maketree <json> command instead, which parallelizes the construction of the Merkle DAG.

ipfs maketree <json> takes a JSON description of a directory tree. The JSON file contains mappings of the form {filepath: ..., destpath: ...} and {entries: [...]} for files and directories. The ipfs maketree command will copy over all files to a destination tree and return the CID for the destination tree. Instead of filepath:, users can also specify cid: (incorporated directly), linkpath: (only CIDs are added, but the original file is used as the underlying storage), and linkurl: (only CIDs are added, and the data at the given URL is used as the underlying storage).

@hsanjuan
Copy link
Contributor Author

Two quick notes (agreeing on most otherwise):

Apparently, this root node can be changed with ipfs files chcid.

According to the cli docs, this only changes the CID version or hash function of the root node of a given path.

The documentation says nothing about how this mapping is established or where it is stored.

I would be against providing imlementation details in user documentation. How that happens does not (or should not) affect the usage of the feature.

@johnnymatthews johnnymatthews changed the title [DOCS ISSUE] Better document MFS and it's relation with non-MFS operations Better document MFS and it's relation with non-MFS operations Apr 17, 2020
@tmbdev
Copy link

tmbdev commented Apr 18, 2020

would be against providing imlementation details in user documentation. How that happens does not (or should not) affect the usage of the feature.

I'd say where the state is stored is both user-visible and affects the usage. For example, if it's stored in the file system, it is subject to backup, restore, version control, concurrent access, storage on network file systems etc. (i.e., after a local file system restore, a directory on IPFS created with these tools would seem to revert as well, an unexpected behavior for a distributed file system).

@hsanjuan
Copy link
Contributor Author

There is no difference between that or any ipfs related data, everything is in ~/.ipfs.

(i.e., after a local file system restore, a directory on IPFS created with these tools would seem to revert as well, an unexpected behavior for a distributed file system).

Did you get the idea that somehow MFS is backed up for you somewhere else so that it would keep state even if you reverted your IPFS repo?

@tmbdev
Copy link

tmbdev commented Apr 23, 2020

There is no difference between that or any ipfs related data, everything is in ~/.ipfs.

MFS could store state in memory/a daemon, in the current working directory, or in IPNS, all places IPFS also already stores state. All of those could be reasonable choices for something like MFS, depending on which use cases you have in mind.

@hsanjuan
Copy link
Contributor Author

@tmbdev thanks, I see how being more detailed there can help users.

@johnnymatthews I think we have enough feedback to actually write a good guide on MFS.

Do you think this should still be under concepts/MFS, or a different content location? I volunteer myself to write it.

@johnnymatthews
Copy link
Contributor

Yeah /concepts/mfs works well for the URL. The title of the page should be Mutable File-Systems (MFS), as should the sidebar nav item.

@hsanjuan hsanjuan transferred this issue from ipfs-inactive/docs May 22, 2020
@johnnymatthews johnnymatthews added dif/hard Having worked on the specific codebase is important effort/days Estimated to take multiple days, but less than a week help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or an improvement to an existing feature need/analysis Needs further analysis before proceeding P2 Medium: Good to have, but can wait until someone steps up status/inactive No significant work in the previous month topic/docs Documentation labels May 23, 2020
@johnnymatthews johnnymatthews changed the title Better document MFS and it's relation with non-MFS operations Explain the relationship between MFS, IPFS, and UnixFS. May 23, 2020
@johnnymatthews
Copy link
Contributor

@hsanjuan do you still have the bandwidth to write this doc, or should I open it up for a bounty?

@johnnymatthews johnnymatthews removed the need/analysis Needs further analysis before proceeding label Jul 16, 2020
@realChainLife
Copy link
Contributor

@hsanjuan does the update MFS document explain this issue better, the distinction between MFS & UnixFS?

@hsanjuan
Copy link
Contributor Author

@johnnymatthews there has been a big update to this file (c5ed3ec). It has added examples using javascript (which I'm not sure are helpful at all as I would expect go's CLI examples). It still fails to explain how anything works in regards to MFS, while it does a better job with UnixFS.

If I were to work on this, I'd throw most of the MFS subsection away, keeping some small CLI example or two at the end for the usual workflow.

@johnnymatthews
Copy link
Contributor

Correct, c5ed3ec was a bounty project completed by @realChainLife (in this thread). Using Go-IPFS or JS-IPFS is a debate for another time, but having any examples at all is a good thing. I'm assuming you no longer have time to write the changes you'd prefer to see on the page. Would you be able give list a few bullet points for what you'd like to see though?

@johnnymatthews johnnymatthews added the bounty Has bounty! See https://github.com/ipfs/devgrants/projects/1 label Oct 16, 2020
@alexmmueller
Copy link
Contributor

@johnnymatthews I could try to complete this issue if this makes sense?

@johnnymatthews
Copy link
Contributor

Assigned to you @alexmmueller! Drop any questions you've got in here.

@johnnymatthews
Copy link
Contributor

Are you still working on this one @alexmmueller?

@tamagosante
Copy link

tamagosante commented Jan 28, 2022

Re: Current docs about "File systems and IPFS": MFS and UnixFS

My profile: Someone new to web3 and IPFs

Reading the docs

The docs are doing a good job of explaining MFS and UnixFS respectively.
I got a vague understanding, that MFS is the "file sytem", that uses the "format" UnixFS. [U1]

However, there was still this uncertain feeling of my understanding [U1] being actually correct, due to these 2 passages in the docs:
(Both describe how they handle linking for me)

MFS:
https://docs.ipfs.io/concepts/file-systems/#mutable-file-system-mfs

MFS files and have all the work of updating links and hashes taken care of for you.

UnixFS:
https://docs.ipfs.io/concepts/file-systems/#unix-file-system-unixfs

[...] so it needs metadata to link all its blocks together. UnixFS is a protocol-buffers (opens new window)-based format [...]

Suddenly I wondered: "If both handle linking, are they actually both describing a "file system, thus are they alternatives?"
(yes, I'm aware, that the definition of UnixFS say it is a format, but the "FS" in its name didn't help to make this separation clear)

Wish to reduce confusion

It would have been helpful for me, to make this "hierarchy" between MFS and UnixFS clearer with a simple diagram, like

#251

     +---------+
     |   MFS   |   File system.
     +---------+
     |  UnixFS |   Files and directories.
     +---------+
     |   DAG   |   Nodes.
     +---------+
     |  Block  |   Stream of bits.
     +---------+

or a sentence like

ipfs/kubo#5051 (comment)

[...] Unixfs is a format. Mfs is the virtual filesystem tree, and the files api is an api interface that gives you filesystem operations over unixfs files/directories backed by mfs.

Final words

Generally, the docs are well structured and written.
I like all the references and links to tutorials you are giving.
This is just a tiny nitpick!

@johnnymatthews johnnymatthews removed the bounty Has bounty! See https://github.com/ipfs/devgrants/projects/1 label May 24, 2022
@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in Protocol Docs Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dif/hard Having worked on the specific codebase is important effort/days Estimated to take multiple days, but less than a week help wanted Seeking public contribution on this issue kind/enhancement A net-new feature or an improvement to an existing feature P2 Medium: Good to have, but can wait until someone steps up status/inactive No significant work in the previous month topic/docs Documentation
Projects
None yet
Development

No branches or pull requests

6 participants