Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzing the DAG modifier #5211

Open
schomatis opened this issue Jul 10, 2018 · 0 comments
Open

Analyzing the DAG modifier #5211

schomatis opened this issue Jul 10, 2018 · 0 comments
Labels
topic/docs-ipfs Topic docs-ipfs topic/technical debt Topic technical debt topic/UnixFS Topic UnixFS

Comments

@schomatis
Copy link
Contributor

Some random notes to keep in mind when refactoring the DAG modifier. Complemented by #5192.

Document what are the rules when writing and rewriting a file DAG, what can be expected in terms of DAG structure, block sizes, etc? How is a DAG extended or modified?

Rename and document expandSparse, this is not really sparse, all the zeros are actually being written to the DAG. (It would be interesting to add a codification feature in UnixFS v2 to allow truly sparse file DAGs).

curNode, similar to the DAG reader, but actually used, where and how?

Entry point command: ipfs files write.

Any modification implies copying the node, we don't modify in-place (is this correct?) because of the merkle DAG nature of data, this should be obvious to the experienced reader but it's worth clarifying it, especially when functions like WriteAt are documented as: will modify a dag file in place. The modified node has to be re-added, so it's not in-place, or maybe I should redefine in-place, merkle is not about the where but about the what.

Document and clarify that the write operations are buffered, Sync does the actual writing, Write just writes to a buffer (that needs to be synced).

The DAG modifier contains a DAG reader. Can these be combined in a single structure (and set it to a read-only mode in the cases where the DAG reader was used)?

The appendData function only works with the trickle layout, so a balanced DAG would be appended with the trickle layout rules. Is a balanced append function needed?

The modifyDag function seems to be the core function that does the actual writing in a DAG. It is too long, 100+ LoC. It has a base case for leaf nodes and then a case for internal nodes where it iterates the child sizes in a similar way to the reader (but it doesn't use it for this part, why? because it doesn't use the reader's offset?) and recursively calls itself.

Clarify the use of writeStart compared with curWrOff that seems to be used only for reading (it doesn't leverage the reader offset).

May be useful to extract the FSNodeOverDag cache structure from the builder package into unixfs and start using it here. This may replace the EncodeProtobuf calls.

Truncate calls dagTruncate for the actual operation that does a similar recursive algorithm as modifyDag, that part of the code should be reused (and maybe even replaced with the reader operations). expandSparse, when truncate has a bigger-than-size offset, just uses appendData to fill with zeros.

Seek when called with a bigger offset than the file size is appending data, is this the expected behavior from the Unix API?

@schomatis schomatis added topic/docs-ipfs Topic docs-ipfs topic/technical debt Topic technical debt topic/UnixFS Topic UnixFS labels Jul 10, 2018
@schomatis schomatis added this to the Files API Documentation milestone Jul 10, 2018
@schomatis schomatis self-assigned this Jul 10, 2018
@Stebalien Stebalien removed this from the Files API Documentation milestone Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/docs-ipfs Topic docs-ipfs topic/technical debt Topic technical debt topic/UnixFS Topic UnixFS
Projects
None yet
Development

No branches or pull requests

2 participants