Analyzing the DAG modifier #5211

schomatis · 2018-07-10T14:49:47Z

Some random notes to keep in mind when refactoring the DAG modifier. Complemented by #5192.

Document what are the rules when writing and rewriting a file DAG, what can be expected in terms of DAG structure, block sizes, etc? How is a DAG extended or modified?

Rename and document expandSparse, this is not really sparse, all the zeros are actually being written to the DAG. (It would be interesting to add a codification feature in UnixFS v2 to allow truly sparse file DAGs).

curNode, similar to the DAG reader, but actually used, where and how?

Entry point command: ipfs files write.

Any modification implies copying the node, we don't modify in-place (is this correct?) because of the merkle DAG nature of data, this should be obvious to the experienced reader but it's worth clarifying it, especially when functions like WriteAt are documented as: will modify a dag file in place. The modified node has to be re-added, so it's not in-place, or maybe I should redefine in-place, merkle is not about the where but about the what.

Document and clarify that the write operations are buffered, Sync does the actual writing, Write just writes to a buffer (that needs to be synced).

The DAG modifier contains a DAG reader. Can these be combined in a single structure (and set it to a read-only mode in the cases where the DAG reader was used)?

The appendData function only works with the trickle layout, so a balanced DAG would be appended with the trickle layout rules. Is a balanced append function needed?

The modifyDag function seems to be the core function that does the actual writing in a DAG. It is too long, 100+ LoC. It has a base case for leaf nodes and then a case for internal nodes where it iterates the child sizes in a similar way to the reader (but it doesn't use it for this part, why? because it doesn't use the reader's offset?) and recursively calls itself.

Clarify the use of writeStart compared with curWrOff that seems to be used only for reading (it doesn't leverage the reader offset).

May be useful to extract the FSNodeOverDag cache structure from the builder package into unixfs and start using it here. This may replace the EncodeProtobuf calls.

Truncate calls dagTruncate for the actual operation that does a similar recursive algorithm as modifyDag, that part of the code should be reused (and maybe even replaced with the reader operations). expandSparse, when truncate has a bigger-than-size offset, just uses appendData to fill with zeros.

Seek when called with a bigger offset than the file size is appending data, is this the expected behavior from the Unix API?

The text was updated successfully, but these errors were encountered:

schomatis added topic/docs-ipfs Topic docs-ipfs topic/technical debt Topic technical debt topic/UnixFS Topic UnixFS labels Jul 10, 2018

schomatis added this to the Files API Documentation milestone Jul 10, 2018

schomatis self-assigned this Jul 10, 2018

Stebalien removed this from the Files API Documentation milestone Apr 29, 2020

Stebalien unassigned schomatis Apr 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyzing the DAG modifier #5211

Analyzing the DAG modifier #5211

schomatis commented Jul 10, 2018

Analyzing the DAG modifier #5211

Analyzing the DAG modifier #5211

Comments

schomatis commented Jul 10, 2018