Skip to content
This repository was archived by the owner on Jun 29, 2022. It is now read-only.

dag-pb: express as IPLD Schema #247

Merged
merged 2 commits into from
May 27, 2020
Merged

dag-pb: express as IPLD Schema #247

merged 2 commits into from
May 27, 2020

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Mar 10, 2020

Fixes: #245

While I have it open, may as well push something for this. Feel free to edit directly if you have opinions.


type PBNode struct {
links [PBLink]
data Bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data should be optional? I think PBNode with only lists of links and no data are not uncommon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PBNode with only lists of links and no data are not uncommon.

They aren't valid. You need to signify the "sub-codec" for the links:

  • is it a bunch of file fragments? ( type 2, formerly type 0 )
  • is it a directory with all links being named ( type 1 )
  • is it a single named symlink ( type 4 )
  • is it a hamted-directory-shard ( type 5 )

Also due to any intermediate file-stream node ( type 2 ) in a graph being a file on its own right, the size of the stream section and the sync-hints go into data

TLDR: not optoinal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( above is written in the sense of "there is virtually no use of dag-pb outside of unixfs", if there is such a beast what I wrote above does not apply )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DagPB, data is optional. However, for unixfs, it's not optional.

And yeah, there are uses outside of unixfs (the webui has a dagpb geolocation database, cluster uses it, the pin datastructure uses it, etc.). They're just not commonly transferred/queried on the network (well, except for the geolocation databse).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of the schema, I wouldn't distinguish between "there is data but its empty" and "there is no data".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we're spec'ing DagPB here, not unixfs(v1), so: on this line we should say data optional Bytes, yep.

(And continue to not say nullable -- saying nullable would be claiming there's a distinguishment between "there is data but its empty(/null)" and "there is no data", which per @Stebalien 's statement would be incorrect.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. So you distinguish between "optional" (does not need to be specified in the serialized data) and "nullable" (null doesn't matter)? In that case, everything in dagpb is optional, and nothing is nullable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so links is optional too then? or will that just be a zero-length array?

I'm getting a bit mixed up here with what we're trying to represent. We're saying this is the "logical" format but I think this pushes it into the language level where the distinction between nullable and optional starts to get weird and maybe not so helpful. When we're talking about representation then yes, crisp and clear, but that's not what this is saying. The representation is described by the protobuf spec above this block. I think we should just be aiming for what is most conceptually useful to the reader of this doc trying to understand what data is in a dag-pb block.

Have pushed a new version but I'm dubious, latest is:

type PBLink struct {
  Hash Bytes
  Name optional String
  Tsize optional Int
}

type PBNode struct {
  Links optional [PBLink]
  Data optional Bytes
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just be aiming for what is most conceptually useful to the reader of this doc trying to understand what data is in a dag-pb block.

The reason I opened this issue is, that go-ipld-prime transforms from the Data Model into the actual DagPB. Rust IPLD does the same and I'd like to make sure the both are actually doing the same thing/all implementers can just look at the schema, implement it and are good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, well, 🤷‍♂, I think this might be good to go but it's a bit hard to tell for sure since concepts are not lining up with crystal clarity

@rvagg
Copy link
Member Author

rvagg commented Mar 10, 2020

reaching the limits of my protobof knowledge, @vmx are you any deeper on this stuff?

@warpfork warpfork force-pushed the rvagg/dag-pb-schema branch from 207d417 to 7ea8df6 Compare March 10, 2020 10:14
@warpfork
Copy link
Contributor

I pushed another commit to the branch with a little more prose. It added section headers to separate description of the serial format and the logical format into their own topics. While I had it open, I also poked at the introduction of the Pathing section a bit, to try clarify that it's one of several mechanisms, and nonstandard compared to other parts of IPLD.

@rvagg you may want to take a look at that and make sure I didn't make a complete mess of it.

@rvagg
Copy link
Member Author

rvagg commented Mar 13, 2020

I've pushed some edits here: took away optional on the Hash and made the names start with uppercase characters. I think I'm synthesizing the conversation here so far but please take a look and let me know if you're unhappy with the state, otherwise I'll assume silence as agreement.

@vmx
Copy link
Member

vmx commented May 26, 2020

@rvagg sorry for screwing up this PR's log. I thought I merge master into it and things will be fine. Somehow there was still a conflict. So I ended pushing a proper rebase, hence the force push. I just updated it to use DAG-PB instead of DagPB to be consistent.

This PR has been open for long and I haven't seen any objections, so I think it's good to go.

@rvagg rvagg force-pushed the rvagg/dag-pb-schema branch from f22bef7 to b94e308 Compare May 27, 2020 03:47
rvagg and others added 2 commits May 27, 2020 13:48
…e pathing

Add section headers for serial and logical format separately.  Retitled pathing section to clarify that it's one of several mechanisms, and nonstandard compared to other parts of IPLD.
@rvagg rvagg force-pushed the rvagg/dag-pb-schema branch from b94e308 to d13bf7d Compare May 27, 2020 03:48
@rvagg rvagg merged commit 81dcd4c into master May 27, 2020
@rvagg rvagg deleted the rvagg/dag-pb-schema branch May 27, 2020 03:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Describe DagPB also as IPLD Schema
6 participants