Draft: base protocol with merkle trees and new hash algorithms #59

the8472 · 2017-02-26T15:19:45Z

For discussion. For now it's as a diff to BEP3. Later I can rebase and move it to a new BEP.

Changes so far:

files are piece-aligned with implicit holes
per file merkle trees
merkle leaves are always 16KiB
piece hashes (now part of the merkle tree) are moved to the root dictionary + metadata exchange extension to cover those
directory-trie instead of flat file list
sha2-256
version field for future hash upgrades
btmh magnets using multihash format instead of btih
hybrid torrents as upgrade path
adopted reject message + state machine from fast extensions

Note that things that need super-fast torrent startup do not need to concern themselves with the bulky hash list in the torrent root dictionary. Those things can be handled by extending BEP9 to send those hashes piecemeal after obtaining the more lightweight info dictionary.

Also note I'm not using tr_hashpiece logic since I think the transfer of hashes should be decoupled from the transfer of payload. This is important for some use-cases as discussed in issue #29

the8472 · 2017-02-26T15:20:39Z

Related discussion: #58

arvidn · 2017-02-26T21:35:32Z

beps/bep_0003.rst

+  A list of strings. Each string consists of concatenated hashes
+  of an intermediate merkle tree layer for each file. The layer is chosen so that
+  one hash represents one piece. For example if a piece size of 128KiB is used
+  then 3rd layer up from the leaf hashes is used.


this paragraph implies that the leaf size is 16 kiB, perhaps that should be stated explicitly.

It is stated further down in the pieces root definition

arvidn · 2017-02-26T21:36:58Z

beps/bep_0003.rst

@@ -87,70 +92,115 @@ announce

 info
  This maps to a dictionary, with keys described below.
+
+``piece layer``


I interpret this to be a list where each entry represents a file, and the content of each layer is the full tree (except truncated at the piece size). I would expect the key to be called something like "merkle trees" or at least plural "piece layers"

Yes, each list entry represents one file.

No, it does not represent full trees. I'll have to improve that part. It is meant to be only one level of the tree where the data size covered by the hash is equal to the piece length. so if piece length = 16KiB you get leaf hashes, if it's 32KiB then you get 1 level up, if it's 64KiB you get two levels up etc.

I would prefer always having the 16KiB leaves in there but some users today scale their piece length to keep the .torrent file very small, so I assume they would dislike the potentially massive increase in torrent sizes.

(except truncated at the piece size).

Oh yeah, right. Exactly. I can rename it.

Just to clarify once more: It's just one level of the tree for each file. Not the levels above - they can be derived if needed - or the levels below - they would take up too much space.

ah, right, of course

arvidn · 2017-02-26T21:57:46Z

beps/bep_0003.rst

-appear in the files list. The files list is the value
-``files`` maps to, and is a list of dictionaries containing
-the following keys:
+``length``


the distinction between multi-file and single-file torrents is also a frequent source of issues (as the typical client behavior is not necessarily always obvious). But it may not be worth trying to unify these in this proposal

Unification would be possible with a few awkward compromises wrt. legacy compatibility. single-file torrents would have to be represented as multifile ones with a single entry and we'd have to make some "how to interpret the directory layout" rules.

yeah. on the other hand, one could argue that those rules of how to interpret the directory structure for multi-file torrents are already there (just not documented and kind of de-facto). Sometimes people create single-file torrents using the multi-file structure, and have empty torrent "name" fields. To handle that, I put the hex encoded info-hash as the name, and then users get confused.

I hate 'digest func' being required. There should be an assumed default value with the key being a marker so clients can cleanly report that they don't support a new algorithm. I don't want to get into a discussion of what that default should be in this thread, but it should be only one thing.

arvidn · 2017-02-26T21:58:39Z

beps/bep_0003.rst

+``length``
+  Length of the file in bytes.
+
+``pieces root``


I wonder if the word "pieces" may be a bit misleading here, since the tree doesn't (necessarily) terminate at the piece level

I'm not married to any of the names. "block tree root", "block root", "tree root" all work. I would avoid "root hash" since it's already used in BEP 30.

arvidn · 2017-02-26T22:03:14Z

beps/bep_0003.rst

+``pieces root``
+  The root hash of a merkle tree with a branching factor of 2,
+  constructed from 16KiB blocks of the file.
+  The last block of the file may be smaller than 16KiB.


It may be worth being explicit here. The end piece of a file, should it be (1) padded with zeroes for the purposes of calculating the hash, or (2) should the hash be calculated on the truncated byte range?
I imagine the benefit of (1) is that it creates more coherent pieces whereas (2) may be slightly more efficient

I have no strong opinion in either direction. As written it currently is intended to be (2).

regarding easing upgrades, zero-padding pieces at the end of a file may provide some benefits.

I have a feeling that (1) would be simpler to implement, as it would preserve the "dense", fixed size pieces for purposes of hashing. It would mean a backwards compatible torrent would need to pad the last file as well. Without any implementation experience, I can't back this up other than with a gut feeling

Padding with zeros avoids a weird layer violation. It isn't particularly expensive because the hash value of 16KiB of zeros can be cached. The extra cost is that file sizes are rounded up to a multiple of 16KiB. I could go either way on this one.

The padding already is in the current revision. GH should be showing this conversatio as outdated for that reason.

Although it's only padded to the nearest multiple of 16KiB.

It isn't particularly expensive because the hash value of 16KiB of zeros can be cached.

You mean that the remaining leaves of the merkle tree to be derived from 16KiB of zeroes too instead of just initializing the leaf hashes to zero?

Sorry I misunderstood thinking that padding was to the end of the piece. There's some benefit to end-of-piece padding, in that if multiple peers are all downloading just one file they won't have to download extra stuff to complete pieces to send those to each other.

Well, files are piece-aligned now. But the holes created by that alignment can be larger than 16KiB if the piece size is larger than 16KiB. The hashing only rounds up to the nearest 16KiB boundary so implementations can use fixed-sized buffers when hashing.

In other words piece-padding != hash-padding

arvidn · 2017-02-26T22:13:08Z

beps/bep_0003.rst

+
+  Each dictionary contains
+
+  ``path``


I'm a little bit worried about how flexible this structure is. I take it the file a/b/c/foo.txt could be encoded in several different ways:

{ "path": ["a", "b", "c", "foo.txt"], "files": {"length": ..., "pieces root": "..." } }

as well as:

{ "path": ["a"], "files": { "path": ["b"], "files": { "path": ["c"], "files": {"path": ["foo.txt"], "length": ..., "pieces root": "..." } } } }

I appreciate the value of being able to represent paths in more compact ways than to require a separate dictionary at each level though. A strawman alternative could be:

{ "paths": [["a"], ["b"], ["c"], ...], "files": [ { "name": "foo.txt", path: 0, ...}] }

i.e. store the paths in a separate list and have files specify the index to which directory they are in.

The first case is illegal since path is mandatory, including in leaves.

The second case is would be the norm of those entries had siblings, otherwise it should be collapsed to path: ["a", "b", "c", "foo.txt"]

The problem with the last case is that it could not be made into a hybrid torrent that is backwards-compatible with BEP3. And it would be less efficient since it would still not encode share prefixes efficiently like trie does.

I see. It was not obvious to me that this field was mandatory, and specifically must be present for each leaf (which I think should be clarified). My concern remains though, of having many different ways of encoding the same thing (I consider it to be in the same class of issues as overlong utf-8 encodings).

Your point about this structure supporting coexisting with the current "files" structure is a good point, and I think it should be documented as well (perhaps I just missed it).

As for reusing path prefixes, yes. However, there's a fair amount of overhead in the dictionary keys in the directory tree in this proposal too, you would need a fair amount of path string reuse to "break even". I wonder if there's a more compact (and perhaps simpler) way to represent directory trees.

I wonder if there's a more compact (and perhaps simpler) way to represent directory trees.

Yes, using dictionary keys as path elements. It would also have the neat property of enforcing uniqueness.

We can do that, but then we'll have to encode it twice for backwards-compatible torrents. I'm not sure what is better, compromising in the new format or compromising how to achieve backwards compatibility.

(perhaps I just missed it).

No, it's part of the TODO

I don't think there's much gain from compromising the new format to make it look more like the old format because it's breaking backwards compatibility anyways. We might as well define two different formats: A backwards compatible format which is the same as BEP3 except with the pieces root key added, and a new trie based format. Only one of them would be present in a torrent.

I think that would be a bad idea. Since then an implementation that only intends to handle the new format would still have to be able to parse the old one because there could be old-format-but-with-root-hash torrents.
I.e. such a hybrid format would not actually be forward-compatible. We would be creating yet another legacy form.

So I only see two possible approaches for each change

a) have any new field mimic the old form in some way that at least a subset of the allowed values can be used in a hybrid format. this is what I've done with the files, length, piece length and name field
b) let the new format use a different key and encode the information twice. this is what I would do with pieces vs. pieces root + pieces layers which essentially is redundant information which can exist side by side.

So if we want a completely new files format and hybrid torrents will also have to encode it twice.
Now that I think about it it's probably not so bad since for many - even if not all - torrents the pieces make up the bulk of the size anyway, so we might as well duplicate that too.

A hybrid format would then look like files, pieces existing alongside with pieces layer, pieces root and files tree or something like that.

arvidn · 2017-02-26T22:17:38Z

beps/bep_0003.rst

+multi-file
+----------
+
+``files``


back in the day when talking about how torrents could be made more deterministic (i.e. more likely to generate the same info-hash when the same files are turned into torrents independently) included a few things that may be worth considering with a compatibility break:

requiring files to be listed in a deterministic order

for the purposes of the info-hash, only refer to files by their hash, and not their name (in this case it could be the root hash)

I suppose perhaps the approach people would prefer is to identify duplicate files across torrents by their hash separately

The per-file-tree with fixed block sizes should forever solve the dedup problem.
Putting the file information outside the info dict calculation would be possible, but that would make bep9 more complex since magnets also have to convey file naming information, not just piece payload.

arvidn · 2017-02-26T22:21:28Z

beps/bep_0003.rst

+* pieces field
+* double announce behavior
+* safe hashing. avoid downgrade attacks
+* changes to BEP 9. magnets. send merkle layers.


interesting. I imagine the main value from merkle trees is to extend the PIECE message to include uncle-hashes (like the tribler protocol does). But I suppose extending BEP 9 may provide a simpler upgrade path.

Yes, as I said in the initial comment, this is intentional to decouple hash transfer from pieces transfer. The tribler approach is great for one thing, terrible for everything else.

arvidn · 2017-02-26T22:27:15Z

One use case that per-file hash trees makes worse than current bittorrent is having lots of small files. In the current protocol a lot of files may fit in a single piece, meaning they only add one hash to the .torrent file. With per-file merkle trees, each file will add a hash, which could make the info dictionary significantly larger.

Since the filenames are likely to also use a fair amount of space in the info dictionary, this is a problem currently as well. No obvious solutions to this come to mind though (short of having built-in support for tar files)

the8472 · 2017-02-26T22:33:15Z

I think the cost of additional hashes is paid for by making the directory layout more efficient, which usually is several layers deep for torrents with >10k files.

the8472 · 2017-02-27T18:36:07Z

New version with the dictionary based path trie. Since no operating system permits empty path elements I've used that to specify path properties. It could even be used to specify per-directory properties by other BEPs.

the8472 · 2017-02-28T20:32:31Z

I have replaced the hash agility with a version field. Which actual function will be used is still open since discussion in #58 is ongoing. I also added the reject message and associated state machine from the fast extension. Also added is emphasis that the piece layers are essential to functionality and thus must be verified when parsing a torrent.

ssiloti · 2017-03-01T23:20:09Z

beps/bep_0003.rst

+  For example if the piece size is 16KiB then the leaf hashes are used.
+  If a piece size of 128KiB is used then 3rd layer up from the leaf hashes is used.
+  Files smaller or equal to the piece size are represented by an empty string since
+  the root hash is sufficient to cover the piece.


We should specify how nodes which only contain leaves beyond the end of the file are handled here. The obvious choice would be to exclude them from the list.

good point, added.

the8472 · 2017-03-01T23:49:46Z

added a section how to create and handle hybrid torrents
extended BEP9 messages to handle merkle tree layer exchange

bramcohen · 2017-03-20T22:19:30Z

beps/bep_0003.rst

@@ -87,70 +92,153 @@ announce

 info
  This maps to a dictionary, with keys described below.
+
+``piece layers``


A high level comment on this overall structure: There's going to be an overlap in time where there are torrent files which support both new and old style data, and new style clients can connect to either type, then after a transition the old-style data should be dropped. A bit of redundancy between the old and new data isn't terribly problematic. What I'd like to see is that the new-style data all be included in a bencoded string under a single key in the info dict, so that its value is committed to by the old style data, and after a transition all other keys in the info dict are dropped. When new peers connect to each other they should specify the new algorithm hash of just the new data and communicate based on that, and they'll need to use a slightly extended peer protocol to be able to communicate paths to the root of individual chunks.

I mostly agree with the new keys as presented here, but think they should be moved into the new dict.

What is the underlying motivation for that? To be able to transform a hybrid torrent to the new format exclusively without changing the new hash? I guess that's possible, but it would prevent peers that obtained the torrent the new way to interoperate with old peers even if the torrent itself was created in a hybrid format.

But forever lugging around the legacy data on hybrid torrents created during the transition phase is not ideal either.

Hrm, I'll have to think about it.

Handshaking can be done by using one of the handshake bits to specify support for the new format. If both sides handle the new format then the receiving side can send the hash of just the new-style data, even though the first hash sent by the initiating side is still an old-style sha1.

Handshaking is unrelated. Clients have to do 2 announces / dht lookups anyway since the hashes change, so they know which peer supports which format. The issue is the chain-of-trust from the hash to the hybrid data. If the new hash is only derived from the new data but not from the hybrid format then metadata sourced from the new hash can't be used to talk to old clients since the old data cannot be obtained due to a lack of trust.

The client can get disjoint sets of peers from each announce though, so it may receive a peer only under the legacy infohash when it does in fact support both formats. The client would only discover the peer's support for the new format during the handshake, so we need a flag to signal that.

Hrm, yeah, that wasn't what I was aiming for, but it's a good point. If both have hybrid torrents and know both infohashes then that can be used to upgrade the connection.

bramcohen · 2017-03-20T23:34:49Z

It's okay to trust the old-style metadata for now. Sha1 isn't that badly broken. The goal should be to finish this transition before such concerns become serious.

There is a problem with not having paths to the root for pieces downloaded from the old peers. It may be that putting a new style piece hash list outside the info dict (which I think is what you're proposing) is the best thing to do for that, but it kind of sucks having torrent files be so bloated in the interim.

the8472 · 2017-03-21T00:00:11Z

It looks like we have not reached common understanding yet. If I understand this comment correctly you propose that the new format be structured as follows:

{info: {newformat: {...}}} and the new infohash is derived by hashing over torrent['info']['newformat']. This would mean that a client has no way to establish which old infohash corresponds to a new infohash if it only got the new one because they're not hashing the same thing. And when performing a metadata exchange they would only get the new format. That means they cannot talk to old clients. It means you can upgrade, but you cannot downgrade, even if the torrent was originally created with both sets of data.

It may be that putting a new style piece hash list outside the info dict (which I think is what you're proposing)

That is indeed what I am proposing, but that is mostly unrelated to the transition. It is supposed to be a permanent feature of the new format. I have moved them out of the info dictionary so that usecases that new fast startup can use the (now modified) metadata exchange to just obtain the file list with root hashes without the lower layers while other use-cases that require more finegrained hash information (deduplication, resuming of partial downloads, interoperation with other protocols such as webseeds) can get the full set.

In short: .torrent files contain fine-grained hashes, metadata initially only transfers the lighweight info-dict with per-file roots, fine-grained hashes can be obtained separately.

see also: the8472@e25fa15

bramcohen · 2017-03-21T00:06:43Z

Oh I see what you're saying now. That is indeed what I proposed, but I don't have a strong objection to making the new-style hash be over the whole info dict, because once the old-style data is dropped canonical representation isn't a real problem any more because the dict only has one thing in it.

Having a new-style piece list outside the info dict is perfectly fine but it should be optional. A torrent which has only support for new style peers doesn't need it, and getting rid of that bloat matters in some use cases.

the8472 · 2017-03-21T00:10:36Z

Having a new-style piece list outside the info dict is perfectly fine but it should be optional.

I have designed the .torrent to be the slow path that contains all the data. This ensures that the data is retained for those who need it. Making it optional would mean it could get lost in a game of telephone. Clients that need the fast path should just use the hash or magnet + metadata exchange.

the8472 · 2017-03-21T00:20:25Z

In fact the data is essential for the use with the core protocol. There is no other way to obtain these hashes short of rehashing a complete file.

Metadata exchange allows them to be transferred incrementally, but it is an extension. A barebones client not supporting it will need the fine-grained hashes it in the torrent torrent file.

the8472 · 2017-05-14T14:23:44Z

Imo it is ready for merging now. Any issues discovered during the first implementations can be addressed in separate PRs.

ssiloti · 2017-05-15T16:58:34Z

beps/bep_0052_torrent_creator.py

-
+
+    def info_hash_v2(self):
+        return binascii.hexlify(sha256(encode(self.info)).digest()).decode('ascii')


there's no need to use binascii here, just call hexdigest() instead of digest()

ssiloti · 2017-05-15T17:00:16Z

beps/bep_0052_torrent_creator.py

@@ -243,3 +253,7 @@ def create(self, tracker, hybrid=True):
   args = parser.parse_args()
   t = Torrent(args.path, args.piece_length)
   open(t.name + '.torrent', 'wb').write(encode(t.create(args.tracker, args.v2_only)))
+   if args.v2_only:


There needs to be a not here.

Wait no it's correct as it is. Clearly the naming here is not the best.

jzelinskie · 2017-07-07T18:08:40Z

I know it's a bit late, but is there a chance we can get BEP48 merged into this specification, too?
Scraping is pretty standard fair at this point and it's already documented in the UDP protocol spec. Seems like it makes sense to be officially builtin at this point.

the8472 · 2017-07-08T12:16:08Z

Scrapes are not essential for getting bittorrent to work. BEP3/52 basically describe minimum implementations.

atomashpolskiy · 2017-08-18T13:19:07Z

Hi guys! I'm about to begin implementing v2 changes and am currently reading the spec. I'm having a hard time with some of the wording though and would be really grateful, if someone helps with clarification.

piece layers
A dictionary of strings. For each file in the file tree that is larger than the piece size it contains one string value. The keys are the merkle roots...

What about small files (smaller than or of the same size as the piece size)? Do they still have a key-value pair in piece layers with the value being an empty string, or are they omitted completely?

pieces root
<optional, merkle tree root (string)>
For non-empty files this is the the root hash of a merkle tree with a branching factor of 2, constructed from 16KiB blocks of the file...
...identical files always result in the same root hash.

I understand that hashing is not applicable to empty files, so they don't have the pieces root attribute in the file tree. Does not look like it can be omitted in any other circumstances (as long as the data is expected to be verified), so it might help using a clearer wording, like:

<merkle tree root (string); omitted for empty files>

Being more explicit about the interconnection of new concepts would also be beneficial, i.e. stating that pieces root is used for:

matching the file against piece layers, unless the file is smaller than or of the same size as the piece size
requesting hashes from peers, when the client does not have the .torrent file and piece layers

When verifying an infohash implementations must also check that the piece layers hashes outside the info dictionary match the pieces root fields.

If small files are omitted from the piece layers, then the matching is non-trivial and should be described in the spec, i.e.:

each file in the file tree, that is larger than the piece size, can be matched by its' pieces root value to a key in the piece layers
files, that are smaller than or of same size as the piece size, don't have matching entries in piece layers
(optionally) piece layers does not have keys that can't be matched to one or more files in the file tree

Also looks to me like the piece layers may as well be absent from the .torrent file, and a proper implementation should still be able to handle it by resorting to asking peers for hashes...

the8472 · 2017-08-18T13:48:58Z

What about small files (smaller than or of the same size as the piece size)? Do they still have a key-value pair in piece layers with the value being an empty string, or are they omitted completely?

Omitted completely. There is nothing to include if the merkle tree only has a single node.

I understand that hashing is not applicable to empty files, so they don't have the pieces root attribute in the file tree. Does not look like it can be omitted in any other circumstances (as long as the data is expected to be verified), so it might help using a clearer wording, like:

In principle the empty-keyed descriptor dictionary is also allowed for directories, which neither have length nor merkle root, although that possibility currently is not used. That's why it is phrased in a positive manner ("non-empty files") instead of attempting to exclude cases.

If small files are omitted from the piece layers, then the matching is non-trivial and should be described in the spec, i.e.:

This too is formulated in a positive manner, listing a single case, instead of enumerating all the possible cases to exclude: For each file in the file tree that is larger than the piece size it contains one string value.

You could also put it this way: piece layers only contains entries when the merkle tree has other nodes than the root node.

What makes it non-trivial?

Also looks to me like the piece layers may as well be absent from the .torrent file, and a proper implementation should still be able to handle it by resorting to asking peers for hashes...

That's what magnets are for. A .torrent is only in a fully valid state once the piece layers have been included. Those are necessary for partial resumes and stateless torrent clients. So I want to make sure that people don't just go around omitting them because they don't see the use-cases.

That said, I know spec wording can be a bit obtuse. But having written it I am blind on that front, so I need strong arguments why something is confusing or can be misunderstood to be convinced.

If other people also have issues with it, please speak up!

the8472 · 2017-08-18T13:57:07Z

Correction. Since the piece layer can be several layers down in the merkle tree it's not about the merkle tree only having a root node, it's about the merkle tree having fewer layers than where the piece layer would be located.

merkle tree of a small file, in heap order:
R AA BBBB CCCCC000
merkle tree of a large file
R AA BBBB CCCCCCCC PPPPPPPPPPPPPPPP EEEEEEEEEEEEEEEEEEEEEEEEEE000000

P being the piece layer, 0 being beyond-end-of-file leafs.
As you can see there simply is nothing that could be included for the small file.

atomashpolskiy · 2017-08-18T14:17:05Z

Thanks, I agree that the matter of wording is very subjective, and it might as well be me lacking the required mental capacity to connect all the dots at once, so I'd prefer to see more hints and cross-references :)

That's what magnets are for. A .torrent is only in a fully valid state once the piece layers have been included. Those are necessary for partial resumes and stateless torrent clients.

This is an interesting point by the way. hash request description currently says:

'hash request' messages contain ..., base layer, ... . The base layer defines the lowest requested layer of the hash tree. It is the number of layers above the leaf layer that the hash list should start at. A value of zero indicates that leaf hashes are requested. Clients are only required to support setting the base layer to the leaf and piece layers.

I assume this means that aside from the piece hashes (i.e. piece layers) the client must also be able to serve leaf hashes. Given that leaf hashes are not explicitly specified anywhere, this implies that the client has to do one of the following:

persist leaf hashes somewhere and load them after the restart (N/A for stateless clients)
re-hash the existing files on startup
calculate the required hashes upon receiving a hash request (with optional caching)

Is my assumption correct? If yes, then I can't see why a stateless client can't calculate piece hashes in addition to leaf hashes. If my math is correct, it increases the complexity of required hashing by a constant factor in the range of [1.5; 2).

atomashpolskiy · 2017-08-18T14:26:35Z

Another question about hash request:

Index MUST be a multiple of length, this includes zero.

Shouldn't it be a multiple of length*2 ? Otherwise the client will be allowed to request hashes that are not siblings (from different subtrees), and uncle hashes will not be sufficient for verification.

the8472 · 2017-08-18T14:41:17Z

Is my assumption correct?

Kind of, but you're missing two details:

leaf hashes are only needed for realtime applications or maybe superseeding, so the need to create those hashes should be rare. piece hashes should make up the bulk of everything. operating on whole pieces is the encouraged way to do things.
stateless clients also must be able to find files on the filesystem, for that they need to rapidly scan thousands or millions of files, for that merkle roots are inadequate because they would need to hash the whole file instead of being able to randomly probe parts of the file

Shouldn't it be a multiple of length*2 ?

Length is already a power of two. So if your length is 4 you can only use 0, 4, 8, ... as index. That's already aligned to a single subtree, no? Can you construct a case where more than one uncle hash per layer is needed?

the8472 · 2017-08-18T15:11:50Z

You can also cross-check with the example implementation.

http://bittorrent.org/beps/bep_0052.html#metainfo-files

atomashpolskiy · 2017-08-18T15:12:45Z

Is streaming considered real-time? If yes, I can't see how real-time can be considered "rare".

As for the hashes.. what exactly stops me from requesting 2 hashes, starting with index 1? The receiver ought to make a sanity check

the8472 · 2017-08-18T15:20:35Z

Is streaming considered real-time? If yes, I can't see how real-time can be considered "rare".

Streaming clients can buffer in the general case. For startup they might want to use sub-piece hash checking to reduce the initial latency to playback. But again, that's just a transient use.

As for the hashes.. what exactly stops me from requesting 2 hashes, starting with index 1? The receiver ought to make a sanity check

Requesting 2 hashes means length = 2 in the request. Which means you can only request index 0, 2, 4, ...

Index MUST be a multiple of length, this includes zero. Length is the number of hashes to include from the base layer. Length MUST be equal-to-or-greater-than two and a power of two.

atomashpolskiy · 2017-08-18T15:26:52Z

Oh, now I see it, my bad...)) I'm probably loosing my sight

atomashpolskiy · 2017-08-18T20:01:42Z

You can also cross-check with the example implementation.
http://bittorrent.org/beps/bep_0052.html#metainfo-files

Just checked, might be worth noting that when run under Python 2 for a single-file torrent, it formats piece layers as a list of strings instead of a single string value. This is confusing, granted that for multi-file torrents it does not work at all due to some functions missing (so it's clear that Python 3 is intended to be used).

the8472 · 2017-08-18T21:08:16Z

The shebang is #!/usr/bin/env python3. @ssiloti can we do something about that?

As for the rest, I guess there's some room for a few simple visualizations wrt. merkle tree layers. Maybe in the style of this comment

ssiloti · 2017-08-18T21:14:38Z

I just pushed a commit to add a 3 to the shebang so that it's clear the script is written for python 3. The change should propagate to bittorrent.org any minute now.

the8472 · 2017-08-18T21:26:55Z

Oh, heh.

arvidn · 2020-06-03T08:56:19Z

I've put up two test torrents, one v2-only and one hybrid, here:

https://libtorrent.org/bittorrent-v2-test.torrent
https://libtorrent.org/bittorrent-v2-hybrid-test.torrent

First Draft: base protocol with merkle trees and new hash algorithms

39670fb

arvidn reviewed Feb 26, 2017

View reviewed changes

dictionary file trees, zero-paddeding when hashing

6c7b87a

reject message, replace digest function field with version indicator

a56c940

ssiloti reviewed Mar 1, 2017

View reviewed changes

the8472 added 3 commits March 2, 2017 00:43

add metadata exchange for piece layers

e25fa15

describe legacy/v2 hybrid format

8628126

omit hashes which can be reconstructed

57f4f26

the8472 added 3 commits March 2, 2017 18:12

v2 and hybrid magnets

f7ef8ee

choose hash algo, change piece layers to a dictionary

e7d2f7c

terrible typo

5790658

bramcohen reviewed Mar 20, 2017

View reviewed changes

the8472 added 3 commits May 14, 2017 15:38

move v2 spec to new bep

6a0b817

restore bep 0003

bd77a55

change status to draft

9a34522

ssiloti reviewed May 15, 2017

View reviewed changes

from review: simplify hex output

9049d0c

ssiloti merged commit 51fe877 into bittorrent:master May 15, 2017

the8472 mentioned this pull request Aug 7, 2017

Bittorrent v2 arvidn/libtorrent#2197

Closed

atomashpolskiy mentioned this pull request Aug 8, 2017

Adopt BitTorrent v2 specification atomashpolskiy/bt#28

Open

fasiha mentioned this pull request Jan 2, 2018

Would a v2-aware DHT allow individual file search? #77

Closed

jimmywarting mentioned this pull request May 16, 2019

Idea: WebSockets as an alternative to WebRTC webtorrent/webtorrent#1492

Closed

ssiloti mentioned this pull request Jul 15, 2021

Couldn't generate valid v2/hybrid torrent file from torrent_handle::torrent_file() arvidn/libtorrent#6283

Closed



		def info_hash_v2(self):
		return binascii.hexlify(sha256(encode(self.info)).digest()).decode('ascii')

Draft: base protocol with merkle trees and new hash algorithms #59

Draft: base protocol with merkle trees and new hash algorithms #59

Conversation

the8472 commented Feb 26, 2017 • edited Loading

the8472 commented Feb 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

the8472 Feb 26, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssiloti Feb 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arvidn commented Feb 26, 2017

the8472 commented Feb 26, 2017

the8472 commented Feb 27, 2017

the8472 commented Feb 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

the8472 commented Mar 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssiloti Mar 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bramcohen commented Mar 20, 2017

the8472 commented Mar 21, 2017

bramcohen commented Mar 21, 2017

the8472 commented Mar 21, 2017

the8472 commented Mar 21, 2017

the8472 commented May 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jzelinskie commented Jul 7, 2017

the8472 commented Jul 8, 2017

atomashpolskiy commented Aug 18, 2017

the8472 commented Aug 18, 2017 • edited Loading

the8472 commented Aug 18, 2017

atomashpolskiy commented Aug 18, 2017

atomashpolskiy commented Aug 18, 2017

the8472 commented Aug 18, 2017

the8472 commented Aug 18, 2017

atomashpolskiy commented Aug 18, 2017

the8472 commented Aug 18, 2017

atomashpolskiy commented Aug 18, 2017

atomashpolskiy commented Aug 18, 2017

the8472 commented Aug 18, 2017

ssiloti commented Aug 18, 2017

the8472 commented Aug 18, 2017

arvidn commented Jun 3, 2020

the8472 commented Feb 26, 2017 •

edited

Loading

the8472 Feb 26, 2017 •

edited

Loading

ssiloti Feb 27, 2017 •

edited

Loading

ssiloti Mar 20, 2017 •

edited

Loading

the8472 commented Aug 18, 2017 •

edited

Loading