Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: base protocol with merkle trees and new hash algorithms #59

Merged
merged 35 commits into from
May 15, 2017
Merged
Changes from 1 commit
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
39670fb
First Draft: base protocol with merkle trees and new hash algorithms
the8472 Feb 26, 2017
6c7b87a
dictionary file trees, zero-paddeding when hashing
the8472 Feb 27, 2017
a56c940
reject message, replace digest function field with version indicator
the8472 Feb 28, 2017
e25fa15
add metadata exchange for piece layers
the8472 Mar 1, 2017
8628126
describe legacy/v2 hybrid format
the8472 Mar 1, 2017
57f4f26
omit hashes which can be reconstructed
the8472 Mar 1, 2017
f7ef8ee
v2 and hybrid magnets
the8472 Mar 2, 2017
e7d2f7c
choose hash algo, change piece layers to a dictionary
the8472 Mar 2, 2017
5790658
terrible typo
the8472 Mar 2, 2017
33077e2
add hash request messages to the core protocol
Mar 23, 2017
131b288
updates based on review feedback
Mar 24, 2017
cd1574a
Merge pull request #2 from bittorrent/hash-transfer
the8472 Mar 24, 2017
cfd7cd5
fix missing markup
the8472 Mar 26, 2017
48dfdcb
changes from reviews
the8472 Mar 26, 2017
b0634bc
Document client version field in DHT messages
Apr 28, 2017
369b2bc
add note about possible absence of the version key
May 1, 2017
6034458
Merge pull request #61 from bittorrent/dht-version-string
ssiloti May 2, 2017
940d1c6
regenerate html
May 2, 2017
a160be6
byte[] vs. String vs. path component clarifications
the8472 May 8, 2017
42501e5
minor fixes
the8472 May 9, 2017
5afecdc
relax path name restrictions
the8472 May 9, 2017
4c3f888
typo
the8472 May 9, 2017
4bdc9f6
add v2 torrent creation script
May 12, 2017
6c2e56d
go back to using the file length to filter for 'piece layers'
ssiloti May 13, 2017
655e385
include padding for the last file of a multi-file torrent
ssiloti May 13, 2017
96de012
normalize path before setting name
ssiloti May 13, 2017
055446e
set pad attribute on pad files
ssiloti May 13, 2017
7baf1c9
walk the directory tree in lexicographic order
ssiloti May 13, 2017
96aa178
Merge pull request #3 from bittorrent/new-hash-algos
the8472 May 14, 2017
47ed76d
add infohash output
the8472 May 14, 2017
e14a7a6
link torrent creator script
the8472 May 14, 2017
6a0b817
move v2 spec to new bep
the8472 May 14, 2017
bd77a55
restore bep 0003
the8472 May 14, 2017
9a34522
change status to draft
the8472 May 14, 2017
9049d0c
from review: simplify hex output
the8472 May 15, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 119 additions & 51 deletions beps/bep_0003.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
:BEP: 3
:BEP: XX
:Title: The BitTorrent Protocol Specification
:Version: $Revision$
:Last-Modified: $Date$
Expand All @@ -19,6 +19,7 @@ file happen concurrently, the downloaders upload to each other, making
it possible for the file source to support very large numbers of
downloaders with only a modest increase in its load.

----------------------------------------------------------
A BitTorrent file distribution consists of these entities:
----------------------------------------------------------

Expand All @@ -31,6 +32,7 @@ A BitTorrent file distribution consists of these entities:

There are ideally many end users for a single file.

----------------------------------------------------------
To start serving, a host goes through the following steps:
----------------------------------------------------------

Expand All @@ -42,6 +44,7 @@ To start serving, a host goes through the following steps:
#. Link to the metainfo (.torrent) file from some other web page.
#. Start a downloader which already has the complete file (the 'origin').

------------------------------------------------
To start downloading, a user does the following:
------------------------------------------------

Expand All @@ -52,6 +55,7 @@ To start downloading, a user does the following:
#. Wait for download to complete.
#. Tell downloader to exit (it keeps uploading until this happens).

---------
bencoding
---------

Expand All @@ -76,6 +80,7 @@ bencoding
(sorted as raw strings, not alphanumerics).


--------------
metainfo files
--------------

Expand All @@ -87,70 +92,115 @@ announce

info
This maps to a dictionary, with keys described below.

``piece layer``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interpret this to be a list where each entry represents a file, and the content of each layer is the full tree (except truncated at the piece size). I would expect the key to be called something like "merkle trees" or at least plural "piece layers"

Copy link
Contributor Author

@the8472 the8472 Feb 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, each list entry represents one file.

No, it does not represent full trees. I'll have to improve that part. It is meant to be only one level of the tree where the data size covered by the hash is equal to the piece length. so if piece length = 16KiB you get leaf hashes, if it's 32KiB then you get 1 level up, if it's 64KiB you get two levels up etc.

I would prefer always having the 16KiB leaves in there but some users today scale their piece length to keep the .torrent file very small, so I assume they would dislike the potentially massive increase in torrent sizes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(except truncated at the piece size).

Oh yeah, right. Exactly. I can rename it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify once more: It's just one level of the tree for each file. Not the levels above - they can be derived if needed - or the levels below - they would take up too much space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, right, of course

A list of strings. Each string consists of concatenated hashes
of an intermediate merkle tree layer for each file. The layer is chosen so that
one hash represents one piece. For example if a piece size of 128KiB is used
then 3rd layer up from the leaf hashes is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this paragraph implies that the leaf size is 16 kiB, perhaps that should be stated explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is stated further down in the pieces root definition

Files smaller or equal to the piece size are represented by an empty string.


All strings in a .torrent file that contains text must be UTF-8
encoded.

info dictionary
...............
===============

The ``name`` key maps to a UTF-8 encoded string which is the
suggested name to save the file (or directory) as. It is purely advisory.
``name``
a UTF-8 encoded string which is the suggested name to save the file (or directory) as.
It is purely advisory.

``piece length`` maps to the number of bytes in each piece
the file is split into. For the purposes of transfer, files are
split into fixed-size pieces which are all the same length except for
possibly the last one which may be truncated. ``piece
length`` is almost always a power of two, most commonly 2 18 =
256 K (BitTorrent prior to version 3.2 uses 2 20 = 1 M as
default).
``piece length``
the number of bytes that each logical piece in the peer protocol refers to.
I.e. it sets the granularity of ``piece``, ``request``, ``bitfield`` and ``have``
messages. It must be a power of two and at least 16KiB.

Files are mapped into this piece address space so that each non-empty file starts
at a piece boundary and occur in the same order as in the file tree.
The last piece of each file may be shorter than the specified piece length.

``pieces`` maps to a string whose length is a multiple of
20. It is to be subdivided into strings of length 20, each of which is
the SHA1 hash of the piece at the corresponding index.
``digest func``
the digest used for the calculation of merkle trees and the infohash.
Currently valid values are ``sha3-256`` and ``blake2s``.
Implementations must reject torrents if they encounter an unknown value.
Future revisions may allow additional algorithms if new vulnerabilities are discovered.

There is also a key ``length`` or a key ``files``,
but not both or neither. If ``length`` is present then the
download represents a single file, otherwise it represents a set of
files which go in a directory structure.
The remaining fields differ depending on whether the torrent represents
one or more files.

In the single file case, ``length`` maps to the length of
the file in bytes.
single-file
-----------

For the purposes of the other keys, the multi-file case is treated as
only having a single file by concatenating the files in the order they
appear in the files list. The files list is the value
``files`` maps to, and is a list of dictionaries containing
the following keys:
``length``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the distinction between multi-file and single-file torrents is also a frequent source of issues (as the typical client behavior is not necessarily always obvious). But it may not be worth trying to unify these in this proposal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unification would be possible with a few awkward compromises wrt. legacy compatibility. single-file torrents would have to be represented as multifile ones with a single entry and we'd have to make some "how to interpret the directory layout" rules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. on the other hand, one could argue that those rules of how to interpret the directory structure for multi-file torrents are already there (just not documented and kind of de-facto). Sometimes people create single-file torrents using the multi-file structure, and have empty torrent "name" fields. To handle that, I put the hex encoded info-hash as the name, and then users get confused.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate 'digest func' being required. There should be an assumed default value with the key being a marker so clients can cleanly report that they don't support a new algorithm. I don't want to get into a discussion of what that default should be in this thread, but it should be only one thing.

Length of the file in bytes.

``pieces root``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the word "pieces" may be a bit misleading here, since the tree doesn't (necessarily) terminate at the piece level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not married to any of the names. "block tree root", "block root", "tree root" all work. I would avoid "root hash" since it's already used in BEP 30.

The root hash of a merkle tree with a branching factor of 2,
constructed from 16KiB blocks of the file.
The last block of the file may be smaller than 16KiB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth being explicit here. The end piece of a file, should it be (1) padded with zeroes for the purposes of calculating the hash, or (2) should the hash be calculated on the truncated byte range?
I imagine the benefit of (1) is that it creates more coherent pieces whereas (2) may be slightly more efficient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion in either direction. As written it currently is intended to be (2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding easing upgrades, zero-padding pieces at the end of a file may provide some benefits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that (1) would be simpler to implement, as it would preserve the "dense", fixed size pieces for purposes of hashing. It would mean a backwards compatible torrent would need to pad the last file as well. Without any implementation experience, I can't back this up other than with a gut feeling

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Padding with zeros avoids a weird layer violation. It isn't particularly expensive because the hash value of 16KiB of zeros can be cached. The extra cost is that file sizes are rounded up to a multiple of 16KiB. I could go either way on this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The padding already is in the current revision. GH should be showing this conversatio as outdated for that reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's only padded to the nearest multiple of 16KiB.

It isn't particularly expensive because the hash value of 16KiB of zeros can be cached.

You mean that the remaining leaves of the merkle tree to be derived from 16KiB of zeroes too instead of just initializing the leaf hashes to zero?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I misunderstood thinking that padding was to the end of the piece. There's some benefit to end-of-piece padding, in that if multiple peers are all downloading just one file they won't have to download extra stuff to complete pieces to send those to each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, files are piece-aligned now. But the holes created by that alignment can be larger than 16KiB if the piece size is larger than 16KiB. The hashing only rounds up to the nearest 16KiB boundary so implementations can use fixed-sized buffers when hashing.

In other words piece-padding != hash-padding

The remaining leaf hashes beyond the end of the file required
to construct upper layers of the merkle tree are set to zero.

multi-file
----------

``files``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

back in the day when talking about how torrents could be made more deterministic (i.e. more likely to generate the same info-hash when the same files are turned into torrents independently) included a few things that may be worth considering with a compatibility break:

  • requiring files to be listed in a deterministic order
  • for the purposes of the info-hash, only refer to files by their hash, and not their name (in this case it could be the root hash)

I suppose perhaps the approach people would prefer is to identify duplicate files across torrents by their hash separately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-file-tree with fixed block sizes should forever solve the dedup problem.
Putting the file information outside the info dict calculation would be possible, but that would make bep9 more complex since magnets also have to convey file naming information, not just piece payload.

is a list of dictionaries which represent files or directories
containing additional files or directories.

Each dictionary contains

``path``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit worried about how flexible this structure is. I take it the file a/b/c/foo.txt could be encoded in several different ways:

{ "path": ["a", "b", "c", "foo.txt"], "files": {"length": ..., "pieces root": "..." } }

as well as:

{ "path": ["a"], "files": { "path": ["b"], "files": { "path": ["c"], "files": {"path": ["foo.txt"], "length": ..., "pieces root": "..." } } } }

I appreciate the value of being able to represent paths in more compact ways than to require a separate dictionary at each level though. A strawman alternative could be:

{ "paths": [["a"], ["b"], ["c"], ...], "files": [ { "name": "foo.txt", path: 0, ...}] }

i.e. store the paths in a separate list and have files specify the index to which directory they are in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first case is illegal since path is mandatory, including in leaves.

The second case is would be the norm of those entries had siblings, otherwise it should be collapsed to path: ["a", "b", "c", "foo.txt"]

The problem with the last case is that it could not be made into a hybrid torrent that is backwards-compatible with BEP3. And it would be less efficient since it would still not encode share prefixes efficiently like trie does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It was not obvious to me that this field was mandatory, and specifically must be present for each leaf (which I think should be clarified). My concern remains though, of having many different ways of encoding the same thing (I consider it to be in the same class of issues as overlong utf-8 encodings).

Your point about this structure supporting coexisting with the current "files" structure is a good point, and I think it should be documented as well (perhaps I just missed it).

As for reusing path prefixes, yes. However, there's a fair amount of overhead in the dictionary keys in the directory tree in this proposal too, you would need a fair amount of path string reuse to "break even". I wonder if there's a more compact (and perhaps simpler) way to represent directory trees.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a more compact (and perhaps simpler) way to represent directory trees.

Yes, using dictionary keys as path elements. It would also have the neat property of enforcing uniqueness.

We can do that, but then we'll have to encode it twice for backwards-compatible torrents. I'm not sure what is better, compromising in the new format or compromising how to achieve backwards compatibility.

(perhaps I just missed it).

No, it's part of the TODO

Copy link
Contributor

@ssiloti ssiloti Feb 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's much gain from compromising the new format to make it look more like the old format because it's breaking backwards compatibility anyways. We might as well define two different formats: A backwards compatible format which is the same as BEP3 except with the pieces root key added, and a new trie based format. Only one of them would be present in a torrent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be a bad idea. Since then an implementation that only intends to handle the new format would still have to be able to parse the old one because there could be old-format-but-with-root-hash torrents.
I.e. such a hybrid format would not actually be forward-compatible. We would be creating yet another legacy form.

So I only see two possible approaches for each change

a) have any new field mimic the old form in some way that at least a subset of the allowed values can be used in a hybrid format. this is what I've done with the files, length, piece length and name field
b) let the new format use a different key and encode the information twice. this is what I would do with pieces vs. pieces root + pieces layers which essentially is redundant information which can exist side by side.

So if we want a completely new files format and hybrid torrents will also have to encode it twice.
Now that I think about it it's probably not so bad since for many - even if not all - torrents the pieces make up the bulk of the size anyway, so we might as well duplicate that too.

A hybrid format would then look like files, pieces existing alongside with pieces layer, pieces root and files tree or something like that.

A list of UTF-8 encoded strings corresponding to subdirectory names.
If this dictionary represents a file then the last of entry is the actual file name.
A zero length list is an error case.

``files``
A list of directory entries nested within this directory.
Mutually exclusive with ``length`` and ``pieces root``.

``length``
The length of the file, in bytes.
Presence indicates that this is a file, not a directory.
Mutually exclusive with ``files``.

``pieces root``
The merkle tree for this file if the file has a non-zero length.
Its construction is identical to the single-file case.
Mutually exclusive with ``files``.


A file's full path consists of the torrent's ``name``, the ``path``
elements of the directory tree and file's own ``path`` elements.


--------
infohash
--------

``length`` - The length of the file, in bytes.
The infohash is calculated by applying ``digest func`` to the bencoded form of the info dictionary,
which is a substring of the metainfo file.

``path`` - A list of UTF-8 encoded strings corresponding to subdirectory
names, the last of which is the actual file name (a zero length list
is an error case).
The info-hash must be the hash of the encoded form as found
in the .torrent file, which is identical to bdecoding the metainfo file,
extracting the info dictionary and encoding it *if and only if* the
bdecoder fully validated the input (e.g. key ordering, absence of leading zeros).
Conversely that means implementations must either reject invalid metainfo files
or extract the substring directly.
They must not perform a decode-encode roundtrip on invalid data.

In the single file case, the name key is the name of a file, in the
muliple file case, it's the name of a directory.
For some uses as torrent identifier it is truncated to 20 bytes.

--------
trackers
--------

Tracker GET requests have the following keys:

info_hash
The 20 byte sha1 hash of the bencoded form of the info value from the
metainfo file. This value will almost certainly have to be escaped.

Note that this is a substring of the metainfo file.
The info-hash must be the hash of the encoded form as found
in the .torrent file, which is identical to bdecoding the metainfo file,
extracting the info dictionary and encoding it *if and only if* the
bdecoder fully validated the input (e.g. key ordering, absence of leading zeros).
Conversely that means clients must either reject invalid metainfo files
or extract the substring directly.
They must not perform a decode-encode roundtrip on invalid data.


The 20byte truncated infohash as described above.
This value will almost certainly have to be escaped.

peer_id
A string of length 20 which this downloader uses as its id. Each
Expand Down Expand Up @@ -217,6 +267,7 @@ It is common to announce over a `UDP tracker protocol`_ as well.

.. _`UDP tracker protocol`: bep_0015.html

-------------
peer protocol
-------------

Expand Down Expand Up @@ -256,7 +307,7 @@ they can all be thrown out when a choke happens.

The peer wire protocol consists of a handshake followed by a
never-ending stream of length-prefixed messages. The handshake starts
with character ninteen (decimal) followed by the string 'BitTorrent
with character nineteen (decimal) followed by the string 'BitTorrent
protocol'. The leading character is a length prefix, put there in the
hope that other new protocols may do the same and thus be trivially
distinguishable from each other.
Expand All @@ -269,11 +320,8 @@ zero in all current implementations. If you wish to extend the
protocol using these bytes, please coordinate with Bram Cohen to make
sure all extensions are done compatibly.

Next comes the 20 byte sha1 hash of the bencoded form of the info
value from the metainfo file. (This is the same value which is
announced as ``info_hash`` to the tracker, only here it's raw
instead of quoted here). If both sides don't send the same value, they
sever the connection. The one possible exception is if a downloader
Next comes the 20 byte truncated infohash. If both sides don't send the same value,
they sever the connection. The one possible exception is if a downloader
wants to do multiple downloads over a single port, they may wait for
incoming connections to give a download hash first, and respond with
the same one if it's in their list.
Expand All @@ -289,6 +337,7 @@ and ignored. Keepalives are generally sent once every two minutes, but
note that timeouts can be done much more quickly when data is
expected.

-------------
peer messages
-------------

Expand Down Expand Up @@ -320,7 +369,7 @@ that downloader just completed and checked the hash of.

'request' messages contain an index, begin, and length. The last
two are byte offsets. Length is generally a power of two unless it
gets truncated by the end of the file. All current implementations use
gets truncated by the end of a file. All current implementations use
2^14 (16 kiB), and close connections which request an amount greater than
that.

Expand Down Expand Up @@ -380,6 +429,24 @@ decent chance of getting a complete piece to upload, new connections
are three times as likely to start as the current optimistic unchoke
as anywhere else in the rotation.


------------
Upgrade Path
------------

## TODO ##

* restrict file layout. no nested directories for hybrid torrents
* padding or different piece-space layout?
* pieces field
* double announce behavior
* safe hashing. avoid downgrade attacks
* changes to BEP 9. magnets. send merkle layers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. I imagine the main value from merkle trees is to extend the PIECE message to include uncle-hashes (like the tribler protocol does). But I suppose extending BEP 9 may provide a simpler upgrade path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I said in the initial comment, this is intentional to decouple hash transfer from pieces transfer. The tribler approach is great for one thing, terrible for everything else.

* incorporate BEP 6 state machine (reject messages)?



---------
Resources
---------

Expand All @@ -393,6 +460,7 @@ Resources
existing ones.

__ https://wiki.wireshark.org/BitTorrent




Expand Down