-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: base protocol with merkle trees and new hash algorithms #59
Changes from 1 commit
39670fb
6c7b87a
a56c940
e25fa15
8628126
57f4f26
f7ef8ee
e7d2f7c
5790658
33077e2
131b288
cd1574a
cfd7cd5
48dfdcb
b0634bc
369b2bc
6034458
940d1c6
a160be6
42501e5
5afecdc
4c3f888
4bdc9f6
6c2e56d
655e385
96de012
055446e
7baf1c9
96aa178
47ed76d
e14a7a6
6a0b817
bd77a55
9a34522
9049d0c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
:BEP: 3 | ||
:BEP: XX | ||
:Title: The BitTorrent Protocol Specification | ||
:Version: $Revision$ | ||
:Last-Modified: $Date$ | ||
|
@@ -19,6 +19,7 @@ file happen concurrently, the downloaders upload to each other, making | |
it possible for the file source to support very large numbers of | ||
downloaders with only a modest increase in its load. | ||
|
||
---------------------------------------------------------- | ||
A BitTorrent file distribution consists of these entities: | ||
---------------------------------------------------------- | ||
|
||
|
@@ -31,6 +32,7 @@ A BitTorrent file distribution consists of these entities: | |
|
||
There are ideally many end users for a single file. | ||
|
||
---------------------------------------------------------- | ||
To start serving, a host goes through the following steps: | ||
---------------------------------------------------------- | ||
|
||
|
@@ -42,6 +44,7 @@ To start serving, a host goes through the following steps: | |
#. Link to the metainfo (.torrent) file from some other web page. | ||
#. Start a downloader which already has the complete file (the 'origin'). | ||
|
||
------------------------------------------------ | ||
To start downloading, a user does the following: | ||
------------------------------------------------ | ||
|
||
|
@@ -52,6 +55,7 @@ To start downloading, a user does the following: | |
#. Wait for download to complete. | ||
#. Tell downloader to exit (it keeps uploading until this happens). | ||
|
||
--------- | ||
bencoding | ||
--------- | ||
|
||
|
@@ -76,6 +80,7 @@ bencoding | |
(sorted as raw strings, not alphanumerics). | ||
|
||
|
||
-------------- | ||
metainfo files | ||
-------------- | ||
|
||
|
@@ -87,70 +92,115 @@ announce | |
|
||
info | ||
This maps to a dictionary, with keys described below. | ||
|
||
``piece layer`` | ||
A list of strings. Each string consists of concatenated hashes | ||
of an intermediate merkle tree layer for each file. The layer is chosen so that | ||
one hash represents one piece. For example if a piece size of 128KiB is used | ||
then 3rd layer up from the leaf hashes is used. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this paragraph implies that the leaf size is 16 kiB, perhaps that should be stated explicitly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is stated further down in the |
||
Files smaller or equal to the piece size are represented by an empty string. | ||
|
||
|
||
All strings in a .torrent file that contains text must be UTF-8 | ||
encoded. | ||
|
||
info dictionary | ||
............... | ||
=============== | ||
|
||
The ``name`` key maps to a UTF-8 encoded string which is the | ||
suggested name to save the file (or directory) as. It is purely advisory. | ||
``name`` | ||
a UTF-8 encoded string which is the suggested name to save the file (or directory) as. | ||
It is purely advisory. | ||
|
||
``piece length`` maps to the number of bytes in each piece | ||
the file is split into. For the purposes of transfer, files are | ||
split into fixed-size pieces which are all the same length except for | ||
possibly the last one which may be truncated. ``piece | ||
length`` is almost always a power of two, most commonly 2 18 = | ||
256 K (BitTorrent prior to version 3.2 uses 2 20 = 1 M as | ||
default). | ||
``piece length`` | ||
the number of bytes that each logical piece in the peer protocol refers to. | ||
I.e. it sets the granularity of ``piece``, ``request``, ``bitfield`` and ``have`` | ||
messages. It must be a power of two and at least 16KiB. | ||
|
||
Files are mapped into this piece address space so that each non-empty file starts | ||
at a piece boundary and occur in the same order as in the file tree. | ||
The last piece of each file may be shorter than the specified piece length. | ||
|
||
``pieces`` maps to a string whose length is a multiple of | ||
20. It is to be subdivided into strings of length 20, each of which is | ||
the SHA1 hash of the piece at the corresponding index. | ||
``digest func`` | ||
the digest used for the calculation of merkle trees and the infohash. | ||
Currently valid values are ``sha3-256`` and ``blake2s``. | ||
Implementations must reject torrents if they encounter an unknown value. | ||
Future revisions may allow additional algorithms if new vulnerabilities are discovered. | ||
|
||
There is also a key ``length`` or a key ``files``, | ||
but not both or neither. If ``length`` is present then the | ||
download represents a single file, otherwise it represents a set of | ||
files which go in a directory structure. | ||
The remaining fields differ depending on whether the torrent represents | ||
one or more files. | ||
|
||
In the single file case, ``length`` maps to the length of | ||
the file in bytes. | ||
single-file | ||
----------- | ||
|
||
For the purposes of the other keys, the multi-file case is treated as | ||
only having a single file by concatenating the files in the order they | ||
appear in the files list. The files list is the value | ||
``files`` maps to, and is a list of dictionaries containing | ||
the following keys: | ||
``length`` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the distinction between multi-file and single-file torrents is also a frequent source of issues (as the typical client behavior is not necessarily always obvious). But it may not be worth trying to unify these in this proposal There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unification would be possible with a few awkward compromises wrt. legacy compatibility. single-file torrents would have to be represented as multifile ones with a single entry and we'd have to make some "how to interpret the directory layout" rules. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah. on the other hand, one could argue that those rules of how to interpret the directory structure for multi-file torrents are already there (just not documented and kind of de-facto). Sometimes people create single-file torrents using the multi-file structure, and have empty torrent "name" fields. To handle that, I put the hex encoded info-hash as the name, and then users get confused. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I hate 'digest func' being required. There should be an assumed default value with the key being a marker so clients can cleanly report that they don't support a new algorithm. I don't want to get into a discussion of what that default should be in this thread, but it should be only one thing. |
||
Length of the file in bytes. | ||
|
||
``pieces root`` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if the word "pieces" may be a bit misleading here, since the tree doesn't (necessarily) terminate at the piece level There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not married to any of the names. "block tree root", "block root", "tree root" all work. I would avoid "root hash" since it's already used in BEP 30. |
||
The root hash of a merkle tree with a branching factor of 2, | ||
constructed from 16KiB blocks of the file. | ||
The last block of the file may be smaller than 16KiB. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may be worth being explicit here. The end piece of a file, should it be (1) padded with zeroes for the purposes of calculating the hash, or (2) should the hash be calculated on the truncated byte range? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have no strong opinion in either direction. As written it currently is intended to be (2). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. regarding easing upgrades, zero-padding pieces at the end of a file may provide some benefits. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a feeling that (1) would be simpler to implement, as it would preserve the "dense", fixed size pieces for purposes of hashing. It would mean a backwards compatible torrent would need to pad the last file as well. Without any implementation experience, I can't back this up other than with a gut feeling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Padding with zeros avoids a weird layer violation. It isn't particularly expensive because the hash value of 16KiB of zeros can be cached. The extra cost is that file sizes are rounded up to a multiple of 16KiB. I could go either way on this one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The padding already is in the current revision. GH should be showing this conversatio as outdated for that reason. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although it's only padded to the nearest multiple of 16KiB.
You mean that the remaining leaves of the merkle tree to be derived from 16KiB of zeroes too instead of just initializing the leaf hashes to zero? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I misunderstood thinking that padding was to the end of the piece. There's some benefit to end-of-piece padding, in that if multiple peers are all downloading just one file they won't have to download extra stuff to complete pieces to send those to each other. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, files are piece-aligned now. But the holes created by that alignment can be larger than 16KiB if the piece size is larger than 16KiB. The hashing only rounds up to the nearest 16KiB boundary so implementations can use fixed-sized buffers when hashing. In other words piece-padding != hash-padding |
||
The remaining leaf hashes beyond the end of the file required | ||
to construct upper layers of the merkle tree are set to zero. | ||
|
||
multi-file | ||
---------- | ||
|
||
``files`` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. back in the day when talking about how torrents could be made more deterministic (i.e. more likely to generate the same info-hash when the same files are turned into torrents independently) included a few things that may be worth considering with a compatibility break:
I suppose perhaps the approach people would prefer is to identify duplicate files across torrents by their hash separately There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The per-file-tree with fixed block sizes should forever solve the dedup problem. |
||
is a list of dictionaries which represent files or directories | ||
containing additional files or directories. | ||
|
||
Each dictionary contains | ||
|
||
``path`` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a little bit worried about how flexible this structure is. I take it the file
as well as:
I appreciate the value of being able to represent paths in more compact ways than to require a separate dictionary at each level though. A strawman alternative could be:
i.e. store the paths in a separate list and have files specify the index to which directory they are in. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The first case is illegal since The second case is would be the norm of those entries had siblings, otherwise it should be collapsed to The problem with the last case is that it could not be made into a hybrid torrent that is backwards-compatible with BEP3. And it would be less efficient since it would still not encode share prefixes efficiently like trie does. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. It was not obvious to me that this field was mandatory, and specifically must be present for each leaf (which I think should be clarified). My concern remains though, of having many different ways of encoding the same thing (I consider it to be in the same class of issues as overlong utf-8 encodings). Your point about this structure supporting coexisting with the current "files" structure is a good point, and I think it should be documented as well (perhaps I just missed it). As for reusing path prefixes, yes. However, there's a fair amount of overhead in the dictionary keys in the directory tree in this proposal too, you would need a fair amount of path string reuse to "break even". I wonder if there's a more compact (and perhaps simpler) way to represent directory trees. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, using dictionary keys as path elements. It would also have the neat property of enforcing uniqueness. We can do that, but then we'll have to encode it twice for backwards-compatible torrents. I'm not sure what is better, compromising in the new format or compromising how to achieve backwards compatibility.
No, it's part of the TODO There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think there's much gain from compromising the new format to make it look more like the old format because it's breaking backwards compatibility anyways. We might as well define two different formats: A backwards compatible format which is the same as BEP3 except with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that would be a bad idea. Since then an implementation that only intends to handle the new format would still have to be able to parse the old one because there could be old-format-but-with-root-hash torrents. So I only see two possible approaches for each change a) have any new field mimic the old form in some way that at least a subset of the allowed values can be used in a hybrid format. this is what I've done with the So if we want a completely new files format and hybrid torrents will also have to encode it twice. A hybrid format would then look like |
||
A list of UTF-8 encoded strings corresponding to subdirectory names. | ||
If this dictionary represents a file then the last of entry is the actual file name. | ||
A zero length list is an error case. | ||
|
||
``files`` | ||
A list of directory entries nested within this directory. | ||
Mutually exclusive with ``length`` and ``pieces root``. | ||
|
||
``length`` | ||
The length of the file, in bytes. | ||
Presence indicates that this is a file, not a directory. | ||
Mutually exclusive with ``files``. | ||
|
||
``pieces root`` | ||
The merkle tree for this file if the file has a non-zero length. | ||
Its construction is identical to the single-file case. | ||
Mutually exclusive with ``files``. | ||
|
||
|
||
A file's full path consists of the torrent's ``name``, the ``path`` | ||
elements of the directory tree and file's own ``path`` elements. | ||
|
||
|
||
-------- | ||
infohash | ||
-------- | ||
|
||
``length`` - The length of the file, in bytes. | ||
The infohash is calculated by applying ``digest func`` to the bencoded form of the info dictionary, | ||
which is a substring of the metainfo file. | ||
|
||
``path`` - A list of UTF-8 encoded strings corresponding to subdirectory | ||
names, the last of which is the actual file name (a zero length list | ||
is an error case). | ||
The info-hash must be the hash of the encoded form as found | ||
in the .torrent file, which is identical to bdecoding the metainfo file, | ||
extracting the info dictionary and encoding it *if and only if* the | ||
bdecoder fully validated the input (e.g. key ordering, absence of leading zeros). | ||
Conversely that means implementations must either reject invalid metainfo files | ||
or extract the substring directly. | ||
They must not perform a decode-encode roundtrip on invalid data. | ||
|
||
In the single file case, the name key is the name of a file, in the | ||
muliple file case, it's the name of a directory. | ||
For some uses as torrent identifier it is truncated to 20 bytes. | ||
|
||
-------- | ||
trackers | ||
-------- | ||
|
||
Tracker GET requests have the following keys: | ||
|
||
info_hash | ||
The 20 byte sha1 hash of the bencoded form of the info value from the | ||
metainfo file. This value will almost certainly have to be escaped. | ||
|
||
Note that this is a substring of the metainfo file. | ||
The info-hash must be the hash of the encoded form as found | ||
in the .torrent file, which is identical to bdecoding the metainfo file, | ||
extracting the info dictionary and encoding it *if and only if* the | ||
bdecoder fully validated the input (e.g. key ordering, absence of leading zeros). | ||
Conversely that means clients must either reject invalid metainfo files | ||
or extract the substring directly. | ||
They must not perform a decode-encode roundtrip on invalid data. | ||
|
||
|
||
The 20byte truncated infohash as described above. | ||
This value will almost certainly have to be escaped. | ||
|
||
peer_id | ||
A string of length 20 which this downloader uses as its id. Each | ||
|
@@ -217,6 +267,7 @@ It is common to announce over a `UDP tracker protocol`_ as well. | |
|
||
.. _`UDP tracker protocol`: bep_0015.html | ||
|
||
------------- | ||
peer protocol | ||
------------- | ||
|
||
|
@@ -256,7 +307,7 @@ they can all be thrown out when a choke happens. | |
|
||
The peer wire protocol consists of a handshake followed by a | ||
never-ending stream of length-prefixed messages. The handshake starts | ||
with character ninteen (decimal) followed by the string 'BitTorrent | ||
with character nineteen (decimal) followed by the string 'BitTorrent | ||
protocol'. The leading character is a length prefix, put there in the | ||
hope that other new protocols may do the same and thus be trivially | ||
distinguishable from each other. | ||
|
@@ -269,11 +320,8 @@ zero in all current implementations. If you wish to extend the | |
protocol using these bytes, please coordinate with Bram Cohen to make | ||
sure all extensions are done compatibly. | ||
|
||
Next comes the 20 byte sha1 hash of the bencoded form of the info | ||
value from the metainfo file. (This is the same value which is | ||
announced as ``info_hash`` to the tracker, only here it's raw | ||
instead of quoted here). If both sides don't send the same value, they | ||
sever the connection. The one possible exception is if a downloader | ||
Next comes the 20 byte truncated infohash. If both sides don't send the same value, | ||
they sever the connection. The one possible exception is if a downloader | ||
wants to do multiple downloads over a single port, they may wait for | ||
incoming connections to give a download hash first, and respond with | ||
the same one if it's in their list. | ||
|
@@ -289,6 +337,7 @@ and ignored. Keepalives are generally sent once every two minutes, but | |
note that timeouts can be done much more quickly when data is | ||
expected. | ||
|
||
------------- | ||
peer messages | ||
------------- | ||
|
||
|
@@ -320,7 +369,7 @@ that downloader just completed and checked the hash of. | |
|
||
'request' messages contain an index, begin, and length. The last | ||
two are byte offsets. Length is generally a power of two unless it | ||
gets truncated by the end of the file. All current implementations use | ||
gets truncated by the end of a file. All current implementations use | ||
2^14 (16 kiB), and close connections which request an amount greater than | ||
that. | ||
|
||
|
@@ -380,6 +429,24 @@ decent chance of getting a complete piece to upload, new connections | |
are three times as likely to start as the current optimistic unchoke | ||
as anywhere else in the rotation. | ||
|
||
|
||
------------ | ||
Upgrade Path | ||
------------ | ||
|
||
## TODO ## | ||
|
||
* restrict file layout. no nested directories for hybrid torrents | ||
* padding or different piece-space layout? | ||
* pieces field | ||
* double announce behavior | ||
* safe hashing. avoid downgrade attacks | ||
* changes to BEP 9. magnets. send merkle layers. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. interesting. I imagine the main value from merkle trees is to extend the PIECE message to include uncle-hashes (like the tribler protocol does). But I suppose extending BEP 9 may provide a simpler upgrade path. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, as I said in the initial comment, this is intentional to decouple hash transfer from pieces transfer. The tribler approach is great for one thing, terrible for everything else. |
||
* incorporate BEP 6 state machine (reject messages)? | ||
|
||
|
||
|
||
--------- | ||
Resources | ||
--------- | ||
|
||
|
@@ -393,6 +460,7 @@ Resources | |
existing ones. | ||
|
||
__ https://wiki.wireshark.org/BitTorrent | ||
|
||
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I interpret this to be a list where each entry represents a file, and the content of each layer is the full tree (except truncated at the piece size). I would expect the key to be called something like "merkle trees" or at least plural "piece layers"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, each list entry represents one file.
No, it does not represent full trees. I'll have to improve that part. It is meant to be only one level of the tree where the data size covered by the hash is equal to the piece length. so if piece length = 16KiB you get leaf hashes, if it's 32KiB then you get 1 level up, if it's 64KiB you get two levels up etc.
I would prefer always having the 16KiB leaves in there but some users today scale their piece length to keep the .torrent file very small, so I assume they would dislike the potentially massive increase in torrent sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, right. Exactly. I can rename it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify once more: It's just one level of the tree for each file. Not the levels above - they can be derived if needed - or the levels below - they would take up too much space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, right, of course