Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial download is too big #2

Closed
icidasset opened this issue Dec 30, 2018 · 17 comments
Closed

Partial download is too big #2

icidasset opened this issue Dec 30, 2018 · 17 comments
Labels
not a bug This doesn't seem to be a bug, has different reason

Comments

@icidasset
Copy link
Contributor

Hey again 👋

I was trying out this add-on and noticed that for most of my files (not all of them), there are multiple partial downloads that are way too big (between 2.5 and 3.2MB). Is this expected?

If not, the only differences I could find between the faulty ones and the working ones is that the faulty ones have a bitrate of 320 kbps and the stereo channel mode (working ones use joint stereo). But I assume that's just a coincidence and none of these things matter? 🤔


Tag formats used in faulty files:

  • ID3v2.2 (mp3)
  • ID3v2.4 + ID3v1 (mp3)

Tag formats used in working files:

  • ID3v2.3 (mp3)
  • ID3v2.4 + ID3v1 (mp3)

Options I used:

streamingHttpTokenReaderOptions = { timeoutInSec: 300, avoidHeadRequests: false }
parserOptions = { duration: false, skipCovers: true, skipPostHeaders: true, loadParser: ... }

Example response of a faulty one:

screen shot 2018-12-30 at 22 08 52

@icidasset
Copy link
Contributor Author

@Borewit Any thoughts? 🙏 Would love to help out if this is an issue.

@Borewit
Copy link
Owner

Borewit commented Jan 14, 2019

It is not a coincidence at all. It has to do with the duration. A constant bit rate (CBR) (like CBR/320) encoded MP3, is easier to predict the duration based on the file size, because the bitrate is constant. A variable bitrate (VBR) requires a Xing header to tell the duration, if that is not present, the only option left is to parse the entire file. But that last resort, is (should) only be taken if the duration flag is set to true. So enabling the duration flag means, the duration is so important to me, if takes you parsing the entire file, then that is what it takes.

If you think there is room for improvement, your help is very welcome.

@icidasset
Copy link
Contributor Author

@Borewit Thanks for the detailed explanation! I don't need the duration though so I've set the duration flag to false (in the parseFromTokenizer options). Is that all it takes to disable it, or am I missing something?

@Borewit
Copy link
Owner

Borewit commented Jan 15, 2019

In theory, yes.

@icidasset
Copy link
Contributor Author

Doesn't seem to be a way around it, so I'm closing this.
Thanks for all the help @Borewit ! ☺️

@Borewit
Copy link
Owner

Borewit commented Jan 19, 2019

Thanks a lot @icidasset for you donation! I was already happy you gave me feedback on music-metadata-browser.
You should join the chat one day, some more elm fans out there.

The MPEG-parser responsible for handling MP3 is a very tricky piece of code, which is the logical consequence of the message specifications with all the slightly different appearances.

After misread your issue last time, it looks like you experience this issue with CBR. With CBR there should be no reason to parse the entire file, because the duration is linear with the size, if you stream, make sure you pass the file/content size to the parser (example: parseStream function)

e.g.:

const metadata = await mm.parseStream(someReadStream, 'audio/mpeg', { 
  fileSize: resp.headers['content-size'] // Pass the Content-Size from the HTTP-response to the parser 
});

Can you check if the size is passed Steven?

I noticed that some CBR encoded streams have some odd frames in the beginning causing the parser to fail to detected the CBR pattern by examine the first frame. So that could be explanation,

This utility helped me as additional reference MPEG-Audio-Frame-Header to decoded information per frame. If you have a Windows machine, you could use that, to get some more clarity.

Let's investigate a bit further before we close the issue. If there is not good justification for music-metadata to request so much data, it should be addressed. Otherwise it defeats this purpose of this great partial download client isn't?.

@Borewit Borewit reopened this Jan 19, 2019
@icidasset
Copy link
Contributor Author

Joined the chat 👍 Thanks for the invite!

I'm using StreamingHttpTokenReader + parseFromTokenizer as shown in the example from this package. Which gets the file size from the file using a HEAD request, right?

I've double checked if the file size is actually saved and is correct, which is the case.
I've confirmed this by doing reader.init().then(_ => console.log(reader.fileSize)).

Here's the timeline of the requests:

  1. ✅ HEAD request
  2. ✅ GET request for the first 1024 bytes (response = Content-Range: bytes 0-1023/12160711)
  3. 🙅‍♂️ GET request that's too big (response = Content-Range: bytes 10-3361496/12160711)
  4. ❔ Another GET request for 1024 bytes (response = Content-Range: bytes 3361497-3362520/12160711)
  5. ❔ Another GET request for 1024 bytes (response = Content-Range: bytes 3361537-3362560/12160711)
  6. ❔ Another GET request for 1024 bytes (response = Content-Range: bytes 3362542-3363565/12160711)
  7. ❔ Another GET request for 1024 bytes (response = Content-Range: bytes 3362582-3363605/12160711)
  8. ❔ Another GET request for 1024 bytes (response = Content-Range: bytes 3363587-3364610/12160711)

I'm on Mac, I'll see if I can find a similar tool to list the frames.

@Borewit Borewit transferred this issue from Borewit/tokenizer-http Jan 30, 2019
@Borewit
Copy link
Owner

Borewit commented Jan 31, 2019

Can you please provide me with a sampe Steven?

@Borewit
Copy link
Owner

Borewit commented Jan 31, 2019

Thanks, but I am getting error on those:

<?xml version="1.0" encoding="UTF-8"?>
<Error>
	<Code>AccessDenied</Code>
	<Message>Access Denied</Message>
	<RequestId>6B7FFABF8C2D5CF4</RequestId>
	<HostId>Gd6lkgIYzXhGRZJ+rboZDYlLc99qySpCA+7ltcFxr9xgFwqILjgMA8gnBo/kYxyVKR7Eul+VMQg=</HostId>
</Error>

@icidasset
Copy link
Contributor Author

icidasset commented Jan 31, 2019

🤦‍♂️ Sorry about that, I thought they were set to public, should work now.

@Borewit
Copy link
Owner

Borewit commented Feb 5, 2019

I think there are two underlying problems Steven.

  1. A bug in music-metadata causing the entire MP3 file to processed.
  2. I think content-length doesn't make to the streaming-http-token-reader, otherwise it would not have been

Although solving just one them would already solve your issue, can you help me debugging the second issue? Can you figure out why tokenizer.fileSize is not set? (https://github.com/Borewit/streaming-http-token-reader/blob/112a0ff0f10b0e85274a622858774d9dc5e4f518/src/streaming-http-token-reader.ts#L58)

It should be filled with the HTTP header Content-Length: https://github.com/Borewit/streaming-http-token-reader/blob/112a0ff0f10b0e85274a622858774d9dc5e4f518/src/streaming-http-token-reader.ts#L168

So the correct behavior should be: with the duration flag set to false (and the fixed duration flag handling), you should get a duration back.

@Borewit
Copy link
Owner

Borewit commented Feb 5, 2019

Released music-metadata-browser v0.7.1, should handle duration properly.

@Borewit Borewit added the bug Something isn't working label Feb 5, 2019
@icidasset
Copy link
Contributor Author

icidasset commented Feb 5, 2019

Thanks @Borewit! Just upgraded to 0.7.1.

In my testing, there doesn't seem to be any issues with the fileSize.
Unless, am I testing fileSize at the wrong time/place?
Here's how I tested it:

var reader = new StreamingHttpTokenReader(url, {})

reader.init().then(_ => {
  console.log(reader.fileSize)
  return musicMetadata.parseFromTokenizer(
    reader,
    reader.contentType,
    { duration: false }
  )
}).then(_ => {
  console.log(reader.fileSize)
})

Further notes:

  • 3.21MB download is still present
  • I need to check if parcel didn't cache my dependencies (in which case I'd still be using 0.7.0)

@Borewit
Copy link
Owner

Borewit commented Feb 7, 2019

This is why:
bonita
2526 x 2526 pixels, 3 MB
hidden in Bonita's id3v2.2 header.

I shrink the embedded covers to a width of 650 pixels and adjust the JPEG quality so that the size ~150 kB, reasonable quality en not much overhead. In addition to that I keep the best quality cover in folder of the release.

Case closed?

@Borewit
Copy link
Owner

Borewit commented Feb 7, 2019

Anyway, released a new version of streaming-http-token-reader, fixed an incorrect dependency on music-metadata: v0.0.3

@Borewit Borewit added not a bug This doesn't seem to be a bug, has different reason and removed bug Something isn't working labels Feb 7, 2019
@icidasset
Copy link
Contributor Author

Case closed! Thanks so much @Borewit !
I didn't realize it fetched the album art automatically as well, my bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not a bug This doesn't seem to be a bug, has different reason
Projects
None yet
Development

No branches or pull requests

2 participants