Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vikings.pdf: Cross-reference stream has invalid W key. #46

Closed
kleuter opened this issue Oct 6, 2023 · 3 comments
Closed

vikings.pdf: Cross-reference stream has invalid W key. #46

kleuter opened this issue Oct 6, 2023 · 3 comments
Assignees
Labels
bug Something isn't working priority-low
Milestone

Comments

@kleuter
Copy link
Contributor

kleuter commented Oct 6, 2023

The pdfiototext tool fails to parse the file:
https://www.dropbox.com/scl/fi/xpktv4rrozm22sxjuyarp/vikings.pdf?rlkey=e6rdkbbgjg45d2mfypdrjwd8a&dl=0

System Information:

  • OS: Windows 10, Visual Studio 2019
@michaelrsweet
Copy link
Owner

OK, so this file, generated by some sort of Microsoft project, contains a cross reference stream with a W value of [1 4 3]. The third value specifies the size of the "generation number" field in bytes, where the generation number is an integer between 0 and 65535. Normally this field should be 1 or 2 bytes in length (1 allows generation numbers from 0 to 255, 2 allows the full range of 0 to 65535). Having anything larger is puzzling, but since other PDF consuming applications seem to deal with this OK I'll just allow it and peg values > 65535 as required by the PDF standards.

@michaelrsweet michaelrsweet self-assigned this Oct 6, 2023
@michaelrsweet michaelrsweet added bug Something isn't working priority-low labels Oct 6, 2023
@michaelrsweet michaelrsweet added this to the Stable milestone Oct 6, 2023
michaelrsweet added a commit that referenced this issue Oct 6, 2023
…rting

Services (Issue #46)

- Odd cross-reference stream containing 3-byte generation number field for this
  16-bit value
- Odd empty hex strings
@michaelrsweet
Copy link
Owner

This file also has empty binary (hex) strings...

[master 7f6ffcd] Fix a couple issues with parsing PDF files produced by Microsoft Reporting Services (Issue #46)

@kleuter
Copy link
Contributor Author

kleuter commented Oct 6, 2023

thanks, Michael!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority-low
Projects
None yet
Development

No branches or pull requests

2 participants