-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset - Too Many "Unknown" Files, Friendly File MIME Type Display Names #2202
Comments
Also, I believe we should extend this "friendly name" functionality, to support wild cards. image/jpeg=JPEG Image i.e., we provide friendly names for the types we know about; and a generic name for an image of type image/blah-blah that's not specifically listed. |
@scolapasta I'm passing this to you for a decision of what to do for 4.2. |
The thing I mentioned during standup: yes, jhove's xml module hangs for an hour+ if it can't download the schema as listed in the header. For example, we apparently have a whole bunch of previously unidentified Gephi files, and they all have headers like
and connections to www.getxf.net just time out. |
The final word on the new version of Jhove - it works; (aside from the new xml plugin, that has the problem above - which does not seem acceptable, so it's going to be excluded from the configuration). It gives some modest gains in detecting the types of some previously unidentified files (mostly png images, text files, including the specific encoding used, gzip and web archive files; I can post the exact percentages in relation to the number of prod. files currently listed as unknown). |
(That is to say, I'm choosing the manageable chunks/incremental improvements approach here, just so that we can close this issue and move forward) |
… search facets and default thumbnail icons. (ref #2202)
At standup I said I wanted to to check if I had documented the new file type redetect API endpoint I added (phew, done already) and I see that @landreev just pushed a release note in ef40804 which looks good. I just moved this to QA. Also looked at the recent code-related commits that @landreev made since I last touched the branch and they all look good to me too. |
Something I should've done earlier - notes on how to test/what to look for:
|
Dataset - Too Many "Unknown" Files, Friendly File MIME Type Display Names #2202
Peaked at the icons mentioned in "6" and suggested tweaks for data and archive icons. Put my random selection of 84 unknown files into dvn-build and Data went last (2 of 84), to first (18 of 84). The unknowns were still pretty high, but hopefully we see greater gains in the full 127,109 pool of unknowns in production since all 84 of those files were unknowns there originally. |
As referenced in #2192, there are files in production that need friendly MIME Type labels.
From @landreev
We should identify as many of these as possible, and give them friendlier display names that the one that @pdurbin found.
The text was updated successfully, but these errors were encountered: