Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return MD5 hash for form attachments #758

Closed
matthew-white opened this issue Oct 25, 2024 · 7 comments · Fixed by getodk/central-backend#1292
Closed

Return MD5 hash for form attachments #758

matthew-white opened this issue Oct 25, 2024 · 7 comments · Fixed by getodk/central-backend#1292
Assignees
Labels
backend Requires a change to the API server enhancement New feature or behavior

Comments

@matthew-white
Copy link
Member

For #728, Frontend needs to be able to compare the form attachments of the published version of the form with the form attachments of the draft version. We've discussed that Frontend could do so by comparing the MD5 hashes of the attachments. Collect similarly uses the MD5 hash returned via OpenRosa to determine whether a form attachment has changed.

We store the MD5 hash for each blob, but we don't return it over the API. This issue is to return the MD5 hash for form attachments. Maybe it'd be nice for symmetry to also return it for submission attachments, but that's not necessary for this issue. Frontend will just be comparing form attachments, not submission attachments.

We already return the MD5 hash for forms and form versions over the API. For #728, Frontend will be using that hash to determine whether the XForm itself has changed. The name of the MD5 hash in the form response is hash. We also return sha and sha256 in that response. In the database, the column name for the MD5 hash is hash in the form_defs table and md5 in the blobs table. In terms of what to return over the API for this issue, I kind of like md5: it feels a little more descriptive. Then again, maybe hash would be better for consistency. Either one would work well from my perspective.

@matthew-white matthew-white added enhancement New feature or behavior backend Requires a change to the API server labels Oct 25, 2024
@github-project-automation github-project-automation bot moved this to 🕒 backlog in ODK Central Oct 25, 2024
@karntrehan
Copy link

Hie @matthew-white is this a good first issue to be picked up?
cc @lognaturel

@ktuite ktuite assigned ktuite and unassigned ktuite Oct 29, 2024
@ktuite
Copy link
Member

ktuite commented Oct 29, 2024

Hi @karntrehan,

If this is something you are interested in and able to work on soon, that would be great. @matthew-white is working in central-frontend on redesigning the form draft experience, and this is related to that change. He is not blocked on this backend change at the moment, but was hoping it could be ready in the next week or so (by ~Nov 6).

I was planning to work on it, but I have many other things to do, so you are welcome to try it! If so, I will be the one reviewing. If you don't get to it, no worries, I can take it back. I will leave it for you until Nov 6 or until I hear from you.

See current form attachment API docs here.

We chatted today and some decisions are:

  • The property should be called hash in the form attachment API
  • If the attachment is a blob, it should return the md5 hash of that blob data (pulled from a column in the database)
  • If the attachment is an entity list (called "dataset" in the code), I'm not sure what it should return. For OpenRosa, it uses the hash of the last-updated-at timestamp on the entity list. But in the form draft context, I don't think returning a new hash when an entity is added/changed is the intention here. It could return null for the hash until we decide otherwise.

I would look at this code for fetching the md5 from the blobs table for OpenRosa form attachments:
https://github.com/getodk/central-backend/blob/master/lib/model/query/form-attachments.js#L141-L164

And this Form.Attachment.forApi() function:
https://github.com/getodk/central-backend/blob/master/lib/model/frames.js#L115-L119

@brontolosone
Copy link
Contributor

just some considerations:

  • md5 is not supported in the browser Crypto API — picking one of its available digest methods may be a good idea.

  • OTOH Postgres has no out-of-the-box sha256() function like it has an md5() function. But it's easily available via the builtin pgcrypto extension.

@matthew-white
Copy link
Member Author

md5 is not supported in the browser Crypto API

For #728, Frontend won't need to use the browser Crypto API. Frontend won't be downloading the content of form attachments, but we want to make it easy for Frontend to indicate to the user when a form attachment has changed. Backend calculates and stores md5 for form attachments, and the goal of this issue is to surface that value over the API. md5 is the only hash that Backend calculates for form attachments, since it's the one that OpenRosa requires.

OTOH Postgres has no out-of-the-box sha256() function like it has an md5() function. But it's easily available via the builtin pgcrypto extension.

We actually don't use Postgres at all for this, even for md5, though that's an interesting idea.

@karntrehan
Copy link

Hie @matthew-white, thank you for the details.

I will take some time to setup central locally and hence would miss the 6th (today) deadline. I would still go ahead with the installation and start looking into the issue to understand more about the working of central.

Thanks.

@matthew-white
Copy link
Member Author

Sounds good. Note that many issues only require setting up the central-backend repository. Others require central-backend and central-frontend, but not the rest of the stack that's in this central repo. I think @ktuite may assign herself back to this issue, since it's a little time-sensitive. That said, there are certainly other issues for which we would welcome a PR. If you're interested in the backend, here are two issues that you could consider: #598, #654. The first one is a bug with a solid reproduction. The second would require you to learn some about client audit logs, but ultimately the issue is about adding extra columns to an existing CSV export.

@ktuite ktuite moved this from 🕒 backlog to ✏️ in progress in ODK Central Nov 12, 2024
@github-project-automation github-project-automation bot moved this from ✏️ in progress to ✅ done in ODK Central Nov 14, 2024
@matthew-white
Copy link
Member Author

For #728, Frontend needs to be able to compare the form attachments of the published version of the form with the form attachments of the draft version.

We've decided to bump #728 to v2025.1. However, we will still ship the changes for this issue with v2024.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Requires a change to the API server enhancement New feature or behavior
Projects
Status: ✅ done
Development

Successfully merging a pull request may close this issue.

4 participants