Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SciCat retrieval job for selected files #4

Open
4 tasks
sofyalaski opened this issue Oct 7, 2024 · 3 comments
Open
4 tasks

SciCat retrieval job for selected files #4

sofyalaski opened this issue Oct 7, 2024 · 3 comments
Assignees

Comments

@sofyalaski
Copy link
Collaborator

To be able to add add browsed files in OneDep they need to be available locally. Think about retrieval strategy.

ScaCat jobs backend should in theory be able to handle retrieving only certain files.
These points need to be clarified:

  • location of the specific files is at /api/v3/datasets/pid/datablocks (?)
  • is it really Datablocks or OrigDatablocks (probably Datablocks)
  • implementation to do jobs only for certain files
  • frontend to choose only specific files of the list
@sofyalaski sofyalaski self-assigned this Oct 7, 2024
@sbliven
Copy link
Member

sbliven commented Oct 21, 2024

I would see the following workflow for the case where maps are not deposited in SciCat:

First, EMDB:

  1. User selects the raw dataset and clicks the new retrieve option 'Retrieve as EMDB deposition'.
    • No raw data needs to be uploaded, so this doesn't necessarily require a job. Maybe a retrieve_emdb job gets created just for status tracking.
  2. The user gets redirected to the uploader
  3. User uploads "maps" (meaning all derived files), adds required metadata (eg authentication info depending on solution to Authorization in One Dep #5), and submits.
  4. The uploader sends everything to OneDep via the API
  5. User gets redirected to OneDep

Second, EMPIAR:

  1. User selects the raw dataset and clicks the new retrieve option 'Retrieve as EMPIAR deposition'.
    • Maybe this could also be triggered from the retrieve_emdb job (eg through a link in the email), in which case we could associate the EMDB identifier from before.
  2. This creates a new job of type retrieve_empiar
    • jobParams could contain the EMDB identifier, or this could be added manually
  3. The archive system retrieves the data to intermediate storage and updates the job status
  4. The user gets an email with a link to the uploader website
  5. User enters the EMDB identifier if needed, selects which images to include, and submits
    • file selection can now be based on the real files, since they are available in intermediate storage on the server
  6. Server uploads data to EMPIAR via globus and metadata via the deposition tool/API
  7. Job gets marked as successful

If we want to support getting maps from a derived dataset, it would work similarly to the EMPIAR case where the derived dataset is retrieved from storage and then the user gets an email to continue the EMDB process.

@sofyalaski
Copy link
Collaborator Author

@sbliven,
I thought of a similar workflow.
My Initial step was not putting it in a cart in the first place, but rather for datasets, that have an OpenEM keyword (this one will be added by Ingestor as I remember), introduce a button "OneDep" that would redirect to the uploader page. We can still create a job again to track the status.
I just thought of the cart as a place where you put multiple datasets and can create a task on many datsets simultaneously. Since we are working on one dataset only I thought it would be more intuitive to go to the dataset first.

@sbliven
Copy link
Member

sbliven commented Oct 21, 2024

If you first filter for 'Retrievable' jobs you can directly retrieve a dataset without putting it in the cart. This "feature" of the frontend is terrible and should be completely redesigned IMO, but that's what SciCat provides currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants