SciCat retrieval job for selected files #4

sofyalaski · 2024-10-07T12:10:25Z

To be able to add add browsed files in OneDep they need to be available locally. Think about retrieval strategy.

ScaCat jobs backend should in theory be able to handle retrieving only certain files.
These points need to be clarified:

location of the specific files is at /api/v3/datasets/pid/datablocks (?)
is it really Datablocks or OrigDatablocks (probably Datablocks)
implementation to do jobs only for certain files
frontend to choose only specific files of the list

sbliven · 2024-10-21T11:55:36Z

I would see the following workflow for the case where maps are not deposited in SciCat:

First, EMDB:

User selects the raw dataset and clicks the new retrieve option 'Retrieve as EMDB deposition'.
- No raw data needs to be uploaded, so this doesn't necessarily require a job. Maybe a retrieve_emdb job gets created just for status tracking.
The user gets redirected to the uploader
User uploads "maps" (meaning all derived files), adds required metadata (eg authentication info depending on solution to Authorization in One Dep #5), and submits.
The uploader sends everything to OneDep via the API
User gets redirected to OneDep

Second, EMPIAR:

User selects the raw dataset and clicks the new retrieve option 'Retrieve as EMPIAR deposition'.
- Maybe this could also be triggered from the retrieve_emdb job (eg through a link in the email), in which case we could associate the EMDB identifier from before.
This creates a new job of type retrieve_empiar
- jobParams could contain the EMDB identifier, or this could be added manually
The archive system retrieves the data to intermediate storage and updates the job status
The user gets an email with a link to the uploader website
User enters the EMDB identifier if needed, selects which images to include, and submits
- file selection can now be based on the real files, since they are available in intermediate storage on the server
Server uploads data to EMPIAR via globus and metadata via the deposition tool/API
Job gets marked as successful

If we want to support getting maps from a derived dataset, it would work similarly to the EMPIAR case where the derived dataset is retrieved from storage and then the user gets an email to continue the EMDB process.

sofyalaski · 2024-10-21T12:38:05Z

@sbliven,
I thought of a similar workflow.
My Initial step was not putting it in a cart in the first place, but rather for datasets, that have an OpenEM keyword (this one will be added by Ingestor as I remember), introduce a button "OneDep" that would redirect to the uploader page. We can still create a job again to track the status.
I just thought of the cart as a place where you put multiple datasets and can create a task on many datsets simultaneously. Since we are working on one dataset only I thought it would be more intuitive to go to the dataset first.

sbliven · 2024-10-21T13:12:03Z

If you first filter for 'Retrievable' jobs you can directly retrieve a dataset without putting it in the cart. This "feature" of the frontend is terrible and should be completely redesigned IMO, but that's what SciCat provides currently.

sofyalaski self-assigned this Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SciCat retrieval job for selected files #4

SciCat retrieval job for selected files #4

sofyalaski commented Oct 7, 2024

sbliven commented Oct 21, 2024

sofyalaski commented Oct 21, 2024

sbliven commented Oct 21, 2024

SciCat retrieval job for selected files #4

SciCat retrieval job for selected files #4

Comments

sofyalaski commented Oct 7, 2024

sbliven commented Oct 21, 2024

sofyalaski commented Oct 21, 2024

sbliven commented Oct 21, 2024