Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support file list in POST /dataset #40

Open
sbliven opened this issue Nov 29, 2024 · 3 comments
Open

Support file list in POST /dataset #40

sbliven opened this issue Nov 29, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@sbliven
Copy link
Member

sbliven commented Nov 29, 2024

Support passing a list of files for ingestion. Files should be relative to sourceFolder.

This is required for the SwissFel usecase. They would like to replace calling datasetIngestor metadata.json fileListing.txt with a REST call to the PSI Ingestor service.

POST /dataset currently takes body {metadata: string}.

I suggest this change to something like

{
  properties: {
    metadata: {$ref: 'https://scicat.development.psi.ch/explorer-json#/components/schemas/CreateRawDatasetObsoleteDto'},
    fileListing: {
      type: "array",
      items: {type:"string"}
    }
  },
  required: ["metadata"]
}

(or whatever this path stabilizes to)

@consolethinks consolethinks self-assigned this Feb 18, 2025
@consolethinks
Copy link
Collaborator

I have a question about this: the sourceFolder in the metadata is an obligatory field as far as I know, and it normally marks the source folder of the dataset that should contain all files. Should I verify that all of the files in the given file list is within that folder (normal or symlinked path contains sourceFolder basically) or should I just ignore it? @sbliven

@consolethinks consolethinks added enhancement New feature or request question Further information is requested labels Feb 18, 2025
@consolethinks
Copy link
Collaborator

Also, it's not clear for me if you want this feature to be presented in the front-end. If that's the case, then a new file picker has to be implemented in the front-end, and a way for the back-end to let the front-end explore the filesystem (additional endpoint[s]).

@sbliven
Copy link
Member Author

sbliven commented Feb 24, 2025

How do facilities locate datasets? Some ideas:

  • Can't assume all datasets are in a single directory (eg PSI uses /emf/instrument/camera/group/project/dataset)
  • Don't want to show all files necessarily
  • Maybe facilities could configure some validation? Eg a regex that is matched against directory paths
  • Could also do some auto-detection of dataset folders for some software eg EPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants