Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Seqera containers in pipeline #36

Open
wants to merge 12 commits into
base: dev
Choose a base branch
from
Open

Use Seqera containers in pipeline #36

wants to merge 12 commits into from

Conversation

drpatelh
Copy link
Member

@drpatelh drpatelh commented May 28, 2024

  • Added a seqera_containers profile to build and push containers to community registry via Wave
  • Bumped minimum NF version required by pipeline to 24.04.0
  • Noticed an important discrepancy when using the conda declaration within a process vs a separate environment.yml (like was updated on nf-core recently for all modules e.g. fastqc). The latter impacts the name of the container being built as NF will use the name entry in the YAML as opposed to the package names in the conda declaration. That makes sense, but it will make it trickier to reuse containers with the same packages across modules. I have always preferred having an explicit conda definition in the process itself to avoid having to open numerous files to figure out software versions. Something worth looking into before rolling this out on nf-core more widely.
  • Removed --singularity_pull_docker_container for Wave to implicitly handle pulling the correct container

@drpatelh
Copy link
Member Author

drpatelh commented May 28, 2024

I ran the command below within the pipeline root to build and push the containers to the community registry:

$ nextflow inspect . -profile test,seqera_containers -concretize

{
    "processes": [
        {
            "name": "NF_AGGREGATE:SEQERA_RUNS_DUMP",
            "container": "community.wave.seqera.io/library/tower-cli:0.9.2--dc544a7e574ab73b"
        },
        {
            "name": "NF_AGGREGATE:MULTIQC",
            "container": "community.wave.seqera.io/library/multiqc:1.21--95b60bb51371925c"
        },
        {
            "name": "NF_AGGREGATE:PLOT_RUN_GANTT",
            "container": "community.wave.seqera.io/library/click_pandas_plotly_express_typing:21adb9e2d1b605a5"
        }
    ]
}

However, I can't pull all of the containers:

docker pull community.wave.seqera.io/library/multiqc:1.21--95b60bb51371925c
docker pull community.wave.seqera.io/library/tower-cli:0.9.2--dc544a7e574ab73b
docker pull community.wave.seqera.io/library/click_pandas_plotly_express_typing:21adb9e2d1b605a5

How can I check if these containers have been built or am I doing something wrong?

// Set default registry for Apptainer, Docker, Podman and Singularity independent of -profile
// Will not be used unless Apptainer / Docker / Podman / Singularity are enabled
// Set to your registry if you have a mirror of containers
apptainer.registry = 'quay.io'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could set these to community.wave.seqera.io if we want think. For the same reasons we added quay.io here in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

community.wave.seqera.io/library/tower-cli:0.9.2--dc544a7e574ab73b

What does the library mean in the URI? Will this always be the same for different registries? So do we need:

apptainer.registry   = 'community.wave.seqera.io'
docker.registry      = 'community.wave.seqera.io'
podman.registry      = 'community.wave.seqera.io'
singularity.registry = 'community.wave.seqera.io'

or this:

apptainer.registry   = 'community.wave.seqera.io/library'
docker.registry      = 'community.wave.seqera.io/library'
podman.registry      = 'community.wave.seqera.io/library'
singularity.registry = 'community.wave.seqera.io/library'

or none of these based on @pditommaso comment?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to use container with Community registry, hence you don't need registry either

Copy link
Contributor

@adamrtalbot adamrtalbot May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for including this is so people can quickly repoint to their own registry if they are using freeze, i.e. set all of these to myprivateecr.registry.io. On the other hand, they can just do this themselves and disregard all the community containers 🤔 . I don't think we ever worked out the developer workflow here.

@drpatelh you should do either 1 of two things here:

  1. docker.registry = 'community.wave.seqera.io/library' and the container directive is container 'click_pandas_plotly_express_typing:21adb9e2d1b605a5'
  2. Do not set docker.registry and use the directivecontainer 'community.wave.seqera.io/library/click_pandas_plotly_express_typing:21adb9e2d1b605a5'

The docker.registry is added as a prefix if it does not exist in the URI of the container already (i.e. it's idempotent). What you have done here is fine but redundant and the registry config option will be ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. not certain about library, this depends on Wave + ECR behaviour.

@@ -2,7 +2,7 @@ process PLOT_RUN_GANTT {
tag "$meta.id"

conda 'click=8.0.1 pandas=1.1.5 plotly_express=0.4.1 typing=3.10.0.0'
container 'seqeralabs/nf-aggregate:click-8.0.1_pandas-1.1.5_plotly_express-0.4.1_typing-3.10.0.0--ccea219dc6c3d6a1'
container 'community.wave.seqera.io/library/click_pandas_plotly_express_typing:21adb9e2d1b605a5'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOOOOOOOOOO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made this to have containers made via the Conda packages declaration. Why bumping the community container image?

Copy link
Member Author

@drpatelh drpatelh May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we figured out how to run these pipelines in offline environments? Is this going to be inherently taken care of by nf-core download and not handled by NF/Wave anymore @ewels ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - if we want to run offline then we cannot use Wave.

The only way that I can see around this is to have the Nextflow Wave plugin create the conda + dockerfile locally and from that figure out what the image name should be. Then this could be checked against the local docker / singularity cache without doing any calls to the internet. But I've spoken to @pditommaso about this already and I think that it was difficult / impossible.

So until we figure that out, I can't see any way to use Wave directly in nf-core pipelines, as it will break all offline usage. So the hardcoded docker + singularity URIs have to stay.

We can of course use Wave to build those images and so my hope is that the developer can specify just the conda requirements and then the automation will pin the image names into a config without human intervention.

If anyone has any better ideas, shout!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are near to have nextflow inspect to report the HTTP uri of the Singularity image files. Would not that allow handling them in the same manner as it's done for Biocontainers right now?

Copy link
Contributor

@adamrtalbot adamrtalbot May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way that I can see around this is to have the Nextflow Wave plugin create the conda + dockerfile locally and from that figure out what the image name should be.

Is it possible to use the hash and check for existing containers before hitting Wave? This doesn't help with offline usage, but does bypass Wave when it isn't needed. Furthermore, it could check for existing Singularity containers for some offline use right now. I'm assuming the hash in the filename is complete enough to do this check which seems like quite a large assumption.

We can of course use Wave to build those images and so my hope is that the developer can specify just the conda requirements and then the automation will pin the image names into a config without human intervention.

I assume this is the end goal but don't see how we'd get there and still include offline support?

Currently the workflow for nf-core would be essentially:

  • play with 'interactive' Wave + Conda for development
  • use https://seqera.io/containers/ to build the 'static' container and grab the URI
  • add this to process container directive(s)
  • open PR to nf-core/modules

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use the hash and check for existing containers before hitting Wave?

Wave generates the hash, so no

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we'll have to stick with process.container for now.

Comment on lines +71 to +75
apptainer.ociAutoPull = true
singularity.ociAutoPull = true
wave.enabled = true
wave.freeze = true
wave.strategy = 'conda'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are using Wave behind the scenes to get the container URI, do I need to move these out of a profile and into the top-level config so they are always on?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's keep into a separate profile

@drpatelh
Copy link
Member Author

docker pull community.wave.seqera.io/library/tower-cli:0.9.2--dc544a7e574ab73b
docker pull community.wave.seqera.io/library/click_pandas_plotly_express_typing:21adb9e2d1b605a5

I also still don't know why these containers weren't created 😏

Comment on lines +71 to +72
apptainer.ociAutoPull = true
singularity.ociAutoPull = true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
apptainer.ociAutoPull = true
singularity.ociAutoPull = true

Not needed. Those forces the use of Docker image what's converted by Singularity

@ewels
Copy link
Member

ewels commented May 29, 2024

docker pull community.wave.seqera.io/library/tower-cli:0.9.2--dc544a7e574ab73b
docker pull community.wave.seqera.io/library/click_pandas_plotly_express_typing:21adb9e2d1b605a5

I also still don't know why these containers weren't created 😏

If you have your tower auth token set locally then you should get emails when you run nextflow inspect, potentially with error messages for these two.

For now that's the only method I know of to be able to figure out the build status.

@adamrtalbot
Copy link
Contributor

docker pull community.wave.seqera.io/library/tower-cli:0.9.2--dc544a7e574ab73b
docker pull community.wave.seqera.io/library/click_pandas_plotly_express_typing:21adb9e2d1b605a5

I also still don't know why these containers weren't created 😏

I was able to generate the tower-cli one but I get a different hash: community.wave.seqera.io/library/tower-cli:0.9.2--28258d337ec30808

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants