Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

entrypoint-aws-batch: Keep ../ path parts in ZIP archive members during extraction #241

Merged
merged 2 commits into from
Mar 4, 2025

Conversation

tsibley
Copy link
Member

@tsibley tsibley commented Feb 12, 2025

entrypoint-aws-batch: Keep ../ path parts in ZIP archive members during extraction

The default of stripping ../ parts in member paths is a (good!) restriction for safety and security, but such paths do not pose any (additional) risk in the context of our Nextstrain runtime containers. We're already downloading and executing arbitrary user-supplied code, so the ability to potentially overwrite system files with ZIP archive members is not any additional privilege. And it's only potential at that due to most files being owned by root in the image, not the default container user of nextstrain.

Keeping the ../ parts will allow Nextstrain CLI to construct ZIP archives for jobs which write to new sibling paths of /nextstrain/build in the container. This will be used for including pathogen workflow source separate (e.g. in /nextstrain/pathogen) from the analysis working directory (/nextstrain/build). It can also be used to support Nextstrain CLI's existing --augur, --auspice, etc. overlays on AWS Batch, though a few other changes are required for that too (coming soon).

Note that Nextstrain CLI does not permit ../ path parts when extracting from these same ZIP archives (e.g. after a job completes to download results), as that would be additional risk. Currently it strips ../ parts, like unzip's default behaviour, but that will change soon to entirely skip archive members containing ../ parts.

@tsibley tsibley mentioned this pull request Nov 7, 2024
18 tasks
@tsibley tsibley force-pushed the trs/workflows-as-programs branch 2 times, most recently from b759049 to 4630445 Compare February 13, 2025 06:13
…ng extraction

The default of stripping ../ parts in member paths is a (good!)
restriction for safety and security, but such paths do not pose any
(additional) risk in the context of our Nextstrain runtime containers.
We're already downloading and executing arbitrary user-supplied code, so
the ability to potentially overwrite system files with ZIP archive
members is not any additional privilege.  And it's only potential at
that due to most files being owned by root in the image, not the default
container user of nextstrain.

Keeping the ../ parts will allow Nextstrain CLI to construct ZIP
archives for jobs which write to new sibling paths of /nextstrain/build
in the container.  This will be used for including pathogen workflow
source separate (e.g. in /nextstrain/pathogen) from the analysis working
directory (/nextstrain/build).  It can also be used to support
Nextstrain CLI's existing --augur, --auspice, etc. overlays on AWS
Batch, though a few other changes are required for that too (coming
soon).

Note that Nextstrain CLI does *not* permit ../ path parts when
extracting from these same ZIP archives (e.g. after a job completes to
download results), as that *would* be additional risk.  Currently it
strips ../ parts, like unzip's default behaviour, but that will change
soon to entirely skip archive members containing ../ parts.
@tsibley tsibley force-pushed the trs/workflows-as-programs branch from 4630445 to e05ddfb Compare February 13, 2025 06:18
…archive extraction

This allows Nextstrain CLI's --augur, --auspice, etc. overlays to start
working with AWS Batch when previously they did not, by bundling them up
with appropriate ../ path parts into the workdir ZIP archive.

See "entrypoint-aws-batch: Keep ../ path parts in ZIP archive members
during extraction" (e05ddfb) for the rationale of why this is not
particularly unsafe.
@tsibley
Copy link
Member Author

tsibley commented Feb 13, 2025

This can (and should) be merged ahead of Nextstrain CLI's usage of ../ paths for nextstrain run and overlays (happening in nextstrain/cli#407).

Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't test, but the concept seems sensible!

@tsibley tsibley merged commit 02bfb1f into master Mar 4, 2025
60 checks passed
@tsibley tsibley deleted the trs/workflows-as-programs branch March 4, 2025 04:09
tsibley added a commit to nextstrain/cli that referenced this pull request Mar 4, 2025
The latest image, nextstrain/base:build-20250304T041009Z, provides a
mechanism in the entrypoint to support bundling of overlays in the
workdir ZIP archive.¹

Extending overlay support to AWS Batch has been very low priority and
something I thought was unlikely to ever happen.  However, in the course
of working on AWS Batch support for `nextstrain run`, it turned out to
be easiest/most straightforward/most minimal changes to bundle the
pathogen source directory with the working analysis directory in the
workdir ZIP archive, i.e. as a "pathogen" overlay.  This naturally led
to supporting overlays more generally, which I've done here.

¹ <nextstrain/docker-base#241>
tsibley added a commit to nextstrain/cli that referenced this pull request Mar 5, 2025
The latest image, nextstrain/base:build-20250304T041009Z, provides a
mechanism in the entrypoint to support bundling of overlays in the
workdir ZIP archive by way of upwards-traversing archive member paths.¹
For example, an Augur overlay is bundled into the workdir ZIP archive
with member paths starting with ../augur/ and ends up overwriting files
in the image's /nextstrain/augur/ since the AWS Batch workdir is always
/nextstrain/build/.

Extending overlay support to AWS Batch has been very low priority and
something I thought was unlikely to ever happen.  However, in the course
of working on AWS Batch support for `nextstrain run`, it turned out to
be easiest/most straightforward/most minimal changes to bundle the
pathogen source directory with the working analysis directory in the
workdir ZIP archive, i.e. as a "pathogen" overlay.  This naturally led
to supporting overlays more generally, which I've done here.

One caveat compared to overlays in runtimes with the concept of volume
mounts (Docker, Singularity) is that any files in the image that do not
exist in the overlaid files will remain present since nothing removes
them.  This is potentially problematic and will be annoying if run into
but most of the time should be a non-issue.  It is also solvable if we
care to exert the effort and extra code to do so.  I don't right now.

¹ <nextstrain/docker-base#241>
tsibley added a commit to nextstrain/cli that referenced this pull request Mar 5, 2025
The latest image, nextstrain/base:build-20250304T041009Z, provides a
mechanism in the entrypoint to support bundling of overlays in the
workdir ZIP archive by way of upwards-traversing archive member paths.¹
For example, an Augur overlay is bundled into the workdir ZIP archive
with member paths starting with ../augur/ and ends up overwriting files
in the image's /nextstrain/augur/ since the AWS Batch workdir is always
/nextstrain/build/.

Extending overlay support to AWS Batch has been very low priority and
something I thought was unlikely to ever happen.  However, in the course
of working on AWS Batch support for `nextstrain run`, it turned out to
be easiest/most straightforward/most minimal changes to bundle the
pathogen source directory with the working analysis directory in the
workdir ZIP archive, i.e. as a "pathogen" overlay.  This naturally led
to supporting overlays more generally, which I've done here.

One caveat compared to overlays in runtimes with the concept of volume
mounts (Docker, Singularity) is that any files in the image that do not
exist in the overlaid files will remain present since nothing removes
them.  This is potentially problematic and will be annoying if run into
but most of the time should be a non-issue.  It is also solvable if we
care to exert the effort and extra code to do so.  I don't right now.

¹ <nextstrain/docker-base#241>
tsibley added a commit to nextstrain/cli that referenced this pull request Mar 11, 2025
The latest image, nextstrain/base:build-20250304T041009Z, provides a
mechanism in the entrypoint to support bundling of overlays in the
workdir ZIP archive by way of upwards-traversing archive member paths.¹
For example, an Augur overlay is bundled into the workdir ZIP archive
with member paths starting with ../augur/ and ends up overwriting files
in the image's /nextstrain/augur/ since the AWS Batch workdir is always
/nextstrain/build/.

Extending overlay support to AWS Batch has been very low priority and
something I thought was unlikely to ever happen.  However, in the course
of working on AWS Batch support for `nextstrain run`, it turned out to
be easiest/most straightforward/most minimal changes to bundle the
pathogen source directory with the working analysis directory in the
workdir ZIP archive, i.e. as a "pathogen" overlay.  This naturally led
to supporting overlays more generally, which I've done here.

One caveat compared to overlays in runtimes with the concept of volume
mounts (Docker, Singularity) is that any files in the image that do not
exist in the overlaid files will remain present since nothing removes
them.  This is potentially problematic and will be annoying if run into
but most of the time should be a non-issue.  It is also solvable if we
care to exert the effort and extra code to do so.  I don't right now.

¹ <nextstrain/docker-base#241>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants