forked from galaxyproject/galaxy
-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Expression Tools to Galaxy #27
Labels
Comments
jmchilton
added a commit
that referenced
this issue
Feb 5, 2016
CWL Support: -------------- - Implemented integer params, boolean, data parameter, and arrays thereof as well ["null", <simple_type>] union parameters. - Draft 3 CreateFileRequirements are supported (see the test_rename test case). - Draft 3 InlineJavascriptRequirement are support to define output files (see test_cat3 test case). - EnvVarRequirement requirements are supported (see the test_env_tool1 and test_env_tool2 test cases). - Secondary files are supported at least partially, see the index1 and showindex1 CWL tools as well as the test_index1 test case. - Docker integration is only partial (simple docker pull is supported) - so cat3-tool.cwl works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Non-File CWL outputs are represented as 'expression.json' files, traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either File or non-File and determined at runtime, so galaxy.json is used to dynamically adjust output extension as needed for non-File parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ".json" or ".cwl" and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all "File" outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to config/job_conf.xml. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues.
jmchilton
added a commit
that referenced
this issue
Feb 8, 2016
CWL Support: -------------- - Implemented integer params, boolean, data parameter, and arrays thereof as well ["null", <simple_type>] union parameters. - Draft 3 CreateFileRequirements are supported (see the test_rename test case). - Draft 3 InlineJavascriptRequirement are support to define output files (see test_cat3 test case). - EnvVarRequirement requirements are supported (see the test_env_tool1 and test_env_tool2 test cases). - Secondary files are supported at least partially, see the index1 and showindex1 CWL tools as well as the test_index1 test case. - Docker integration is only partial (simple docker pull is supported) - so cat3-tool.cwl works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Non-File CWL outputs are represented as 'expression.json' files, traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either File or non-File and determined at runtime, so galaxy.json is used to dynamically adjust output extension as needed for non-File parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ".json" or ".cwl" and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all "File" outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to config/job_conf.xml. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues.
jmchilton
added a commit
that referenced
this issue
Feb 10, 2016
CWL Support: -------------- - Implemented integer params, boolean, data parameter, and arrays thereof as well ["null", <simple_type>] union parameters. - Draft 3 CreateFileRequirements are supported (see the test_rename test case). - Draft 3 InlineJavascriptRequirement are support to define output files (see test_cat3 test case). - EnvVarRequirement requirements are supported (see the test_env_tool1 and test_env_tool2 test cases). - Secondary files are supported at least partially, see the index1 and showindex1 CWL tools as well as the test_index1 test case. - Docker integration is only partial (simple docker pull is supported) - so cat3-tool.cwl works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Non-File CWL outputs are represented as 'expression.json' files, traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either File or non-File and determined at runtime, so galaxy.json is used to dynamically adjust output extension as needed for non-File parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ".json" or ".cwl" and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all "File" outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to config/job_conf.xml. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues.
jmchilton
added a commit
that referenced
this issue
Feb 12, 2016
CWL Support: -------------- - Implemented integer params, boolean, data parameter, and arrays thereof as well ["null", <simple_type>] union parameters. - Draft 3 CreateFileRequirements are supported (see the test_rename test case). - Draft 3 InlineJavascriptRequirement are support to define output files (see test_cat3 test case). - EnvVarRequirement requirements are supported (see the test_env_tool1 and test_env_tool2 test cases). - Secondary files are supported at least partially, see the index1 and showindex1 CWL tools as well as the test_index1 test case. - Docker integration is only partial (simple docker pull is supported) - so cat3-tool.cwl works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Non-File CWL outputs are represented as 'expression.json' files, traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either File or non-File and determined at runtime, so galaxy.json is used to dynamically adjust output extension as needed for non-File parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ".json" or ".cwl" and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all "File" outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to config/job_conf.xml. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues.
jmchilton
added a commit
that referenced
this issue
Feb 12, 2016
CWL Support: -------------- - Implemented integer params, boolean, data parameter, and arrays thereof as well ["null", <simple_type>] union parameters. - Draft 3 CreateFileRequirements are supported (see the test_rename test case). - Draft 3 InlineJavascriptRequirement are support to define output files (see test_cat3 test case). - EnvVarRequirement requirements are supported (see the test_env_tool1 and test_env_tool2 test cases). - Secondary files are supported at least partially, see the index1 and showindex1 CWL tools as well as the test_index1 test case. - Docker integration is only partial (simple docker pull is supported) - so cat3-tool.cwl works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Non-File CWL outputs are represented as 'expression.json' files, traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either File or non-File and determined at runtime, so galaxy.json is used to dynamically adjust output extension as needed for non-File parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ".json" or ".cwl" and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all "File" outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to config/job_conf.xml. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues.
jmchilton
added a commit
that referenced
this issue
Feb 12, 2016
CWL Support: -------------- - Implemented integer params, boolean, data parameter, and arrays thereof as well ["null", <simple_type>] union parameters. - Draft 3 CreateFileRequirements are supported (see the test_rename test case). - Draft 3 InlineJavascriptRequirement are support to define output files (see test_cat3 test case). - EnvVarRequirement requirements are supported (see the test_env_tool1 and test_env_tool2 test cases). - Secondary files are supported at least partially, see the index1 and showindex1 CWL tools as well as the test_index1 test case. - Docker integration is only partial (simple docker pull is supported) - so cat3-tool.cwl works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Non-File CWL outputs are represented as 'expression.json' files, traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either File or non-File and determined at runtime, so galaxy.json is used to dynamically adjust output extension as needed for non-File parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ".json" or ".cwl" and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all "File" outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to config/job_conf.xml. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues.
jmchilton
added a commit
that referenced
this issue
Feb 12, 2016
CWL Support: -------------- - Implemented integer params, boolean, data parameter, and arrays thereof as well ["null", <simple_type>] union parameters. - Draft 3 CreateFileRequirements are supported (see the test_rename test case). - Draft 3 InlineJavascriptRequirement are support to define output files (see test_cat3 test case). - EnvVarRequirement requirements are supported (see the test_env_tool1 and test_env_tool2 test cases). - Secondary files are supported at least partially, see the index1 and showindex1 CWL tools as well as the test_index1 test case. - Docker integration is only partial (simple docker pull is supported) - so cat3-tool.cwl works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Non-File CWL outputs are represented as 'expression.json' files, traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either File or non-File and determined at runtime, so galaxy.json is used to dynamically adjust output extension as needed for non-File parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ".json" or ".cwl" and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all "File" outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to config/job_conf.xml. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues.
jmchilton
added a commit
that referenced
this issue
Feb 15, 2016
CWL Support: -------------- - Implemented integer params, boolean, data parameter, and arrays thereof as well ["null", <simple_type>] union parameters. - Draft 3 CreateFileRequirements are supported (see the test_rename test case). - Draft 3 InlineJavascriptRequirement are support to define output files (see test_cat3 test case). - EnvVarRequirement requirements are supported (see the test_env_tool1 and test_env_tool2 test cases). - Secondary files are supported at least partially, see the index1 and showindex1 CWL tools as well as the test_index1 test case. - Docker integration is only partial (simple docker pull is supported) - so cat3-tool.cwl works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Non-File CWL outputs are represented as 'expression.json' files, traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either File or non-File and determined at runtime, so galaxy.json is used to dynamically adjust output extension as needed for non-File parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ".json" or ".cwl" and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all "File" outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to config/job_conf.xml. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues.
jmchilton
added a commit
that referenced
this issue
Feb 15, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well ``["null", <simple_type>]`` union parameters and Any-type parameters. More complex unions of datatypes are stil unsupported (unions of two or more non-null parameters, unions of ``["null", Any]``, etc...). - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues.
jmchilton
added a commit
that referenced
this issue
Feb 18, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well ``["null", <simple_type>]`` union parameters and Any-type parameters. More complex unions of datatypes are stil unsupported (unions of two or more non-null parameters, unions of ``["null", Any]``, etc...). - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support.
jmchilton
added a commit
that referenced
this issue
Feb 20, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support.
jmchilton
added a commit
that referenced
this issue
Feb 22, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support.
jmchilton
added a commit
that referenced
this issue
Feb 25, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support. Conflicts: static/scripts/bundled/analysis.bundled.js static/scripts/bundled/analysis.bundled.js.map static/scripts/bundled/libs.bundled.js static/scripts/bundled/libs.bundled.js.map
jmchilton
added a commit
that referenced
this issue
Mar 1, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support. Conflicts: static/scripts/bundled/analysis.bundled.js static/scripts/bundled/analysis.bundled.js.map static/scripts/bundled/libs.bundled.js static/scripts/bundled/libs.bundled.js.map
jmchilton
added a commit
that referenced
this issue
Apr 2, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support. Conflicts: static/scripts/bundled/analysis.bundled.js static/scripts/bundled/analysis.bundled.js.map static/scripts/bundled/libs.bundled.js static/scripts/bundled/libs.bundled.js.map
jmchilton
added a commit
that referenced
this issue
Apr 15, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support. Conflicts: static/scripts/bundled/analysis.bundled.js static/scripts/bundled/analysis.bundled.js.map static/scripts/bundled/libs.bundled.js static/scripts/bundled/libs.bundled.js.map
jmchilton
added a commit
that referenced
this issue
Apr 25, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support.
jmchilton
added a commit
that referenced
this issue
Apr 27, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support.
jmchilton
added a commit
that referenced
this issue
Apr 28, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support.
jmchilton
added a commit
that referenced
this issue
May 10, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support. Conflicts: static/scripts/bundled/analysis.bundled.js static/scripts/bundled/analysis.bundled.js.map static/scripts/bundled/libs.bundled.js static/scripts/bundled/libs.bundled.js.map
jmchilton
added a commit
that referenced
this issue
May 11, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support. Conflicts: static/scripts/bundled/analysis.bundled.js static/scripts/bundled/analysis.bundled.js.map static/scripts/bundled/libs.bundled.js static/scripts/bundled/libs.bundled.js.map
jmchilton
added a commit
that referenced
this issue
May 12, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support.
jmchilton
added a commit
that referenced
this issue
May 31, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support.
jmchilton
added a commit
that referenced
this issue
Jun 13, 2016
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support.
jmchilton
added a commit
that referenced
this issue
Mar 6, 2017
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support.
jmchilton
added a commit
that referenced
this issue
Mar 6, 2017
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support.
nsoranzo
added a commit
that referenced
this issue
Apr 2, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Apr 2, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Jul 13, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Jul 15, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Jul 21, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Jul 27, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Sep 4, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Sep 4, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Sep 17, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Sep 17, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Sep 17, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Sep 27, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Oct 2, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Oct 8, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Oct 10, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Oct 27, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Dec 5, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Dec 5, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Dec 30, 2024
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Jan 15, 2025
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Jan 17, 2025
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Jan 23, 2025
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Jan 27, 2025
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Jan 27, 2025
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Jan 28, 2025
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Jan 30, 2025
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Feb 3, 2025
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Mar 11, 2025
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Mar 13, 2025
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
nsoranzo
added a commit
that referenced
this issue
Mar 13, 2025
…rmats. This should support a subset of [draft-3](http://www.commonwl.org/draft-3/) and [v1.0](http://www.commonwl.org/v1.0/) tools. CWL Support (Tools): -------------------- - Implemented integer, long, float, double, boolean, string, File, Directory, "null", Any, as well as records and arrays thereof. There are two approaches to handling more complex parameters discussed here (#59). - ``secondaryFiles`` that are actual Files are implemented, secondaryFiles containing directories are not yet implemented. - ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Expression tools are supported (see ``parseInt-tool`` test case). - Shell tools are also support (see record output test case). - Default File values are very un-Galaxy and have been hacked into work with Tools - they still don't work with workflows. - Partial Docker support - this supports the most simple and common pullFrom semantics but not additional ways to fetch containers or additional options such as output directory configuration (https://github.com/common-workflow-language/galaxy/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20Docker). Additionally, Galaxy mounts the inputs and outputs where it wants instead of CWL required mount points - this needs to be fixed for the conformance tests but may not matter much in practice (I'm not sure). CWL Support (Workflows): ------------------------ - Simple connections and tool execution. - Overriding tool input defaults via literal values and simple expressions. - MultipleInputFeatureRequirements to glue together multiple file inputs into a File[] or multiple File[] into a single flat File[]. (nested merge is still a TODO). - Simple scatter semantics for Files and non-Files (e.g. count-lines3). - Simple subworkflows (e.g. count-lines10). - Simple valueFrom expressions (e.g. ``step-valueFrom`` and ``step-valueFrom2``). This work doesn't yet model non-tool parameters to steps - for complex ``valueFrom`` expressions like in ``step-valueFrom3`` do not work yet. Remaining Work --------------------------------- The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being. Implementation Notes: ---------------------- Tools: - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. - CWL secondary files are just normal datasets with extra files stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory and indexed in a file called __secondary_files_index.json in extra_files_path. The upload tools has been augmented to allow attaching arbitrary extra files as a tar file to support getting data into this format initially. CWL requires staging files to include their parent File's ``basename`` - but tools describe inputs as just the extension. I'm not sure which way Galaxy should store __secondary_files__ in its objectstore - just with the extension or with the basename and extension - both options are implemented and can be swapped by setting the boolean STORE_SECONDARY_FILES_WITH_BASENAME in galaxy.tools.cwl.util. - CWL Directory types are datasets of a new type "directory" implemented earlier in this branch. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Workflows: - This work serializes embedded and referenced tools into the database - this will allow reuse and tracing without require the path to exist forever on the filesystem - this will have problems with default file references in workflows. - Implements re-mapping CWL workflow connections to Galaxy input connections. - Fix tool serialization for jobs for path-less tools (such as embedded tools). - Hack tool state during workflow import for CWL. - The sort of dynamic shaping of inputs CWL allows has required enhancing Galaxy's map/reduce stuff to allow mapping over dynamic collections that don't yet exist at the time of tool execution and need to be created on the fly. This commit creates them as HDCAs - but likely they should be something else that doesn't appear in the history panel. - Multi-input scattering but only scatterMethod == "dotproduct" is currently support. Other scatter methods (nested_crossproduct and flatcross_product) are not used by workflows in GA4GH challenge. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Implementation Links: ---------------------- Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links: - Implement merge_nested link semantics for workflow steps (a903abd). - Implement subworkflows in CWL (9933c3c) - MultipleInputFeatureRequirements: - Second attempt: ed8307f - First attempt: ae11f56 - Basic, implicit dotproduct scattering of workflows - d1ad64e. - Simple input StepInputExpressionRequirements - 819a27b - StepInputExpressionRequirements for multiple inputs - 5e7f622 - Record Types in CWL - e6be28a - Rework original approach at mapping CWL state to tool state - 669ea55 - Rework approach at mapping CWL state to tool state again to use "FieldTypeToolParameter"s - implements default values, optional parameters, and union types for workflow inputs. d1ca22f - Initial tracking of "cwl_filename" for CWL jobs (67ffc55). - Reworked secondary file staging, implement testing and indexing of secondary files - 03d1636. Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % cd galaxy % git checkout cwl-1.0 Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ./run_tests.sh -api test/api/test_workflows_cwl.py ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py ``` An individual conformance test can be ran using this pattern: ``` ./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6 ``` The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification. Issues and Contact --------------------------------- Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL [Gitter channel](https://gitter.im/common-workflow-language/common-workflow-language). Co-authored-by: Hervé MENAGER <[email protected]> Co-authored-by: John Chilton <[email protected]> Co-authored-by: Michael R. Crusoe <[email protected]> Co-authored-by: jra001k Co-authored-by: mvdbeek <[email protected]> Co-authored-by: Nicola Soranzo <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
WIP: in this branch - see 0daf498.
The text was updated successfully, but these errors were encountered: