Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction module #14

Merged
merged 52 commits into from
Nov 30, 2021
Merged

Prediction module #14

merged 52 commits into from
Nov 30, 2021

Conversation

ljyanesm
Copy link
Contributor

No description provided.

Minor refactoring
Adds Augustus to conda environment definition file
We have models now according to the training specification, only missing some tests and printing as either a list that can be used by gff2gb to extract or gff

HashableTranscript provides hash and simpler repr for Mikado.Transcripts
Collate augustus output predictions
Add CLI arguments for:
  - Re-training if the species is already present
  - Intron hints
  - Optionally optimising augustus and number of folds parameter for optimisation
  - Make certain CLI inputs required

Selective training based on input config_path
Adding intron hints parameter and force training parameter (for existing species)
Preparing for many runs (gold+silver, homology preference, etc)
Safer bash with set -euxo pipefail
# Conflicts:
#	annotation/__init__.py
Fix augustus training condition for existing species
Ensure sampling is reproducible and generates the right ratio of mono/multi exonic models
Make scripts executable

Carry the index of the fasta reference alongside it
Always turn on UTR training on augustus
Use a file of sources and priorities for each evidence type for each augustus run, this allows running as many augustus instances as parameter files ther are in the CLI input.
Quiet down wget download
- Add a standalone python3 compatible bam2wig script based on rseqc's bam2wig, this new script also accepts csi indices
- Improve checks on generate_augustus_hint_parameters script
- Generate a single 'base' expressed exon hints file that is then customised per augustus run
- Collate repeats, expression, and intron hints into the WDL
- Add EVM to reat.yml
- EVM process for combining predictions
- Input of weights for EVM
- Minor updates to augustus.wdl (bugfix)
- Add another CQ run which predicts all models (including for locus where a model was provided)
Basic functionality completed, now need to add chunking for Augustus
Rename AugustusTest to AugustusAbinitio
# Conflicts:
#	annotation/__init__.py
- HQ/LQ assembly and protein_alignments can be used as evidence or for defining alternative splicing or UTRs in mikado (if present)
- Add parameters for controlling gene predictor runs
- Automatically determine if running Augustus with or without hints from the parameters
- Preprocess repeat gff for use as Augustus hints
- PrepareTranscriptHints: Make sure the name is correctly computed
- gff_to_aug_hints: Add validation for the parsing of the Parent
- ChangeSource: Keep non exon/CDS records and for exon/CDS records keep previous attributes
Remove unused __init__.py in script
Pass length check on both transcriptome and homology models
Fix protein_alignments inputs to EVM
Keep raw output from the gene predictors
Fix zff_to_gff script
@ljyanesm ljyanesm merged commit 46fc9c8 into main Nov 30, 2021
@ljyanesm ljyanesm deleted the prediction_module branch November 30, 2021 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants