-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disentangling failed: 'Unable to generate result with single copy vertex percentage < 50%' #322
Comments
Hi, I am also facing the same issue stated above. I have 120 GB of plant sequencing data from which I want to do Chloroplast and Mitochondria assemblies. While assembling chloroplast from the data I got the same error:
I Highly appreciate some help in solving the issue. Thank you! |
@SowmyaPulapet @sanhuacat |
This is the assembly graph I got. I am confused about why it shows both embplant_pt and embplant_mt. I am assembling only the chloroplast. The command I used:
Along with this I also got another error:
This happened when I reran the command with a lesser word count ( Feel free to let me know if you need any other information. Thank you! |
Hi @JianjunJin, I am running out of time. Could you please help me with this? |
Dr. Jin, I reran the program and have provided the information from log.txt and the assembly graph png. As mentioned previously, the output files are not normal, so I have provided the extended_spades\K105\assembly_graph.fastg instead of the assembly graph in the regular output directory. The log file indicates that the SPAdes software does not seem to have run successfully. Does this mean that the assembly process was completed but the circular graph resolution failed? I believe that the data was recognized as Sanger due to the sequencing platform, which should not be the reason for the issues with assembly. I would like to understand the meaning of "Unable to generate result with single copy vertex percentage < 50%" and find a solution for it. GetOrganelle v1.7.7.0 get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Python 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] 2024-04-03 16:28:36,368 - INFO: Pre-reading fastq ... 2024-04-03 16:40:21,314 - INFO: Making seed reads ... 2024-04-03 16:41:38,632 - INFO: Checking seed reads and parameters ... 2024-04-03 16:50:53,103 - INFO: Making read index ... 2024-04-03 16:52:54,793 - INFO: Extending ... 2024-04-03 17:02:29,446 - INFO: Separating extended fastq file ... 2024-04-03 17:30:59,629 - INFO: Slimming /nfs/lou/cpg30/GM003/extended_spades/K105/assembly_graph.fastg finished! 2024-04-03 17:30:59,631 - INFO: Extracting embplant_pt from the assemblies ... Total cost 3840.43 s |
FYI, it's not organelle sufficient, because there are a few high-depth embplant_pt contigs having dead-ends (being terminal contigs). Try to solve this issue first. |
Hi @JianjunJin , Could you please give some input on how that can be achieved? Is the error "No new connections" also due to the same issue? |
You may either 1) do manual curation using |
@SowmyaPulapet The "No new connections" was printed because GetOrganelle was trying to fix the terminal contigs but failed - it's on the same track leading to insufficient. |
Hi @JianjunJin , I am already aware of this Wiki section and it is appreciatable how informative and detailed the Wiki for this tool is. Among the solutions suggested; I have already tried the following:
In all those runs, I got the organelle insufficient graphs with the above-mentioned errors. I will try a run with the related genome as the seed. But I am not sure whether it is suggested for chloroplast genome also. Please let me know what would you suggest if I have made all the above modifications. Thank you |
@SowmyaPulapet I didn't see your complete log; However, further reducing word size and/or using related as the seed may help. |
Alright, will try and let you know.
…On Wed, Apr 10, 2024 at 7:32 PM JianJun Jin ***@***.***> wrote:
@SowmyaPulapet <https://github.com/SowmyaPulapet>
It's not clear enough through this graph (depth not turned on), but if you
set the depth in Bandage, you would likely get rid of real embplant_mt
contigs. Although the SSC region is not clear, my intuition here is that
there is only one gap in the LSC.
I didn't see your complete log; However, further reducing word size and/or
using related as the seed may help.
—
Reply to this email directly, view it on GitHub
<#322 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AYKKGQDYWQZLMQ7SG5SETZLY4VA7NAVCNFSM6AAAAABFM657UCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBXGYZTOMBXHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
*Sowmya P*
*Team Lead (Bioinformatics)*
*ONEOMICS PRIVATE LIMITED*
F12 - Bharathidasan University Tech Park,
Kajamalai Colony, Tiruchirappalli,
Tamil Nadu - 620 023.
Email: ***@***.***
[image: ONEOMICS]
Website: https://oneomics.in/
|
Thank you for your answer. I tried to solve it manually. Although I didn't understand why there was an error, I eventually got the usable genome. |
@SowmyaPulapet @sanhuacat |
As suggested I did a rerun with closely related species as seed. This is the command used:
Unfortunately this time, the .fastg graph was not at all generated. I am also attaching the log file here. Please have a look and provide your suggestions. Thank you. |
@SowmyaPulapet |
This is the graph I got from the path. What can I do further? |
@JianjunJin Hi, any input from your side? |
@SowmyaPulapet It's not clear to me but likely organelle sufficient now. Try to load the csv (with blast info) and manually curate the graph in Bandage, e.g. remove the contigs with shallow depth coverages and see what remains. |
Yes, I figured it out and got the complete genome. Thanks for your inputs @JianjunJin ! |
Hello, Dr. Jin,
I encountered a problem when assembling the chloroplast genome. I have attached my log file and a list of output files.
The dataset is 30x depth of second-generation sequencing data, with about 450 samples. More than 320 samples were successfully assembled, while over 100 samples encountered the same error: "Unable to generate result with single copy vertex percentage < 50%".
What confuses me is that the log file detected the data as Sanger, but I checked all the successful cases and they were also Sanger data. In fact, I used the BGI solution for second-generation genome sequencing. I have previously assembled 5x resequencing data without encountering this issue.
Is this a problem with the data itself? Can it be remedied by adjusting parameters?
Thank in advanced.
`
GetOrganelle v1.7.7.0
get_organelle_from_reads.py assembles organelle genomes from genome skimming data.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.
Python 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
PLATFORM: Linux localhost.localdomain 4.18.0-80.11.2.el8_0.x86_64 #1 SMP Tue Sep 24 11:32:19 UTC 2019 x86_64 x86_64
PYTHON LIBS: GetOrganelleLib 1.7.7.0; numpy 1.24.3; sympy 1.12; scipy 1.10.1
DEPENDENCIES: Bowtie2 2.4.1; SPAdes 3.13.1; Blast 2.12.0
GETORG_PATH=/home/lou/.GetOrganelle
SEED DB: embplant_pt 0.0.1; embplant_mt 0.0.1
LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1
WORKING DIR: /nfs/Cold_storage/Sbi_data/CAU_SbiReseq/clean_data
/nfs/lou/miniconda3/envs/getorganelle/bin/get_organelle_from_reads.py -1 GM008.final.R1.fq.gz -2 GM008.final.R2.fq.gz -o /nfs/lou/cpg30/GM008 -R 15 -k 21,45,65,85,105 -F embplant_pt -t 10
2024-03-26 15:22:08,775 - INFO: Pre-reading fastq ...
2024-03-26 15:22:08,776 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf')
2024-03-26 15:22:08,930 - INFO: Tasting 100000+100000 reads ...
2024-03-26 15:22:24,045 - INFO: Estimating reads to use finished.
2024-03-26 15:22:24,046 - INFO: Unzipping reads file: GM008.final.R1.fq.gz (13660406230 bytes)
2024-03-26 15:22:42,849 - INFO: Unzipping reads file: GM008.final.R2.fq.gz (13550063857 bytes)
2024-03-26 15:23:01,662 - INFO: Counting read qualities ...
2024-03-26 15:23:02,198 - INFO: Identified quality encoding format = Sanger
2024-03-26 15:23:02,198 - INFO: Phred offset = 33
2024-03-26 15:23:02,199 - INFO: Trimming bases with qualities (0.00%): 33..33 !
2024-03-26 15:23:02,284 - INFO: Mean error rate = 0.0038
2024-03-26 15:23:02,288 - INFO: Counting read lengths ...
2024-03-26 15:23:23,881 - INFO: Mean = 139.2 bp, maximum = 150 bp.
2024-03-26 15:23:23,881 - INFO: Reads used = 3949578+3949578
2024-03-26 15:23:23,881 - INFO: Pre-reading fastq finished.
2024-03-26 15:23:23,881 - INFO: Making seed reads ...
2024-03-26 15:23:23,883 - INFO: Seed bowtie2 index existed!
2024-03-26 15:23:23,883 - INFO: Mapping reads to seed bowtie2 index ...
2024-03-26 15:24:05,025 - INFO: Mapping finished.
2024-03-26 15:24:05,100 - INFO: Seed reads made: /nfs/lou/cpg30/GM008/seed/embplant_pt.initial.fq (248653346 bytes)
2024-03-26 15:24:05,137 - INFO: Making seed reads finished.
2024-03-26 15:24:05,137 - INFO: Checking seed reads and parameters ...
2024-03-26 15:24:05,137 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s).
2024-03-26 15:24:05,137 - INFO: If the result graph is not a circular organelle genome,
2024-03-26 15:24:05,138 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run.
2024-03-26 15:24:44,702 - INFO: Pre-assembling mapped reads ...
2024-03-26 15:25:37,324 - INFO: Pre-assembling mapped reads finished.
2024-03-26 15:25:37,325 - INFO: Estimated embplant_pt-hitting base-coverage = 784.83
2024-03-26 15:25:37,858 - INFO: Reads reduced to = 2516200+2516200
2024-03-26 15:25:37,858 - INFO: Adjusting expected embplant_pt base coverage to 500.00
2024-03-26 15:25:37,859 - INFO: Estimated word size(s): 104
2024-03-26 15:25:37,859 - INFO: Setting '-w 104'
2024-03-26 15:25:37,859 - INFO: Setting '--max-extending-len inf'
2024-03-26 15:25:38,853 - INFO: Checking seed reads and parameters finished.
2024-03-26 15:25:38,854 - INFO: Making read index ...
2024-03-26 15:25:51,459 - INFO: For /nfs/lou/cpg30/GM008/1-GM008.final.R1.fq.gz.fastq, only top 2516200 reads are used in downstream analysis.
2024-03-26 15:26:03,817 - INFO: For /nfs/lou/cpg30/GM008/2-GM008.final.R2.fq.gz.fastq, only top 2516200 reads are used in downstream analysis.
2024-03-26 15:26:14,095 - INFO: 3830292 candidates in all 5032400 reads
2024-03-26 15:26:14,096 - INFO: Pre-grouping reads ...
2024-03-26 15:26:14,096 - INFO: Setting '--pre-w 104'
2024-03-26 15:26:14,391 - INFO: 200000/842175 used/duplicated
2024-03-26 15:26:22,998 - INFO: 5869 groups made.
2024-03-26 15:26:23,356 - INFO: Making read index finished.
2024-03-26 15:26:23,356 - INFO: Extending ...
2024-03-26 15:26:23,356 - INFO: Adding initial words ...
2024-03-26 15:26:40,518 - INFO: AW 8917240
2024-03-26 15:27:03,133 - INFO: Round 1: 3830292/3830292 AI 262316 AW 9079382
2024-03-26 15:27:19,835 - INFO: Round 2: 3830292/3830292 AI 263791 AW 9100240
2024-03-26 15:27:36,894 - INFO: Round 3: 3830292/3830292 AI 264604 AW 9111448
2024-03-26 15:27:53,810 - INFO: Round 4: 3830292/3830292 AI 265305 AW 9119778
2024-03-26 15:28:10,535 - INFO: Round 5: 3830292/3830292 AI 265807 AW 9125648
2024-03-26 15:28:27,172 - INFO: Round 6: 3830292/3830292 AI 266247 AW 9131894
2024-03-26 15:28:44,791 - INFO: Round 7: 3830292/3830292 AI 266781 AW 9138324
2024-03-26 15:29:01,295 - INFO: Round 8: 3830292/3830292 AI 267329 AW 9144342
2024-03-26 15:29:17,730 - INFO: Round 9: 3830292/3830292 AI 267920 AW 9150174
2024-03-26 15:29:34,383 - INFO: Round 10: 3830292/3830292 AI 268324 AW 9154904
2024-03-26 15:29:51,522 - INFO: Round 11: 3830292/3830292 AI 268739 AW 9159568
2024-03-26 15:30:08,871 - INFO: Round 12: 3830292/3830292 AI 269036 AW 9163302
2024-03-26 15:30:26,308 - INFO: Round 13: 3830292/3830292 AI 269290 AW 9166078
2024-03-26 15:30:43,654 - INFO: Round 14: 3830292/3830292 AI 269569 AW 9169796
2024-03-26 15:31:00,722 - INFO: Round 15: 3830292/3830292 AI 269896 AW 9173578
2024-03-26 15:31:00,722 - INFO: Hit the round limit 15 and terminated ...
2024-03-26 15:31:08,666 - INFO: Extending finished.
2024-03-26 15:31:08,840 - INFO: Separating extended fastq file ...
2024-03-26 15:31:12,127 - INFO: Setting '-k 21,45,65,85,105'
2024-03-26 15:31:12,127 - INFO: Assembling using SPAdes ...
2024-03-26 15:31:12,180 - INFO: spades.py -t 10 --phred-offset 33 -1 /nfs/lou/cpg30/GM008/extended_1_paired.fq -2 /nfs/lou/cpg30/GM008/extended_2_paired.fq --s1 /nfs/lou/cpg30/GM008/extended_1_unpaired.fq --s2 /nfs/lou/cpg30/GM008/extended_2_unpaired.fq -k 21,45,65,85,105 -o /nfs/lou/cpg30/GM008/extended_spades
2024-03-26 15:32:28,897 - INFO: Insert size = 159.306, deviation = 36.3284, left quantile = 119, right quantile = 207
2024-03-26 15:32:28,900 - INFO: Assembling finished.
2024-03-26 15:32:38,969 - INFO: Slimming /nfs/lou/cpg30/GM008/extended_spades/K105/assembly_graph.fastg finished!
2024-03-26 15:32:38,970 - INFO: Slimming assembly graphs finished.
2024-03-26 15:32:38,971 - INFO: Extracting embplant_pt from the assemblies ...
2024-03-26 15:32:38,975 - INFO: Disentangling /nfs/lou/cpg30/GM008/extended_spades/K105/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg as a circular genome ...
2024-03-26 15:32:39,148 - INFO: Disentangling failed: 'Unable to generate result with single copy vertex percentage < 50%'
2024-03-26 15:32:39,148 - INFO: Scaffolding disconnected contigs using SPAdes scaffolds ...
2024-03-26 15:32:39,148 - WARNING: Assembly based on scaffolding may not be as accurate as the ones directly exported from the assembly graph.
2024-03-26 15:32:39,148 - INFO: Disentangling /nfs/lou/cpg30/GM008/extended_spades/K105/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg as a circular genome ...
2024-03-26 15:32:39,156 - INFO: Disentangling failed: 'No new connections.'
2024-03-26 15:32:39,157 - INFO: Disentangling /nfs/lou/cpg30/GM008/extended_spades/K105/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg as a/an embplant_pt-insufficient graph ...
2024-03-26 15:32:39,225 - INFO: Disentangling failed: 'Unable to generate result with single copy vertex percentage < 50%'
2024-03-26 15:32:39,225 - INFO: Please ...
2024-03-26 15:32:39,225 - INFO: load the graph file 'assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg,assembly_graph.fastg' in K105
2024-03-26 15:32:39,225 - INFO: load the CSV file 'assembly_graph.fastg.extend-embplant_pt-embplant_mt.csv' in K105
2024-03-26 15:32:39,225 - INFO: visualize and export your result in Bandage.
2024-03-26 15:32:39,225 - INFO: If you have questions for us, please provide us with the get_org.log.txt file and the post-slimming graph in the format you like!
2024-03-26 15:32:39,226 - INFO: Extracting embplant_pt from the assemblies failed.
Total cost 636.20 s
Thank you!
`
The text was updated successfully, but these errors were encountered: