Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disentangling failed: 'Unable to generate result with single copy vertex percentage < 50%' #322

Closed
sanhuacat opened this issue Mar 28, 2024 · 20 comments

Comments

@sanhuacat
Copy link

Hello, Dr. Jin,

I encountered a problem when assembling the chloroplast genome. I have attached my log file and a list of output files.

The dataset is 30x depth of second-generation sequencing data, with about 450 samples. More than 320 samples were successfully assembled, while over 100 samples encountered the same error: "Unable to generate result with single copy vertex percentage < 50%".

What confuses me is that the log file detected the data as Sanger, but I checked all the successful cases and they were also Sanger data. In fact, I used the BGI solution for second-generation genome sequencing. I have previously assembled 5x resequencing data without encountering this issue.

Is this a problem with the data itself? Can it be remedied by adjusting parameters?

Thank in advanced.

屏幕截图 2024-03-28 230150

`
GetOrganelle v1.7.7.0

get_organelle_from_reads.py assembles organelle genomes from genome skimming data.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
PLATFORM: Linux localhost.localdomain 4.18.0-80.11.2.el8_0.x86_64 #1 SMP Tue Sep 24 11:32:19 UTC 2019 x86_64 x86_64
PYTHON LIBS: GetOrganelleLib 1.7.7.0; numpy 1.24.3; sympy 1.12; scipy 1.10.1
DEPENDENCIES: Bowtie2 2.4.1; SPAdes 3.13.1; Blast 2.12.0
GETORG_PATH=/home/lou/.GetOrganelle
SEED DB: embplant_pt 0.0.1; embplant_mt 0.0.1
LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1
WORKING DIR: /nfs/Cold_storage/Sbi_data/CAU_SbiReseq/clean_data
/nfs/lou/miniconda3/envs/getorganelle/bin/get_organelle_from_reads.py -1 GM008.final.R1.fq.gz -2 GM008.final.R2.fq.gz -o /nfs/lou/cpg30/GM008 -R 15 -k 21,45,65,85,105 -F embplant_pt -t 10

2024-03-26 15:22:08,775 - INFO: Pre-reading fastq ...
2024-03-26 15:22:08,776 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf')
2024-03-26 15:22:08,930 - INFO: Tasting 100000+100000 reads ...
2024-03-26 15:22:24,045 - INFO: Estimating reads to use finished.
2024-03-26 15:22:24,046 - INFO: Unzipping reads file: GM008.final.R1.fq.gz (13660406230 bytes)
2024-03-26 15:22:42,849 - INFO: Unzipping reads file: GM008.final.R2.fq.gz (13550063857 bytes)
2024-03-26 15:23:01,662 - INFO: Counting read qualities ...
2024-03-26 15:23:02,198 - INFO: Identified quality encoding format = Sanger
2024-03-26 15:23:02,198 - INFO: Phred offset = 33
2024-03-26 15:23:02,199 - INFO: Trimming bases with qualities (0.00%): 33..33 !
2024-03-26 15:23:02,284 - INFO: Mean error rate = 0.0038
2024-03-26 15:23:02,288 - INFO: Counting read lengths ...
2024-03-26 15:23:23,881 - INFO: Mean = 139.2 bp, maximum = 150 bp.
2024-03-26 15:23:23,881 - INFO: Reads used = 3949578+3949578
2024-03-26 15:23:23,881 - INFO: Pre-reading fastq finished.

2024-03-26 15:23:23,881 - INFO: Making seed reads ...
2024-03-26 15:23:23,883 - INFO: Seed bowtie2 index existed!
2024-03-26 15:23:23,883 - INFO: Mapping reads to seed bowtie2 index ...
2024-03-26 15:24:05,025 - INFO: Mapping finished.
2024-03-26 15:24:05,100 - INFO: Seed reads made: /nfs/lou/cpg30/GM008/seed/embplant_pt.initial.fq (248653346 bytes)
2024-03-26 15:24:05,137 - INFO: Making seed reads finished.

2024-03-26 15:24:05,137 - INFO: Checking seed reads and parameters ...
2024-03-26 15:24:05,137 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s).
2024-03-26 15:24:05,137 - INFO: If the result graph is not a circular organelle genome,
2024-03-26 15:24:05,138 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run.
2024-03-26 15:24:44,702 - INFO: Pre-assembling mapped reads ...
2024-03-26 15:25:37,324 - INFO: Pre-assembling mapped reads finished.
2024-03-26 15:25:37,325 - INFO: Estimated embplant_pt-hitting base-coverage = 784.83
2024-03-26 15:25:37,858 - INFO: Reads reduced to = 2516200+2516200
2024-03-26 15:25:37,858 - INFO: Adjusting expected embplant_pt base coverage to 500.00
2024-03-26 15:25:37,859 - INFO: Estimated word size(s): 104
2024-03-26 15:25:37,859 - INFO: Setting '-w 104'
2024-03-26 15:25:37,859 - INFO: Setting '--max-extending-len inf'
2024-03-26 15:25:38,853 - INFO: Checking seed reads and parameters finished.

2024-03-26 15:25:38,854 - INFO: Making read index ...
2024-03-26 15:25:51,459 - INFO: For /nfs/lou/cpg30/GM008/1-GM008.final.R1.fq.gz.fastq, only top 2516200 reads are used in downstream analysis.
2024-03-26 15:26:03,817 - INFO: For /nfs/lou/cpg30/GM008/2-GM008.final.R2.fq.gz.fastq, only top 2516200 reads are used in downstream analysis.
2024-03-26 15:26:14,095 - INFO: 3830292 candidates in all 5032400 reads
2024-03-26 15:26:14,096 - INFO: Pre-grouping reads ...
2024-03-26 15:26:14,096 - INFO: Setting '--pre-w 104'
2024-03-26 15:26:14,391 - INFO: 200000/842175 used/duplicated
2024-03-26 15:26:22,998 - INFO: 5869 groups made.
2024-03-26 15:26:23,356 - INFO: Making read index finished.

2024-03-26 15:26:23,356 - INFO: Extending ...
2024-03-26 15:26:23,356 - INFO: Adding initial words ...
2024-03-26 15:26:40,518 - INFO: AW 8917240
2024-03-26 15:27:03,133 - INFO: Round 1: 3830292/3830292 AI 262316 AW 9079382
2024-03-26 15:27:19,835 - INFO: Round 2: 3830292/3830292 AI 263791 AW 9100240
2024-03-26 15:27:36,894 - INFO: Round 3: 3830292/3830292 AI 264604 AW 9111448
2024-03-26 15:27:53,810 - INFO: Round 4: 3830292/3830292 AI 265305 AW 9119778
2024-03-26 15:28:10,535 - INFO: Round 5: 3830292/3830292 AI 265807 AW 9125648
2024-03-26 15:28:27,172 - INFO: Round 6: 3830292/3830292 AI 266247 AW 9131894
2024-03-26 15:28:44,791 - INFO: Round 7: 3830292/3830292 AI 266781 AW 9138324
2024-03-26 15:29:01,295 - INFO: Round 8: 3830292/3830292 AI 267329 AW 9144342
2024-03-26 15:29:17,730 - INFO: Round 9: 3830292/3830292 AI 267920 AW 9150174
2024-03-26 15:29:34,383 - INFO: Round 10: 3830292/3830292 AI 268324 AW 9154904
2024-03-26 15:29:51,522 - INFO: Round 11: 3830292/3830292 AI 268739 AW 9159568
2024-03-26 15:30:08,871 - INFO: Round 12: 3830292/3830292 AI 269036 AW 9163302
2024-03-26 15:30:26,308 - INFO: Round 13: 3830292/3830292 AI 269290 AW 9166078
2024-03-26 15:30:43,654 - INFO: Round 14: 3830292/3830292 AI 269569 AW 9169796
2024-03-26 15:31:00,722 - INFO: Round 15: 3830292/3830292 AI 269896 AW 9173578
2024-03-26 15:31:00,722 - INFO: Hit the round limit 15 and terminated ...
2024-03-26 15:31:08,666 - INFO: Extending finished.

2024-03-26 15:31:08,840 - INFO: Separating extended fastq file ...
2024-03-26 15:31:12,127 - INFO: Setting '-k 21,45,65,85,105'
2024-03-26 15:31:12,127 - INFO: Assembling using SPAdes ...
2024-03-26 15:31:12,180 - INFO: spades.py -t 10 --phred-offset 33 -1 /nfs/lou/cpg30/GM008/extended_1_paired.fq -2 /nfs/lou/cpg30/GM008/extended_2_paired.fq --s1 /nfs/lou/cpg30/GM008/extended_1_unpaired.fq --s2 /nfs/lou/cpg30/GM008/extended_2_unpaired.fq -k 21,45,65,85,105 -o /nfs/lou/cpg30/GM008/extended_spades
2024-03-26 15:32:28,897 - INFO: Insert size = 159.306, deviation = 36.3284, left quantile = 119, right quantile = 207
2024-03-26 15:32:28,900 - INFO: Assembling finished.

2024-03-26 15:32:38,969 - INFO: Slimming /nfs/lou/cpg30/GM008/extended_spades/K105/assembly_graph.fastg finished!
2024-03-26 15:32:38,970 - INFO: Slimming assembly graphs finished.

2024-03-26 15:32:38,971 - INFO: Extracting embplant_pt from the assemblies ...
2024-03-26 15:32:38,975 - INFO: Disentangling /nfs/lou/cpg30/GM008/extended_spades/K105/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg as a circular genome ...
2024-03-26 15:32:39,148 - INFO: Disentangling failed: 'Unable to generate result with single copy vertex percentage < 50%'
2024-03-26 15:32:39,148 - INFO: Scaffolding disconnected contigs using SPAdes scaffolds ...
2024-03-26 15:32:39,148 - WARNING: Assembly based on scaffolding may not be as accurate as the ones directly exported from the assembly graph.
2024-03-26 15:32:39,148 - INFO: Disentangling /nfs/lou/cpg30/GM008/extended_spades/K105/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg as a circular genome ...
2024-03-26 15:32:39,156 - INFO: Disentangling failed: 'No new connections.'
2024-03-26 15:32:39,157 - INFO: Disentangling /nfs/lou/cpg30/GM008/extended_spades/K105/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg as a/an embplant_pt-insufficient graph ...
2024-03-26 15:32:39,225 - INFO: Disentangling failed: 'Unable to generate result with single copy vertex percentage < 50%'
2024-03-26 15:32:39,225 - INFO: Please ...
2024-03-26 15:32:39,225 - INFO: load the graph file 'assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg,assembly_graph.fastg' in K105
2024-03-26 15:32:39,225 - INFO: load the CSV file 'assembly_graph.fastg.extend-embplant_pt-embplant_mt.csv' in K105
2024-03-26 15:32:39,225 - INFO: visualize and export your result in Bandage.
2024-03-26 15:32:39,225 - INFO: If you have questions for us, please provide us with the get_org.log.txt file and the post-slimming graph in the format you like!
2024-03-26 15:32:39,226 - INFO: Extracting embplant_pt from the assemblies failed.

Total cost 636.20 s
Thank you!
`

@SowmyaPulapet
Copy link

Hi,

I am also facing the same issue stated above. I have 120 GB of plant sequencing data from which I want to do Chloroplast and Mitochondria assemblies. While assembling chloroplast from the data I got the same error:

Unable to generate results with single copy vertex percentage < 50%

I Highly appreciate some help in solving the issue.

Thank you!

@JianjunJin
Copy link
Collaborator

@SowmyaPulapet @sanhuacat
Please provide the assembly graph in either fastg or visualized png form for troubleshooting

@SowmyaPulapet
Copy link

Assembly graph

This is the assembly graph I got. I am confused about why it shows both embplant_pt and embplant_mt. I am assembling only the chloroplast.

The command I used:

get_organelle_from_reads.py -1 test_1.fq.gz -2 test_2.fq.gz -o PS-plastome -F embplant_pt -t 20

Along with this I also got another error:

Disentangling failed: 'No new connections.'

This happened when I reran the command with a lesser word count (-w 75) and increasing --max-reads .

Feel free to let me know if you need any other information.

Thank you!

@SowmyaPulapet
Copy link

Hi @JianjunJin,

I am running out of time. Could you please help me with this?

@sanhuacat
Copy link
Author

Dr. Jin,

I reran the program and have provided the information from log.txt and the assembly graph png.

As mentioned previously, the output files are not normal, so I have provided the extended_spades\K105\assembly_graph.fastg instead of the assembly graph in the regular output directory.

The log file indicates that the SPAdes software does not seem to have run successfully. Does this mean that the assembly process was completed but the circular graph resolution failed?

I believe that the data was recognized as Sanger due to the sequencing platform, which should not be the reason for the issues with assembly. I would like to understand the meaning of "Unable to generate result with single copy vertex percentage < 50%" and find a solution for it.

Thanks again.
屏幕截图 2024-04-10 154439

GetOrganelle v1.7.7.0

get_organelle_from_reads.py assembles organelle genomes from genome skimming data.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0]
PLATFORM: Linux localhost.localdomain 4.18.0-80.11.2.el8_0.x86_64 #1 SMP Tue Sep 24 11:32:19 UTC 2019 x86_64 x86_64
PYTHON LIBS: GetOrganelleLib 1.7.7.0; numpy 1.24.3; sympy 1.12; scipy 1.10.1
DEPENDENCIES: Bowtie2 2.4.1; SPAdes 3.13.1; Blast 2.12.0
GETORG_PATH=/home/lou/.GetOrganelle
LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1
WORKING DIR: /nfs/Cold_storage/Sbi_data/CAU_SbiReseq/clean_data
/nfs/lou/miniconda3/envs/getorganelle/bin/get_organelle_from_reads.py -1 GM003.final.R1.fq.gz -2 GM003.final.R2.fq.gz -F embplant_pt -o /nfs/lou/cpg30/GM003 -R 15 -t 10 -k 21,45,65,85,105 -s /nfs/lou/Sorghum_bicolor_cp.fasta

2024-04-03 16:28:36,368 - INFO: Pre-reading fastq ...
2024-04-03 16:28:36,398 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf')
2024-04-03 16:28:41,822 - INFO: Tasting 100000+100000 reads ...
2024-04-03 16:36:29,871 - INFO: Estimating reads to use finished.
2024-04-03 16:36:30,056 - INFO: Unzipping reads file: GM003.final.R1.fq.gz (13761484241 bytes)
2024-04-03 16:37:28,923 - INFO: Unzipping reads file: GM003.final.R2.fq.gz (13696338551 bytes)
2024-04-03 16:38:26,289 - INFO: Counting read qualities ...
2024-04-03 16:38:27,521 - INFO: Identified quality encoding format = Sanger
2024-04-03 16:38:27,522 - INFO: Phred offset = 33
2024-04-03 16:38:27,523 - INFO: Trimming bases with qualities (0.00%): 33..33 !
2024-04-03 16:38:27,610 - INFO: Mean error rate = 0.0049
2024-04-03 16:38:27,611 - INFO: Counting read lengths ...
2024-04-03 16:40:21,282 - INFO: Mean = 140.4 bp, maximum = 150 bp.
2024-04-03 16:40:21,313 - INFO: Reads used = 7482673+7482673
2024-04-03 16:40:21,314 - INFO: Pre-reading fastq finished.

2024-04-03 16:40:21,314 - INFO: Making seed reads ...
2024-04-03 16:40:22,373 - INFO: Making seed - bowtie2 index ...
2024-04-03 16:40:31,160 - INFO: Making seed - bowtie2 index finished.
2024-04-03 16:40:31,161 - INFO: Mapping reads to seed bowtie2 index ...
2024-04-03 16:41:38,434 - INFO: Mapping finished.
2024-04-03 16:41:38,573 - INFO: Seed reads made: /nfs/lou/cpg30/GM003/seed/embplant_pt.initial.fq (357841720 bytes)
2024-04-03 16:41:38,632 - INFO: Making seed reads finished.

2024-04-03 16:41:38,632 - INFO: Checking seed reads and parameters ...
2024-04-03 16:41:38,633 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s).
2024-04-03 16:41:38,633 - INFO: If the result graph is not a circular organelle genome,
2024-04-03 16:41:38,633 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run.
2024-04-03 16:42:50,385 - INFO: Pre-assembling mapped reads ...
2024-04-03 16:50:25,529 - INFO: Pre-assembling mapped reads finished.
2024-04-03 16:50:25,560 - INFO: Estimated embplant_pt-hitting base-coverage = 958.33
2024-04-03 16:50:51,432 - INFO: Reads reduced to = 3904016+3904016
2024-04-03 16:50:51,432 - INFO: Adjusting expected embplant_pt base coverage to 500.00
2024-04-03 16:50:51,433 - INFO: Estimated word size(s): 105
2024-04-03 16:50:51,433 - INFO: Setting '-w 105'
2024-04-03 16:50:51,433 - INFO: Setting '--max-extending-len inf'
2024-04-03 16:50:53,103 - INFO: Checking seed reads and parameters finished.

2024-04-03 16:50:53,103 - INFO: Making read index ...
2024-04-03 16:51:10,926 - INFO: For /nfs/lou/cpg30/GM003/1-GM003.final.R1.fq.gz.fastq, only top 3904016 reads are used in downstream analysis.
2024-04-03 16:52:15,306 - INFO: For /nfs/lou/cpg30/GM003/2-GM003.final.R2.fq.gz.fastq, only top 3904016 reads are used in downstream analysis.
2024-04-03 16:52:39,843 - INFO: 6005450 candidates in all 7808032 reads
2024-04-03 16:52:39,878 - INFO: Pre-grouping reads ...
2024-04-03 16:52:39,879 - INFO: Setting '--pre-w 105'
2024-04-03 16:52:40,392 - INFO: 200000/1246567 used/duplicated
2024-04-03 16:52:51,162 - INFO: 5757 groups made.
2024-04-03 16:52:54,792 - INFO: Making read index finished.

2024-04-03 16:52:54,793 - INFO: Extending ...
2024-04-03 16:52:54,793 - INFO: Adding initial words ...
2024-04-03 16:53:23,774 - INFO: AW 11410352
2024-04-03 16:54:12,897 - INFO: Round 1: 6005450/6005450 AI 271161 AW 11476374
2024-04-03 16:54:44,603 - INFO: Round 2: 6005450/6005450 AI 271981 AW 11489456
2024-04-03 16:55:41,275 - INFO: Round 3: 6005450/6005450 AI 272615 AW 11498544
2024-04-03 16:56:15,546 - INFO: Round 4: 6005450/6005450 AI 273294 AW 11507744
2024-04-03 16:56:49,671 - INFO: Round 5: 6005450/6005450 AI 273984 AW 11515272
2024-04-03 16:57:23,559 - INFO: Round 6: 6005450/6005450 AI 274415 AW 11520702
2024-04-03 16:57:58,129 - INFO: Round 7: 6005450/6005450 AI 274861 AW 11525516
2024-04-03 16:58:31,470 - INFO: Round 8: 6005450/6005450 AI 275375 AW 11531414
2024-04-03 16:59:02,604 - INFO: Round 9: 6005450/6005450 AI 275816 AW 11536476
2024-04-03 16:59:35,025 - INFO: Round 10: 6005450/6005450 AI 276242 AW 11541920
2024-04-03 17:00:08,163 - INFO: Round 11: 6005450/6005450 AI 276721 AW 11547248
2024-04-03 17:00:39,730 - INFO: Round 12: 6005450/6005450 AI 277212 AW 11551604
2024-04-03 17:01:11,540 - INFO: Round 13: 6005450/6005450 AI 277522 AW 11555108
2024-04-03 17:01:40,362 - INFO: Round 14: 6005450/6005450 AI 277894 AW 11559594
2024-04-03 17:02:11,451 - INFO: Round 15: 6005450/6005450 AI 278322 AW 11564216
2024-04-03 17:02:11,452 - INFO: Hit the round limit 15 and terminated ...
2024-04-03 17:02:28,882 - INFO: Extending finished.

2024-04-03 17:02:29,446 - INFO: Separating extended fastq file ...
2024-04-03 17:02:38,535 - INFO: Setting '-k 21,45,65,85,105'
2024-04-03 17:02:38,535 - INFO: Assembling using SPAdes ...
2024-04-03 17:02:39,179 - INFO: spades.py -t 10 --phred-offset 33 -1 /nfs/lou/cpg30/GM003/extended_1_paired.fq -2 /nfs/lou/cpg30/GM003/extended_2_paired.fq --s1 /nfs/lou/cpg30/GM003/extended_1_unpaired.fq --s2 /nfs/lou/cpg30/GM003/extended_2_unpaired.fq -k 21,45,65,85,105 -o /nfs/lou/cpg30/GM003/extended_spades
2024-04-03 17:28:50,851 - INFO: Insert size = 162.362, deviation = 36.9732, left quantile = 121, right quantile = 209
2024-04-03 17:28:50,852 - INFO: Assembling finished.

2024-04-03 17:30:59,629 - INFO: Slimming /nfs/lou/cpg30/GM003/extended_spades/K105/assembly_graph.fastg finished!
2024-04-03 17:30:59,630 - INFO: Slimming assembly graphs finished.

2024-04-03 17:30:59,631 - INFO: Extracting embplant_pt from the assemblies ...
2024-04-03 17:30:59,635 - INFO: Disentangling /nfs/lou/cpg30/GM003/extended_spades/K105/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg as a circular genome ...
2024-04-03 17:30:59,778 - INFO: Disentangling failed: 'Unable to generate result with single copy vertex percentage < 50%'
2024-04-03 17:30:59,779 - INFO: Scaffolding disconnected contigs using SPAdes scaffolds ...
2024-04-03 17:30:59,779 - WARNING: Assembly based on scaffolding may not be as accurate as the ones directly exported from the assembly graph.
2024-04-03 17:30:59,779 - INFO: Disentangling /nfs/lou/cpg30/GM003/extended_spades/K105/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg as a circular genome ...
2024-04-03 17:30:59,787 - INFO: Disentangling failed: 'No new connections.'
2024-04-03 17:30:59,787 - INFO: Disentangling /nfs/lou/cpg30/GM003/extended_spades/K105/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg as a/an embplant_pt-insufficient graph ...
2024-04-03 17:30:59,837 - INFO: Disentangling failed: 'Unable to generate result with single copy vertex percentage < 50%'
2024-04-03 17:30:59,837 - INFO: Please ...
2024-04-03 17:30:59,837 - INFO: load the graph file 'assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg,assembly_graph.fastg' in K105
2024-04-03 17:30:59,837 - INFO: load the CSV file 'assembly_graph.fastg.extend-embplant_pt-embplant_mt.csv' in K105
2024-04-03 17:30:59,837 - INFO: visualize and export your result in Bandage.
2024-04-03 17:30:59,837 - INFO: If you have questions for us, please provide us with the get_org.log.txt file and the post-slimming graph in the format you like!
2024-04-03 17:30:59,837 - INFO: Extracting embplant_pt from the assemblies failed.

Total cost 3840.43 s
Thank you!

@JianjunJin
Copy link
Collaborator

Assembly graph

This is the assembly graph I got. I am confused about why it shows both embplant_pt and embplant_mt. I am assembling only the chloroplast.

The command I used:

get_organelle_from_reads.py -1 test_1.fq.gz -2 test_2.fq.gz -o PS-plastome -F embplant_pt -t 20

Along with this I also got another error:

Disentangling failed: 'No new connections.'

This happened when I reran the command with a lesser word count (-w 75) and increasing --max-reads .

Feel free to let me know if you need any other information.

Thank you!

FYI, it's not organelle sufficient, because there are a few high-depth embplant_pt contigs having dead-ends (being terminal contigs). Try to solve this issue first.

@SowmyaPulapet
Copy link

Hi @JianjunJin ,

Could you please give some input on how that can be achieved? Is the error "No new connections" also due to the same issue?

@JianjunJin
Copy link
Collaborator

@sanhuacat

  • Sanger is the quality encoding format (see https://en.wikipedia.org/wiki/FASTQ_format), not the seq tech.
  • SPAdes is running good.
  • At some point, it's a good result with an organelle-sufficient and relatively clean graph. It's a little complex due to LSC sharing a small repeat with IR. But GetOrganelle can handle it well usually. No idea why GetOrganelle didn't recognize the multiplicities correctly in this simple case (and triggered the <50% issue), probably due to uneven coverage - not turned on in your provided image.

You may either 1) do manual curation using get_organelle_from_assembly.py to automatically extract the pt from the manually-curated graph or 2) try GetOrganelle v1.8.0, which using an updated disentangling module but not formally released yet. You may send the fastg file to me if you want me to test it out.

@JianjunJin
Copy link
Collaborator

@SowmyaPulapet
Please see https://github.com/Kinggerm/GetOrganelle/wiki/FAQ#what-should-i-do-with-incomplete-resultbroken-assembly-graph for finetuning.

The "No new connections" was printed because GetOrganelle was trying to fix the terminal contigs but failed - it's on the same track leading to insufficient.

@SowmyaPulapet
Copy link

Hi @JianjunJin ,

I am already aware of this Wiki section and it is appreciatable how informative and detailed the Wiki for this tool is.

Among the solutions suggested; I have already tried the following:

  1. Reduced the word size from 89 to 75
  2. Increased the input reads with these options: --reduce-reads-for-coverage or --max-reads
  3. Increasing the number of rounds

In all those runs, I got the organelle insufficient graphs with the above-mentioned errors. I will try a run with the related genome as the seed. But I am not sure whether it is suggested for chloroplast genome also.

Please let me know what would you suggest if I have made all the above modifications.

Thank you

@JianjunJin
Copy link
Collaborator

@SowmyaPulapet
It's not clear enough through this graph (depth not turned on), but if you set the depth in Bandage, you would likely get rid of real embplant_mt contigs. Although the SSC region is not clear, my intuition here is that there is only one gap in the LSC.

I didn't see your complete log; However, further reducing word size and/or using related as the seed may help.

@SowmyaPulapet
Copy link

SowmyaPulapet commented Apr 10, 2024 via email

@sanhuacat
Copy link
Author

@JianjunJin

Thank you for your answer. I tried to solve it manually. Although I didn't understand why there was an error, I eventually got the usable genome.
Looking forward to new updates!

@JianjunJin
Copy link
Collaborator

@SowmyaPulapet @sanhuacat
Please note that Disentangling failed is not an error indicating abnormal execution, but rather an expected outcome in many runs. It is analogous to obtaining low support or unusual results in a statistical estimation, which can occur due to limitations in the data or imperfections in the model's or algorithm's suitability for the given problem.
Probably the log message appears too alarming.

@SowmyaPulapet
Copy link

@JianjunJin

As suggested I did a rerun with closely related species as seed. This is the command used:

~/Tools/GetOrganelle/get_organelle_from_reads.py -1 ../Trimming/test_R1_val_1.fq.gz -2 ../Trimming/test_R2_val_2.fq.gz -o plastome -F embplant_pt -t 20 -s ../Reference/CP_Genome.fasta -w 65

Unfortunately this time, the .fastg graph was not at all generated. I am also attaching the log file here.

Please have a look and provide your suggestions.

Thank you.
get_org.log.txt

@JianjunJin
Copy link
Collaborator

@SowmyaPulapet
The graph is available at plastome/extended_spades/K115/assembly_graph.fastg.extend-embplant_pt-embplant_mt.fastg according to your log file

@SowmyaPulapet
Copy link

@JianjunJin

graph

This is the graph I got from the path. What can I do further?

@SowmyaPulapet
Copy link

@JianjunJin Hi, any input from your side?

@JianjunJin
Copy link
Collaborator

@SowmyaPulapet It's not clear to me but likely organelle sufficient now. Try to load the csv (with blast info) and manually curate the graph in Bandage, e.g. remove the contigs with shallow depth coverages and see what remains.

@SowmyaPulapet
Copy link

Yes, I figured it out and got the complete genome.

Thanks for your inputs @JianjunJin !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants