Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(ERR): "Arabidopsis_simulated.plastome/seed/embplant_pt.index" does not exist or is not a Bowtie 2 index #139

Open
tallnuttrbgv opened this issue Feb 22, 2022 · 11 comments

Comments

@tallnuttrbgv
Copy link

Hi,

I have installed using a manual method - git clone etc. But it seems to fail on test data building the bowtie index.

Thanks,

GetOrganelle v1.7.5.3

get_organelle_from_reads.py assembles organelle genomes from genome skimming data.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.10.0 (default, Nov 16 2021, 09:41:50) [GCC 8.4.1 20200928 (Red Hat 8.4.1-1)]
PLATFORM: Linux gadi-login-03.gadi.nci.org.au 4.18.0-348.2.1.el8.nci.x86_64 #1 SMP Fri Nov 26 03:20:41 UTC 2021 x86_64 x86_64
PYTHON LIBS: GetOrganelleLib 1.7.5.3; numpy 1.21.4; sympy 1.9; scipy 1.7.2; psutil 5.9.0
DEPENDENCIES: Bowtie2 2.3.5.1; /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades.py:13: DeprecationWarning: The distutils package is deprecated and slated for remoal in Python 3.12. Use setuptools or check PEP 632 for potential alternaties
from distutils import dir_util
SPAdes 3.13.0; Blast 2.9.0
GETORG_PATH=/home/554/ta0341/.GetOrganelle
SEED DB: embplant_pt 0.0.1; embplant_mt 0.0.1
LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1
WORKING DIR: /g/data/nm31/d/r3.22_paftol_validation/getorgtest
/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite

2022-02-22 15:37:02,907 - INFO: Pre-reading fastq ...
2022-02-22 15:37:02,907 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf')
2022-02-22 15:37:03,028 - INFO: Estimating reads to use finished.
2022-02-22 15:37:03,029 - INFO: Unzipping reads file: Arabidopsis_simulated.1.fq.gz (8796915 bytes)
2022-02-22 15:37:07,535 - INFO: Unzipping reads file: Arabidopsis_simulated.2.fq.gz (9067061 bytes)
2022-02-22 15:37:12,807 - INFO: Counting read qualities ...
2022-02-22 15:37:12,959 - INFO: Identified quality encoding format = Illumina 1.8+
2022-02-22 15:37:12,959 - INFO: Phred offset = 33
2022-02-22 15:37:12,960 - INFO: Trimming bases with qualities (0.00%): 33..33 !
2022-02-22 15:37:13,012 - INFO: Mean error rate = 0.0019
2022-02-22 15:37:13,013 - INFO: Counting read lengths ...
2022-02-22 15:37:13,181 - INFO: Mean = 150.0 bp, maximum = 150 bp.
2022-02-22 15:37:13,182 - INFO: Reads used = 91563+91563
2022-02-22 15:37:13,182 - INFO: Pre-reading fastq finished.

2022-02-22 15:37:13,182 - INFO: Making seed reads ...
2022-02-22 15:37:18,147 - INFO: Making seed - bowtie2 index ...
2022-02-22 15:37:18,212 - INFO: Making seed - bowtie2 index finished.
2022-02-22 15:37:18,213 - INFO: Mapping reads to seed bowtie2 index ...
2022-02-22 15:37:18,316 - ERROR:
(ERR): "Arabidopsis_simulated.plastome/seed/embplant_pt.index" does not exist or is not a Bowtie 2 index
Exiting now ...

2022-02-22 15:37:18,316 - ERROR:
Traceback (most recent call last):
File "/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py", line 3941, in main
seed_fq, seed_sam, new_seed_f = making_seed_reads_using_mapping(
File "/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py", line 3040, in making_seed_reads_using_mapping
map_with_bowtie2(seed_file=seed_file, original_fq_files=original_fq_files,
File "/g/data/nm31/bin/GetOrganelle/GetOrganelleLib/pipe_control_func.py", line 399, in map_with_bowtie2
raise Exception("")
Exception

Total cost 22.27 s
For trouble-shooting, please
Firstly, check https://github.com/Kinggerm/GetOrganelle/wiki/FAQ
Secondly, check if there are open/closed issues related at https://github.com/Kinggerm/GetOrganelle/issues
If your problem was still not solved,
please open an issue at https://github.com/Kinggerm/GetOrganelle/issues
please provide the get_org.log.txt and the assembly graph (can be *.png to protect your data privacy) if possible!

@jaktykusuma
Copy link

Hi,

Sorry to interrupt and add a few problems in this thread. You might want to download the database first. Put it in your main directory https://github.com/Kinggerm/GetOrganelle/wiki/Initialization

I had the same trouble but succeed after downloading it.

However, I notice the same error I made with your log. If you see the dependencies, it's deprecated. I hope the authors could help us fix this problem.

See my log file. Something wrong with SPAdes.

get_org.log.txt

Thank you.

@Kinggerm
Copy link
Owner

jaktykusuma

Your error was different from the current thread. The deprecated dependency issue is currently a harmless warning, not an error.

The failure of running SPAdes in your case was caused by the space in your working directory, specifically, "IRD Works". Besides, please to 1.7.5+, which not only has better instant feedback info in the space-in-working-directory case but also has essential bugs fixed.

@Kinggerm
Copy link
Owner

Hi,

I have installed using a manual method - git clone etc. But it seems to fail on test data building the bowtie index.

Thanks,

GetOrganelle v1.7.5.3

get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.10.0 (default, Nov 16 2021, 09:41:50) [GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] PLATFORM: Linux gadi-login-03.gadi.nci.org.au 4.18.0-348.2.1.el8.nci.x86_64 #1 SMP Fri Nov 26 03:20:41 UTC 2021 x86_64 x86_64 PYTHON LIBS: GetOrganelleLib 1.7.5.3; numpy 1.21.4; sympy 1.9; scipy 1.7.2; psutil 5.9.0 DEPENDENCIES: Bowtie2 2.3.5.1; /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades.py:13: DeprecationWarning: The distutils package is deprecated and slated for remoal in Python 3.12. Use setuptools or check PEP 632 for potential alternaties from distutils import dir_util SPAdes 3.13.0; Blast 2.9.0 GETORG_PATH=/home/554/ta0341/.GetOrganelle SEED DB: embplant_pt 0.0.1; embplant_mt 0.0.1 LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1 WORKING DIR: /g/data/nm31/d/r3.22_paftol_validation/getorgtest /g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite

2022-02-22 15:37:02,907 - INFO: Pre-reading fastq ... 2022-02-22 15:37:02,907 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf') 2022-02-22 15:37:03,028 - INFO: Estimating reads to use finished. 2022-02-22 15:37:03,029 - INFO: Unzipping reads file: Arabidopsis_simulated.1.fq.gz (8796915 bytes) 2022-02-22 15:37:07,535 - INFO: Unzipping reads file: Arabidopsis_simulated.2.fq.gz (9067061 bytes) 2022-02-22 15:37:12,807 - INFO: Counting read qualities ... 2022-02-22 15:37:12,959 - INFO: Identified quality encoding format = Illumina 1.8+ 2022-02-22 15:37:12,959 - INFO: Phred offset = 33 2022-02-22 15:37:12,960 - INFO: Trimming bases with qualities (0.00%): 33..33 ! 2022-02-22 15:37:13,012 - INFO: Mean error rate = 0.0019 2022-02-22 15:37:13,013 - INFO: Counting read lengths ... 2022-02-22 15:37:13,181 - INFO: Mean = 150.0 bp, maximum = 150 bp. 2022-02-22 15:37:13,182 - INFO: Reads used = 91563+91563 2022-02-22 15:37:13,182 - INFO: Pre-reading fastq finished.

2022-02-22 15:37:13,182 - INFO: Making seed reads ... 2022-02-22 15:37:18,147 - INFO: Making seed - bowtie2 index ... 2022-02-22 15:37:18,212 - INFO: Making seed - bowtie2 index finished. 2022-02-22 15:37:18,213 - INFO: Mapping reads to seed bowtie2 index ... 2022-02-22 15:37:18,316 - ERROR: (ERR): "Arabidopsis_simulated.plastome/seed/embplant_pt.index" does not exist or is not a Bowtie 2 index Exiting now ...

2022-02-22 15:37:18,316 - ERROR: Traceback (most recent call last): File "/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py", line 3941, in main seed_fq, seed_sam, new_seed_f = making_seed_reads_using_mapping( File "/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py", line 3040, in making_seed_reads_using_mapping map_with_bowtie2(seed_file=seed_file, original_fq_files=original_fq_files, File "/g/data/nm31/bin/GetOrganelle/GetOrganelleLib/pipe_control_func.py", line 399, in map_with_bowtie2 raise Exception("") Exception

Total cost 22.27 s For trouble-shooting, please Firstly, check https://github.com/Kinggerm/GetOrganelle/wiki/FAQ Secondly, check if there are open/closed issues related at https://github.com/Kinggerm/GetOrganelle/issues If your problem was still not solved, please open an issue at https://github.com/Kinggerm/GetOrganelle/issues please provide the get_org.log.txt and the assembly graph (can be *.png to protect your data privacy) if possible!

Could you please

  1. run ls -lah Arabidopsis_simulated.plastome/seed/embplant_pt.index to list the files
  2. run bowtie2-build -h to see the reaction.
  3. rerun the command with "--verbose" added and attach the new log file here

Thanks!

@tallnuttrbgv
Copy link
Author

ls -lah Arabidopsis_simulated.plastome/seed/embplant_pt.index
ls: cannot access 'Arabidopsis_simulated.plastome/seed/embplant_pt.index': No such file or directory

ls -lah Arabidopsis_simulated.plastome/seed/
total 15M
drwxr-sr-x 2 ta0341 nm31 33K Feb 22 15:44 .
drwxr-sr-x 3 ta0341 nm31 33K Feb 22 15:44 ..
-rw-r--r-- 1 ta0341 nm31 15M Feb 22 15:44 embplant_pt.fasta

bowtie2-build -h
=== ERROR ===
The use of the #!/usr/bin/env python interpreter line in python scripts
has been deprecated.

Please modify this script:
/g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/bowtie2/bowtie2-build

To use either #!/usr/bin/env python3 or #!/usr/bin/env python2
depending on which version of python you require
Alternatively, if you are unable to modify this script
You can load the python2-as-python or python3-as-python
modules depending on which version of python you require

I fixed the interpreter line in bowtie2-build then got the error below.
Verbose log also attached

get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite
GetOrganelle v1.7.5.3

get_organelle_from_reads.py assembles organelle genomes from genome skimming data.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.10.0 (default, Nov 16 2021, 09:41:50) [GCC 8.4.1 20200928 (Red Hat 8.4.1-1)]
PLATFORM: Linux gadi-login-06.gadi.nci.org.au 4.18.0-348.2.1.el8.nci.x86_64 #1 SMP Fri Nov 26 03:20:41 UTC 2021 x86_64 x86_64
PYTHON LIBS: GetOrganelleLib 1.7.5.3; numpy 1.21.4; sympy 1.9; scipy 1.7.2; psutil 5.9.0
DEPENDENCIES: Bowtie2 2.3.5.1; /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades.py:13: DeprecationWarning: The distutils package is deprecated and slated for remoal in Python 3.12. Use setuptools or check PEP 632 for potential alternaties
from distutils import dir_util
SPAdes 3.13.0; Blast 2.9.0
GETORG_PATH=/home/554/ta0341/.GetOrganelle
SEED DB: embplant_pt 0.0.1; embplant_mt 0.0.1
LABEL DB: embplant_pt 0.0.1; embplant_mt 0.0.1
WORKING DIR: /g/data/nm31/d/r3.22_paftol_validation/getorgtest
/g/data/nm31/bin/GetOrganelle/get_organelle_from_reads.py -1 Arabidopsis_simulated.1.fq.gz -2 Arabidopsis_simulated.2.fq.gz -t 1 -o Arabidopsis_simulated.plastome -F embplant_pt -R 10 --overwrite

2022-02-23 10:50:16,977 - INFO: Pre-reading fastq ...
2022-02-23 10:50:16,977 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf')
2022-02-23 10:50:17,177 - INFO: Estimating reads to use finished.
2022-02-23 10:50:17,177 - INFO: Unzipping reads file: Arabidopsis_simulated.1.fq.gz (8796915 bytes)
2022-02-23 10:50:17,513 - INFO: Unzipping reads file: Arabidopsis_simulated.2.fq.gz (9067061 bytes)
2022-02-23 10:50:20,697 - INFO: Counting read qualities ...
2022-02-23 10:50:20,851 - INFO: Identified quality encoding format = Illumina 1.8+
2022-02-23 10:50:20,851 - INFO: Phred offset = 33
2022-02-23 10:50:20,852 - INFO: Trimming bases with qualities (0.00%): 33..33 !
2022-02-23 10:50:20,901 - INFO: Mean error rate = 0.0019
2022-02-23 10:50:20,902 - INFO: Counting read lengths ...
2022-02-23 10:50:21,068 - INFO: Mean = 150.0 bp, maximum = 150 bp.
2022-02-23 10:50:21,068 - INFO: Reads used = 91563+91563
2022-02-23 10:50:21,068 - INFO: Pre-reading fastq finished.

2022-02-23 10:50:21,068 - INFO: Making seed reads ...
2022-02-23 10:50:24,278 - INFO: Making seed - bowtie2 index ...
2022-02-23 10:50:33,840 - INFO: Making seed - bowtie2 index finished.
2022-02-23 10:50:33,840 - INFO: Mapping reads to seed bowtie2 index ...
2022-02-23 10:50:42,532 - INFO: Mapping finished.
2022-02-23 10:50:42,534 - INFO: Seed reads made: Arabidopsis_simulated.plastome/seed/embplant_pt.initial.fq (14144302 bytes)
2022-02-23 10:50:42,535 - INFO: Making seed reads finished.

2022-02-23 10:50:42,535 - INFO: Checking seed reads and parameters ...
2022-02-23 10:50:42,535 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s).
2022-02-23 10:50:42,535 - INFO: If the result graph is not a circular organelle genome,
2022-02-23 10:50:42,535 - INFO: you could adjust the value(s) of '-w'/'-R' for another new run.
2022-02-23 10:50:45,524 - INFO: Pre-assembling mapped reads ...
2022-02-23 10:50:47,545 - INFO: Retrying with more reads ..
2022-02-23 10:51:06,399 - WARNING: Pre-assembling failed. The estimations for embplant_pt-hitting base-coverage and word size may be misleading.
2022-02-23 10:51:07,664 - INFO: Estimated embplant_pt-hitting base-coverage = 52.85
2022-02-23 10:51:07,876 - INFO: Estimated word size(s): 98
2022-02-23 10:51:07,877 - INFO: Setting '-w 98'
2022-02-23 10:51:07,877 - INFO: Setting '--max-extending-len inf'
2022-02-23 10:51:07,958 - INFO: Checking seed reads and parameters finished.

2022-02-23 10:51:07,958 - INFO: Making read index ...
2022-02-23 10:51:09,003 - INFO: Mem 0.324 G, 178623 candidates in all 183126 reads
2022-02-23 10:51:09,003 - INFO: Pre-grouping reads ...
2022-02-23 10:51:09,004 - INFO: Setting '--pre-w 98'
2022-02-23 10:51:09,030 - INFO: Mem 0.324 G, 4074/4074 used/duplicated
2022-02-23 10:51:09,287 - INFO: Mem 0.324 G, 517 groups made.
2022-02-23 10:51:09,298 - INFO: Making read index finished.

2022-02-23 10:51:09,298 - INFO: Extending ...
2022-02-23 10:51:09,298 - INFO: Adding initial words ...
2022-02-23 10:51:10,821 - INFO: AW 1113742
2022-02-23 10:51:12,411 - INFO: Round 1: 178623/178623 AI 40378 AW 1126044 Mem 0.437
2022-02-23 10:51:13,216 - INFO: Round 2: 178623/178623 AI 40411 AW 1126346 Mem 0.437
2022-02-23 10:51:14,071 - INFO: Round 3: 178623/178623 AI 40411 AW 1126346 Mem 0.437
2022-02-23 10:51:14,072 - INFO: No more reads found and terminated ...
2022-02-23 10:51:14,782 - INFO: Extending finished.

2022-02-23 10:51:14,795 - INFO: Separating extended fastq file ...
2022-02-23 10:51:15,137 - INFO: Setting '-k 21,55,85,115'
2022-02-23 10:51:15,137 - INFO: Assembling using SPAdes ...
2022-02-23 10:51:15,152 - INFO: /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades.py -t 1 --phred-offset 33 -1 Arabidopsis_simulated.plastome/extended_1_paired.fq -2 Arabidopsis_simulated.plastome/extended_2_paired.fq --s1 Arabidopsis_simulated.plastome/extended_1_unpaired.fq --s2 Arabidopsis_simulated.plastome/extended_2_unpaired.fq -k 21,55,85,115 -o Arabidopsis_simulated.plastome/extended_spades
2022-02-23 10:51:15,714 - WARNING: Assembling exited halfway.

2022-02-23 10:51:17,441 - ERROR: No valid assembly graph found!

get_org.log.txt

@tallnuttrbgv
Copy link
Author

I also checked that all the other python scripts were #!/usr/bin/env python3, as is required for my system.

@Kinggerm
Copy link
Owner

Kinggerm commented Feb 23, 2022

I would try removing SPAdes under GetOrganelleDep

rm -r /g/data/nm31/bin/GetOrganelle/GetOrganelleDep/linux/SPAdes/

Then install the latest SPAdes using apt install, or conda, or from the source.

Let me know your updates.

@Kinggerm
Copy link
Owner

BTW, no matter if the latest SPAdes could fix your issue on the Gadi environment or not, the GetOrganelleDep needs an update.
I will leave this issue open until an update.

@tallnuttrbgv
Copy link
Author

I deleted the dependency version of spades and use my (working) system version. Get the same error - see attached log. Thanks
get_org.log.txt
.

@Kinggerm
Copy link
Owner

What is the result of spades.py --test?

@tallnuttrbgv
Copy link
Author

ah yes, spades problem..

spades.py --test

== Warning == No assembly mode was specified! If you intend to assemble high-coverage multi-cell/isolate data, use '--isolate' option.

Command line: /g/data/nm31/bin/SPAdes-3.15.2-Linux/bin/spades.py --test

System information:
SPAdes version: 3.15.2
Python version: 3.10.0
OS: Linux-4.18.0-348.2.1.el8.nci.x86_64-x86_64-with-glibc2.28

Output dir: /g/data/nm31/d/r3.21_aatol_extra_samples_2022/spades_test
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
Standard mode
For multi-cell/isolate data we recommend to use '--isolate' option; for single-cell MDA data use '--sc'; for metagenomic data use '--meta'; for RNA-Seq use '--rna'.
Reads:
Traceback (most recent call last):
File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/bin/spades.py", line 651, in
main(sys.argv)
File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/bin/spades.py", line 591, in main
print_params(log, log_filename, command_line, args, cfg)
File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/bin/spades.py", line 327, in print_params
print_used_values(cfg, log)
File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/bin/spades.py", line 117, in print_used_values
dataset_data = pyyaml.load(open(cfg["dataset"].yaml_filename))
File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/share/spades/pyyaml3/init.py", line 72, in load
return loader.get_single_data()
File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/share/spades/pyyaml3/constructor.py", line 37, in get_single_data
return self.construct_document(node)
File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/share/spades/pyyaml3/constructor.py", line 46, in construct_document
for dummy in generator:
File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/share/spades/pyyaml3/constructor.py", line 398, in construct_yaml_map
value = self.construct_mapping(node)
File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/share/spades/pyyaml3/constructor.py", line 204, in construct_mapping
return super().construct_mapping(node, deep=deep)
File "/g/data/nm31/bin/SPAdes-3.15.2-Linux/share/spades/pyyaml3/constructor.py", line 126, in construct_mapping
if not isinstance(key, collections.Hashable):
AttributeError: module 'collections' has no attribute 'Hashable'

@tallnuttrbgv
Copy link
Author

Updated to spades 3.15.4, which works with python 3.10, and issue is now solved. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants