Skip to content

Releases: jltsiren/gbwtgraph

GBWTGraph v1.1

08 Mar 06:19
Compare
Choose a tag to compare
  • Bug fixes:
    • GFA parsing now understands W-lines with with missing (*) SeqStart / SeqEnd fields.
  • Improved support for parallelizing GBWT construction over graph components:
    • MetadataBuilder for generating GBWT metadata.
    • assing_paths() and insert_paths() for including reference/generic paths in the built GBWT.
    • ConstructionJobs::contig_names() for determining contig names for graph components.
  • Subgraph queries similar to GBZ-base:
    • Extract a context within a given distance of a reference path offset/interval or a node.
    • Path-based queries require building a PathIndex for the GBZ graph.
    • The resulting Subgraph object can be converted to GFA.
  • Multithreaded path cover algorithms.
  • path_lcs() for finding the longest common subsequence of two paths, weighted by sequence length in the nodes.
  • Methods for changing the reference samples in a GBZ graph.
  • Sequence extraction using the gbz_extract tool.
    • The extracted sequences maintain the same order as the corresponding paths in the GBWT.
  • Minimizer index version 10.
    • No actual changes. Version number was incremented due to a breaking change in minimizer payloads used in vg.

GBWTGraph v1.0

05 Dec 04:02
Compare
Choose a tag to compare

Things have stabilized enough to call this version 1.0. It contains a number of changes from the last year and a half.

  • Faster copying / moving / swapping of GBZ objects.
  • GFA compression/decompression changes:
    • L-line overlap is now written as 0M instead of *.
    • Option to extract GFA without using the node-to-segment translation.
  • Minimizer index version 9:
    • Payload is now 128 bits per hit (already in version 8).
    • Parameterized over both key type and value type, allowing the use minimizer indexes without a payload.
    • Option to provide a set of frequent kmers that should be avoided as minimizers, as in Winnowmap.
    • Plain kmer indexes can also be used.
    • Iterator over the kmers in the index.
    • Compatible with version 8 but not with earlier versions.
  • New graph operations:
    • subgraph(): Return a subgraph based on the given GBWT index (faster than using a constructor).
  • New algorithms:
    • gbwt_construction_jobs(): Partition a graph into sets of components for parallel GBWT construction.
    • partition_chains(): Partition top-level chains in a snarl decomposition between GBWT construction jobs.

GBWTGraph v0.9

13 May 21:47
Compare
Choose a tag to compare
  • Terminology change consistent with libhandlegraph senses:
    • Old reference paths are now generic (named) paths.
    • Some samples may be interpreted as reference paths.
    • All other paths are haplotype paths.
  • GFA compression/decompression changes:
    • The output is GFA 1.1 instead of GFA 1.0.
    • Path names are now interpreted as contig names instead of sample names by default.
    • Preset --pan-sn for parsing path names and extracting paths in the PanSN format.
    • --ref-only mode that extracts only graph topology and named paths as P-lines.
    • An optional list of reference samples can be stored using GFA header tag RS and GBWT tag reference_samples.
    • The API supports multiple regexes for parsing path names and assigning path senses by the first regex that matches the path name.
  • GBWTGraph implements the PathMetadata interface from libhandlegraph.
    • Metadata that depends on the sense (type) of the path.
    • Haplotype paths are not exposed through the PathHandleGraph interface.
  • Construction from HandleGraph takes an optional NamedNodeBackTranslation for building the node-to-segment translation.
  • Path cover / subsampled GBWTs now include (a selected subset of) paths from the original graph as generic paths.

GBWTGraph v0.8.1

18 Feb 06:32
Compare
Choose a tag to compare

This is a minor patch release for the GBZ paper.

  • Faster GFA decompression in some degenerate cases by caching large GBWT records.

GBWTGraph v0.8

21 Jan 16:34
Compare
Choose a tag to compare
  • GBWTGraph implements more libhandlegraph interfaces:
    • PathHandleGraph: Reference paths corresponding to sample name _gbwt_ref are reported as named paths.
    • NamedNodeBackTranslation: Node chopping + node id / segment name translation for GFA.
    • CachedGBWTGraph is only a HandleGraph.
  • Multithreaded GFA (de)compression in gfa2gbwt and with gfa_to_gbwt() / gbwt_to_gfa().
    • Faster and more memory-efficient compression, faster decompression.
    • Path order in the GBWT and path/edge order in the decompressed GFA file is different from the old algorithms.

GBWTGraph v0.7

20 Nov 02:21
Compare
Choose a tag to compare
  • File format version 3:
  • GBZ format for space-efficient storage of GFA with many paths.
    • Wrapper for GBWT and GBWTGraph serialized using the simple-sds format.
    • Better defined semantics.
  • Serialization and loading use exceptions to handle failures.
  • Requires version 2.3 of the vgteam fork of SDSL and GBWT version 1.3.
  • gfa2gbwt outputs the compressed simple-sds .gbz format by default.

GBWTGraph v0.6

17 Mar 20:20
Compare
Choose a tag to compare

Some work on using GBWTGraph for GFA storage.

  • Uses the vgteam fork of SDSL.
    • In CMake builds, uses the same SDSL as GBWT.
  • When a GFA file contains both P-lines and W-lines, use the P-lines as reference paths.
    • Reference paths have sample name _gbwt_ref and the path name as contig name.
  • GBWTGraph file format version 2.
    • Optional node-to-segment translation for GFA files.
    • Compatible with version 1.
  • Preliminary SegmentHandleGraph interface for GBWTGraph.
    • If node-to-segment translation is included in the graph, each handle maps to a (segment name, starting offset) pair.
    • Each node also maps to a (segment name, node id range) pair.
    • Iteration over segments and links.
  • Compressed GBWTGraph file format for GFA storage.
    • Both GBWT and GBWTGraph in the same file. GBWTGraph typically compresses to ~20% of the original space.
    • gfa2gbwt can output plain and compressed formats and convert between the two.
  • GFA extraction from GBWTGraph.

GBWTGraph v0.5

16 Jan 02:21
Compare
Choose a tag to compare

Major improvements to GFA parsing:

  • Multi-pass algorithm that validates the input before starting GBWT construction.
  • Segment names are automatically translated into integer ids if necessary.
  • Long segments are broken down into smaller nodes (at most 1024 bp by default) if necessary.
  • Segment-to-node translation table is written to basename.trans.
  • GBWT metadata can be generated by parsing path names.
  • Support for proposed W-lines.

GBWTGraph v0.4

06 Nov 05:38
Compare
Choose a tag to compare
  • CachedGBWTGraph: A GBWTGraph overlay that uses the cached interface automatically. Intended for algorithms that repeatedly access the edges in a small subgraph.
  • Minimizer index v7 (compatible with v6):
    • An option to use bounded syncmers instead of minimizers.
  • Graph algorithms in algorithms.h:
    • topological_order(): Find a topological order for all handles in the subgraph induced by a subset of nodes.
  • New queries:
    • GBWTGraph::cached_follow_edges(): A version of follow_edges() using CachedGBWT.
    • hits_in_subgraph(): Report minimizer hits in a subgraph induced by a set of nodes.
  • Changes to path cover:
    • local_haplotypes(): Revert to the path cover algorithm if there are no haplotypes in the component.
    • augment_gbwt(): Augment an existing GBWT with a path cover of missing components.

GBWTGraph v0.3

14 Apr 08:04
Compare
Choose a tag to compare
  • Implemented the new SerializableHandleGraph interface.
  • Query MinimizerIndex::count_and_find() that returns the occurrence count and a pointer to the internal representation of the occurrences.
  • Better path cover for acyclic component by always starting from a head node in such components.
  • Avoid querying for nonexistent nodes during construction when the source graph has gaps between node ids.
  • Store 64 bits of payload for each position in the minimizer index.
  • Minimizer index file format v6 (not compatible with the earlier versions).