Lexiconfree beam search #101

SimBe195 · 2025-02-19T18:21:36Z

Simple time synchronous beam search algorithm based on the new SearchAlgorithmV2 interface. Does not use (proper) pronunciation lexicon, word-level LM or transition model. Performs special handling of blank if a blank index is set. Main purpose is open vocabulary search with CTC/Neural Transducer (or similar) models.

Supports global pruning by max beam-size and by score difference to the best hypothesis. Uses a LabelScorer to context initialization/extension and scoring.

The search requires a lexicon that represents the vocabulary. Each lemma is viewed as a token with its index in the lexicon corresponding to the associated output index of the LabelScorer.

Depends on #103 and #104.

larissakl · 2025-02-20T08:25:16Z

Two general points:

How should this search algorithm be used? Are you planning to put the enum SearchTypeV2, the related ParameterChoice and Module_::createSearchAlgorithm() to Module.hh/.cc later? Or do you have a different plan?
Would it maybe be a good idea to factor out the struct TimeStatistic and its related code so that it can be reused in other search algorithms?

SimBe195 · 2025-02-20T09:01:16Z

1. How should this search algorithm be used? Are you planning to put the `enum SearchTypeV2`, the related `ParameterChoice` and `Module_::createSearchAlgorithm()` to Module.hh/.cc later? Or do you have a different plan?

With this PR alone, the search algorithm is not usable yet. I will make PR's for an Flf node and python bindings separately. But I can include the createSearchAlgorithm() function already here.

2. Would it maybe be a good idea to factor out the `struct TimeStatistic` and its related code so that it can be reused in other search algorithms?

Yeah, probably. Maybe even into Core?

src/Search/LexiconfreeTimesyncBeamSearch/LexiconfreeTimesyncBeamSearch.hh

src/Search/LexiconfreeTimesyncBeamSearch/LexiconfreeTimesyncBeamSearch.cc

src/Search/LexiconfreeTimesyncBeamSearch/LexiconfreeTimesyncBeamSearch.hh

…lattice from beam

…_beam_search

…_search

larissakl · 2025-03-13T12:29:41Z

src/Search/LexiconfreeTimesyncBeamSearch/LexiconfreeTimesyncBeamSearch.hh

+    static const Core::ParameterBool  paramUseSentenceEnd;
+    static const Core::ParameterBool  paramSentenceEndIndex;


These two are currently not used. I don't know if you want keep them if at some point you introduce sentence-end handling or if you want to remove them for now.

Simon Berger added 2 commits February 19, 2025 19:10

Implement simple lexiconfree time-sync beam search

1e7035e

Add some comments

bf0a8ce

SimBe195 requested review from curufinwe and larissakl February 19, 2025 18:21

Add createSearchAlgorithm to Search::Module

d6689b4

larissakl reviewed Feb 24, 2025

View reviewed changes

larissakl reviewed Feb 26, 2025

View reviewed changes

src/Search/LexiconfreeTimesyncBeamSearch/LexiconfreeTimesyncBeamSearch.hh Outdated Show resolved Hide resolved

Simon Berger added 6 commits February 26, 2025 16:29

Fix compilation

664945c

Refactor traceback/lattice building and construct proper (nonlinear) …

488fb0e

…lattice from beam

Factor out time statistics into new Core::StopWatch class

1599302

Don't copy sibling from predecessor

9a60916

Better handling of blank index

8e96423

Apply suggestions from code review

536ac82

This was referenced Mar 4, 2025

StopWatch #103

Open

Lattice and Traceback building #104

Open

larissakl mentioned this pull request Mar 5, 2025

Add VocabTextLexiconParser for simple text-based lexica #105

Open

SimBe195 added 5 commits March 5, 2025 14:00

Merge remote-tracking branch 'origin/lattice_traces' into lexiconfree…

f2f4cf7

…_beam_search

Merge remote-tracking branch 'origin/lattice_traces' into lexiconfree…

04b6ac4

…_beam_search

Merge remote-tracking branch 'origin/lattice_traces' into lexiconfree…

b3d5f02

…_beam_search

Merge remote-tracking branch 'origin/lattice_traces' into lexiconfree…

b1ed20e

…_beam_search

Merge remote-tracking branch 'origin/stopwatch' into lexiconfree_beam…

f112113

…_search

SimBe195 changed the base branch from master to lattice_traces March 5, 2025 14:15

SimBe195 added 3 commits March 5, 2025 20:10

Update traceback/lattice building logic

d67cf45

Merge branch 'stopwatch' into lexiconfree_beam_search

f0832f8

Merge branch 'lattice_traces' into lexiconfree_beam_search

46ee1a8

SimBe195 mentioned this pull request Mar 5, 2025

Add Flf::RecognizerNodeV2 #106

Open

larissakl reviewed Mar 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lexiconfree beam search #101

Lexiconfree beam search #101

SimBe195 commented Feb 19, 2025 •

edited

Loading

larissakl commented Feb 20, 2025

SimBe195 commented Feb 20, 2025 •

edited

Loading

larissakl Mar 13, 2025

		static const Core::ParameterBool paramUseSentenceEnd;
		static const Core::ParameterBool paramSentenceEndIndex;

Lexiconfree beam search #101

Are you sure you want to change the base?

Lexiconfree beam search #101

Conversation

SimBe195 commented Feb 19, 2025 • edited Loading

larissakl commented Feb 20, 2025

SimBe195 commented Feb 20, 2025 • edited Loading

larissakl Mar 13, 2025

Choose a reason for hiding this comment

SimBe195 commented Feb 19, 2025 •

edited

Loading

SimBe195 commented Feb 20, 2025 •

edited

Loading