edu.cmu.sphinx.linguist.lextree
Class LexTreeLinguist
java.lang.Object
edu.cmu.sphinx.linguist.lextree.LexTreeLinguist
- All Implemented Interfaces:
- Linguist, Configurable
public class LexTreeLinguist
- extends java.lang.Object
- implements Linguist
A linguist that can represent large vocabularies efficiently. This class implements the Linguist interface. The main
role of any linguist is to represent the search space for the decoder. The initial state in the search space can be
retrieved by a SearchManager via a call to getInitialSearchState
. This method returns a SearchState.
Successor states can be retrieved via calls to SearchState.getSuccessors().
. There are a number of
search state subinterfaces that are used to indicate different types of states in the search space:
- WordSearchState - represents a word in the search space.
- UnitSearchState - represents a
unit in the search space
- HMMSearchState represents an HMM state in the search space
A linguist has a great deal of latitude about the order in which it returns states. For instance a 'flat' linguist
may return a WordState at the beginning of a word, while a 'tree' linguist may return WordStates at the ending of a
word. Likewise, a linguist may omit certain state types completely (such as a unit state). Some Search Managers may
want to know a priori the order in which states will be generated by the linguist. The method
getSearchStateOrder
can be used to retrieve the order of state returned by the linguist.
Depending on the vocabulary size and topology, the search space represented by the linguist may include a very large
number of states. Some linguists will generate the search states dynamically, that is, the object representing a
particular state in the search space is not created until it is needed by the SearchManager. SearchManagers often
need to be able to determine if a particular state has been entered before by comparing states. Because SearchStates
may be generated dynamically, the SearchState.equals()
call (as opposed to the reference equals '=='
method) should be used to determine if states are equal. The states returned by the linguist will generally provide
very efficient implementations of equals
and hashCode
. This will allow a SearchManager to
maintain collections of states in HashMaps efficiently.
LexTeeLinguist Characteristics
Some characteristics of this linguist: - Dynamic - the linguist generates search states on the fly,
greatly reducing the required memory footprint
- tree topology this linguist represents the search space as
an inverted tree. Units near the roots of word are shared among many different words. These reduces the amount of
states that need to be considered during the search.
- HMM sharing - because of state tying in the acoustic
models, it is often the case that triphone units that differ in the right context actually are represented by the
same HMM. This linguist recognizes this case and will use a single state to represent the HMM instead of two states.
This can greatly reduce the number of states generated by the linguist.
- Small-footprint - this linguist
uses a few other techniques to reduce the overall footprint of the search space. One technique that is particularly
helpful is to share the end word units (where the largest fanout of states occurs) across all of the words. For a 60K
word vocabulary, these can result in a reduction in tree nodes of about 2 million to around 3,000.
- Quick
loading - this linguist can compile the search space very quickly. A 60K word vocabulary can be made ready in
less than 10 seconds.
This linguist is not a general purpose linguist. It does impose some constraints:
- unit size - this linguist will units that are no larger than triphones.
- n-gram grammars -
this linguist will generate the search space directly from the N-Gram language model. The vocabulary supported is the
intersection of the words found in the language model and the words that exist in the Dictionary. It is assumed that
all sequences of words in the vocabulary are valid. This linguist doesn't support arbitrary grammars.
Design Notes The following are some notes describing the design of this linguist. They may be helpful to
those who want to understand how this linguist works but are not necessary if you are only interested in using this
linguist.
Search Space Representation It has been shown that representing the search space as a tree can greatly reduce
the number of active states in a search since the units at the beginnings of words can be shared across multiple
words. For example, with a large vocabulary (60K words), at the end of a word, with a flat representation, we have to
provide transitions to the initial state of each possible word. That is 60K transitions. In a tree based system we
need to only provide transitions to each initial phone (within its context). That is about 1600 transitions. This is
a substantial reduction. Conceptually, this tree consists of a node for each possible initial unit. Each node can
have an arbitrary number of children which can be either unit nodes or word nodes.
This linguist uses the HMMTree class to build and represent the tree. The HMMTree is given the dictionary and
language model and builds the lex tree. Instead of representing the nodes in the tree as phonemes and words as is
typically done, the HMMTree represents the tree as HMMs and words. The HMM is essentially a unit within its context.
This is typically a triphone (although for some units (such as SIL) it is a simple phone. Representing the nodes as
HMM instead of nodes yields a much larger tree, but also has some advantages:
- Because of state-tying in the acoustic models, many distinct triphones actually share an HMM. Representing
the nodes as HMMs allows these shared HMMs to be represented in the tree only once instead of many times if we
representing states as phones or triphones. This leads to a reduction in the actual number of states that are
considered during a search. Experiments have shown that this can reduce the required beam by a factor of 2 or 3.
- By representing the nodes as HMM, we avoid having to lookup the HMM for a particular triphone during the search.
This is a modest savings.
There are some disadvantages in representing the tree with HMMs:
- size since HMMs represent units in their context, we have many more copies of each node. For
instance, instead of having a single unit representing the initial 'd' in the word 'dog' we would have about 40 HMMs,
one for each possible left context.
- speed building the much larger HMM tree can take much more time,
since many more nodes are needed to represent the tree.
- complexity representing the tree with HMMs is
more complex. There are multiple entry points for each word/unit that have to be dealt with.
Luckily the size and speed issues can be mitigated (by adding a bit more complexity of course). The bulk of the nodes
in the HMM tree are the word ending nodes. There is a word ending node for each possible right context. To reduce
space, all of the word ending nodes are replaced by a single EndNode. During the search, the actual hmm nodes for a
particular EndNode are generated on request. These sets of hmm nodes can be shared among different word endings, and
therefore are cached. The effect of using this EndNode optimization is to reduce the space required by the tree by
about 300mb and the time required to generate the tree from about 60 seconds to about 6 seconds.
Field Summary |
static java.lang.String |
PROP_ACOUSTIC_MODEL
A sphinx property used to define the acoustic model to use when building the search graph |
static java.lang.String |
PROP_ADD_FILLER_WORDS
Property that controls whether filler words are automatically added to the vocabulary |
static java.lang.String |
PROP_CACHE_SIZE
A sphinx property that defines the size of the arc cache (zero to disable the cache). |
static java.lang.String |
PROP_DICTIONARY
Property that defines the dictionary to use for this grammar |
static java.lang.String |
PROP_FULL_WORD_HISTORIES
Sphinx property used to determine whether or not the gstates are dumped. * A sphinx property that determines
whether or not full word histories are used to determine when two states are equal. |
static java.lang.String |
PROP_GENERATE_UNIT_STATES
Property to control whether or not the linguist will generate unit states. |
static java.lang.String |
PROP_GRAMMAR
A sphinx property used to define the grammar to use when building the search graph |
static java.lang.String |
PROP_LANGUAGE_MODEL
A sphinx property for the language model to be used by this grammar |
static java.lang.String |
PROP_LOG_MATH
Sphinx property that defines the name of the logmath to be used by this search manager. |
static java.lang.String |
PROP_UNIGRAM_SMEAR_WEIGHT
A sphinx property that determines the weight of the smear |
static java.lang.String |
PROP_UNIT_MANAGER
A sphinx property used to define the unit manager to use when building the search graph |
static java.lang.String |
PROP_WANT_UNIGRAM_SMEAR
A sphinx property that determines whether or not unigram probabilities are smeared through the lex tree |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PROP_GRAMMAR
@S4Component(type=Grammar.class)
public static final java.lang.String PROP_GRAMMAR
- A sphinx property used to define the grammar to use when building the search graph
- See Also:
- Constant Field Values
PROP_ACOUSTIC_MODEL
@S4Component(type=AcousticModel.class)
public static final java.lang.String PROP_ACOUSTIC_MODEL
- A sphinx property used to define the acoustic model to use when building the search graph
- See Also:
- Constant Field Values
PROP_UNIT_MANAGER
@S4Component(type=UnitManager.class,
defaultClass=UnitManager.class)
public static final java.lang.String PROP_UNIT_MANAGER
- A sphinx property used to define the unit manager to use when building the search graph
- See Also:
- Constant Field Values
PROP_LOG_MATH
@S4Component(type=LogMath.class)
public static final java.lang.String PROP_LOG_MATH
- Sphinx property that defines the name of the logmath to be used by this search manager.
- See Also:
- Constant Field Values
PROP_FULL_WORD_HISTORIES
@S4Boolean(defaultValue=true)
public static final java.lang.String PROP_FULL_WORD_HISTORIES
- Sphinx property used to determine whether or not the gstates are dumped. * A sphinx property that determines
whether or not full word histories are used to determine when two states are equal.
- See Also:
- Constant Field Values
PROP_LANGUAGE_MODEL
@S4Component(type=LanguageModel.class)
public static final java.lang.String PROP_LANGUAGE_MODEL
- A sphinx property for the language model to be used by this grammar
- See Also:
- Constant Field Values
PROP_DICTIONARY
@S4Component(type=Dictionary.class)
public static final java.lang.String PROP_DICTIONARY
- Property that defines the dictionary to use for this grammar
- See Also:
- Constant Field Values
PROP_CACHE_SIZE
@S4Integer(defaultValue=0)
public static final java.lang.String PROP_CACHE_SIZE
- A sphinx property that defines the size of the arc cache (zero to disable the cache).
- See Also:
- Constant Field Values
PROP_ADD_FILLER_WORDS
@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_ADD_FILLER_WORDS
- Property that controls whether filler words are automatically added to the vocabulary
- See Also:
- Constant Field Values
PROP_GENERATE_UNIT_STATES
@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_GENERATE_UNIT_STATES
- Property to control whether or not the linguist will generate unit states. When this property is false the
linguist may omit UnitSearchState states. For some search algorithms this will allow for a faster search with
more compact results.
- See Also:
- Constant Field Values
PROP_WANT_UNIGRAM_SMEAR
@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_WANT_UNIGRAM_SMEAR
- A sphinx property that determines whether or not unigram probabilities are smeared through the lex tree
- See Also:
- Constant Field Values
PROP_UNIGRAM_SMEAR_WEIGHT
@S4Double(defaultValue=1.0)
public static final java.lang.String PROP_UNIGRAM_SMEAR_WEIGHT
- A sphinx property that determines the weight of the smear
- See Also:
- Constant Field Values
LexTreeLinguist
public LexTreeLinguist()
newProperties
public void newProperties(PropertySheet ps)
throws PropertyException
- Description copied from interface:
Configurable
- This method is called when this configurable component needs to be reconfigured.
- Specified by:
newProperties
in interface Configurable
- Parameters:
ps
- a property sheet holding the new data
- Throws:
PropertyException
- if there is a problem with the properties.
allocate
public void allocate()
throws java.io.IOException
- Description copied from interface:
Linguist
- Allocates the linguist. Resources allocated by the linguist are allocated here. This method may take many seconds
to complete depending upon the linguist.
Implementor's Note - A well written linguist will allow allocate to be called multiple times without harm. This
will allow a linguist to be shared by multiple search managers.
- Specified by:
allocate
in interface Linguist
- Throws:
java.io.IOException
- if an IO error occurs
deallocate
public void deallocate()
- Description copied from interface:
Linguist
- Deallocates the linguist. Any resources allocated by this linguist are released.
Implementor's Note - if the linguist is being shared by multiple searches, the deallocate should only actually
deallocate things when the last call to deallocate is made. Two approaches for dealing with this:
(1) Keep an allocation counter that is incremented during allocate and decremented during deallocate. Only when
the counter reaches zero should the actually deallocation be performed.
(2) Do nothing in dellocate - just the the GC take care of things
- Specified by:
deallocate
in interface Linguist
getSearchGraph
public SearchGraph getSearchGraph()
- Description copied from interface:
Linguist
- Retrieves search graph. The search graph represents the search space to be used to guide the search.
Implementor's note: This method is typically called at the beginning of each recognition and therefore should be
- Specified by:
getSearchGraph
in interface Linguist
- Returns:
- the search graph
startRecognition
public void startRecognition()
- Called before a recognition
- Specified by:
startRecognition
in interface Linguist
stopRecognition
public void stopRecognition()
- Called after a recognition
- Specified by:
stopRecognition
in interface Linguist
getLanguageModel
public LanguageModel getLanguageModel()
- Retrieves the language model for this linguist
- Returns:
- the language model (or null if there is none)
getDictionary
public Dictionary getDictionary()