edu.cmu.sphinx.linguist
Interface Linguist

All Superinterfaces:
Configurable
All Known Implementing Classes:
DynamicFlatLinguist, FlatLinguist, LexTreeLinguist

public interface Linguist
extends Configurable

The linguist is responsible for representing and managing the search space for the decoder. The role of the linguist is to provide, upon request, the search graph that is to be used by the decoder. The linguist is a generic interface that provides language model services.

The main role of any linguist is to represent the search space for the decoder. The search space can be retrieved by a SearchManager via a call to getSearchGraph. This method returns a SearchGraph. The initial state in the search graph can be retrieved via a call to getInitialState Successor states can be retrieved via calls to SearchState.getSuccessors().. There are a number of search state subinterfaces that are used to indicate different types of states in the search space:

A linguist has a great deal of latitude about the order in which it returns states. For instance a 'flat' linguist may return a WordState at the beginning of a word, while a 'tree' linguist may return WordStates at the ending of a word. Likewise, a linguist may omit certain state types completely (such as a unit state). Some Search Managers may want to know a priori the order in which different state types will be generated by the linguist. The method SearchGraph.getNumStateOrder() can be used to retrieve the number of state types that will be returned by the linguist. The method SearchState.getOrder() returns the ranking for a particular state.

Depending on the vocabulary size and topology, the search space represented by the linguist may include a very large number of states. Some linguists will generate the search states dynamically, that is, the object representing a particular state in the search space is not created until it is needed by the SearchManager. SearchManagers often need to be able to determine if a particular state has been entered before by comparing states. Because SearchStates may be generated dynamically, the SearchState.equals() call (as opposed to the reference equals '==' method) should be used to determine if states are equal. The states returned by the linguist will generally provide very efficient implementations of equals and hashCode. This will allow a SearchManager to maintain collections of states in HashMaps efficiently.

The lifecycle of a linguist is as follows:


Field Summary
static java.lang.String PROP_FILLER_INSERTION_PROBABILITY
          Filler insertion probability property
static java.lang.String PROP_LANGUAGE_WEIGHT
          Sphinx property that defines the language weight for the search
static java.lang.String PROP_SILENCE_INSERTION_PROBABILITY
          Silence insertion probability property
static java.lang.String PROP_UNIT_INSERTION_PROBABILITY
          Unit insertion probability property
static java.lang.String PROP_WORD_INSERTION_PROBABILITY
          Word insertion probability property
 
Method Summary
 void allocate()
          Allocates the linguist.
 void deallocate()
          Deallocates the linguist.
 SearchGraph getSearchGraph()
          Retrieves search graph.
 void startRecognition()
          Called before a recognition.
 void stopRecognition()
          Called after a recognition.
 
Methods inherited from interface edu.cmu.sphinx.util.props.Configurable
newProperties
 

Field Detail

PROP_WORD_INSERTION_PROBABILITY

@S4Double(defaultValue=1.0)
static final java.lang.String PROP_WORD_INSERTION_PROBABILITY
Word insertion probability property

See Also:
Constant Field Values

PROP_UNIT_INSERTION_PROBABILITY

@S4Double(defaultValue=1.0)
static final java.lang.String PROP_UNIT_INSERTION_PROBABILITY
Unit insertion probability property

See Also:
Constant Field Values

PROP_SILENCE_INSERTION_PROBABILITY

@S4Double(defaultValue=1.0)
static final java.lang.String PROP_SILENCE_INSERTION_PROBABILITY
Silence insertion probability property

See Also:
Constant Field Values

PROP_FILLER_INSERTION_PROBABILITY

@S4Double(defaultValue=1.0)
static final java.lang.String PROP_FILLER_INSERTION_PROBABILITY
Filler insertion probability property

See Also:
Constant Field Values

PROP_LANGUAGE_WEIGHT

@S4Double(defaultValue=1.0)
static final java.lang.String PROP_LANGUAGE_WEIGHT
Sphinx property that defines the language weight for the search

See Also:
Constant Field Values
Method Detail

getSearchGraph

SearchGraph getSearchGraph()
Retrieves search graph. The search graph represents the search space to be used to guide the search.

Implementor's note: This method is typically called at the beginning of each recognition and therefore should be

Returns:
the search graph

startRecognition

void startRecognition()
Called before a recognition. This method gives a linguist the opportunity to prepare itself before a recognition begins.

Implementor's Note - Some linguists (or underlying lanaguge or acoustic models) may keep caches or pools that need to be initialzed before a recognition. A linguist may implement this method to perform such initialization. Note however, that an ideal linguist will, once allocated, be state-less. This will allow the linguist to be shared by multiple simulataneous searches. Reliance on a 'startRecognition' may prevent a linguist from being used in a multi-threaded search.


stopRecognition

void stopRecognition()
Called after a recognition. This method gives a linguist the opportunity to clean up after a recognition has been completed.

Implementor's Note - Some linguists (or underlying lanaguge or acoustic models) may keep caches or pools that need to be flushed after a recognition. A linguist may implement this method to perform such flushing. Note however, that an ideal linguist will once allocated, be state-less. This will allow the linguist to be shared by multiple simulataneous searches. Reliance on a 'stopRecognition' may prevent a linguist from being used in a multi-threaded search.


allocate

void allocate()
              throws java.io.IOException
Allocates the linguist. Resources allocated by the linguist are allocated here. This method may take many seconds to complete depending upon the linguist.

Implementor's Note - A well written linguist will allow allocate to be called multiple times without harm. This will allow a linguist to be shared by multiple search managers.

Throws:
java.io.IOException - if an IO error occurs

deallocate

void deallocate()
Deallocates the linguist. Any resources allocated by this linguist are released.

Implementor's Note - if the linguist is being shared by multiple searches, the deallocate should only actually deallocate things when the last call to deallocate is made. Two approaches for dealing with this:

(1) Keep an allocation counter that is incremented during allocate and decremented during deallocate. Only when the counter reaches zero should the actually deallocation be performed.

(2) Do nothing in dellocate - just the the GC take care of things