edu.cmu.sphinx.linguist.language.grammar
Class FSTGrammar
java.lang.Object
edu.cmu.sphinx.linguist.language.grammar.Grammar
edu.cmu.sphinx.linguist.language.grammar.FSTGrammar
- All Implemented Interfaces:
- GrammarInterface, Configurable
public class FSTGrammar
- extends Grammar
Loads a grammar from a file representing a finite-state transducer (FST) in the 'ARPA' grammar format. The ARPA FST
format is like so (the explanation of the format is below):
I 2
F 0 2.30259
T 0 1 <unknown> <unknown> 2.30259
T 0 4 wood wood 1.60951
T 0 5 cindy cindy 1.60951
T 0 6 pittsburgh pittsburgh 1.60951
T 0 7 jean jean 1.60951
F 1 2.89031
T 1 0 , , 0.587725
T 1 4 wood wood 0.58785
F 2 3.00808
T 2 0 , , 0.705491
T 2 1 <unknown> <unknown> 0.58785
F 3 2.30259
T 3 0
F 4 2.89031
T 4 0 , , 0.587725
T 4 6 pittsburgh pittsburgh 0.58785
F 5 2.89031
T 5 0 , , 0.587725
T 5 7 jean jean 0.58785
F 6 2.89031
T 6 0 , , 0.587725
T 6 5 cindy cindy 0.58785
F 7 1.28093
T 7 0 , , 0.454282
T 7 4 wood wood 1.28093
Key:
I - initial node, so "I 2" means node 2 is the initial node
F - final node, e.g., "F 0 2.30259" means that node 0 is a final node,
and the probability of finishing at node 0 is 2.30259 (in -ln)
T - transition, "T 0 4 wood wood 1.60951" means "transitioning from
node 0 to node 4, the output is wood and the machine is now
in the node wood, and the probability associated with the
transition is 1.60951 (in -ln)". "T 6 0 , , 0.587725" is
a backoff transition, and the output is null (epsilon in
the picture), and the machine is now in the null node.
Probabilities read in from the FST file are in negative natural log format and are converted to the internal logMath
log base.
As the FST file is read in, a Grammar object that is structurally equivalent to the FST is created. The steps of
converting the FST file to a Grammar object are:
- Create all the Grammar nodes
Go through the entire FST file and for each word transition, take the
destination node ID and create a grammar node using that ID. These nodes are kept in a hashtable to make sure they
are created once for each ID. Therefore, we get one word per grammar node.
- Create an end node for each Grammar node
This is end node is used for backoff transitions into the
Grammar node, so that it will not go through the word itself, but instead go directly to the end of the word.
Moreover, we also add an optional silence node between the grammar node and its end node. The result of this
step on each grammar node (show in Figure 1 below as the circle with "word") is as follows. The end node is the empty
circle at the far right:
Figure 1: Addition of end node and the
optional silence.
- Create the transitions
Read through the entire FST file, and for each line indicating a transition,
connect up the corresponding Grammar nodes. Backoff transitions and null transitions (i.e., the ones that do not
output a word) will be linked to the end node of a grammar node.
Field Summary |
static java.lang.String |
PROP_LOG_MATH
Sphinx property that defines the logMath component. |
static java.lang.String |
PROP_PATH
The SphinxProperty for the location of the FST n-gram file. |
Method Summary |
void |
newProperties(PropertySheet ps)
This method is called when this configurable component needs to be reconfigured. |
Methods inherited from class edu.cmu.sphinx.linguist.language.grammar.Grammar |
allocate, deallocate, dumpGrammar, dumpRandomSentences, dumpRandomSentences, dumpStatistics, getDictionary, getGrammarNodes, getInitialNode, getNumNodes, getRandomSentence |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PROP_PATH
@S4String(defaultValue="default.arpa_gram")
public static final java.lang.String PROP_PATH
- The SphinxProperty for the location of the FST n-gram file.
- See Also:
- Constant Field Values
PROP_LOG_MATH
@S4Component(type=LogMath.class)
public static final java.lang.String PROP_LOG_MATH
- Sphinx property that defines the logMath component.
- See Also:
- Constant Field Values
FSTGrammar
public FSTGrammar()
newProperties
public void newProperties(PropertySheet ps)
throws PropertyException
- Description copied from interface:
Configurable
- This method is called when this configurable component needs to be reconfigured.
- Specified by:
newProperties
in interface Configurable
- Overrides:
newProperties
in class Grammar
- Parameters:
ps
- a property sheet holding the new data
- Throws:
PropertyException
- if there is a problem with the properties.