edu.cmu.sphinx.linguist.language.grammar
Class FSTGrammar

java.lang.Object
  extended by edu.cmu.sphinx.linguist.language.grammar.Grammar
      extended by edu.cmu.sphinx.linguist.language.grammar.FSTGrammar
All Implemented Interfaces:
GrammarInterface, Configurable

public class FSTGrammar
extends Grammar

Loads a grammar from a file representing a finite-state transducer (FST) in the 'ARPA' grammar format. The ARPA FST format is like so (the explanation of the format is below):

  I 2
  F 0 2.30259
  T 0 1 <unknown> <unknown> 2.30259
  T 0 4 wood wood 1.60951
  T 0 5 cindy cindy 1.60951
  T 0 6 pittsburgh pittsburgh 1.60951
  T 0 7 jean jean 1.60951
  F 1 2.89031
  T 1 0 , , 0.587725
  T 1 4 wood wood 0.58785
  F 2 3.00808
  T 2 0 , , 0.705491
  T 2 1 <unknown> <unknown> 0.58785
  F 3 2.30259
  T 3 0
  F 4 2.89031
  T 4 0 , , 0.587725
  T 4 6 pittsburgh pittsburgh 0.58785
  F 5 2.89031
  T 5 0 , , 0.587725
  T 5 7 jean jean 0.58785
  F 6 2.89031
  T 6 0 , , 0.587725
  T 6 5 cindy cindy 0.58785
  F 7 1.28093
  T 7 0 , , 0.454282
  T 7 4 wood wood 1.28093
   

Key:

  I - initial node, so "I 2" means node 2 is the initial node
  F - final node, e.g., "F 0 2.30259" means that node 0 is a final node,
  and the probability of finishing at node 0 is 2.30259 (in -ln)
  T - transition, "T 0 4 wood wood 1.60951" means "transitioning from
  node 0 to node 4, the output is wood and the machine is now
  in the node wood, and the probability associated with the
  transition is 1.60951 (in -ln)". "T 6 0 , , 0.587725" is
  a backoff transition, and the output is null (epsilon in
  the picture), and the machine is now in the null node.
   

Probabilities read in from the FST file are in negative natural log format and are converted to the internal logMath log base.

As the FST file is read in, a Grammar object that is structurally equivalent to the FST is created. The steps of converting the FST file to a Grammar object are:

  1. Create all the Grammar nodes
    Go through the entire FST file and for each word transition, take the destination node ID and create a grammar node using that ID. These nodes are kept in a hashtable to make sure they are created once for each ID. Therefore, we get one word per grammar node.

  2. Create an end node for each Grammar node
    This is end node is used for backoff transitions into the Grammar node, so that it will not go through the word itself, but instead go directly to the end of the word. Moreover, we also add an optional silence node between the grammar node and its end node. The result of this step on each grammar node (show in Figure 1 below as the circle with "word") is as follows. The end node is the empty circle at the far right:

    Figure 1: Addition of end node and the optional silence.

  3. Create the transitions
    Read through the entire FST file, and for each line indicating a transition, connect up the corresponding Grammar nodes. Backoff transitions and null transitions (i.e., the ones that do not output a word) will be linked to the end node of a grammar node.


Field Summary
static java.lang.String PROP_LOG_MATH
          Sphinx property that defines the logMath component.
static java.lang.String PROP_PATH
          The SphinxProperty for the location of the FST n-gram file.
 
Fields inherited from class edu.cmu.sphinx.linguist.language.grammar.Grammar
PROP_ADD_FILLER_WORDS, PROP_ADD_SIL_WORDS, PROP_DICTIONARY, PROP_OPTIMIZE_GRAMMAR, PROP_SHOW_GRAMMAR
 
Constructor Summary
FSTGrammar()
           
 
Method Summary
 void newProperties(PropertySheet ps)
          This method is called when this configurable component needs to be reconfigured.
 
Methods inherited from class edu.cmu.sphinx.linguist.language.grammar.Grammar
allocate, deallocate, dumpGrammar, dumpRandomSentences, dumpRandomSentences, dumpStatistics, getDictionary, getGrammarNodes, getInitialNode, getNumNodes, getRandomSentence
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PROP_PATH

@S4String(defaultValue="default.arpa_gram")
public static final java.lang.String PROP_PATH
The SphinxProperty for the location of the FST n-gram file.

See Also:
Constant Field Values

PROP_LOG_MATH

@S4Component(type=LogMath.class)
public static final java.lang.String PROP_LOG_MATH
Sphinx property that defines the logMath component.

See Also:
Constant Field Values
Constructor Detail

FSTGrammar

public FSTGrammar()
Method Detail

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component needs to be reconfigured.

Specified by:
newProperties in interface Configurable
Overrides:
newProperties in class Grammar
Parameters:
ps - a property sheet holding the new data
Throws:
PropertyException - if there is a problem with the properties.