edu.cmu.sphinx.linguist.language.ngram.large
Class LargeTrigramModel

java.lang.Object
  extended by edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel
All Implemented Interfaces:
LanguageModel, Configurable

public class LargeTrigramModel
extends java.lang.Object
implements LanguageModel

Queries a binary language model file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.

Note that all probabilites in the grammar are stored in LogMath log base format. Language Probabilties in the language model file are stored in log 10 base. They are converted to the LogMath logbase.


Field Summary
static int BYTES_PER_BIGRAM
          The number of bytes per bigram in the LM file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.
static int BYTES_PER_TRIGRAM
          The number of bytes per trigram in the LM file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.
static java.lang.String PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP
          Sphinx propert that controls whether or not the language model will apply the language weight and word insertion probability
static boolean PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP_DEFAULT
          The default value for PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP
static java.lang.String PROP_BIGRAM_CACHE_SIZE
          A sphinx property that defines the maximum number of bigrams to be cached.
static int PROP_BIGRAM_CACHE_SIZE_DEFAULT
          The default value for the PROP_BIGRAM_CACHE_SIZE property
static java.lang.String PROP_CLEAR_CACHES_AFTER_UTTERANCE
          A sphinx property that controls whether the bigram and trigram caches are cleared after every utterance
static boolean PROP_CLEAR_CACHES_AFTER_UTTERANCE_DEFAULT
          The default value for the PROP_CLEAR_CACHES_AFTER_UTTERANCE property
static java.lang.String PROP_FULL_SMEAR
          If true, use full bigram information to determine smear
static boolean PROP_FULL_SMEAR_DEFAULT
          Default value for PROP_FULL_SMEAR
static java.lang.String PROP_LANGUAGE_WEIGHT
          Sphinx property that defines the language weight for the search
static float PROP_LANGUAGE_WEIGHT_DEFAULT
          The default value for the PROP_LANGUAGE_WEIGHT property
static java.lang.String PROP_LOG_MATH
          Sphinx property that defines the logMath component.
static java.lang.String PROP_QUERY_LOG_FILE
          Sphinx property for the name of the file that logs all the queried N-grams.
static java.lang.String PROP_QUERY_LOG_FILE_DEFAULT
          The default value for PROP_QUERY_LOG_FILE.
static java.lang.String PROP_TRIGRAM_CACHE_SIZE
          A sphinx property that defines that maxium number of trigrams to be cached
static int PROP_TRIGRAM_CACHE_SIZE_DEFAULT
          The default value for the PROP_TRIGRAM_CACHE_SIZE property
static java.lang.String PROP_WORD_INSERTION_PROBABILITY
          Word insertion probability property
static double PROP_WORD_INSERTION_PROBABILITY_DEFAULT
          The default value for PROP_WORD_INSERTION_PROBABILITY
 
Fields inherited from interface edu.cmu.sphinx.linguist.language.ngram.LanguageModel
PROP_DICTIONARY, PROP_FORMAT, PROP_FORMAT_DEFAULT, PROP_LOCATION, PROP_LOCATION_DEFAULT, PROP_MAX_DEPTH, PROP_MAX_DEPTH_DEFAULT, PROP_UNIGRAM_WEIGHT, PROP_UNIGRAM_WEIGHT_DEFAULT
 
Constructor Summary
LargeTrigramModel()
           
 
Method Summary
 void allocate()
          Create the language model
 void deallocate()
          Deallocate resources allocated to this language model
 float getBackoff(WordSequence wordSequence)
          Returns the backoff probability for the give sequence of words
 int getBigramMisses()
          Returns the number of times when a bigram is queried, but there is no bigram in the LM (in which case it uses the backoff probabilities).
 java.util.logging.Logger getLogger()
          Used for reporting errors and warnings during loading
 int getMaxDepth()
          Returns the maximum depth of the language model
 java.lang.String getName()
           
 float getProbability(WordSequence wordSequence)
          Gets the ngram probability of the word sequence represented by the word list
 float getSmear(WordSequence wordSequence)
          Gets the smear term for the given wordSequence
 float getSmearOld(WordSequence wordSequence)
          Gets the smear term for the given wordSequence
 int getTrigramHits()
          Returns the number of trigram hits.
 int getTrigramMisses()
          Returns the number of times when a trigram is queried, but there is no trigram in the LM (in which case it uses the backoff probabilities).
 java.util.Set getVocabulary()
          Returns the set of words in the lanaguage model.
 int getWordID(Word word)
          Returns the ID of the given word.
 void newProperties(PropertySheet ps)
          This method is called when this configurable component needs to be reconfigured.
 void start()
          Called before a recognition
 void stop()
          Called after a recognition
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PROP_QUERY_LOG_FILE

@S4String(mandatory=false)
public static final java.lang.String PROP_QUERY_LOG_FILE
Sphinx property for the name of the file that logs all the queried N-grams. If this property is set to null, it means that the queried N-grams are not logged.

See Also:
Constant Field Values

PROP_QUERY_LOG_FILE_DEFAULT

public static final java.lang.String PROP_QUERY_LOG_FILE_DEFAULT
The default value for PROP_QUERY_LOG_FILE.


PROP_TRIGRAM_CACHE_SIZE

@S4Integer(defaultValue=100000)
public static final java.lang.String PROP_TRIGRAM_CACHE_SIZE
A sphinx property that defines that maxium number of trigrams to be cached

See Also:
Constant Field Values

PROP_TRIGRAM_CACHE_SIZE_DEFAULT

public static final int PROP_TRIGRAM_CACHE_SIZE_DEFAULT
The default value for the PROP_TRIGRAM_CACHE_SIZE property

See Also:
Constant Field Values

PROP_BIGRAM_CACHE_SIZE

@S4Integer(defaultValue=50000)
public static final java.lang.String PROP_BIGRAM_CACHE_SIZE
A sphinx property that defines the maximum number of bigrams to be cached.

See Also:
Constant Field Values

PROP_BIGRAM_CACHE_SIZE_DEFAULT

public static final int PROP_BIGRAM_CACHE_SIZE_DEFAULT
The default value for the PROP_BIGRAM_CACHE_SIZE property

See Also:
Constant Field Values

PROP_CLEAR_CACHES_AFTER_UTTERANCE

@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_CLEAR_CACHES_AFTER_UTTERANCE
A sphinx property that controls whether the bigram and trigram caches are cleared after every utterance

See Also:
Constant Field Values

PROP_CLEAR_CACHES_AFTER_UTTERANCE_DEFAULT

public static final boolean PROP_CLEAR_CACHES_AFTER_UTTERANCE_DEFAULT
The default value for the PROP_CLEAR_CACHES_AFTER_UTTERANCE property

See Also:
Constant Field Values

PROP_LANGUAGE_WEIGHT

@S4Double(defaultValue=1.0)
public static final java.lang.String PROP_LANGUAGE_WEIGHT
Sphinx property that defines the language weight for the search

See Also:
Constant Field Values

PROP_LANGUAGE_WEIGHT_DEFAULT

public static final float PROP_LANGUAGE_WEIGHT_DEFAULT
The default value for the PROP_LANGUAGE_WEIGHT property

See Also:
Constant Field Values

PROP_LOG_MATH

@S4Component(type=LogMath.class)
public static final java.lang.String PROP_LOG_MATH
Sphinx property that defines the logMath component.

See Also:
Constant Field Values

PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP

@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP
Sphinx propert that controls whether or not the language model will apply the language weight and word insertion probability

See Also:
Constant Field Values

PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP_DEFAULT

public static final boolean PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP_DEFAULT
The default value for PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP

See Also:
Constant Field Values

PROP_WORD_INSERTION_PROBABILITY

@S4Double(defaultValue=1.0)
public static final java.lang.String PROP_WORD_INSERTION_PROBABILITY
Word insertion probability property

See Also:
Constant Field Values

PROP_WORD_INSERTION_PROBABILITY_DEFAULT

public static final double PROP_WORD_INSERTION_PROBABILITY_DEFAULT
The default value for PROP_WORD_INSERTION_PROBABILITY

See Also:
Constant Field Values

PROP_FULL_SMEAR

@S4Boolean(defaultValue=false)
public static final java.lang.String PROP_FULL_SMEAR
If true, use full bigram information to determine smear

See Also:
Constant Field Values

PROP_FULL_SMEAR_DEFAULT

public static final boolean PROP_FULL_SMEAR_DEFAULT
Default value for PROP_FULL_SMEAR

See Also:
Constant Field Values

BYTES_PER_BIGRAM

public static final int BYTES_PER_BIGRAM
The number of bytes per bigram in the LM file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.

See Also:
Constant Field Values

BYTES_PER_TRIGRAM

public static final int BYTES_PER_TRIGRAM
The number of bytes per trigram in the LM file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.

See Also:
Constant Field Values
Constructor Detail

LargeTrigramModel

public LargeTrigramModel()
Method Detail

getLogger

public java.util.logging.Logger getLogger()
Description copied from interface: LanguageModel
Used for reporting errors and warnings during loading

Specified by:
getLogger in interface LanguageModel
Returns:
the logger used by the LanguageModel

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component needs to be reconfigured.

Specified by:
newProperties in interface Configurable
Parameters:
ps - a property sheet holding the new data
Throws:
PropertyException - if there is a problem with the properties.

getName

public java.lang.String getName()

allocate

public void allocate()
              throws java.io.IOException
Description copied from interface: LanguageModel
Create the language model

Specified by:
allocate in interface LanguageModel
Throws:
java.io.IOException

deallocate

public void deallocate()
Description copied from interface: LanguageModel
Deallocate resources allocated to this language model

Specified by:
deallocate in interface LanguageModel

start

public void start()
Called before a recognition

Specified by:
start in interface LanguageModel

stop

public void stop()
Called after a recognition

Specified by:
stop in interface LanguageModel

getProbability

public float getProbability(WordSequence wordSequence)
Gets the ngram probability of the word sequence represented by the word list

Specified by:
getProbability in interface LanguageModel
Parameters:
wordSequence - the word sequence
Returns:
the probability of the word sequence. Probability is in logMath log base

getWordID

public final int getWordID(Word word)
Returns the ID of the given word.

Parameters:
word - the word to find the ID
Returns:
the ID of the word

getSmearOld

public float getSmearOld(WordSequence wordSequence)
Gets the smear term for the given wordSequence

Parameters:
wordSequence - the word sequence
Returns:
the smear term associated with this word sequence

getSmear

public float getSmear(WordSequence wordSequence)
Description copied from interface: LanguageModel
Gets the smear term for the given wordSequence

Specified by:
getSmear in interface LanguageModel
Parameters:
wordSequence - the word sequence
Returns:
the smear term associated with this word sequence

getBackoff

public float getBackoff(WordSequence wordSequence)
Returns the backoff probability for the give sequence of words

Parameters:
wordSequence - the sequence of words
Returns:
the backoff probability in LogMath log base

getMaxDepth

public int getMaxDepth()
Returns the maximum depth of the language model

Specified by:
getMaxDepth in interface LanguageModel
Returns:
the maximum depth of the language model

getVocabulary

public java.util.Set getVocabulary()
Returns the set of words in the lanaguage model. The set is unmodifiable.

Specified by:
getVocabulary in interface LanguageModel
Returns:
the unmodifiable set of words

getBigramMisses

public int getBigramMisses()
Returns the number of times when a bigram is queried, but there is no bigram in the LM (in which case it uses the backoff probabilities).

Returns:
the number of bigram misses

getTrigramMisses

public int getTrigramMisses()
Returns the number of times when a trigram is queried, but there is no trigram in the LM (in which case it uses the backoff probabilities).

Returns:
the number of trigram misses

getTrigramHits

public int getTrigramHits()
Returns the number of trigram hits.

Returns:
the number of trigram hits