edu.cmu.sphinx.linguist.language.ngram.large
Class BinaryLoader

java.lang.Object
  extended by edu.cmu.sphinx.linguist.language.ngram.large.BinaryLoader

public class BinaryLoader
extends java.lang.Object

Reads a binary language model file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.

Note that all probabilites in the grammar are stored in LogMath log base format. Language Probabilties in the language model file are stored in log 10 base. They are converted to the LogMath logbase.


Constructor Summary
BinaryLoader(java.lang.String format, java.io.File location, boolean applyLanguageWeightAndWip, LogMath logMath, float languageWeight, double wip, float unigramWeight)
          Initializes the binary loader
 
Method Summary
 boolean getBigEndian()
          Returns true if the loaded file is in big-endian.
 int getBigramOffset()
          Returns the location (or offset) into the file where bigrams start.
 float[] getBigramProbabilities()
          Returns all the bigram probabilities.
 int getLogBigramSegmentSize()
          Returns the log of the bigram segment size
 int getMaxDepth()
          Returns the maximum depth of the language model
 int getNumberBigrams()
          Returns the number of bigrams
 int getNumberTrigrams()
          Returns the number of trigrams
 int getNumberUnigrams()
          Returns the number of unigrams
 float[] getTrigramBackoffWeights()
          Returns all the trigram backoff weights
 int getTrigramOffset()
          Returns the location (or offset) into the file where trigrams start.
 float[] getTrigramProbabilities()
          Returns all the trigram probabilities.
 int[] getTrigramSegments()
          Returns the trigram segment table.
 edu.cmu.sphinx.linguist.language.ngram.large.UnigramProbability[] getUnigrams()
          Returns all the unigrams
 java.lang.String[] getWords()
          Returns all the words.
 byte[] loadBuffer(long position, int size)
          Loads the contents of the memory-mapped file starting at the given position and for the given size, into a byte buffer.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BinaryLoader

public BinaryLoader(java.lang.String format,
                    java.io.File location,
                    boolean applyLanguageWeightAndWip,
                    LogMath logMath,
                    float languageWeight,
                    double wip,
                    float unigramWeight)
             throws java.io.IOException
Initializes the binary loader

Parameters:
format - the file format
location - the location of the model
applyLanguageWeightAndWip - if true apply lw and wip
logMath - the logmath to sue
languageWeight - the language weight
wip - the word insertion probability
unigramWeight - the unigram weight
Throws:
java.io.IOException - if an I/O error occurs
Method Detail

getNumberUnigrams

public int getNumberUnigrams()
Returns the number of unigrams

Returns:
the nubmer of unigrams

getNumberBigrams

public int getNumberBigrams()
Returns the number of bigrams

Returns:
the nubmer of bigrams

getNumberTrigrams

public int getNumberTrigrams()
Returns the number of trigrams

Returns:
the nubmer of trigrams

getUnigrams

public edu.cmu.sphinx.linguist.language.ngram.large.UnigramProbability[] getUnigrams()
Returns all the unigrams

Returns:
all the unigrams

getBigramProbabilities

public float[] getBigramProbabilities()
Returns all the bigram probabilities.

Returns:
all the bigram probabilities

getTrigramProbabilities

public float[] getTrigramProbabilities()
Returns all the trigram probabilities.

Returns:
all the trigram probabilities

getTrigramBackoffWeights

public float[] getTrigramBackoffWeights()
Returns all the trigram backoff weights

Returns:
all the trigram backoff weights

getTrigramSegments

public int[] getTrigramSegments()
Returns the trigram segment table.

Returns:
the trigram segment table

getLogBigramSegmentSize

public int getLogBigramSegmentSize()
Returns the log of the bigram segment size

Returns:
the log of the bigram segment size

getWords

public java.lang.String[] getWords()
Returns all the words.

Returns:
all the words

getBigramOffset

public int getBigramOffset()
Returns the location (or offset) into the file where bigrams start.

Returns:
the location of the bigrams

getTrigramOffset

public int getTrigramOffset()
Returns the location (or offset) into the file where trigrams start.

Returns:
the location of the trigrams

getMaxDepth

public int getMaxDepth()
Returns the maximum depth of the language model

Returns:
the maximum depth of the language mdoel

getBigEndian

public boolean getBigEndian()
Returns true if the loaded file is in big-endian.

Returns:
true if the loaded file is big-endian

loadBuffer

public byte[] loadBuffer(long position,
                         int size)
                  throws java.io.IOException
Loads the contents of the memory-mapped file starting at the given position and for the given size, into a byte buffer. This method is implemented because MappedByteBuffer.load() does not work properly.

Parameters:
position - the starting position in the file
size - the number of bytes to load
Returns:
the loaded ByteBuffer
Throws:
java.io.IOException