edu.cmu.sphinx.frontend.endpoint
Class SpeechMarker

java.lang.Object
  extended by edu.cmu.sphinx.frontend.BaseDataProcessor
      extended by edu.cmu.sphinx.frontend.endpoint.SpeechMarker
All Implemented Interfaces:
DataProcessor, Configurable

public class SpeechMarker
extends BaseDataProcessor

Converts a stream of SpeechClassifiedData objects, marked as speech and non-speech, and mark out the regions that are considered speech. This is done by inserting SPEECH_START and SPEECH_END signals into the stream.

The algorithm for inserting the two signals is as follows.

The algorithm is always in one of two states: 'in-speech' and 'out-of-speech'. If 'out-of-speech', it will read in audio until we hit audio that is speech. If we have read more than 'startSpeech' amount of continuous speech, we consider that speech has started, and insert a SPEECH_START at 'speechLeader' time before speech first started. The state of the algorithm changes to 'in-speech'.

Now consider the case when the algorithm is in 'in-speech' state. If it read an audio that is speech, it is outputted. If the audio is non-speech, we read ahead until we have 'endSilence' amount of continuous non-speech. At the point we consider that speech has ended. A SPEECH_END signal is inserted at 'speechTrailer' time after the first non-speech audio. The algorithm returns to 'ou-of-speech' state. If any speech audio is encountered in-between, the accounting starts all over again.


Field Summary
static java.lang.String PROP_END_SILENCE
          The SphinxProperty for the amount of time in silence (in milliseconds) to be considered as utterance end.
static java.lang.String PROP_SPEECH_LEADER
          The SphinxProperty for the amount of time (in milliseconds) before speech start to be included as speech data.
static java.lang.String PROP_SPEECH_TRAILER
          The SphinxProperty for the amount of time (in milliseconds) after speech ends to be included as speech data.
static java.lang.String PROP_START_SPEECH
          The Sphinx4 roperty for the minimum amount of time in speech (in milliseconds) to be considered as utterance start.
 
Constructor Summary
SpeechMarker()
           
 
Method Summary
 int getAudioTime(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData audio)
          Returns the amount of audio data in milliseconds in the given SpeechClassifiedData object.
 Data getData()
          Returns the next Data object.
 void initialize()
          Initializes this SpeechMarker
 void newProperties(PropertySheet ps)
          This method is called when this configurable component needs to be reconfigured.
 
Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor
getPredecessor, getTimer, setPredecessor, toString
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PROP_START_SPEECH

@S4Integer(defaultValue=200)
public static final java.lang.String PROP_START_SPEECH
The Sphinx4 roperty for the minimum amount of time in speech (in milliseconds) to be considered as utterance start.

See Also:
Constant Field Values

PROP_END_SILENCE

@S4Integer(defaultValue=500)
public static final java.lang.String PROP_END_SILENCE
The SphinxProperty for the amount of time in silence (in milliseconds) to be considered as utterance end.

See Also:
Constant Field Values

PROP_SPEECH_LEADER

@S4Integer(defaultValue=100)
public static final java.lang.String PROP_SPEECH_LEADER
The SphinxProperty for the amount of time (in milliseconds) before speech start to be included as speech data.

See Also:
Constant Field Values

PROP_SPEECH_TRAILER

@S4Integer(defaultValue=100)
public static final java.lang.String PROP_SPEECH_TRAILER
The SphinxProperty for the amount of time (in milliseconds) after speech ends to be included as speech data.

See Also:
Constant Field Values
Constructor Detail

SpeechMarker

public SpeechMarker()
Method Detail

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component needs to be reconfigured.

Specified by:
newProperties in interface Configurable
Overrides:
newProperties in class BaseDataProcessor
Parameters:
ps - a property sheet holding the new data
Throws:
PropertyException - if there is a problem with the properties.

initialize

public void initialize()
Initializes this SpeechMarker

Specified by:
initialize in interface DataProcessor
Overrides:
initialize in class BaseDataProcessor

getData

public Data getData()
             throws DataProcessingException
Returns the next Data object.

Specified by:
getData in interface DataProcessor
Specified by:
getData in class BaseDataProcessor
Returns:
the next Data object, or null if none available
Throws:
DataProcessingException - if a data processing error occurs

getAudioTime

public int getAudioTime(edu.cmu.sphinx.frontend.endpoint.SpeechClassifiedData audio)
Returns the amount of audio data in milliseconds in the given SpeechClassifiedData object.

Parameters:
audio - the SpeechClassifiedData object
Returns:
the amount of audio data in milliseconds