Logo

sapi_lipsync Class Reference

#include <sapi_lipsync.h>

Inheritance diagram for sapi_lipsync:

sapi_textbased_lipsync sapi_textless_lipsync

Detailed Description

base class for lipsync

This class handles all the SAPI objects and handles the management of the audio stream.

Subclasses will override the callback method and also handle grammar initialization of SAPI. These methods are different depending on whether text based, or textless lipsync is used.

The lipsync results are stored in sapi_lipsync.m_results. The raw results (before sapi_lipsync::finalize_phoneme_alignment is called) contain the timings and the recognized words, and phonemes, but don't include timing for each phoneme with the word.

sapi_lipsync::finalize_phoneme_alignment estimates the values and also inserts silence markers (empty orthography with a phoneme label 'x').

Applications can either use the raw results and generate their own timings, or use the default phoneme_estimator, to use the out of the box system.

See also:
sapi_textbased_lipsync, sapi_textless_lipsync, alignment_result, phoneme_estimator.

run_sapi_textbased_lipsync for example code

run_sapi_textless_lipsync for example code

Definition at line 122 of file sapi_lipsync.h.

Public Member Functions

 sapi_lipsync ()
 default constructor
 sapi_lipsync (phoneme_estimator *pEstimator)
 constructor with estimator object
virtual void close ()
 destroy objects
bool initializeObjects ()
 this method performs common initialization of SAPI objects
bool loadAudio (const std::wstring &audioFile)
 this method loads the audio file into the ISPStream
const std::wstring & getErrorString ()
 retrieve the error string
virtual bool isDone ()
 returns true if subclass thinks we are finished with async lipsync
std::vector< alignment_result > & get_phoneme_alignment ()
 this method returns the current best phoneme alignment
virtual void finalize_phoneme_alignment ()
 pretties up the phoneme alignment
virtual void print_results (std::ostream &os)
 prints the current best results to the specified stream
virtual void callback ()=0
 pure virtual function implemented by subclasses
long sapi_time_to_milli (ULONGLONG ts)
 converts a SAPI timestamp into a millisecond time
long bytes_to_milli (DWORD dwBytes)
 converts audio bytes into milliseconds using the sample rate of the audio stream

Static Protected Member Functions

static void _stdcall sapi_callback (WPARAM wParam, LPARAM lParam)
 static method used to receive notifications from SAPI

Protected Attributes

CComPtr< ISpRecognizer > m_recog
 the recognizer COM object.
CComPtr< ISpRecoContext > m_recogCntxt
 the recognizer context COM object
CComPtr< ISpRecoGrammar > m_grammar
 the grammar COM object
CComPtr< ISpPhoneConverter > m_phnCvt
 the phone converter object. Converts PHONEID into strings
CComPtr< ISpStream > m_audioStream
 the audio source object
WAVEFORMATEX * m_pWaveFmt
 wave format of the audio we are processing
phoneme_estimatorm_pPhnEstimator
 the phoneme estimator used to heuristically spread phoneme timings across the word. Can be NULL.
std::wstring m_err
 error description.
std::wstring m_strAudioFile
 audio file path. Needed for printing .anno file
std::vector< alignment_resultm_results
 results container
bool m_bDone
 subclasses will set this when the lipsync is done.


The documentation for this class was generated from the following files:

Copyright (C) 2002-2005 Annosoft LLC. All Rights Reserved.
Visit us at www.annosoft.com