sapi_lipsync Class Reference

#include <sapi_lipsync.h>

Inheritance diagram for sapi_lipsync:

Detailed Description

base class for lipsync

This class handles all the SAPI objects and handles the management of the audio stream.

Subclasses will override the callback method and also handle grammar initialization of SAPI. These methods are different depending on whether text based, or textless lipsync is used.

The lipsync results are stored in sapi_lipsync.m_results. The raw results (before sapi_lipsync::finalize_phoneme_alignment is called) contain the timings and the recognized words, and phonemes, but don't include timing for each phoneme with the word.

sapi_lipsync::finalize_phoneme_alignment estimates the values and also inserts silence markers (empty orthography with a phoneme label 'x').

Applications can either use the raw results and generate their own timings, or use the default phoneme_estimator, to use the out of the box system.

run_sapi_textbased_lipsync for example code

run_sapi_textless_lipsync for example code

Definition at line 122 of file sapi_lipsync.h.

Public Member Functions

sapi_lipsync ()

default constructor

sapi_lipsync (phoneme_estimator *pEstimator)

constructor with estimator object

virtual void close ()

destroy objects

bool initializeObjects ()

this method performs common initialization of SAPI objects

bool loadAudio (const std::wstring &audioFile)

this method loads the audio file into the ISPStream

const std::wstring & getErrorString ()

retrieve the error string

virtual bool isDone ()

returns true if subclass thinks we are finished with async lipsync

std::vector< alignment_result > & get_phoneme_alignment ()

this method returns the current best phoneme alignment

virtual void finalize_phoneme_alignment ()

pretties up the phoneme alignment

virtual void print_results (std::ostream &os)

prints the current best results to the specified stream

virtual void callback ()=0

pure virtual function implemented by subclasses

long sapi_time_to_milli (ULONGLONG ts)

converts a SAPI timestamp into a millisecond time

long bytes_to_milli (DWORD dwBytes)

converts audio bytes into milliseconds using the sample rate of the audio stream

Static Protected Member Functions

static void _stdcall sapi_callback (WPARAM wParam, LPARAM lParam)

static method used to receive notifications from SAPI

Protected Attributes

CComPtr< ISpRecognizer > m_recog

the recognizer COM object.

CComPtr< ISpRecoContext > m_recogCntxt

the recognizer context COM object

CComPtr< ISpRecoGrammar > m_grammar

the grammar COM object

CComPtr< ISpPhoneConverter > m_phnCvt

the phone converter object. Converts PHONEID into strings

CComPtr< ISpStream > m_audioStream

the audio source object

WAVEFORMATEX * m_pWaveFmt

wave format of the audio we are processing

phoneme_estimator * m_pPhnEstimator

the phoneme estimator used to heuristically spread phoneme timings across the word. Can be NULL.

std::wstring m_err

error description.

std::wstring m_strAudioFile

audio file path. Needed for printing .anno file

std::vector< alignment_result > m_results

results container

bool m_bDone

subclasses will set this when the lipsync is done.

The documentation for this class was generated from the following files:

sapi_lipsync/sapi_lipsync.h
sapi_lipsync/sapi_lipsync.cpp