Logo

sapi_textbased_lipsync Class Reference

#include <sapi_lipsync.h>

Inheritance diagram for sapi_textbased_lipsync:

sapi_lipsync

Detailed Description

This class uses SAPI to align a text buffer with an audio file.

It uses some tricks here and the results are not always complete.

We create a SAPI grammar rule representing the input string. The idea is to use (abuse?) the Command & Control Speech engine of SAPI to match the input string.

This is very different from the textless lipsync which uses the continuous speech recognizer. (Hence the two subclasses of sapi_lipsync)

One problem, unless it's the audio file is very short, it always fails to fully "recognize" the text. To work around this problem, we listen for hypotheses from the SAPI object and store the one which records the most text. So basically, we accept hyphothesis and/or final answers depending on which one generates the most information.

See also:
run_sapi_textbased_lipsync for example code
Todo:
We noticed that the Command and Control system often "stops short" It might be interesting to segment the rest of the text with the remaining audio, do it until we are finished. Might be an interesting upgrade.

Definition at line 248 of file sapi_lipsync.h.

Public Member Functions

 sapi_textbased_lipsync ()
 constructor
 sapi_textbased_lipsync (phoneme_estimator *pEstimator)
 constructor with estimator object
virtual ~sapi_textbased_lipsync ()
 destructor
virtual bool lipsync (const std::wstring &strAudioFile, const std::wstring &strText)
 start the asyncronous lipsync process given a text file and an audio file
virtual void callback ()
 notifications from the sapi engine.
virtual void print_results (std::ostream &os)
 override of sapi_lipsync::print_results

Static Public Member Functions

static std::wstring preprocess_text (const std::wstring &in)
static bool is_dirty_char (wchar_t in)

Protected Attributes

std::wstring m_strResults
 the raw text string results. This is used to decide whether or not to accept a hyphothesis. If the result generates a longer string than m_strResults, we accept the hypothesis and generate phonemes, if not. we skip it
std::wstring m_strInputText
 input text. needed for printing anno file


The documentation for this class was generated from the following files:

Copyright (C) 2002-2005 Annosoft LLC. All Rights Reserved.
Visit us at www.annosoft.com