Logo

bool sapi_textbased_lipsync::lipsync const std::wstring &  strAudioFile,
const std::wstring &  strText
[virtual]
 

start the asyncronous lipsync process given a text file and an audio file

The lipsync process runs asyncronously. Application will need to poll for completion or decide when it's time to bail.

The program first initializes the SAPI objects, and the loads the audio file.

We use, abuse the command and control grammar. We create a top level grammatical rule, and then add the source text as a lexical transition. I believe that internally the command and control grammar creates transitions for each word. At least that's what the hypothesis seem to show.

We disable speech processing until we have loaded the grammar.

We also preprocess the file removing punctuation and other "dirty" characters that seem to negatively impact performance.

See also:
run_sapi_textbased_lipsync for an example
Parameters:
strAudioFile - [in] name of the audio file
strText - [in] text transcript (not a file but the contents)
Return values:
true if the lipsync got started successfully
false - if the lipsync failed. call sapi_lipsync::getErrorString for a detailed description of the error.
create a new top level rule. We don't want to use dictation here so we assign our own grammar.

Definition at line 441 of file sapi_lipsync.cpp.

00442 {
00443     HRESULT hr;
00444     try
00445     {
00446         m_strInputText = strText;
00447         if (!this->initializeObjects())
00448             throw (HRESULT(E_FAIL));
00449         
00450         if (!this->loadAudio(strAudioFile))
00451              throw (HRESULT(E_FAIL));
00452         
00453         // initialize the grammar
00454             
00455         SPSTATEHANDLE hLipsyncRule;
00458         hr = this->m_grammar->GetRule(L"TextLipsync", NULL,
00459                             SPRAF_TopLevel | SPRAF_Active, TRUE,
00460                             &hLipsyncRule);
00461 
00462         if (hr != S_OK)
00463         {
00464             m_err = L"Failed to create grammar rule for text based lipsync";
00465             throw (hr);
00466         }
00467         
00468         // prepare text for text based lipsync. Tokenize out formatting, punctuation
00469         std::wstring strIn = preprocess_text(strText);
00470         // create the phrase inside the rule
00471         hr = m_grammar->AddWordTransition(hLipsyncRule, NULL, strIn.c_str(), 
00472             L" ", SPWT_LEXICAL, 1, NULL);
00473 
00474         if (hr != S_OK)
00475         {
00476             m_err = L"Failed to create lipsync rule for specified text transcription";
00477             throw (hr);
00478         }
00479         
00480         // finalize the grammar
00481         hr = m_grammar->Commit(0);
00482         if (hr != S_OK)
00483         {
00484             m_err = L"Failed to commit lipsync text rule for specified text transcription.";
00485             throw (hr);
00486         }
00487                 
00488         // turn the grammar on
00489         hr = m_grammar->SetGrammarState(SPGS_ENABLED);
00490         if (hr != S_OK)
00491         {
00492             m_err = L"Error: Failed to disable the grammar.";
00493             throw (hr);
00494         }
00495         // start up recognition
00496         m_recog->SetRecoState(SPRST_ACTIVE);
00497         // enable the rule
00498         m_grammar->SetRuleState(NULL, NULL, SPRS_ACTIVE);
00499 
00500         // now we should be running!
00501 
00502     }
00503     catch (HRESULT _hr)
00504     {
00505         hr = _hr;
00506     }
00507     return (hr == S_OK);
00508 }


Copyright (C) 2002-2005 Annosoft LLC. All Rights Reserved.
Visit us at www.annosoft.com