Information on converting phonemes to visemes (visual display)
Preface this page with a statement that the author is not an artist. I offer
"plausible" mapping from phonemes to visemes based on information produced by
others and on some years of experience in the area of lipsync.
Although smooth 3d lipsync requires more than a simple phoneme to viseme
mapping. A basic mapping is critical to begin the work. These "basic" mouth
positions can then be twisted and contoured to satisfy co-articulations,
allophones, and intensity information in the sound.
With the basic mapping in hand, a series of transformations can be performed
that takes the recognition output and produces visemes.
The phoneme-2-viseme transformation used in the realtime demo (realtime) is simplistic on the scale of things. It starts with the raw 'all on' visemes
shown below. It then simply picks the longest phoneme within a 50 millisecond
window. It morphs between the viseme representation (for the target phoneme)
and the silence viseme. The morph value is a function of the energy of the
audio
signal
as
well as other ad-hoc rules.
The default phoneme to viseme transformation used in the Lipsync Tool is more complex. It uses an approach based on articulation theory which is implemented
in the SDK and available to SDK customers.
For bitmapped graphics, where interpolation
is not possible, it is
a good idea to either create mouth visemes which are not as exaggerated as
the mouths specified below
or to assign two graphics for each voiced viseme, a hi and low. With {hi, lo}
tuples and intensity information returned from the SDK, better mouth contouring
can be achieved. Also, for bitmapped graphics, it is advisable to choose
larger display frames of time. (83 milliseconds or so), and pick the phoneme
which
covers the most time within a given frame, or window of time.
On with the basic mappings:
To start, we need a list of 40 phonemes used by the Annosoft, LLC lipsync
recognizer at this time:
Annosoft LLC, Basic Phoneset
| label |
word |
example transcription |
| x |
silence |
|
| IY |
eat |
IY t |
| IH |
it |
IH t |
| EH |
Ed |
EH d |
| AE |
at |
AE t |
| AH |
hut |
h AH t |
| UW |
two |
t UW |
| UH |
hood |
h UH d |
| AA |
odd |
AA d |
| AO |
ought |
AO t |
| EY |
ate |
EY t |
| AY |
hide |
h AY d |
| OY |
toy |
t OY |
| AW |
cow |
k AW |
| OW |
oat |
OW t |
| l |
lee |
l IY |
| r |
read |
r IY d |
| y |
yield |
y IY l d |
| w |
we |
w IY |
| ER |
hurt |
h ER t |
| m |
me |
m IY |
| n |
knee |
n IY |
| NG |
ping |
p IH NG |
| CH |
cheese |
CH IY z |
| j |
gee |
j IY |
| DH |
thee |
DH IY |
| b |
be |
b IY |
| d |
dee |
d IY |
| g |
green |
g r IY n |
| p |
pee |
p IY |
| t |
tea |
t IY |
| k |
key |
k IY |
| z |
zee |
z IY |
| ZH |
seizure |
s IY ZH ER |
| v |
vee |
v IY |
| f |
fee |
f IY |
| TH |
theta |
TH EY t AH |
| s |
sea |
s IY |
| SH |
she |
SH IY |
| h |
he |
h IY |
|
|
|
|
|
|
Set 1 (10 mouths)
