Monday, May 25, 2015

Dynamically creating dictionaries for Julius with eSpeak

In working to port Saera to the Jolla, I wanted to add a feature that is quite useful for a voice control program: the ability to play a song by name. The issue with this is that song names tend to use many words which are not in the VoxForge dictionaries, and certainly too many to create into a grammar by hand. But there is a piece of software that I am using already, which creates pronunciations for arbitrary words: eSpeak! It turns out that when given the -x flag, eSpeak outputs the phonemes that it creates. For example:
$ espeak -x america
 a#m'ErIk@
$
But there is a problem here: those are eSpeak phonemes, not Julius phonemes. Julius would define the pronunciation of "america" as:
ax m eh r ax k ax
I couldn't find a straightforward mapping from one to the other, so I created one:

eSpeak Julius
O@r-' er
aI@ ay ax
aU@ aw er
O@r ao r
nkI ng k iy
tS ch
dZ jh
@L ax l
@2 ax
@5 uw
@r er
3: er
3r er
aa ae
a# ax
A: aa
A@ aa r
e@ eh r
I2 ix
i: iy
i@ ih
u: uw
U@ uh r
O: ao
O@ ao r
o@ ao r
aI ay
eI ey
OI oy
aU aw
oU ow
tt t
b b
d d
f f
g g
h hh
k k
l l
m m
n n
N ng
p p
r r
s s
S sh
Z zh
t t
T th
D dh
v v
j y
w w
z z
@ ax
3 er
a ae
E eh
I ih
i iy
0 aa
V ah
U uh
'
,
;
:
-
!
_



Now the only thing left was to create a Julius grammar out of the generated phonemes.