Programmer's Reference Manual

Plum Voice Platform v. 3.0

© 2008 Plum Group, Inc. All rights reserved.

4. TTS Speech Engine Characteristics

4.1 Voice Tag Attributes

<gender>:

AT&T Natural Voices, Cepstral Engine, and RealSpeak Engine:

This attribute works fine for these speech engines.

<age>:

AT&T Natural Voices:

This attribute is not supported.

Cepstral Engine:

This attribute looks for an exact match, instead of looking for the closest match. For example, <voice age="10"> will only select a ten-year-old voice, or fall back to the default voice if one is not found.

RealSpeak Engine:

This attribute is not supported.

<name>:

If you have an onsite system, please contact your sales account manager for which of these voices you have installed on your server.

The following names are supported by their respective engines:

For US production hosting:

AT&T Natural Voices:

Language Male TTS Voice Name Female TTS Voice Name
German (de_de) reiner klara
British English (en_uk) charles anjali, audrey
American English (en_us) mel, mike, ray, rich claire, crystal, julia, lauren
Spanish (es_us) alberto rosa
French (fr_fr) alain juliette

If no name is specified, mike is the default voice for AT&T Natural Voices.

Cepstral Engine (case-sensitive):

Language Male TTS Voice Name Female TTS Voice Name
German (de_de) Matthias Katrin
British English (en_uk) Lawrence Millie
American English (en_us) David, William Diane
Spanish (es_us) Miguel Marta
French (fr_fr) Jean-Pierre Isabelle
Italian (it_it) Vittoria

If no name is specified, Diane is the default voice for the Cepstral Engine.

RealSpeak Engine (case-sensitive):

Language Male TTS Voice Name Female TTS Voice Name
American English (en-US) Tom Jill, Samantha
Canadian French (fr-CA) Felix Julie
Mexican Spanish (es-MX) Javier Paulina
British English (en-GB) Daniel Emily
Austrailian English (en-AU) Lee Karen
Portuguese (pt-PT) Madalena
Brazilian Portuguese (pt-BR) Raquel

If no name is specified, Jill is the default voice for the Realspeak Engine.

Please contact your account manager if you want any of the following Realspeak voices:

Language Male TTS Voice Name Female TTS Voice Name
Danish (da-DK) Nanna
German (de-DE) Yannick Steffi
Indian English (en-IN) Sangeeta
Spanish (es-ES) Diego Isabel, Monica
Basque (eu-ES) Arantxa
French (fr-FR) Sebastien Virginie
Italian (it-IT) Silvia
Japanese (ja-JP) Kyoko
Korean (ko-KR) Narae
Korean (kr-KR) Narae
Belgian Dutch (nl-BE) Ellen
Dutch (nl-NL) Claire
Norwegian (no-NO) Nora
Polish (pl-PL) Agata
Russian (ru-RU) Katerina
Swedish (sv-SE) Ingrid
Mandarin Chinese (zh-CN) Mei-Ling
Hong Kong Cantonese (zh-HK) Sin-ji

For the RealSpeak Engine, this attribute MUST be used along with its corresponding xml:lang attribute if the language is not en-US (American English). For example, to hear the Mexican Spanish voice "Javier", one must type the following:

<speak xml:lang="es-MX"><voice name="Javier" gender="male">
¿Hacen usted tienen gusto de los huevos?
</voice></speak>

NOTE: For speech recognition, we currently only offer American English speech recogition, Spanish speech recognition, French-Canadian speech recognition for hosting. If you are interested in any other speech recognition languages, please contact your sales representative.

For UK production hosting:

AT&T Natural Voices:

Language Male TTS Voice Name Female TTS Voice Name
German (de_de) reiner klara
British English (en_uk) charles audrey
American English (en_us) mike, ray crystal, lauren
French (fr_fr) juliette

If no name is specified, charles is the default voice for AT&T Natural Voices.

Cepstral Engine (case-sensitive):

Language Male TTS Voice Name Female TTS Voice Name
German (de_de) Matthias Katrin
British English (en_uk) Lawrence Millie
American English (en_us) David, William Diane
Spanish (es_us) Miguel Marta
French (fr_fr) Jean-Pierre Isabelle
Italian (it_it) Vittoria

If no name is specified, Millie is the default voice for the Cepstral Engine.

RealSpeak Engine (case-sensitive):

Language Male TTS Voice Name Female TTS Voice Name
Belgian Dutch (nl-BE) Ellen
British English (en-GB) Daniel Emily
Dutch (nl-NL) Claire
French (fr-FR) Sebastien Virginie
German (de-DE) Yannick Steffi
Spanish (es-ES) Diego Isabel

If no name is specified, Emily is the default voice for the Realspeak Engine.

Please contact your account manager if you want any of the following Realspeak voices:

Language Male TTS Voice Name Female TTS Voice Name
Danish (da-DK) Nanna
Austrailian English (en-AU) Lee Karen
Indian English (en-IN) Sangeeta
American English (en-US) Tom Jennifer, Jill, Samantha
Spanish (es-ES) Monica
Mexican Spanish (es-MX) Javier Paulina
Basque (eu-ES) Arantxa
Canadian French (fr-CA) Felix Julie
Italian (it-IT) Paolo Silvia
Japanese (ja-JP) Kyoko
Korean (ko-KR) Narae
Korean (kr-KR) Narae
Norwegian (no-NO) Nora
Polish (pl-PL) Agata
Brazilian Portuguese (pt-BR) Raquel
Portuguese (pt-PT) Madalena
Russian (ru-RU) Katerina
Swedish (sv-SE) Ingrid
Mandarin Chinese (zh-CN) Mei-Ling
Hong Kong Cantonese (zh-HK) Sin-ji

For the RealSpeak Engine, this attribute MUST be used along with its corresponding xml:lang attribute if the language is not en-US (American English). For example, to hear the Mexican Spanish voice "Diego", one must type the following:

<speak xml:lang="es-ES"><voice name="Diego" gender="male">
¿Hacen usted tienen gusto de los huevos?
</voice></speak>

NOTE: For speech recognition, we currently only offer American English speech recogition and British English speech recognition for hosting. If you are interested in any other speech recognition languages, please contact your sales representative.

<xml:lang>:

If you have an onsite system, please contact your sales account manager for which of these languages you have installed on your server.

The following languages are supported by their respective engines:

For US production hosting:

AT&T Natural Voices:

de_de (German)
en_uk (British English)
en_us (American English)
es_us (Spanish)
fr_fr (French)

Cepstral Engine:

en_us (American English)

RealSpeak Engine:

en-US (American English)
es-MX (Mexican Spanish)
fr-CA (Canadian French)

Please contact your account manager if you want any of the following Realspeak languages:

da-DK (Danish)
de-CH (Swiss German)
de-DE (German)
en-AU (Australian English)
en-GB (British English)
en-IN (Indian English)
es-ES (Spanish)
eu-ES (Basque)
fr-BE (Belgian French)
fr-CH (Swiss French)
fr-FR (French)
it-CHC (Swiss Italian)
it-IT (Italian)
ja-JP (Japanese)
ko-KR (Korean)
kr-KR (Korean)
nl-BE (Belgian Dutch)
nl-NL (Dutch)
no-NO (Norwegian)
pl-PL (Polish)
pt-BR (Brazilian Portuguese)
pt-PT (Portuguese)
ru-RU (Russian)
sv-SE (Swedish)
zh-CN (Mandarin Chinese)
zh-HK (Hong Kong Cantonese)

For UK production hosting:

AT&T Natural Voices:

de_de (German)
en_uk (British English)
en_us (American English)
es_us (Spanish)
fr_fr (French)

Cepstral Engine:

en_us (American English)

RealSpeak Engine:

de-DE (German)
en-GB (British English)
fr-FR (French)
es-ES (Spanish)
nl-BE (Belgian Dutch)
nl-NL (Dutch)

Please contact your account manager if you want any of the following Realspeak languages:

da-DK (Danish)
de-CH (Swiss German)
en-AU (Australian English)
en-IN (Indian English)
en-US (American English)
es-MX (Mexican Spanish)
eu-ES (Basque)
fr-BE (Belgian French)
fr-CA (Canadian French)
fr-CH (Swiss French)
it-CHC (Swiss Italian)
it-IT (Italian)
ja-JP (Japanese)
ko-KR (Korean)
kr-KR (Korean)
no-NO (Norwegian)
pl-PL (Polish)
pt-BR (Brazilian Portuguese)
pt-PT (Portuguese)
ru-RU (Russian)
sv-SE (Swedish)
zh-CN (Mandarin Chinese)
zh-HK (Hong Kong Cantonese)

Note that different syntax is used for the xml:lang attribute for the RealSpeak Engine. For example, <voice xml:lang="fr-FR"> would have to be typed to hear a French speaker. For the AT&T Natural Voices Engine and Cepstral Engine, one would type <voice xml:lang="en_us"> to hear an American speaker.

4.2 Voice Child Tags

An "x" marks that the Child Tag is supported by the speech engine. An asterisk (*) means that there are notes to explain the difference between the speech engines.

Child Tag AT&T Natural Voices Cepstral Engine RealSpeak Engine
<break>* x x x
<emphasis>
<enumerate>
<mark>
<paragraph>* x x x
<phoneme>* x x
<prosody>* x x x
<say-as>* x x x
<sentence>* x x x
<speak> x x x
<sub> x x x
<value> x x x

<break>:

AT&T Natural Voices:

The break element works fine for when the voice speaker is en_us (American English) or when the language is set to en-us (American English). However, for the other languages (de_de (German), fr_fr (French), en_uk (British English), es_us (Spanish)), the "size" attribute does not work.

Cepstral Engine:

The "size" attribute of the break element does not work for the Cepstral Engine.

RealSpeak Engine:

The break element works fine for the RealSpeak Engine.

<paragraph>:

Cepstral Engine:

The "xml:lang" attribute does not work with the paragraph element.

<phoneme>:

AT&T Natural Voices and Cepstral Engine:

The phoneme element works fine using the Phoneme Sets shown below.

RealSpeak Engine:

This element is not supported.

Phoneme Set for AT&T Natural Voices:

US English:

Phoneme Example Transcription
aa Bob b aa b 1
ae bat b ae t 1
ah but b ah t 1
ao bought b ao t 1
aw down d aw n 1
ax about ax 0 b aw t 1
ay bite b ay t 1
b bet b eh t 1
ch church ch er ch 1
d dig d ih g
dh that dh ae t 1
dx butter b ah 1 dx er 0
eh bet b eh t 1
em Chatham ch ae 1 dx em 0
en satin s ae 1 q en 0
er bird b er d 1
ey bait b ey t 1
f fog f ao g 1
g got g aa t 1
hh hot h aa t 1
ih bit b ih t 1
iy beat b iy t 1
jh jump jh ah m p 1
k cat k ae t 1
l lot l aa t 1
m Mom m aa m 1
n nod n aa d 1
ng sing s ih ng 1
ow boat b ow t 1
oy boy b oy 1
p pot p aa t 1
q button b ah 1 q en 0
r rat r ae t 1
s sit s ih t 1
sh shut sh ah t 1
t top t aa p 1
th thick th ih k 1
uh book b uh k
uw boot b uw t 1
v vat v ae t 1
w won w ah n 1
y you y uw 1
z zoo z uw 1
zh measure m eh 1 zh er

0 Unstressed
1 Primary stress
2 Secondary stress
& Word boundary

UK English:

Phoneme Example Transcription
p point p OI n t 1
b big bIg1
t team t i: m 1
d dare de@1
k case k eI s 1
g good gUd1
dZ ginger dZ I n 1 dZ @ 0
tS check tS e k 1
f fool f u: l 1
v vest vest1
D this DIs1
T thick TIk1
s sell sel1
z zeal z i: l 1
S shoot S u: t 1
Z measure me1Z@0
h house h aU s 1
m main m eI n 1
n name n eI m 1
N sing sIN1
l life l aI f 1
@I bottle b Q 1 t @l 0
r right r aI t 1
j yes jes1
w wood wUd1
i: beat b i: t 1
I bit bIt1
eI bait b eI t 1
e bet bet1
A: father f A: 1 D @ 0
{ bat b{t1
@U boat b @U t 1
O: bought b O: t 1
Q boss bQs1
u: boot b u: t 1
U book bUk1
V but bVt1
3: bird b 3: d 1
aU bout b aU t 1
OI boy b OI 1
aI bite b aI t 1
@ scallop sk{1l@p0
I believe b I 0 l i: v 1

0 Unstressed
1 Primary stress
2 Secondary stress
& Word boundary

Phoneme Set for Cepstral Engine:

US English:

Phoneme Example Transcription
aa father f aa1 dh er0
ae cat k ae1 t
ah about ah0 b aw1 t
ao bought b ao1 t
aw cow k aw1
ay buy b ay1
b book b uh1 k
ch catch k eh1 ch
d bad b ae1 d
dh then dh eh1 n
eh get g eh1 t
er earth er1 th
ey ate ey1 t
f fat f ae1 t
g good g uh1 d
h hello h eh0 l ow1
i sheep sh i1 p
ih ship sh ih1 p
j yes j eh0 s
jh digit d ih1 jh ih0 t
k camera k ae1 m r ah0
l late l ey1 t
m man m ae1 n
n new n uw1
ng bang b ae1 ng
ow float f l ow1 t
oy boy b oy1
p camper k ae1 m p er0
r car k aa1 r
s sit s ih1 t
sh ship sh ih1 p
t tap t ae1 p
th thin th ih1 n
uh full f uh1 l
uw moon m uw1 n
v have h ae1 v
w water w ao1 t er0
z zero z i0 r ow0
zh vision v ih1 zh ah0 n

0 Unstressed
1 Primary stress
2 Secondary stress
& Word boundary

UK English:

Phoneme Example Transcription
t tap t ae1 p
p pat p ae1 t
b book b uh1 k
d done d ah1 n
k camera k ae1 m r ah0
g good g uh1 d
ch chart ch a1 t
jh jack jh ae1 k
f fat f ae1 t
v various v e@1 r i0 ih0 s
th thin th ih1 n
dh then dh eh1 n
s sit s ih1 t
z zero z i1 r ow0
sh clash k l ae1 sh
zh vision v ih1 zh ah0 n
h hello h eh1 l ow0
m man m ae1 n
n new n j uw1
ng sitting s ih1 t ih0 ng
r reason r i1 z ah0 n
l late l ey1 t
w water w ao1 t er0
j yellow j eh1 l ow0
i sheep sh i1 p
ih image ih1 m ih0 jh
eh end eh1 n d
ae bank b ae1 ng k
er earth er1 th
ah about ah1 b aw0 t
a father f a1 dh er0
oa on oa1 n
ao bought b ao1 t
uh could k uh1 d
uw moon m uw1 n
ay buy b ay1
aw cow k aw1
oy oyster oy1 s t er0
ow float f l ow1 t
ey bacon b ey1 k ah0 n
e@ fairly f e@1 l i0
i@ weary w i@1 r i0

0 Unstressed
1 Primary stress
2 Secondary stress
& Word boundary

<prosody>:

AT&T Natural Voices:

The prosody element works fine for this engine. You can specify a preset rate ("fast", "medium", "slow", or "default"). However, using a preset rate is not recommended because it either sets the voice rate to too slow or too fast. The "rate" attribute can also be set to an integer value such as "100.0" or "50.0". A normal voice rate should be set to around "150.0". These values are not in accordance with the SSML spec, where rates are specified relative to 1. Additionally, you can also adjust the voice rate by using percentages. To increase the rate you could type "+50%" to make the voice rate 50% faster or "-50%" to make the voice rate 50% slower.

Cepstral Engine:

The prosody element works fine for the Cepstral Engine. Also, the "pitch" attribute only works for the Cepstral Engine.

RealSpeak Engine:

When using a Realspeak TTS voice, the talking speed of the TTS voice does not revert back to the normal speed after the <prosody> tag has been used. To revert it back to normal, you must use the <prosody> tag again with the attribute of "volume" set to "100.0" and the attribute of "rate" set to "default".

<say-as>:

The table below shows the <say-as> tag types and the speech engines that support them. An "x" marks that the <say-as> tag is supported by the speech engine.

Say-as Tag Types AT&T Natural Voices Cepstral Engine RealSpeak Engine
acronym* x x
address x x x
number x x x
number:cardinal x x x
number:ordinal x x x
number:digits x x
number:decimal x x x
number:fraction x x x
number:telephone x x x
date x x x
date:dmy* x x x
date:mdy* x x x
date:ymd* x x
date:ym* x x
date:my* x x x
date:md* x x x
date:dm* x x x
date:y* x x x
date:m x x x
date:d x x
date:day x
digits x
duration x
duration:h x
duration:hm x
duration:m x
duration:ms x
duration:s x
measure* x x x
name x
net:email x x x
net:uri x x x
time* x x x
time:h x x x
time:hm x x x
time:hms x x
spell x
telephone* x x x
currency x x x

acronym: The acronym tag type works fine in the US, but does not work in the UK. If you are using AT&T Natural Voices and you want to spell out words or say back digits in the UK, you would have to use commas inside of a string such as "a, c, r, o, n, y, m" or "1, 2, 3, 4, 5".

date:mdy: The preferred format of this tag is "month abbreviation day, year". For example, to return "December 25, 2001", you would type "Dec 25, 2001". You can also use the "month/day/year" format such as "12/25/01" for the US, but this format will not work in the UK.

date:dmy: The preferred format of this tag is "day month abbreviation, year". For example, to return "December 25, 2001", you would type "25 Dec, 2001".

date:ymd: The preferred format for this tag is "year month abbreviation day". For example, to return "December 25, 2001", you would type "2001, Dec 25".

date:my: The format of this tag should be "month abbreviation, year". For example, to return "December, 2001", you would type "Dec, 2001".

date:md: The preferred format for this tag is "month abbreviation day". For example, to return "December 25", you would type "Dec 25". You can also use the "month/day" format such as "12/25" for the US, but this format will not work in the UK.

date:dm: The preferred format for this tag is "day month abbreviation". For example, to return "December 25", you would type "25 Dec".

date:ym: The preferred format for this tag is "year/month". For example, to return "December 2001", you would type "2001/12".

date:y: The date:y tag type works fine in the US, but does not work in the UK.

measure: For AT&T Natural Voices, you could use either a format such as 5'4" or 5m (5 meters). For Cepstral, the preferred format would follow one such as 5m. For Realspeak, the preferred format would follow one such as 5'4".

time: The time tag type works fine in the US, but does not work in the UK.

telephone: The telephone tag type works fine in the US, but does not work in the UK.

The format for telephone numbers is: 123-456-7890

The format for telephone extensions is: 123-456-7890 ext1234

NOTE: For extensions, AT&T Natural Voices and Realspeak will say the number back correctly. In the example above, AT&T Natural Voices and Realspeak will say, "one two three four five six seven eight nine zero, extension one two three four." However, Cepstral will say, "one two three four five six seven eight nine zero, extension twelve thirty-four." To account for this, you can insert commas between the numbers after extension: 123-456-7890 ext1,2,3,4.

<sentence>:

Cepstral Engine:

The xml:lang attribute does not work with the sentence element.