Step 3.1: Set Up the AWS Command Line Interface (AWS CLI)
Follow the steps to download and configure the AWS CLI.
Important
You don't need the AWS CLI to perform the steps in this exercise. However, some of the exercises in this guide use the AWS CLI. You can skip this step and go to Step 3.2: Getting Started Exercise Using the AWS CLI (p. 8), and then set up the AWS CLI later when you need it.
To set up the AWS CLI
1. Download and configure the AWS CLI. For instructions, see the following topics in the AWS Command Line Interface User Guide:
• Getting Set Up with the AWS Command Line Interface
• Configuring the AWS Command Line Interface
2. Add a named profile for the administrator user in the AWS CLI config file. You use this profile when running the AWS CLI commands. For more information about named profiles, see Named Profiles in the AWS Command Line Interface User Guide.
[profile adminuser]
aws_access_key_id = adminuser access key ID
aws_secret_access_key = adminuser secret access key region = aws-region
For a list of available AWS Regions and those supported by Amazon Polly, see Regions and Endpoints in the Amazon Web Services General Reference.
NoteIf you're using the Region supported by Amazon Polly that you specified when you configured the AWS CLI, omit the following line from the AWS CLI code examples.
--region aws-region
3. Verify the setup by typing the following help command at the command prompt.
aws help
A list of valid AWS commands should appear in the AWS CLI window.
To enable Amazon Polly in the AWS CLI (optional)
If you have previously downloaded and configured the AWS CLI, Amazon Polly might not be available unless you reconfigure the AWS CLI. This procedure checks to see if this is necessary and provides instructions if Amazon Polly is not automatically available.
1. Verify the availability of Amazon Polly by typing the following help command at the AWS CLI command prompt.
aws polly help
If a description of Amazon Polly and a list of valid commands appears in the AWS CLI window, Amazon Polly is available in the AWS CLI and can be used immediately. In this case, you can skip the rest of this procedure. If this is not displayed, continue with Step 2.
2. Use one of the two following options to enable Amazon Polly:
a. Uninstall and reinstall the AWS CLI.
For instructions, see Installing the AWS Command Line Interface in the AWS Command Line Interface User Guide.
or
b. Download the file service-2.json.
At the command prompt, run the following command.
aws configure add-model --service-model file://service-2.json --service-name polly 3. Reverify the availability of Amazon Polly.
aws polly help
The description of Amazon Polly should be visible.
Next Step
Step 3.2: Getting Started Exercise Using the AWS CLI (p. 8)
Step 3.2: Getting Started Exercise Using the AWS CLI
Now you can test the speech synthesis offered by Amazon Polly. In this exercise, you call the
SynthesizeSpeech operation by passing in sample text. You can save the resulting audio as a file and verify its content.
The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full quotation marks (") around the input text with single quotes (') for interior tags.
aws polly synthesize-speech \ --output-format mp3 \ --voice-id Joanna \
--text 'Hello, my name is Joanna. I learned about the W3C on 10/3 of last year.' \ hello.mp3
In the call to synthesize-speech, you provide sample text for the synthesis, the voice to use (by providing a voice ID, explained in the following step 3), and the output format. The command saves the resulting audio to the hello.mp3 file.
In addition to the MP3 file, the operation sends the following output to the console.
{ "ContentType": "audio/mpeg", "RequestCharacters": "71"
}
2. Play the resulting hello.mp3 file to verify the synthesized speech.
3. Get the list of available voices by using the DescribeVoices operation. Run the following describe-voices AWS CLI command.
aws polly describe-voices
In response, Amazon Polly returns the list of all available voices. For each voice, the response provides the following metadata: voice ID, language code, language name, and the gender of the voice. The following is a sample response.
{
"Voices": [ {
"Gender": "Female", "Name": "Salli",
"LanguageName": "US English", "Id": "Salli",
"LanguageCode": "en-US"
}, {
"Gender": "Female", "Name": "Joanna",
"LanguageName": "US English", "Id": "Joanna",
"LanguageCode": "en-US"
} ] }
Optionally, you can specify the language code to find the available voices for a specific language.
Amazon Polly supports dozens of voices. The following example lists all the voices for Brazilian Portuguese.
--language-code pt-BR
For a list of language codes, see Languages Supported by Amazon Polly (p. 18). These language codes are W3C language identification tags (ISO 639 code for the language name-ISO 3166 country code). For example, en-US (US English), en-GB (British English), and es-ES (Spanish), etc.
You can also use the help option in the AWS CLI to get the list of language codes:
aws polly describe-voices help
Python Examples
This guide provides additional examples, some of which are Python code examples that use AWS SDK for Python (Boto) to make API calls to Amazon Polly. We recommend that you set up Python and test the example code provided in the following section. For additional examples, see Example Applications (p. 161).
Set Up Python and Test an Example (SDK)
To test the Python example code, you need the AWS SDK for Python (Boto). For instruction, see AWS SDK for Python (Boto3).
To test the example Python code
The following Python code example performs the following actions:
• Uses the AWS SDK for Python (Boto) to send a SynthesizeSpeech request to Amazon Polly (by providing simple text as input).
• Accesses the resulting audio stream in the response and saves the audio to a file (speech.mp3) on your local disk.
• Plays the audio file with the default audio player for your local system.
Save the code to a file (example.py) and run it.
"""Getting Started Example for Python 2.7+/3.3+"""
from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError from contextlib import closing
import os import sys import subprocess
from tempfile import gettempdir
# Create a client using the credentials and region defined in the [adminuser]
# section of the AWS credentials file (~/.aws/credentials).
session = Session(profile_name="adminuser") polly = session.client("polly")
try:
# Request speech synthesis
response = polly.synthesize_speech(Text="Hello world!", OutputFormat="mp3", VoiceId="Joanna")
except (BotoCoreError, ClientError) as error:
# The service returned an error, exit gracefully
# Access the audio stream from the response if "AudioStream" in response:
# Note: Closing the stream is important because the service throttles on the # number of parallel connections. Here we are using contextlib.closing to # ensure the close method of the stream object will be called automatically # at the end of the with statement's scope.
with closing(response["AudioStream"]) as stream:
output = os.path.join(gettempdir(), "speech.mp3") try:
# Open a file for writing the output as a binary stream with open(output, "wb") as file:
file.write(stream.read()) except IOError as error:
# Could not write to file, exit gracefully print(error)
sys.exit(-1) else:
# The response didn't contain audio data, exit gracefully print("Could not stream audio")
sys.exit(-1)
# Play the audio using the platform's default player if sys.platform == "win32":
os.startfile(output) else:
# The following works on macOS and Linux. (Darwin = mac, xdg-open = linux).
opener = "open" if sys.platform == "darwin" else "xdg-open"
subprocess.call([opener, output])
For additional examples including an example application, see Example Applications (p. 161).
Voices in Amazon Polly
Amazon Polly provides a number of different voices for you to use. To hear example voices, see the Amazon Polly product overview. To hear a specific voice speak a sample that you provide, you can use the Amazon Polly console. For instructions, see Listening to the Voices (p. 16).
Available Voices
Amazon Polly provides a variety of different voices in multiple languages for synthesizing speech from text.
Language Name/ID Gender Neural Voice Standard
Voice
Arabic (arb) Zeina Female No Yes
Chinese,
Voice
Hindi (hi-IN) Aditi* Female No Yes
Icelandic
(ko-KR) Seoyeon Female Yes Yes
Norwegian
(nb-NO) Liv Female No Yes
Language Name/ID Gender Neural Voice Standard
(ro-RO) Carmen Female No Yes
Russian
(sv-SE) Astrid Female No Yes
Turkish (tr-TR) Filiz Female No Yes
Welsh (cy-GB) Gwyneth Female No Yes
* This voice is bilingual and can speak both English and Hindi. For more information, see Bilingual Voices (p. 15).
** These voices can be used with Newscaster speaking styles when used with the Neural format. For more information, see NTTS Newscaster Speaking Style (p. 97).
customers. To learn more about Amazon Polly Brand Voices, please see Brand Voice.
Bilingual Voices
Amazon Polly has two ways of producing bilingual voices:
• Accented bilingual voices (p. 15)
• Fully bilingual voices (p. 15)
Accented bilingual voices
Accented bilingual voices can be created using any Amazon Polly voice, but only when using SSML tags.
Normally, all words in the input text are spoken in the default language of the voice specified you're using.
For example, if you're using the voice of Joanna (who speaks US English), Amazon Polly speaks the following in the Joanna voice without a French accent:
<speak>
Why didn't she just say, 'Je ne parle pas français?'
</speak>
In this case, the words Je ne parle pas français are spoken as they would be if they were English.
However, if you use the Joanna voice with the <lang> tag, Amazon Polly speaks the sentence in the Joanna voice in American-accented French:
<speak>
Why didn't she just say, <lang xml:lang="fr-FR">'Je ne parle pas français?'</lang>.
</speak>
Because Joanna is not a native French voice, pronunciation is based on her native language, US English.
For instance, although perfect French pronunciation features an uvual trill /R/ in the word français, Joanna's US English voice pronounces this phoneme as the corresponding sound /r/.
If you use the voice of Giorgio, who speaks Italian, with the following text, Amazon Polly speaks the sentence in Giorgio's voice with an Italian pronunciation:
<speak>
Mi piace Bruce Springsteen.
</speak>
Fully bilingual voices
A fully bilingual voice like Aditi (Indian English and Hindi) can speak two languages fluently. This gives you the ability to use words and phrases from both languages in a single text using the same voice.
Currently, Aditi is the only fully bilingual voice available.
Using a Bilingual Voice (Aditi)
Aditi speaks both Indian English (en-IN) and Hindi (hi-IN) fluently. You can synthesize speech in both English and Hindi, and the voice can switch between the two languages even within the same sentence.
Hindi can be used in two different forms:
• Devanagari: "उसेन कहँा, खेल तोह अब शुूर होगा"
• Romanagari (using the Latin alphabet): "Usne kahan, khel toh ab shuru hoga"
Additionally, it's possible to mix English and Hindi of either or both forms within a single sentence:
• Devanagari + English: "This is the song कभी कभी अदिति"
• Romanagari + English: "This is the song from the movie Jaane Tu Ya Jaane Na."
• Devanagari + Romanagari + English: "This is the song कभी कभी अदिति from the movie Jaane Tu Ya Jaane Na."
Because Aditi is a bilingual voice, text in all of these cases will be read correctly, as Amazon Polly can differentiate between the languages and scripts.
Amazon Polly also supports numbers, dates, times, and currency expansion in both English (Arabic numerals) and Hindi (Devanagari numerals). By default, Arabic numerals are read in Indian English. To make Amazon Polly read them in Hindi, you must use the hi-IN language code parameter.
Listening to the Voices
You can use the Amazon Polly console to hear a sample from any of the voices available in Amazon Polly To listen to a voice in Amazon Polly
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
2. Choose the Text-to-Speech tab.
3. For Engine, choose Standard or Neural.
4. Choose a language and a Region, then choose a voice.
5. Enter text for the voice to speak or use the default phrase, and then choose Listen.
You can choose any of the languages offered by Amazon Polly and the console will display the voices available for that language. In most cases, there will be at least one male and one female voice, often more than one of each. A few only have a single voice. For a complete list, see Voices in Amazon Polly (p. 12)
NoteThe inventory of voices and the number of languages included is continually being updated to include additional choices. To suggest a new language or voice, feel free to provide feedback on this page. Unfortunately, we are not able to comment on plans for specific new languages be they are released.
Each voice is created using native language speakers, so there are variations from voice to voice, even within the same language. When selecting a voice for your project, you should test each of the possible voices with a passage of text to see which best suits your needs.
Voice Speed
Because of the natural variation between voices, each available voice will speak the text at slightly different speeds. For instance, with US English voices, Ivy and Joanna are slightly faster than Matthew when saying "Mary had a little lamb," and considerably faster than Joey.
you can find how long it takes for your voice to say the selected text using SpeechMarks. For more information on using speechmarks in Amazon Polly, see Using Speech Marks (p. 101)
To see approximately how long it takes to speak a text passage 1. Open the AWS CLI.
2. Run the following code, filling in as needed
aws polly synthesize-speech \
--language-code optional language code if needed --output-format json \
--voice-id [name of desired voice] \ --text '[desired text]' \
--speech-mark-types='["viseme"]' \ LengthOfText.txt
3. Open LengthOfText.txt
If the text were "Mary had a little lamb," the last few lines returned by Amazon Polly would be:
{"time":882,"type":"viseme","value":"t"}
{"time":964,"type":"viseme","value":"a"}
{"time":1082,"type":"viseme","value":"p"}
The last viseme, essentially the sound for the final letters in "lamb" starts 1082 milliseconds after the beginning of the speech. While this is not exactly the length of the audio, it's close and can serve as the basis for comparison between voices.
Changing Your Voice Speed
For certain applications, you may find that you'd prefer the voice you like be slowed down, or speeded up. If the speed of the voice is a concern, Amazon Polly provides the ability to modify this using SSML tags.
For example:
Your organization is making an application that reads books to immigrant audiences. The audience speaks English, but their fluency is limited. In this case, you might consider slowing the rate of speech to give your audience a little more time for comprehension while the application is speaking.
Amazon Polly helps you slow down the rate of speech using the SSML <prosody> tag, as in:
<speak>
In some cases, it might help your audience to <prosody rate="85%">slow the speaking rate slightly to aid in comprehension.</prosody>
</speak>
or
<speak>
In some cases, it might help your audience to <prosody rate="slow">slow the speaking rate slightly to aid in comprehension.</prosody>
</speak>
Two speed options are available to you when using SSML with Amazon Polly:
• Preset speeds: x-slow, slow, medium, fast, and x-fast. In these cases, the speed of each option is approximate, depending on your preferred voice. The medium option is the normal speed of the voice.
• n% of speech rate: any percentage of the speech rate, between 20% and 200% can be used. In these cases, you can choose exactly the speed you want. However, the actual speed of the voice is approximate, depending on the voice you've chosen. 100% is considered to be the normal speed of the voice.
Because the speed of each option is approximate and depends on the voice you choose, we recommend that you test your selected voice at various speeds to see what exactly meets your needs.
For more information on using the prosody tag to best effect, see Controlling Volume, Speaking Rate, and Pitch (p. 118)
Languages Supported by Amazon Polly
The following languages are supported by Amazon Polly and can be used to synthesize speech. With each language is the language code. These language codes are W3C language identification tags (ISO 639-3 for the language name and ISO 3166 for the country code).
For in-depth tables showing the phonemes and visemes associated with each language, choose the link on each language in the table below.
Language Language Code
Arabic (p. 20) arb
Chinese, Mandarin (p. 23) cmn-CN
Danish (p. 25) da-DK
Dutch (p. 28) nl-NL
English, Australian (p. 30) en-AU
English, British (p. 35) en-GB
English, Indian (p. 37) en-IN
English, New Zealand (p. 40) en-NZ
English, South African (p. 44) en-ZA
English, US (p. 33) en-US
English, Welsh (p. 46) en-GB-WLS
French (p. 49) fr-FR
French, Canadian (p. 51) fr-CA
Hindi (p. 56) hi-IN
German (p. 53) de-DE
Icelandic (p. 58) is-IS
Italian (p. 61) it-IT
Japanese (p. 63) ja-JP
Korean (p. 65) ko-KR
Norwegian (p. 66) nb-NO
Polish (p. 69) pl-PL
Portuguese, Brazilian (p. 73) pt-BR
Portuguese, European (p. 71) pt-PT
Romanian (p. 75) ro-RO
Russian (p. 77) ru-RU
Spanish, European (p. 79) es-ES
Spanish, Mexican (p. 81) es-MX
Spanish, US (p. 83) es-US
Swedish (p. 85) sv-SE
Turkish (p. 87) tr-TR
Welsh (p. 90) cy-GB
For more information, see Phoneme and Viseme Tables for Supported Languages (p. 19).
Phoneme and Viseme Tables for Supported Languages
The following tables list the phonemes for the languages supported by Amazon Polly, along with examples and the corresponding visemes.
Topics
• Arabic (arb) (p. 20)
• Chinese, Mandarin (cmn-CN) (p. 23)
• Danish (da-DK) (p. 25)
• Dutch (nl-NL) (p. 28)
• English, Australian (en-AU) (p. 30)
• English, American (en-US) (p. 33)
• English, British (en-GB) (p. 35)
• English, Indian (en-IN) (p. 37)
• English, New Zealand (en-NZ) (p. 40)
• English, South African (en-ZA) (p. 44)
• English, Welsh (en-GB-WSL) (p. 46)
• French (fr-FR) (p. 49)
• French, Canadian (fr-CA) (p. 51)
• German (de-DE) (p. 53)
• Hindi (hi-IN) (p. 56)
• Icelandic (is-IS) (p. 58)
• Italian (it-IT) (p. 61)
• Japanese (ja-JP) (p. 63)
• Korean (ko-KR) (p. 65)
• Norwegian (nb-NO) (p. 66)
• Polish (pl-PL) (p. 69)
• Portuguese (pt-PT) (p. 71)
• Portuguese, Brazilian (pt-BR) (p. 73)
• Romanian (ro-RO) (p. 75)
• Russian (ru-RU) (p. 77)
• Spanish (es-ES) (p. 79)
• Spanish, Mexican (es-MX) (p. 81)
• Spanish, US (es-US) (p. 83)
• Swedish (sv-SE) (p. 85)
• Turkish (tr-TR) (p. 87)
• Welsh (cy-GB) (p. 90)
Arabic (arb)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the Arabic voice of Zeina that is supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
ʔ ? glottal stop انَأ
ʕ ?\ voiced pharyngeal
fricative رَمُع k
b b voiced bilabial
plosive دَلَب p
d d voiced alveolar
plosive يراد t
dˤ d_?\ emphatic voiced
alveolar plosive ءوَض t
d͡ʒ dZ voiced postalveolar
affricate ليمَج S
ð D voiced dental
fricative َكِلذ T
ðˤ D_?\ emphatic voiced
dental fricative مالَظ T
f f voiceless labiodental
fricative لصَف f
ɡ g voiced velar plosive ارتلجنإ k
ɣ G voiced velar fricative برَغ k
h h voiceless glottal
fricative اذه k
j j palatal approximant يشمَي i
k k voiceless velar
plosive بلَك k
l l alveolar lateral
approximant ىقال t
lˠ l_G emphatic alveolar
lateral approximant هللا t
m m bilabial nasal اذام p
n n alveolar nasal رون t
p p voiceless bilabial
plosive سبَح p
q q voiceless uvular
plosive بيرَق k
r r alveolar trill لمَر r
s s voiceless alveolar
fricative لاؤُس s
sˤ s_?\ emphatic voiceless
alveolar fricative بِحاص s
ʃ S voiceless
postalveolar fricative ركُش S
t t voiceless alveolar
plosive رمَت t
tˤ t_?\ emphatic voiceless
alveolar plosive بِلاط t
θ T voiceless dental
fricative ثالَث T
v v voiced labiodental
fricative نيماتيف f
w w labio-velar
approximant دَلَو u
x x voiceless velar
fricative فْوَخ k
IPA X-SAMPA Description Example Viseme
ħ X\ voiceless pharyngeal
fricative َلْوَح k
z z voiced alveolar
fricative روهُز s
Vowels
a a open front
unrounded vowel درَب a
aː a: long open front
unrounded vowel راد a
ɑˤ A_?\ emphatic open back
unrounded vowel لبَط a
ɑˤː A_?\: emphatic long open
back unrounded vowel
مِلاظ a
u u close back rounded
vowel برُش u
u: u: long close back
rounded vowel روس u
uˤ u_?\ emphatic close back
rounded vowel ّدُب u
uˤː u_?\: emphatic long close
back rounded vowel لوط u
i i close front
unrounded vowel تنِب i
iː i: long close front
unrounded vowel نيزَح i
iˤ i_?\ emphatic close front
unrounded vowel ّدِض i
iˤː i_?\: emphatic long close
front unrounded vowel
يضام i
e e close-mid front
unrounded vowel تكرام e
eː e: long close-mid front
unrounded vowel ليدوم e
ɔ O open-mid back
rounded vowel يجولونكت O
rounded vowel يجولونكت O