Getting Started Exercise

Step 3.1: Set Up the AWS Command Line Interface (AWS CLI)

Follow the steps to download and conﬁgure the AWS CLI.

Important

You don't need the AWS CLI to perform the steps in this exercise. However, some of the exercises in this guide use the AWS CLI. You can skip this step and go to Step 3.2: Getting Started Exercise Using the AWS CLI (p. 8), and then set up the AWS CLI later when you need it.

To set up the AWS CLI

1. Download and conﬁgure the AWS CLI. For instructions, see the following topics in the AWS Command Line Interface User Guide:

• Getting Set Up with the AWS Command Line Interface

• Conﬁguring the AWS Command Line Interface

2. Add a named proﬁle for the administrator user in the AWS CLI conﬁg ﬁle. You use this proﬁle when running the AWS CLI commands. For more information about named proﬁles, see Named Proﬁles in the AWS Command Line Interface User Guide.

[profile adminuser]

aws_access_key_id = adminuser access key ID

aws_secret_access_key = adminuser secret access key region = aws-region

For a list of available AWS Regions and those supported by Amazon Polly, see Regions and Endpoints in the Amazon Web Services General Reference.

NoteIf you're using the Region supported by Amazon Polly that you speciﬁed when you conﬁgured the AWS CLI, omit the following line from the AWS CLI code examples.

--region aws-region

3. Verify the setup by typing the following help command at the command prompt.

aws help

A list of valid AWS commands should appear in the AWS CLI window.

To enable Amazon Polly in the AWS CLI (optional)

If you have previously downloaded and conﬁgured the AWS CLI, Amazon Polly might not be available unless you reconﬁgure the AWS CLI. This procedure checks to see if this is necessary and provides instructions if Amazon Polly is not automatically available.

1. Verify the availability of Amazon Polly by typing the following help command at the AWS CLI command prompt.

aws polly help

If a description of Amazon Polly and a list of valid commands appears in the AWS CLI window, Amazon Polly is available in the AWS CLI and can be used immediately. In this case, you can skip the rest of this procedure. If this is not displayed, continue with Step 2.

2. Use one of the two following options to enable Amazon Polly:

a. Uninstall and reinstall the AWS CLI.

For instructions, see Installing the AWS Command Line Interface in the AWS Command Line Interface User Guide.

b. Download the ﬁle service-2.json.

At the command prompt, run the following command.

aws configure add-model --service-model file://service-2.json --service-name polly 3. Reverify the availability of Amazon Polly.

aws polly help

The description of Amazon Polly should be visible.

Next Step

Step 3.2: Getting Started Exercise Using the AWS CLI (p. 8)

Step 3.2: Getting Started Exercise Using the AWS CLI

Now you can test the speech synthesis oﬀered by Amazon Polly. In this exercise, you call the

SynthesizeSpeech operation by passing in sample text. You can save the resulting audio as a ﬁle and verify its content.

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full quotation marks (") around the input text with single quotes (') for interior tags.

aws polly synthesize-speech \ --output-format mp3 \ --voice-id Joanna \

--text 'Hello, my name is Joanna. I learned about the W3C on 10/3 of last year.' \ hello.mp3

In the call to synthesize-speech, you provide sample text for the synthesis, the voice to use (by providing a voice ID, explained in the following step 3), and the output format. The command saves the resulting audio to the hello.mp3 ﬁle.

In addition to the MP3 ﬁle, the operation sends the following output to the console.

{ "ContentType": "audio/mpeg", "RequestCharacters": "71"

}

2. Play the resulting hello.mp3 ﬁle to verify the synthesized speech.

3. Get the list of available voices by using the DescribeVoices operation. Run the following describe-voices AWS CLI command.

aws polly describe-voices

In response, Amazon Polly returns the list of all available voices. For each voice, the response provides the following metadata: voice ID, language code, language name, and the gender of the voice. The following is a sample response.

{

"Voices": [ {

"Gender": "Female", "Name": "Salli",

"LanguageName": "US English", "Id": "Salli",

"LanguageCode": "en-US"

}, {

"Gender": "Female", "Name": "Joanna",

"LanguageName": "US English", "Id": "Joanna",

"LanguageCode": "en-US"

} ] }

Optionally, you can specify the language code to ﬁnd the available voices for a speciﬁc language.

Amazon Polly supports dozens of voices. The following example lists all the voices for Brazilian Portuguese.

--language-code pt-BR

For a list of language codes, see Languages Supported by Amazon Polly (p. 18). These language codes are W3C language identiﬁcation tags (ISO 639 code for the language name-ISO 3166 country code). For example, en-US (US English), en-GB (British English), and es-ES (Spanish), etc.

You can also use the help option in the AWS CLI to get the list of language codes:

aws polly describe-voices help

Python Examples

This guide provides additional examples, some of which are Python code examples that use AWS SDK for Python (Boto) to make API calls to Amazon Polly. We recommend that you set up Python and test the example code provided in the following section. For additional examples, see Example Applications (p. 161).

Set Up Python and Test an Example (SDK)

To test the Python example code, you need the AWS SDK for Python (Boto). For instruction, see AWS SDK for Python (Boto3).

To test the example Python code

The following Python code example performs the following actions:

• Uses the AWS SDK for Python (Boto) to send a SynthesizeSpeech request to Amazon Polly (by providing simple text as input).

• Accesses the resulting audio stream in the response and saves the audio to a ﬁle (speech.mp3) on your local disk.

• Plays the audio ﬁle with the default audio player for your local system.

Save the code to a ﬁle (example.py) and run it.

"""Getting Started Example for Python 2.7+/3.3+"""

from boto3 import Session

from botocore.exceptions import BotoCoreError, ClientError from contextlib import closing

import os import sys import subprocess

from tempfile import gettempdir

# Create a client using the credentials and region defined in the [adminuser]

# section of the AWS credentials file (~/.aws/credentials).

session = Session(profile_name="adminuser") polly = session.client("polly")

try:

# Request speech synthesis

response = polly.synthesize_speech(Text="Hello world!", OutputFormat="mp3", VoiceId="Joanna")

except (BotoCoreError, ClientError) as error:

# The service returned an error, exit gracefully

# Access the audio stream from the response if "AudioStream" in response:

# Note: Closing the stream is important because the service throttles on the # number of parallel connections. Here we are using contextlib.closing to # ensure the close method of the stream object will be called automatically # at the end of the with statement's scope.

with closing(response["AudioStream"]) as stream:

output = os.path.join(gettempdir(), "speech.mp3") try:

# Open a file for writing the output as a binary stream with open(output, "wb") as file:

file.write(stream.read()) except IOError as error:

# Could not write to file, exit gracefully print(error)

sys.exit(-1) else:

# The response didn't contain audio data, exit gracefully print("Could not stream audio")

sys.exit(-1)

# Play the audio using the platform's default player if sys.platform == "win32":

os.startfile(output) else:

# The following works on macOS and Linux. (Darwin = mac, xdg-open = linux).

opener = "open" if sys.platform == "darwin" else "xdg-open"

subprocess.call([opener, output])

For additional examples including an example application, see Example Applications (p. 161).

Voices in Amazon Polly

Amazon Polly provides a number of diﬀerent voices for you to use. To hear example voices, see the Amazon Polly product overview. To hear a speciﬁc voice speak a sample that you provide, you can use the Amazon Polly console. For instructions, see Listening to the Voices (p. 16).

Available Voices

Amazon Polly provides a variety of diﬀerent voices in multiple languages for synthesizing speech from text.

Language Name/ID Gender Neural Voice Standard

Voice

Arabic (arb) Zeina Female No Yes

Chinese,

Voice

Hindi (hi-IN) Aditi* Female No Yes

Icelandic

(ko-KR) Seoyeon Female Yes Yes

Norwegian

(nb-NO) Liv Female No Yes

Language Name/ID Gender Neural Voice Standard

(ro-RO) Carmen Female No Yes

Russian

(sv-SE) Astrid Female No Yes

Turkish (tr-TR) Filiz Female No Yes

Welsh (cy-GB) Gwyneth Female No Yes

* This voice is bilingual and can speak both English and Hindi. For more information, see Bilingual Voices (p. 15).

** These voices can be used with Newscaster speaking styles when used with the Neural format. For more information, see NTTS Newscaster Speaking Style (p. 97).

customers. To learn more about Amazon Polly Brand Voices, please see Brand Voice.

Bilingual Voices

Amazon Polly has two ways of producing bilingual voices:

• Accented bilingual voices (p. 15)

• Fully bilingual voices (p. 15)

Accented bilingual voices

Accented bilingual voices can be created using any Amazon Polly voice, but only when using SSML tags.

Normally, all words in the input text are spoken in the default language of the voice speciﬁed you're using.

For example, if you're using the voice of Joanna (who speaks US English), Amazon Polly speaks the following in the Joanna voice without a French accent:

<speak>

Why didn't she just say, 'Je ne parle pas français?'

</speak>

In this case, the words Je ne parle pas français are spoken as they would be if they were English.

However, if you use the Joanna voice with the <lang> tag, Amazon Polly speaks the sentence in the Joanna voice in American-accented French:

<speak>

Why didn't she just say, <lang xml:lang="fr-FR">'Je ne parle pas français?'</lang>.

</speak>

Because Joanna is not a native French voice, pronunciation is based on her native language, US English.

For instance, although perfect French pronunciation features an uvual trill /R/ in the word français, Joanna's US English voice pronounces this phoneme as the corresponding sound /r/.

If you use the voice of Giorgio, who speaks Italian, with the following text, Amazon Polly speaks the sentence in Giorgio's voice with an Italian pronunciation:

<speak>

Mi piace Bruce Springsteen.

</speak>

Fully bilingual voices

A fully bilingual voice like Aditi (Indian English and Hindi) can speak two languages ﬂuently. This gives you the ability to use words and phrases from both languages in a single text using the same voice.

Currently, Aditi is the only fully bilingual voice available.

Using a Bilingual Voice (Aditi)

Aditi speaks both Indian English (en-IN) and Hindi (hi-IN) ﬂuently. You can synthesize speech in both English and Hindi, and the voice can switch between the two languages even within the same sentence.

Hindi can be used in two diﬀerent forms:

• Devanagari: "उसेन कहँा, खेल तोह अब शुूर होगा"

• Romanagari (using the Latin alphabet): "Usne kahan, khel toh ab shuru hoga"

Additionally, it's possible to mix English and Hindi of either or both forms within a single sentence:

• Devanagari + English: "This is the song कभी कभी अदिति"

• Romanagari + English: "This is the song from the movie Jaane Tu Ya Jaane Na."

• Devanagari + Romanagari + English: "This is the song कभी कभी अदिति from the movie Jaane Tu Ya Jaane Na."

Because Aditi is a bilingual voice, text in all of these cases will be read correctly, as Amazon Polly can diﬀerentiate between the languages and scripts.

Amazon Polly also supports numbers, dates, times, and currency expansion in both English (Arabic numerals) and Hindi (Devanagari numerals). By default, Arabic numerals are read in Indian English. To make Amazon Polly read them in Hindi, you must use the hi-IN language code parameter.

Listening to the Voices

You can use the Amazon Polly console to hear a sample from any of the voices available in Amazon Polly To listen to a voice in Amazon Polly

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

2. Choose the Text-to-Speech tab.

3. For Engine, choose Standard or Neural.

4. Choose a language and a Region, then choose a voice.

5. Enter text for the voice to speak or use the default phrase, and then choose Listen.

You can choose any of the languages oﬀered by Amazon Polly and the console will display the voices available for that language. In most cases, there will be at least one male and one female voice, often more than one of each. A few only have a single voice. For a complete list, see Voices in Amazon Polly (p. 12)

NoteThe inventory of voices and the number of languages included is continually being updated to include additional choices. To suggest a new language or voice, feel free to provide feedback on this page. Unfortunately, we are not able to comment on plans for speciﬁc new languages be they are released.

Each voice is created using native language speakers, so there are variations from voice to voice, even within the same language. When selecting a voice for your project, you should test each of the possible voices with a passage of text to see which best suits your needs.

Voice Speed

Because of the natural variation between voices, each available voice will speak the text at slightly diﬀerent speeds. For instance, with US English voices, Ivy and Joanna are slightly faster than Matthew when saying "Mary had a little lamb," and considerably faster than Joey.

you can ﬁnd how long it takes for your voice to say the selected text using SpeechMarks. For more information on using speechmarks in Amazon Polly, see Using Speech Marks (p. 101)

To see approximately how long it takes to speak a text passage 1. Open the AWS CLI.

2. Run the following code, ﬁlling in as needed

aws polly synthesize-speech \

--language-code optional language code if needed --output-format json \

--voice-id [name of desired voice] \ --text '[desired text]' \

--speech-mark-types='["viseme"]' \ LengthOfText.txt

3. Open LengthOfText.txt

If the text were "Mary had a little lamb," the last few lines returned by Amazon Polly would be:

{"time":882,"type":"viseme","value":"t"}

{"time":964,"type":"viseme","value":"a"}

{"time":1082,"type":"viseme","value":"p"}

The last viseme, essentially the sound for the ﬁnal letters in "lamb" starts 1082 milliseconds after the beginning of the speech. While this is not exactly the length of the audio, it's close and can serve as the basis for comparison between voices.

Changing Your Voice Speed

For certain applications, you may ﬁnd that you'd prefer the voice you like be slowed down, or speeded up. If the speed of the voice is a concern, Amazon Polly provides the ability to modify this using SSML tags.

For example:

Your organization is making an application that reads books to immigrant audiences. The audience speaks English, but their ﬂuency is limited. In this case, you might consider slowing the rate of speech to give your audience a little more time for comprehension while the application is speaking.

Amazon Polly helps you slow down the rate of speech using the SSML <prosody> tag, as in:

<speak>

In some cases, it might help your audience to <prosody rate="85%">slow the speaking rate slightly to aid in comprehension.</prosody>

</speak>

<speak>

In some cases, it might help your audience to <prosody rate="slow">slow the speaking rate slightly to aid in comprehension.</prosody>

</speak>

Two speed options are available to you when using SSML with Amazon Polly:

• Preset speeds: x-slow, slow, medium, fast, and x-fast. In these cases, the speed of each option is approximate, depending on your preferred voice. The medium option is the normal speed of the voice.

• n% of speech rate: any percentage of the speech rate, between 20% and 200% can be used. In these cases, you can choose exactly the speed you want. However, the actual speed of the voice is approximate, depending on the voice you've chosen. 100% is considered to be the normal speed of the voice.

Because the speed of each option is approximate and depends on the voice you choose, we recommend that you test your selected voice at various speeds to see what exactly meets your needs.

For more information on using the prosody tag to best eﬀect, see Controlling Volume, Speaking Rate, and Pitch (p. 118)

Languages Supported by Amazon Polly

The following languages are supported by Amazon Polly and can be used to synthesize speech. With each language is the language code. These language codes are W3C language identiﬁcation tags (ISO 639-3 for the language name and ISO 3166 for the country code).

For in-depth tables showing the phonemes and visemes associated with each language, choose the link on each language in the table below.

Language Language Code

Arabic (p. 20) arb

Chinese, Mandarin (p. 23) cmn-CN

Danish (p. 25) da-DK

Dutch (p. 28) nl-NL

English, Australian (p. 30) en-AU

English, British (p. 35) en-GB

English, Indian (p. 37) en-IN

English, New Zealand (p. 40) en-NZ

English, South African (p. 44) en-ZA

English, US (p. 33) en-US

English, Welsh (p. 46) en-GB-WLS

French (p. 49) fr-FR

French, Canadian (p. 51) fr-CA

Hindi (p. 56) hi-IN

German (p. 53) de-DE

Icelandic (p. 58) is-IS

Italian (p. 61) it-IT

Japanese (p. 63) ja-JP

Korean (p. 65) ko-KR

Norwegian (p. 66) nb-NO

Polish (p. 69) pl-PL

Portuguese, Brazilian (p. 73) pt-BR

Portuguese, European (p. 71) pt-PT

Romanian (p. 75) ro-RO

Russian (p. 77) ru-RU

Spanish, European (p. 79) es-ES

Spanish, Mexican (p. 81) es-MX

Spanish, US (p. 83) es-US

Swedish (p. 85) sv-SE

Turkish (p. 87) tr-TR

Welsh (p. 90) cy-GB

For more information, see Phoneme and Viseme Tables for Supported Languages (p. 19).

Phoneme and Viseme Tables for Supported Languages

The following tables list the phonemes for the languages supported by Amazon Polly, along with examples and the corresponding visemes.

Topics

• Arabic (arb) (p. 20)

• Chinese, Mandarin (cmn-CN) (p. 23)

• Danish (da-DK) (p. 25)

• Dutch (nl-NL) (p. 28)

• English, Australian (en-AU) (p. 30)

• English, American (en-US) (p. 33)

• English, British (en-GB) (p. 35)

• English, Indian (en-IN) (p. 37)

• English, New Zealand (en-NZ) (p. 40)

• English, South African (en-ZA) (p. 44)

• English, Welsh (en-GB-WSL) (p. 46)

• French (fr-FR) (p. 49)

• French, Canadian (fr-CA) (p. 51)

• German (de-DE) (p. 53)

• Hindi (hi-IN) (p. 56)

• Icelandic (is-IS) (p. 58)

• Italian (it-IT) (p. 61)

• Japanese (ja-JP) (p. 63)

• Korean (ko-KR) (p. 65)

• Norwegian (nb-NO) (p. 66)

• Polish (pl-PL) (p. 69)

• Portuguese (pt-PT) (p. 71)

• Portuguese, Brazilian (pt-BR) (p. 73)

• Romanian (ro-RO) (p. 75)

• Russian (ru-RU) (p. 77)

• Spanish (es-ES) (p. 79)

• Spanish, Mexican (es-MX) (p. 81)

• Spanish, US (es-US) (p. 83)

• Swedish (sv-SE) (p. 85)

• Turkish (tr-TR) (p. 87)

• Welsh (cy-GB) (p. 90)

Arabic (arb)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the Arabic voice of Zeina that is supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

ʔ ? glottal stop انَأ

ʕ ?\ voiced pharyngeal

fricative رَمُع k

b b voiced bilabial

plosive دَلَب p

d d voiced alveolar

plosive يراد t

dˤ d_?\ emphatic voiced

alveolar plosive ءوَض t

d͡ʒ dZ voiced postalveolar

aﬀricate ليمَج S

ð D voiced dental

fricative َكِلذ T

ðˤ D_?\ emphatic voiced

dental fricative مالَظ T

f f voiceless labiodental

fricative لصَف f

ɡ g voiced velar plosive ارتلجنإ k

ɣ G voiced velar fricative برَغ k

h h voiceless glottal

fricative اذه k

j j palatal approximant يشمَي i

k k voiceless velar

plosive بلَك k

l l alveolar lateral

approximant ىقال t

lˠ l_G emphatic alveolar

lateral approximant هللا t

m m bilabial nasal اذام p

n n alveolar nasal رون t

p p voiceless bilabial

plosive سبَح p

q q voiceless uvular

plosive بيرَق k

r r alveolar trill لمَر r

s s voiceless alveolar

fricative لاؤُس s

sˤ s_?\ emphatic voiceless

alveolar fricative بِحاص s

ʃ S voiceless

postalveolar fricative ركُش S

t t voiceless alveolar

plosive رمَت t

tˤ t_?\ emphatic voiceless

alveolar plosive بِلاط t

θ T voiceless dental

fricative ثالَث T

v v voiced labiodental

fricative نيماتيف f

w w labio-velar

approximant دَلَو u

x x voiceless velar

fricative فْوَخ k

IPA X-SAMPA Description Example Viseme

ħ X\ voiceless pharyngeal

fricative َلْوَح k

z z voiced alveolar

fricative روهُز s

Vowels

a a open front

unrounded vowel درَب a

aː a: long open front

unrounded vowel راد a

ɑˤ A_?\ emphatic open back

unrounded vowel لبَط a

ɑˤː A_?\: emphatic long open

back unrounded vowel

مِلاظ a

u u close back rounded

vowel برُش u

u: u: long close back

rounded vowel روس u

uˤ u_?\ emphatic close back

rounded vowel ّدُب u

uˤː u_?\: emphatic long close

back rounded vowel لوط u

i i close front

unrounded vowel تنِب i

iː i: long close front

unrounded vowel نيزَح i

iˤ i_?\ emphatic close front

unrounded vowel ّدِض i

iˤː i_?\: emphatic long close

front unrounded vowel

يضام i

e e close-mid front

unrounded vowel تكرام e

eː e: long close-mid front

unrounded vowel ليدوم e

ɔ O open-mid back

rounded vowel يجولونكت O

在文檔中 Amazon Polly (頁 13-200)