Amazon Polly

(1)

Developer Guide

(2)

Amazon Polly: Developer Guide

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be aﬃliated with, connected to, or sponsored by Amazon.

(3)

What Is Amazon Polly? ... 1

Are You a First-time User of Amazon Polly? ... 2

How It Works ... 3

What's Next? ... 3

Getting Started ... 4

Step 1: Set Up an Account & User ... 4

Step 1.1: Sign up for AWS ... 4

Step 1.2: Create an IAM User ... 5

Next Step ... 5

Step 2: Getting Started (Console) ... 5

Exercise 1: Synthesizing Speech Quick Start (Console) ... 6

Exercise 2: Synthesizing Speech with Plain Text Input (Console) ... 6

Next Step ... 7

Step 3: Getting Started (AWS CLI) ... 7

Step 3.1: Set Up the AWS CLI ... 7

Step 3.2: Getting Started Exercise ... 8

Python Examples ... 10

Set Up Python and Test an Example (SDK) ... 10

Voices in Amazon Polly ... 12

Available Voices ... 12

Bilingual Voices ... 15

Accented bilingual voices ... 15

Fully bilingual voices ... 15

Listening to Voices ... 16

Voice Speed ... 16

Changing Your Voice Speed ... 17

Languages Supported by Amazon Polly ... 18

Phoneme and Viseme Tables for Supported Languages ... 19

Neural TTS ... 93

Feature and Region Compatibility ... 93

The Voice Engine ... 94

Choosing the Voice Engine (Console) ... 94

Choosing the Voice Engine (CLI) ... 95

Neural Voices ... 96

NTTS Newscaster Speaking Style ... 97

Speech Marks ... 100

Speech Mark Types ... 100

Visemes and Amazon Polly ... 100

Using Speech Marks ... 101

Requesting Speech Marks ... 101

Speech Mark Output ... 102

Speech Mark Examples ... 103

Requesting Speech Marks (Console) ... 104

Using SSML ... 106

Reserved Characters ... 106

Using SSML in the Console ... 108

Using SSML in the AWS CLI ... 109

Using SSML With the Synthesize-Speech Command ... 109

Synthesizing an SSML-enhanced Document ... 110

Using SSML for Common Amazon Polly Tasks ... 111

Supported SSML Tags ... 113

Identifying SSML-Enhanced Text ... 114

Adding a Pause ... 114

Emphasizing Words ... 115

(4)

Specifying Another Language for Speciﬁc Words ... 116

Placing a Custom Tag in Your Text ... 117

Adding a Pause Between Paragraphs ... 117

Using Phonetic Pronunciation ... 117

Controlling Volume, Speaking Rate, and Pitch ... 118

Setting a Maximum Duration for Synthesized Speech ... 120

Adding a Pause Between Sentences ... 122

Controlling How Special Types of Words Are Spoken ... 123

Pronouncing Acronyms and Abbreviations ... 125

Improving Pronunciation by Specifying Parts of Speech ... 126

Adding the Sound of Breathing ... 127

Newscaster speaking style ... 129

Adding Dynamic Range Compression ... 130

Speaking Softly ... 131

Controlling Timbre ... 132

Whispering ... 133

Managing Lexicons ... 134

Applying Multiple Lexicons ... 134

Managing Lexicons Using the Console ... 135

Uploading Lexicons Using the Console ... 135

Applying Lexicons Using the Console (Synthesize Speech) ... 136

Filtering the Lexicon List Using the Console ... 137

Downloading Lexicons Using the Console ... 137

Deleting a Lexicon Using the Console ... 138

Managing Lexicons Using the AWS CLI ... 138

PutLexicon ... 138

GetLexicon ... 143

ListLexicons ... 143

DeleteLexicon ... 144

Creating Long Audio Files ... 145

Setting Up the IAM Policy for Asynchronous Synthesis ... 145

Creating Long Audio Files (Console) ... 146

Creating Long Audio Files (CLI) ... 147

Code and Application Examples ... 150

Sample Code ... 150

Java Samples ... 150

Python Samples ... 156

Example Applications ... 161

Python Example ... 161

Java Example ... 171

iOS Example ... 175

Android Example ... 177

Amazon Polly for Windows (SAPI) ... 179

Installing and Conﬁguring Amazon Polly for Windows (SAPI) ... 179

Create an IAM User for the AWS Client ... 179

Install the AWS CLI for Windows ... 180

Create a Proﬁle for the AWS Client ... 180

Install the Amazon Polly for Windows Plugin ... 180

Using Amazon Polly in Applications ... 181

AWS for WordPress Plugin ... 184

Installation Prerequisites ... 184

Creating an AWS Account ... 184

Creating an IAM User ... 185

Creating a WordPress Website ... 186

Installing and Conﬁguring the Plugin ... 186

Conﬁguring the Plugin ... 187

Customizing WordPress ... 187

(5)

Use Audio Only and Word Only tags in your content ... 188

Adding Translated Text to Your Post ... 189

Amazon Pollycast ... 189

Positioning the Player ... 190

Storing the Audio Files ... 190

Quotas ... 192

Supported Regions ... 192

Throttling ... 192

Pronunciation Lexicons ... 192

SynthesizeSpeech API Operation ... 193

SpeechSynthesisTask API Operations ... 193

Speech Synthesis Markup Language (SSML) ... 193

Security ... 194

Data Protection ... 194

Encryption at Rest ... 195

Encryption in Transit ... 195

Internetwork Traﬃc Privacy ... 195

Identity and Access Management ... 195

Audience ... 195

Authenticating With Identities ... 196

Managing Access Using Policies ... 198

How Amazon Polly Works with IAM ... 199

Identity-Based Policy Examples ... 201

Amazon Polly API Permissions Reference ... 205

Logging and Monitoring ... 206

Compliance Validation ... 206

Resilience ... 207

Infrastructure Security ... 207

Security Best Practices ... 207

Logging Amazon Polly API Calls with AWS CloudTrail ... 209

Amazon Polly Information in CloudTrail ... 209

Example: Amazon Polly Log File Entries ... 210

CloudWatch Integration ... 212

Getting CloudWatch Metrics (Console) ... 212

Getting CloudWatch Metrics (CLI) ... 212

Amazon Polly Metrics ... 213

Dimensions for Amazon Polly Metrics ... 214

API Reference ... 215

Actions ... 215

DeleteLexicon ... 216

DescribeVoices ... 218

GetLexicon ... 221

GetSpeechSynthesisTask ... 223

ListLexicons ... 225

ListSpeechSynthesisTasks ... 227

PutLexicon ... 229

StartSpeechSynthesisTask ... 231

SynthesizeSpeech ... 237

Data Types ... 241

Lexicon ... 242

LexiconAttributes ... 243

LexiconDescription ... 245

SynthesisTask ... 246

Voice ... 249

Document History ... 251

(6)

AWS glossary ... 255

(7)

What Is Amazon Polly?

Amazon Polly is a cloud service that converts text into lifelike speech. You can use Amazon Polly to develop applications that increase engagement and accessibility. Amazon Polly supports multiple languages and includes a variety of lifelike voices, so you can build speech-enabled applications that work in multiple locations and use the ideal voice for your customers. With Amazon Polly, you only pay for the text you synthesize. You can also cache and replay Amazon Polly’s generated speech at no additional cost.

Additionally, Amazon Polly includes a number of Neural Text-to-Speech (NTTS) voices, delivering ground-breaking improvements in speech quality through a new machine learning approach, thereby oﬀering to customers the most natural and human-like text-to-speech voices possible. Neural TTS technology also supports a Newscaster speaking style that is tailored to news narration use cases.

Common use cases for Amazon Polly include, but are not limited to, mobile applications such as

newsreaders, games, eLearning platforms, accessibility applications for visually impaired people, and the rapidly growing segment of Internet of Things (IoT).

Amazon Polly is certiﬁed for use with regulated workloads for HIPAA (the Health Insurance Portability and Accountability Act of 1996), and Payment Card Industry Data Security Standard (PCI DSS).

Some of the beneﬁts of using Amazon Polly include:

• High quality – Amazon Polly oﬀers both new neural TTS and best-in-class standard TTS technology to synthesize the superior natural speech with high pronunciation accuracy (including abbreviations, acronym expansions, date/time interpretations, and homograph disambiguation).

• Low latency – Amazon Polly ensures fast responses, which make it a viable option for low-latency use cases such as dialog systems.

• Support for a large portfolio of languages and voices – Amazon Polly supports dozens of voices languages, oﬀering male and female voice options for most languages. Neural TTS currently supports three British English voices and eight US English voices. This number will continue to increase as we bring more neural voices online. US English voices Matthew and Joanna can also use the Neural Newscaster speaking style, similar to what you might hear from a professional news anchor.

• Cost-eﬀective – Amazon Polly's pay-per-use model means there are no setup costs. You can start small and scale up as your application grows.

• Cloud-based solution – On-device TTS solutions require signiﬁcant computing resources, notably CPU power, RAM, and disk space. These can result in higher development costs and higher power consumption on devices such as tablets, smart phones, and so on. In contrast, TTS conversion done in the AWS Cloud dramatically reduces local resource requirements. This enables support of all the available languages and voices at the best possible quality. Moreover, speech improvements are instantly available to all end-users and do not require additional updates for devices.

(8)

Are You a First-time User of Amazon Polly?

If you are a ﬁrst-time user of Amazon Polly, we recommend that you read the following sections in the listed order:

1.How Amazon Polly Works (p. 3) – This section introduces various Amazon Polly inputs and options that you can work with in order to create an end-to-end experience.

2.Getting Started with Amazon Polly (p. 4) – In this section, you set up your account and test Amazon Polly speech synthesis.

3.Example Applications (p. 161) – This section provides additional examples that you can use to explore Amazon Polly.

(9)

How Amazon Polly Works

Amazon Polly converts input text into life-like speech. You call one of the speech synthesis methods, provide the text that you want to synthesize, choose one of the Neural Text-to-Speech (NTTS) or Standard Text-to-Speech (TTS) voices, and specify an audio output format. Amazon Polly then synthesizes the provided text into a high-quality speech audio stream.

• Input text – Provide the text that you want to synthesize, and Amazon Polly returns an audio stream.

You can provide the input as plain text or in Speech Synthesis Markup Language (SSML) format. With SSML you can control various aspects of speech, such as pronunciation, volume, pitch, and speech rate.

For more information, see Generating Speech from SSML Documents (p. 106).

• Available voices – Amazon Polly provides a portfolio of languages and a variety of voices, including a bilingual voice (for both English and Hindi). For most languages you can choose from several voices, both male and female. When launching a speech synthesis task, you specify the voice ID, and then Amazon Polly uses this voice to convert the text to speech. Amazon Polly is not a translation service

—the synthesized speech is in the same language as the text. However, if the text is in a diﬀerent language than designated for the voice, numbers represented as digits (for example, 53, not ﬁfty- three) are synthesized in the language of the voice and not the text. For more information, see Voices in Amazon Polly.

• Output format – Amazon Polly can deliver the synthesized speech in multiple formats. You can select the audio format that suits your needs. For example, you might request the speech in the MP3 or Ogg Vorbis format for consumption by web and mobile applications. Or, you might request the PCM output format for consumption by AWS IoT devices and telephony solutions.

What's Next?

If you are new to Amazon Polly, we recommend that you to read the following topics in order:

• Getting Started with Amazon Polly (p. 4)

• Example Applications (p. 161)

• Quotas in Amazon Polly (p. 192)

(10)

Getting Started with Amazon Polly

Amazon Polly provides simple API operations that you can easily integrate with your existing

applications. For a list of supported operations, see Actions (p. 215). You can use either of the following options:

• AWS SDKs – When using the SDKs, your requests to Amazon Polly are automatically signed and authenticated using the credentials you provide. This is the recommended choice for building your applications.

• AWS CLI – You can use the AWS CLI to access any of Amazon Polly functionality without having to write any code.

The following sections describe how to get set up and provide an introductory exercise.

Topics

• Step 1: Set Up an AWS Account and Create a User (p. 4)

• Step 2: Getting Started (Console) (p. 5)

• Step 3: Getting Started (AWS CLI) (p. 7)

• Python Examples (p. 10)

Step 1: Set Up an AWS Account and Create a User

Before you use Amazon Polly for the ﬁrst time, complete the following tasks:

1.Step 1.1: Sign up for AWS (p. 4) 2.Step 1.2: Create an IAM User (p. 5)

Step 1.1: Sign up for AWS

When you sign up for Amazon Web Services (AWS), your AWS account is automatically signed up for all services in AWS, including Amazon Polly. You are charged only for the services that you use.

With Amazon Polly, you pay only for the resources you use. If you are a new AWS customer, you can get started with Amazon Polly for free. For more information, see AWS Free Usage Tier.

If you already have an AWS account, skip to the next step. If you don't have an AWS account, perform the steps in the following procedure to create one.

To create an AWS account

1. Open https://portal.aws.amazon.com/billing/signup.

2. Follow the online instructions.

Part of the sign-up procedure involves receiving a phone call and entering a veriﬁcation code on the phone keypad.

Note your AWS account ID because you'll need it for the next step.

(11)

Services in AWS, such as Amazon Polly, require that you provide credentials when you access them so that the service can determine whether you have permissions to access the resources owned by that service. The console requires your password. You can create access keys for your AWS account to access the AWS CLI or API. However, we don't recommend that you access AWS using the credentials for your AWS account. Instead, we recommend that you use AWS Identity and Access Management (IAM). Create an IAM user, add the user to an IAM group with administrative permissions, and then grant administrative permissions to the IAM user that you created. You can then access AWS using a special URL and that IAM user's credentials.

If you signed up for AWS, but you haven't created an IAM user for yourself, you can create one using the IAM console.

The exercises in this guide assume that you have a user (adminuser) with administrator privileges.

Follow the procedure to create adminuser in your account.

To create an administrator user and sign in to the console

1. Create an administrator user called adminuser in your AWS account. For instructions, see Creating Your First IAM User and Administrators Group in the IAM User Guide.

2. A user can sign in to the AWS Management Console using a special URL. For more information, How Users Sign In to Your Account in the IAM User Guide.

Important

The Getting Started exercises use the adminuser credentials. For added security, when building and testing production application we recommend you create a service-speciﬁc administrator user who has permissions for only the Amazon Polly actions. For an example policy that grants Amazon Polly speciﬁc permissions, see Example 1: Allow All Amazon Polly Actions (p. 203).

For more information about IAM, see the following:

• AWS Identity and Access Management (IAM)

• Getting started

• IAM User Guide

Next Step

Step 2: Getting Started (Console) (p. 5)

Step 2: Getting Started (Console)

The Amazon Polly console is the easiest way to get started testing and using Amazon Polly's speech synthesizing. The Amazon Polly console supports synthesizing speech from either plain text or SSML input.

Topics

• Exercise 1: Synthesizing Speech Quick Start (Console) (p. 6)

• Exercise 2: Synthesizing Speech with Plain Text Input (Console) (p. 6)

• Next Step (p. 7)

(12)

Exercise 1: Synthesizing Speech Quick Start (Console)

The Quick Start walks you through the fastest way to test the Amazon Polly speech synthesis for speech quality. When you select the Text-to-Speech tab, the text ﬁeld for entering your text is pre-loaded with example text so you can quickly try out Amazon Polly.

To quickly test Amazon Polly (Console)

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

2. Choose the Text-to-Speech tab.

3. Turn oﬀ SSML.

4. Under Engine, choose Standard or Neural.

5. Choose a language and AWS Region, then choose a voice. If you choose Neural for Engine, only the languages and voices that support NTTS are available. All Standard voices are disabled.

6. Choose Listen.

For more in-depth testing, see the following topics:

• Exercise 2: Synthesizing Speech with Plain Text Input (Console) (p. 6)

• Using SSML (Console) (p. 108)

• Applying Lexicons Using the Console (Synthesize Speech) (p. 136)

Exercise 2: Synthesizing Speech with Plain Text Input (Console)

The following procedure synthesizes speech using plain text input. Note how "W3C" and the date

"10/3" (October 3rd) are synthesized.

To synthesize speech using plain text input (console)

1. After logging on to the Amazon Polly console, choose Try Amazon Polly, and then choose the Text- to-Speech tab.

2. Turn oﬀ SSML.

3. Type or paste this text into the input box.

He was caught up in the game.

In the middle of the 10/3/2014 W3C meeting he shouted, "Score!" quite loudly.

4. For Engine, choose Standard or Neural.

5. Choose a language and AWS Region, then choose a voice. If you choose Neural for Engine, only the languages and voices that support NTTS are available. All Standard voices are disabled.

6. To listen to the speech immediately, choose Listen.

7. To save the speech to a ﬁle, do one of the following:

a. Choose Download.

b. To change to a different file format, expand Additional settings, turn on Speech file format settings, choose the file format that you want, and then choose Download.

For more in-depth examples, see the following topics:

(13)

• Using SSML (Console) (p. 108)

Next Step

Step 3: Getting Started (AWS CLI) (p. 7)

Step 3: Getting Started (AWS CLI)

You can perform almost all of the Amazon Polly operations that you can perform using the Amazon Polly console using the AWS Command Line Interface (AWS CLI). You can't listen to synthesized speech using the AWS CLI. Instead, you must save it to a ﬁle and then open the ﬁle in an application that can play it.

Topics

• Step 3.1: Set Up the AWS Command Line Interface (AWS CLI) (p. 7)

• Step 3.2: Getting Started Exercise Using the AWS CLI (p. 8)

Step 3.1: Set Up the AWS Command Line Interface (AWS CLI)

Follow the steps to download and conﬁgure the AWS CLI.

Important

You don't need the AWS CLI to perform the steps in this exercise. However, some of the exercises in this guide use the AWS CLI. You can skip this step and go to Step 3.2: Getting Started Exercise Using the AWS CLI (p. 8), and then set up the AWS CLI later when you need it.

To set up the AWS CLI

1. Download and conﬁgure the AWS CLI. For instructions, see the following topics in the AWS Command Line Interface User Guide:

• Getting Set Up with the AWS Command Line Interface

• Conﬁguring the AWS Command Line Interface

2. Add a named profile for the administrator user in the AWS CLI config file. You use this profile when running the AWS CLI commands. For more information about named profiles, see Named Profiles in the AWS Command Line Interface User Guide.

[profile adminuser]

aws_access_key_id = adminuser access key ID

aws_secret_access_key = adminuser secret access key region = aws-region

For a list of available AWS Regions and those supported by Amazon Polly, see Regions and Endpoints in the Amazon Web Services General Reference.

NoteIf you're using the Region supported by Amazon Polly that you speciﬁed when you conﬁgured the AWS CLI, omit the following line from the AWS CLI code examples.

(14)

--region aws-region

3. Verify the setup by typing the following help command at the command prompt.

aws help

A list of valid AWS commands should appear in the AWS CLI window.

To enable Amazon Polly in the AWS CLI (optional)

If you have previously downloaded and conﬁgured the AWS CLI, Amazon Polly might not be available unless you reconﬁgure the AWS CLI. This procedure checks to see if this is necessary and provides instructions if Amazon Polly is not automatically available.

1. Verify the availability of Amazon Polly by typing the following help command at the AWS CLI command prompt.

aws polly help

If a description of Amazon Polly and a list of valid commands appears in the AWS CLI window, Amazon Polly is available in the AWS CLI and can be used immediately. In this case, you can skip the rest of this procedure. If this is not displayed, continue with Step 2.

2. Use one of the two following options to enable Amazon Polly:

a. Uninstall and reinstall the AWS CLI.

For instructions, see Installing the AWS Command Line Interface in the AWS Command Line Interface User Guide.

or

b. Download the ﬁle service-2.json.

At the command prompt, run the following command.

aws configure add-model --service-model file://service-2.json --service-name polly 3. Reverify the availability of Amazon Polly.

aws polly help

The description of Amazon Polly should be visible.

Next Step

Step 3.2: Getting Started Exercise Using the AWS CLI (p. 8)

Step 3.2: Getting Started Exercise Using the AWS CLI

Now you can test the speech synthesis oﬀered by Amazon Polly. In this exercise, you call the

SynthesizeSpeech operation by passing in sample text. You can save the resulting audio as a ﬁle and verify its content.

(15)

The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full quotation marks (") around the input text with single quotes (') for interior tags.

aws polly synthesize-speech \ --output-format mp3 \ --voice-id Joanna \

--text 'Hello, my name is Joanna. I learned about the W3C on 10/3 of last year.' \ hello.mp3

In the call to synthesize-speech, you provide sample text for the synthesis, the voice to use (by providing a voice ID, explained in the following step 3), and the output format. The command saves the resulting audio to the hello.mp3 ﬁle.

In addition to the MP3 ﬁle, the operation sends the following output to the console.

{ "ContentType": "audio/mpeg", "RequestCharacters": "71"

}

2. Play the resulting hello.mp3 ﬁle to verify the synthesized speech.

3. Get the list of available voices by using the DescribeVoices operation. Run the following describe-voices AWS CLI command.

aws polly describe-voices

In response, Amazon Polly returns the list of all available voices. For each voice, the response provides the following metadata: voice ID, language code, language name, and the gender of the voice. The following is a sample response.

{

"Voices": [ {

"Gender": "Female", "Name": "Salli",

"LanguageName": "US English", "Id": "Salli",

"LanguageCode": "en-US"

}, {

"Gender": "Female", "Name": "Joanna",

"LanguageName": "US English", "Id": "Joanna",

"LanguageCode": "en-US"

} ] }

Optionally, you can specify the language code to ﬁnd the available voices for a speciﬁc language.

Amazon Polly supports dozens of voices. The following example lists all the voices for Brazilian Portuguese.

(16)

--language-code pt-BR

For a list of language codes, see Languages Supported by Amazon Polly (p. 18). These language codes are W3C language identiﬁcation tags (ISO 639 code for the language name-ISO 3166 country code). For example, en-US (US English), en-GB (British English), and es-ES (Spanish), etc.

You can also use the help option in the AWS CLI to get the list of language codes:

aws polly describe-voices help

Python Examples

This guide provides additional examples, some of which are Python code examples that use AWS SDK for Python (Boto) to make API calls to Amazon Polly. We recommend that you set up Python and test the example code provided in the following section. For additional examples, see Example Applications (p. 161).

Set Up Python and Test an Example (SDK)

To test the Python example code, you need the AWS SDK for Python (Boto). For instruction, see AWS SDK for Python (Boto3).

To test the example Python code

The following Python code example performs the following actions:

• Uses the AWS SDK for Python (Boto) to send a SynthesizeSpeech request to Amazon Polly (by providing simple text as input).

• Accesses the resulting audio stream in the response and saves the audio to a ﬁle (speech.mp3) on your local disk.

• Plays the audio ﬁle with the default audio player for your local system.

Save the code to a ﬁle (example.py) and run it.

"""Getting Started Example for Python 2.7+/3.3+"""

from boto3 import Session

from botocore.exceptions import BotoCoreError, ClientError from contextlib import closing

import os import sys import subprocess

from tempfile import gettempdir

# Create a client using the credentials and region defined in the [adminuser]

# section of the AWS credentials file (~/.aws/credentials).

session = Session(profile_name="adminuser") polly = session.client("polly")

try:

# Request speech synthesis

response = polly.synthesize_speech(Text="Hello world!", OutputFormat="mp3", VoiceId="Joanna")

except (BotoCoreError, ClientError) as error:

# The service returned an error, exit gracefully

(17)

# Access the audio stream from the response if "AudioStream" in response:

# Note: Closing the stream is important because the service throttles on the # number of parallel connections. Here we are using contextlib.closing to # ensure the close method of the stream object will be called automatically # at the end of the with statement's scope.

with closing(response["AudioStream"]) as stream:

output = os.path.join(gettempdir(), "speech.mp3") try:

# Open a file for writing the output as a binary stream with open(output, "wb") as file:

file.write(stream.read()) except IOError as error:

# Could not write to file, exit gracefully print(error)

sys.exit(-1) else:

# The response didn't contain audio data, exit gracefully print("Could not stream audio")

sys.exit(-1)

# Play the audio using the platform's default player if sys.platform == "win32":

os.startfile(output) else:

# The following works on macOS and Linux. (Darwin = mac, xdg-open = linux).

opener = "open" if sys.platform == "darwin" else "xdg-open"

subprocess.call([opener, output])

For additional examples including an example application, see Example Applications (p. 161).

(18)

Voices in Amazon Polly

Amazon Polly provides a number of diﬀerent voices for you to use. To hear example voices, see the Amazon Polly product overview. To hear a speciﬁc voice speak a sample that you provide, you can use the Amazon Polly console. For instructions, see Listening to the Voices (p. 16).

Available Voices

Amazon Polly provides a variety of diﬀerent voices in multiple languages for synthesizing speech from text.

Language Name/ID Gender Neural Voice Standard

Voice

Arabic (arb) Zeina Female No Yes

Chinese, Mandarin (cmn-CN)

Zhiyu Female No Yes

Danish (da-

DK) Naja

Mads

Female Male

No No

Yes Yes Dutch (nl-NL) Lotte

Ruben

Female Male

No No

Yes Yes English

(Australian) (en-AU)

Nicole Olivia Russell

Female Female Male

No Yes No

Yes No Yes English

(British) (en- GB)

Amy**

Emma Brian

Female Female Male

Yes Yes Yes

Yes Yes Yes English

(Indian) (en- IN)

Aditi*

Raveena

Female Female

No No

Yes Yes English (New

Zealand) (en- NZ)

Aria Female Yes No

English (South African) (en- ZA)

Ayanda Female Yes No

English (US)

(en-US) Ivy

Joanna**

Female (child) Female

Yes Yes

(19)

Voice Kendra

Kimberly Salli Joey Justin Kevin Matthew**

Female Female Female Male Male (child) Male (child) Male

Yes Yes Yes Yes Yes Yes Yes

Yes Yes Yes Yes Yes No Yes English

(Welsh) (en- GB-WLS)

Geraint Male No Yes

French (fr-FR) Céline/Celine Léa

Mathieu

Female Female Male

No Yes No

Yes Yes Yes French

(Canadian) (fr- CA)

Chantal Gabrielle

Female Female

No Yes

Yes No German (de-

DE) Marlene

Vicki Hans

Female Female Male

No Yes No

Yes Yes Yes

Hindi (hi-IN) Aditi* Female No Yes

Icelandic (is-

IS) Dóra/Dora

Karl

Female Male

No No

Yes Yes Italian (it-IT) Carla

Bianca Giorgio

Female Female Male

No Yes No

Yes Yes Yes Japanese (ja-

JP) Mizuki

Takumi

Female Male

No Yes

Yes Yes Korean (ko-

KR) Seoyeon Female Yes Yes

Norwegian

(nb-NO) Liv Female No Yes

(20)

Language Name/ID Gender Neural Voice Standard Voice Polish (pl-PL) Ewa

Maja Jacek Jan

Female Female Male Male

No No No No

Yes Yes Yes Yes Portuguese

(Brazilian) (pt- BR)

Camila Vitória/Vitoria Ricardo

Female Female Male

Yes No No

Yes Yes Yes Portuguese

(European) (pt-PT)

Inês/Ines Cristiano

Female Male

No No

Yes Yes Romanian (ro-

RO) Carmen Female No Yes

Russian (ru-

RU) Tatyana

Maxim

Female Male

No No

Yes Yes Spanish

(European) (es-ES)

Conchita Lucia Enrique

Female Female Male

No Yes No

Yes Yes Yes Spanish

(Mexican) (es- MX)

Mia Female No Yes

Spanish (US)

(es-US) Lupe**

Penélope/

Penelope Miguel

Female Female Male

Yes No No

Yes Yes Yes

Swedish (sv-

SE) Astrid Female No Yes

Turkish (tr-TR) Filiz Female No Yes

Welsh (cy-GB) Gwyneth Female No Yes

* This voice is bilingual and can speak both English and Hindi. For more information, see Bilingual Voices (p. 15).

** These voices can be used with Newscaster speaking styles when used with the Neural format. For more information, see NTTS Newscaster Speaking Style (p. 97).

(21)

customers. To learn more about Amazon Polly Brand Voices, please see Brand Voice.

Bilingual Voices

Amazon Polly has two ways of producing bilingual voices:

• Accented bilingual voices (p. 15)

• Fully bilingual voices (p. 15)

Accented bilingual voices

Accented bilingual voices can be created using any Amazon Polly voice, but only when using SSML tags.

Normally, all words in the input text are spoken in the default language of the voice speciﬁed you're using.

For example, if you're using the voice of Joanna (who speaks US English), Amazon Polly speaks the following in the Joanna voice without a French accent:

<speak>

Why didn't she just say, 'Je ne parle pas français?'

</speak>

In this case, the words Je ne parle pas français are spoken as they would be if they were English.

However, if you use the Joanna voice with the <lang> tag, Amazon Polly speaks the sentence in the Joanna voice in American-accented French:

<speak>

Why didn't she just say, <lang xml:lang="fr-FR">'Je ne parle pas français?'</lang>.

</speak>

Because Joanna is not a native French voice, pronunciation is based on her native language, US English.

For instance, although perfect French pronunciation features an uvual trill /R/ in the word français, Joanna's US English voice pronounces this phoneme as the corresponding sound /r/.

If you use the voice of Giorgio, who speaks Italian, with the following text, Amazon Polly speaks the sentence in Giorgio's voice with an Italian pronunciation:

<speak>

Mi piace Bruce Springsteen.

</speak>

Fully bilingual voices

A fully bilingual voice like Aditi (Indian English and Hindi) can speak two languages ﬂuently. This gives you the ability to use words and phrases from both languages in a single text using the same voice.

Currently, Aditi is the only fully bilingual voice available.

Using a Bilingual Voice (Aditi)

Aditi speaks both Indian English (en-IN) and Hindi (hi-IN) ﬂuently. You can synthesize speech in both English and Hindi, and the voice can switch between the two languages even within the same sentence.

(22)

Hindi can be used in two diﬀerent forms:

• Devanagari: "उसेन कहँा, खेल तोह अब शुूर होगा"

• Romanagari (using the Latin alphabet): "Usne kahan, khel toh ab shuru hoga"

Additionally, it's possible to mix English and Hindi of either or both forms within a single sentence:

• Devanagari + English: "This is the song कभी कभी अदिति"

• Romanagari + English: "This is the song from the movie Jaane Tu Ya Jaane Na."

• Devanagari + Romanagari + English: "This is the song कभी कभी अदिति from the movie Jaane Tu Ya Jaane Na."

Because Aditi is a bilingual voice, text in all of these cases will be read correctly, as Amazon Polly can diﬀerentiate between the languages and scripts.

Amazon Polly also supports numbers, dates, times, and currency expansion in both English (Arabic numerals) and Hindi (Devanagari numerals). By default, Arabic numerals are read in Indian English. To make Amazon Polly read them in Hindi, you must use the hi-IN language code parameter.

Listening to the Voices

You can use the Amazon Polly console to hear a sample from any of the voices available in Amazon Polly To listen to a voice in Amazon Polly

1. Sign in to the AWS Management Console and open the Amazon Polly console at https://

console.aws.amazon.com/polly/.

2. Choose the Text-to-Speech tab.

3. For Engine, choose Standard or Neural.

4. Choose a language and a Region, then choose a voice.

5. Enter text for the voice to speak or use the default phrase, and then choose Listen.

You can choose any of the languages oﬀered by Amazon Polly and the console will display the voices available for that language. In most cases, there will be at least one male and one female voice, often more than one of each. A few only have a single voice. For a complete list, see Voices in Amazon Polly (p. 12)

NoteThe inventory of voices and the number of languages included is continually being updated to include additional choices. To suggest a new language or voice, feel free to provide feedback on this page. Unfortunately, we are not able to comment on plans for speciﬁc new languages be they are released.

Each voice is created using native language speakers, so there are variations from voice to voice, even within the same language. When selecting a voice for your project, you should test each of the possible voices with a passage of text to see which best suits your needs.

Voice Speed

Because of the natural variation between voices, each available voice will speak the text at slightly diﬀerent speeds. For instance, with US English voices, Ivy and Joanna are slightly faster than Matthew when saying "Mary had a little lamb," and considerably faster than Joey.

(23)

you can ﬁnd how long it takes for your voice to say the selected text using SpeechMarks. For more information on using speechmarks in Amazon Polly, see Using Speech Marks (p. 101)

To see approximately how long it takes to speak a text passage 1. Open the AWS CLI.

2. Run the following code, ﬁlling in as needed

aws polly synthesize-speech \

--language-code optional language code if needed --output-format json \

--voice-id [name of desired voice] \ --text '[desired text]' \

--speech-mark-types='["viseme"]' \ LengthOfText.txt

3. Open LengthOfText.txt

If the text were "Mary had a little lamb," the last few lines returned by Amazon Polly would be:

{"time":882,"type":"viseme","value":"t"}

{"time":964,"type":"viseme","value":"a"}

{"time":1082,"type":"viseme","value":"p"}

The last viseme, essentially the sound for the ﬁnal letters in "lamb" starts 1082 milliseconds after the beginning of the speech. While this is not exactly the length of the audio, it's close and can serve as the basis for comparison between voices.

Changing Your Voice Speed

For certain applications, you may ﬁnd that you'd prefer the voice you like be slowed down, or speeded up. If the speed of the voice is a concern, Amazon Polly provides the ability to modify this using SSML tags.

For example:

Your organization is making an application that reads books to immigrant audiences. The audience speaks English, but their ﬂuency is limited. In this case, you might consider slowing the rate of speech to give your audience a little more time for comprehension while the application is speaking.

Amazon Polly helps you slow down the rate of speech using the SSML <prosody> tag, as in:

<speak>

In some cases, it might help your audience to <prosody rate="85%">slow the speaking rate slightly to aid in comprehension.</prosody>

</speak>

or

<speak>

In some cases, it might help your audience to <prosody rate="slow">slow the speaking rate slightly to aid in comprehension.</prosody>

</speak>

Two speed options are available to you when using SSML with Amazon Polly:

(24)

• Preset speeds: x-slow, slow, medium, fast, and x-fast. In these cases, the speed of each option is approximate, depending on your preferred voice. The medium option is the normal speed of the voice.

• n% of speech rate: any percentage of the speech rate, between 20% and 200% can be used. In these cases, you can choose exactly the speed you want. However, the actual speed of the voice is approximate, depending on the voice you've chosen. 100% is considered to be the normal speed of the voice.

Because the speed of each option is approximate and depends on the voice you choose, we recommend that you test your selected voice at various speeds to see what exactly meets your needs.

For more information on using the prosody tag to best eﬀect, see Controlling Volume, Speaking Rate, and Pitch (p. 118)

Languages Supported by Amazon Polly

The following languages are supported by Amazon Polly and can be used to synthesize speech. With each language is the language code. These language codes are W3C language identiﬁcation tags (ISO 639-3 for the language name and ISO 3166 for the country code).

For in-depth tables showing the phonemes and visemes associated with each language, choose the link on each language in the table below.

Language Language Code

Arabic (p. 20) arb

Chinese, Mandarin (p. 23) cmn-CN

Danish (p. 25) da-DK

Dutch (p. 28) nl-NL

English, Australian (p. 30) en-AU

English, British (p. 35) en-GB

English, Indian (p. 37) en-IN

English, New Zealand (p. 40) en-NZ

English, South African (p. 44) en-ZA

English, US (p. 33) en-US

English, Welsh (p. 46) en-GB-WLS

French (p. 49) fr-FR

French, Canadian (p. 51) fr-CA

Hindi (p. 56) hi-IN

German (p. 53) de-DE

Icelandic (p. 58) is-IS

Italian (p. 61) it-IT

(25)

Japanese (p. 63) ja-JP

Korean (p. 65) ko-KR

Norwegian (p. 66) nb-NO

Polish (p. 69) pl-PL

Portuguese, Brazilian (p. 73) pt-BR

Portuguese, European (p. 71) pt-PT

Romanian (p. 75) ro-RO

Russian (p. 77) ru-RU

Spanish, European (p. 79) es-ES

Spanish, Mexican (p. 81) es-MX

Spanish, US (p. 83) es-US

Swedish (p. 85) sv-SE

Turkish (p. 87) tr-TR

Welsh (p. 90) cy-GB

For more information, see Phoneme and Viseme Tables for Supported Languages (p. 19).

Phoneme and Viseme Tables for Supported Languages

The following tables list the phonemes for the languages supported by Amazon Polly, along with examples and the corresponding visemes.

Topics

• Arabic (arb) (p. 20)

• Chinese, Mandarin (cmn-CN) (p. 23)

• Danish (da-DK) (p. 25)

• Dutch (nl-NL) (p. 28)

• English, Australian (en-AU) (p. 30)

• English, American (en-US) (p. 33)

• English, British (en-GB) (p. 35)

• English, Indian (en-IN) (p. 37)

• English, New Zealand (en-NZ) (p. 40)

• English, South African (en-ZA) (p. 44)

• English, Welsh (en-GB-WSL) (p. 46)

• French (fr-FR) (p. 49)

• French, Canadian (fr-CA) (p. 51)

(26)

• German (de-DE) (p. 53)

• Hindi (hi-IN) (p. 56)

• Icelandic (is-IS) (p. 58)

• Italian (it-IT) (p. 61)

• Japanese (ja-JP) (p. 63)

• Korean (ko-KR) (p. 65)

• Norwegian (nb-NO) (p. 66)

• Polish (pl-PL) (p. 69)

• Portuguese (pt-PT) (p. 71)

• Portuguese, Brazilian (pt-BR) (p. 73)

• Romanian (ro-RO) (p. 75)

• Russian (ru-RU) (p. 77)

• Spanish (es-ES) (p. 79)

• Spanish, Mexican (es-MX) (p. 81)

• Spanish, US (es-US) (p. 83)

• Swedish (sv-SE) (p. 85)

• Turkish (tr-TR) (p. 87)

• Welsh (cy-GB) (p. 90)

Arabic (arb)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the Arabic voice of Zeina that is supported by Amazon Polly.

Phoneme/Viseme Table

IPA X-SAMPA Description Example Viseme

Consonants

ʔ ? glottal stop انَأ

ʕ ?\ voiced pharyngeal

fricative رَمُع k

b b voiced bilabial

plosive دَلَب p

d d voiced alveolar

plosive يراد t

dˤ d_?\ emphatic voiced

alveolar plosive ءوَض t

d͡ʒ dZ voiced postalveolar

aﬀricate ليمَج S

ð D voiced dental

fricative َكِلذ T

ðˤ D_?\ emphatic voiced

dental fricative مالَظ T

(27)

f f voiceless labiodental

fricative لصَف f

ɡ g voiced velar plosive ارتلجنإ k

ɣ G voiced velar fricative برَغ k

h h voiceless glottal

fricative اذه k

j j palatal approximant يشمَي i

k k voiceless velar

plosive بلَك k

l l alveolar lateral

approximant ىقال t

lˠ l_G emphatic alveolar

lateral approximant هللا t

m m bilabial nasal اذام p

n n alveolar nasal رون t

p p voiceless bilabial

plosive سبَح p

q q voiceless uvular

plosive بيرَق k

r r alveolar trill لمَر r

s s voiceless alveolar

fricative لاؤُس s

sˤ s_?\ emphatic voiceless

alveolar fricative بِحاص s

ʃ S voiceless

postalveolar fricative ركُش S

t t voiceless alveolar

plosive رمَت t

tˤ t_?\ emphatic voiceless

alveolar plosive بِلاط t

θ T voiceless dental

fricative ثالَث T

v v voiced labiodental

fricative نيماتيف f

w w labio-velar

approximant دَلَو u

x x voiceless velar

fricative فْوَخ k

(28)

ħ X\ voiceless pharyngeal

fricative َلْوَح k

z z voiced alveolar

fricative روهُز s

Vowels

a a open front

unrounded vowel درَب a

aː a: long open front

unrounded vowel راد a

ɑˤ A_?\ emphatic open back

unrounded vowel لبَط a

ɑˤː A_?\: emphatic long open

back unrounded vowel

مِلاظ a

u u close back rounded

vowel برُش u

u: u: long close back

rounded vowel روس u

uˤ u_?\ emphatic close back

rounded vowel ّدُب u

uˤː u_?\: emphatic long close

back rounded vowel لوط u

i i close front

unrounded vowel تنِب i

iː i: long close front

unrounded vowel نيزَح i

iˤ i_?\ emphatic close front

unrounded vowel ّدِض i

iˤː i_?\: emphatic long close

front unrounded vowel

يضام i

e e close-mid front

unrounded vowel تكرام e

eː e: long close-mid front

unrounded vowel ليدوم e

ɔ O open-mid back

rounded vowel يجولونكت O

ɔː O: long open-mid back

rounded vowel نويزفيلت O

(29)

The following table lists the Pinyin and International Phonetic Alphabet (IPA) phonemes for the Mandarin Chinese voice that is supported by Amazon Polly. Pinyin is the international standard for Standard Chinese romanization. IPA and X-SAMPA are not commonly used but are available for English support. The IPA and X-SAMPA symbols in the table are for reference only and should not be used for Chinese transcription. Pinyin examples and the corresponding visemes are also shown.

To make Amazon Polly use phonetic pronunciation win Pinyin, use the phoneme alphabet="x- amazon-phonetic standard used" tag.

The following examples show this with each standard.

Pinyin:

<speak>

## <phoneme alphabet="x-amazon-pinyin" ph="bo2">#</phoneme>#

## <phoneme alphabet="x-amazon-pinyin" ph="bao2">#</phoneme>#

</speak>

IPA:

<speak>

## <phoneme alphabet="ipa" ph="p##k##n">pecan</phoneme>#

## <phoneme alphabet="ipa" ph="#pi.kæn">pecan</phoneme>#

</speak>

X-SAMPA:

<speak>

## <phoneme alphabet='x-sampa' ph='pI"kA:n'>pecan</phoneme>#

## <phoneme alphabet='x-sampa' ph='"pi.k{n'>pecan</phoneme>#

</speak>

NoteAmazon Polly accepts Mandarin Chinese input encoded in UTF-8 only. The GB 18030 encoding standard is not currently supported by Amazon Polly.

Pinyin IPA X-SAMPA Description Pinyin

Example Viseme

Consonants

f f f voiceless labiodental

fricative 发, fa1 f

h h h voiceless glottal fricative 和, he2 k

g k k voiceless velar plosive 古, gu3 k

k kʰ k_h aspirated voiceless velar

plosive 苦, ku3 k

l l l alveolar lateral

approximant 拉, la1 t

m m m bilabial nasal 骂, ma4 p

(30)

Pinyin IPA X-SAMPA Description Pinyin

Example Viseme

n n n alveolar nasal 那, na4 t

ng ŋ N velar nasal 正, zheng4 k

b p p voiceless bilabial plosive 爸, ba4 p

p pʰ p_h aspirated voiceless bilabial

plosive 怕, pa4 p

s s s voiceless alveolar fricative 四, si4 s

x ɕ s\ voiceless alveolo-palatal

fricative 西, xi1 J

sh ʂ s` voiceless retroﬂex fricative 是, shi4 S

d t t voiceless alveolar plosive 打, da3 t

t tʰ t_h aspirated voiceless

alveolar plosive 他, ta1 t

zh ʈ͡ʂ t`s` voiceless retroﬂex aﬀricate 之, zhi1 S

ch ʈ͡ʂʰ t`s`_h aspirated voiceless

retroﬂex aﬀricate 吃, chi1 S

s t͡s ts voiceless alveolar aﬀricate 字, zi4 s

j t͡ɕ ts\ voiceless alveolo-palatal

aﬀricate 鸡, ji1 J

q t͡ɕʰ ts\_h aspirated voiceless

alveolo-palatal aﬀricate 七, qi1 J

c t͡sʰ ts_h aspirated voiceless

alveolar aﬀricate 次, ci4 s

w w w labio-velar approximant 我, wo3 u

r ʐ z` voiced retroﬂex fricative 日, ri4 S

"er" and "r" colored syllables

er ɚ @` r-coloured mid central

vowel 二, er4 @

-r r-colored syllable 馅儿, xianr4 @

Vowels

e ɤ 7 close-mid back unrounded

vowel 恶, e4 e

e ə @ mid central vowel 恩, en1 @

a a a open front unrounded

vowel 安, an1 a

ai aɪ aI diphthong 爱, ai4 a

(31)

Example

ao aʊ aU diphthong 奥, ao4 a

ei eɪ e diphthong 诶, ei4 e

e ɛ E open-mid front unrounded

vowel 姐, jie3 E

i i i close front unrounded

vowel 鸡, ji1 i

ou oʊ oU diphthong 欧, ou1 o

o ɔ O open-mid back rounded

vowel 哦, o4 o

u u u close back rounded vowel 主, zhu3 u

yu y y close front rounded vowel 于, yu2 u

Tone marks and Additional Symbols

1 high level tone 淤, yu1

2 rising tone 鱼, yu2

3 low (falling-rising) tone 语, yu3

4 falling tone 育, yu4

0 neutral tone 的, de0

- . . syllable boundary 语音 yu3-yin1

Danish (da-DK)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the Danish voices that are supported by Amazon Polly.

Consonants

b b voiced bilabial

plosive bat p

d d voiced alveolar

plosive da t

ð D voiced dental

fricative mad, thriller T

fricative fat f

(32)

g g voiced velar plosive gat k

fricative hat k

j j palatal approximant jo i

k k voiceless velar

plosive kat k

approximant ladt t

m m bilabial nasal mat p

n n alveolar nasal nay t

ŋ N velar nasal lang k

plosive pande p

r r alveolar trill thriller, story r

ʁ R voiced uvular

fricative rat k

fricative sat s

plosive tal t

fricative vat f

w w labial-velar

approximant hav, weekend u

Vowels

ø 2 close-mid front

rounded vowel øst o

ø: 2: long close-mid front

rounded vowel øse o

ɐ 6 near-open central

vowel mor a

œ 9 open-mid front

rounded vowel skøn, grønt O

œ: 9: long open-mid front

rounded vowel høne, gøre O

ə @ mid central vowel ane @

(33)

æː {: long near-open front

unrounded vowel male a

a a open front

unrounded vowel man a

æ { near-open front

unrounded vowel adresse a

ɑ A open back

unrounded vowel lak, tak a

ɑ: A: long open back

unrounded vowel rase a

e e close-mid front

unrounded vowel midt e

e: e: long close-mid front

unrounded vowel mele e

ɛ E open-mid front

unrounded vowel mæt E

ɛ: E: long open-mid front

unrounded vowel mæle E

i i close front

unrounded vowel mit i

i: i: long close front

unrounded vowel mile i

o o close-mid back

rounded vowel foto o

o: o: long close-mid back

rounded vowel mole o

ɔ O open-mid back

rounded vowel mund O

ɔ: O: long open-mid back

rounded vowel måle O

ɒː Q: long open back

rounded vowel morse O

vowel lusk u

u: u: long close back

rounded vowel mule u

ʌ V open-mid back

unrounded kører E

y y close front rounded

vowel yt u

(34)

y: y: long close front

rounded vowel hyle u

Additional Symbols

ˈ " primary stress Alabama

ˌ % secondary stress Alabama

. . syllable boundary A.la.ba.ma

Dutch (nl-NL)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the Dutch voices that are supported by Amazon Polly.

Consonants

b b voiced bilabial

plosive bak p

d d voiced alveolar

plosive dak t

aﬀricate manager S

fricative fel f

g g voiced velar plosive goal k

ɣ G voiced velar fricative hoed k

ɦ h\ voiced glottal

fricative hand k

j j palatal approximant ja i

k k voiceless velar

plosive kap k

approximant land t

m m bilabial nasal met p

n n alveolar nasal net t

ŋ N velar nasal bang k

plosive pak p

(35)

r r alveolar trill rand r

fricative sein s

ʃ S voiceless

postalveolar fricative show S

plosive tak t

fricative vel f

ʋ v\ labiodental

approximant wit f

x x voiceless velar

fricative toch k

z z voiced alveolar

fricative ziin s

ʒ Z voiced postalveolar

fricative bagage S

Vowels

øː 2: long close-mid front

rounded vowel neus o

œy 9y diphthong buit O

ə @ mid central vowel de @

a: a: long open front

unrounded vowel baad a

ɑ: A open back

unrounded vowel bad a

e: e: long close-mid front

unrounded vowel beet e

ɜː 3: long open-mid

central unrounded vowel

barrière E

ɛ E open-mid front

unrounded vowel bed E

ɛi Ei diphthong beet E

i i close front

unrounded vowel vier i

ɪ I near-close near-front

unrounded vowel pit i

(36)

o: o: long close-mid back

rounded vowel boot o

ɔ O open-mid back

rounded vowel pot O

vowel hoed u

ʌu Vu diphthong fout E

yː y: long close front

rounded vowel fuut u

ʏ Y near-close near-front

rounded vowel hut u

English, Australian (en-AU)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the Australian English voices that are supported by Amazon Polly.

Consonants

b b voiced bilabial

plosive bed p

d d voiced alveolar

plosive dig t

aﬀricate jump S

ð D voiced dental

fricative then T

fricative five f

g g voiced velar plosive game k

fricative house k

j j palatal approximant yes i

(37)

k k voiceless velar

plosive cat k

approximant lay t

l̩ l= syllabic alveolar

lateral approximant battle t

m m bilabial nasal mouse p

m̩ m= syllabic bilabial nasal anthem p

n n alveolar nasal nap t

n̩ n= syllabic alveolar

nasal button t

ŋ N velar nasal thing k

plosive pin p

ɹ r\ alveolar approximant red r

fricative seem s

ʃ S voiceless

postalveolar fricative ship S

plosive task t

t͡ʃ tS voiceless

postalveolar aﬀricate chart S

Θ T voiceless dental

fricative thin T

fricative vest f

w w labial-velar

approximant west u

z z voiced alveolar

fricative zero s

fricative vision S

Vowels

ə @ mid central vowel arena @

əʊ @U diphthong goat @

(38)

æ { near open-front

unrounded vowel trap a

aɪ aI diphthong price a

aʊ aU diphthong mouth a

ɑː A: long open-back

unrounded vowel father a

eɪ eI diphthong face e

ɜː 3: long open mid-

nurse E

ɛ E open mid-front

unrounded vowel dress E

ɛə E@ diphthong square E

i: i long close front

unrounded vowel ﬂeece i

unrounded vowel kit i

ɪə I@ diphthong near i

ɔː OI long open-mid back

rounded vowel thought O

ɔɪ OI Diphthong choice O

ɒ Q open back rounded

vowel lot O

u: u: long close-back

rounded vowel goose u

ʊ U near-close near-back

rounded vowel foot u

ʊə U@ diphthong cure u

ʌ V Open-mid-back

unrounded vowel strut E

(39)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the American English voices that are supported by Amazon Polly.

Consonants

b b voiced bilabial

plosive bed p

d d voiced alveolar

plosive dig t

ð D voiced dental

ɡ g voiced velar plosive game k

k k voiceless velar

plosive cat k

plosive speak p

ʃ S voiceless

plosive trap t

(40)

t͡ʃ tS voiceless

θ T voiceless dental

w w labial-velar

z z voiced alveolar

fricative vision S

Vowels

ə @ mid-central vowel arena @

ɚ @` mid-central r-

colored vowel reader @

ɑ A long open-back

ɝ 3` open mid-central

unrounded r-colored vowel

nurse E

ɛ E open mid-front

i i long close front

oʊ oU diphthong goat o

ɔ O long open mid-back

ɔɪ OI diphthong choice O

u u long close-back

(41)

ʌ V open-mid-back

English, British (en-GB)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the British English voices that are supported by Amazon Polly.

Consonants

b b voiced bilabial

plosive bed p

d d voiced alveolar

plosive dig t

ð D voiced dental

k k voiceless velar

plosive cat k

(42)

nasal button t

plosive pin p

ʃ S voiceless

plosive task t

t͡ʃ tS voiceless

w w labial-velar

z z voiced alveolar

fricative vision S

Vowels

(43)

ɜː 3: long open mid- central unrounded vowel

nurse E

ɛ E open mid-front

ɔː O: long open-mid back

vowel lot O

u: u: long close-back

ʊə U@ diphthong cure u

ʌ V Open-mid-back

English, Indian (en-IN)

The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the Indian English voice supported by Amazon Polly.

For additional phonemes used in conjunction with Indian English, see Hindi (hi-IN) (p. 56).

Consonants

(44)

b b voiced bilabial

plosive bed p

d d voiced alveolar

plosive dig t

ð D voiced dental

k k voiceless velar

plosive cat k

nasal nap t

plosive pin p

ʃ S voiceless

plosive task t

t͡ʃ tS voiceless

(45)

w w labial-velar

z z voiced alveolar

fricative vision S

Vowels

ɜː 3: long open mid-

nurse E

ɛ E open mid-front

ɔː OI long open-mid back

vowel lot O