Developer Guide
Amazon Polly: Developer Guide
Copyright © Amazon Web Services, Inc. and/or its affiliates. All rights reserved.
Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.
What Is Amazon Polly? ... 1
Are You a First-time User of Amazon Polly? ... 2
How It Works ... 3
What's Next? ... 3
Getting Started ... 4
Step 1: Set Up an Account & User ... 4
Step 1.1: Sign up for AWS ... 4
Step 1.2: Create an IAM User ... 5
Next Step ... 5
Step 2: Getting Started (Console) ... 5
Exercise 1: Synthesizing Speech Quick Start (Console) ... 6
Exercise 2: Synthesizing Speech with Plain Text Input (Console) ... 6
Next Step ... 7
Step 3: Getting Started (AWS CLI) ... 7
Step 3.1: Set Up the AWS CLI ... 7
Step 3.2: Getting Started Exercise ... 8
Python Examples ... 10
Set Up Python and Test an Example (SDK) ... 10
Voices in Amazon Polly ... 12
Available Voices ... 12
Bilingual Voices ... 15
Accented bilingual voices ... 15
Fully bilingual voices ... 15
Listening to Voices ... 16
Voice Speed ... 16
Changing Your Voice Speed ... 17
Languages Supported by Amazon Polly ... 18
Phoneme and Viseme Tables for Supported Languages ... 19
Neural TTS ... 93
Feature and Region Compatibility ... 93
The Voice Engine ... 94
Choosing the Voice Engine (Console) ... 94
Choosing the Voice Engine (CLI) ... 95
Neural Voices ... 96
NTTS Newscaster Speaking Style ... 97
Speech Marks ... 100
Speech Mark Types ... 100
Visemes and Amazon Polly ... 100
Using Speech Marks ... 101
Requesting Speech Marks ... 101
Speech Mark Output ... 102
Speech Mark Examples ... 103
Requesting Speech Marks (Console) ... 104
Using SSML ... 106
Reserved Characters ... 106
Using SSML in the Console ... 108
Using SSML in the AWS CLI ... 109
Using SSML With the Synthesize-Speech Command ... 109
Synthesizing an SSML-enhanced Document ... 110
Using SSML for Common Amazon Polly Tasks ... 111
Supported SSML Tags ... 113
Identifying SSML-Enhanced Text ... 114
Adding a Pause ... 114
Emphasizing Words ... 115
Specifying Another Language for Specific Words ... 116
Placing a Custom Tag in Your Text ... 117
Adding a Pause Between Paragraphs ... 117
Using Phonetic Pronunciation ... 117
Controlling Volume, Speaking Rate, and Pitch ... 118
Setting a Maximum Duration for Synthesized Speech ... 120
Adding a Pause Between Sentences ... 122
Controlling How Special Types of Words Are Spoken ... 123
Pronouncing Acronyms and Abbreviations ... 125
Improving Pronunciation by Specifying Parts of Speech ... 126
Adding the Sound of Breathing ... 127
Newscaster speaking style ... 129
Adding Dynamic Range Compression ... 130
Speaking Softly ... 131
Controlling Timbre ... 132
Whispering ... 133
Managing Lexicons ... 134
Applying Multiple Lexicons ... 134
Managing Lexicons Using the Console ... 135
Uploading Lexicons Using the Console ... 135
Applying Lexicons Using the Console (Synthesize Speech) ... 136
Filtering the Lexicon List Using the Console ... 137
Downloading Lexicons Using the Console ... 137
Deleting a Lexicon Using the Console ... 138
Managing Lexicons Using the AWS CLI ... 138
PutLexicon ... 138
GetLexicon ... 143
ListLexicons ... 143
DeleteLexicon ... 144
Creating Long Audio Files ... 145
Setting Up the IAM Policy for Asynchronous Synthesis ... 145
Creating Long Audio Files (Console) ... 146
Creating Long Audio Files (CLI) ... 147
Code and Application Examples ... 150
Sample Code ... 150
Java Samples ... 150
Python Samples ... 156
Example Applications ... 161
Python Example ... 161
Java Example ... 171
iOS Example ... 175
Android Example ... 177
Amazon Polly for Windows (SAPI) ... 179
Installing and Configuring Amazon Polly for Windows (SAPI) ... 179
Create an IAM User for the AWS Client ... 179
Install the AWS CLI for Windows ... 180
Create a Profile for the AWS Client ... 180
Install the Amazon Polly for Windows Plugin ... 180
Using Amazon Polly in Applications ... 181
AWS for WordPress Plugin ... 184
Installation Prerequisites ... 184
Creating an AWS Account ... 184
Creating an IAM User ... 185
Creating a WordPress Website ... 186
Installing and Configuring the Plugin ... 186
Configuring the Plugin ... 187
Customizing WordPress ... 187
Use Audio Only and Word Only tags in your content ... 188
Adding Translated Text to Your Post ... 189
Amazon Pollycast ... 189
Positioning the Player ... 190
Storing the Audio Files ... 190
Quotas ... 192
Supported Regions ... 192
Throttling ... 192
Pronunciation Lexicons ... 192
SynthesizeSpeech API Operation ... 193
SpeechSynthesisTask API Operations ... 193
Speech Synthesis Markup Language (SSML) ... 193
Security ... 194
Data Protection ... 194
Encryption at Rest ... 195
Encryption in Transit ... 195
Internetwork Traffic Privacy ... 195
Identity and Access Management ... 195
Audience ... 195
Authenticating With Identities ... 196
Managing Access Using Policies ... 198
How Amazon Polly Works with IAM ... 199
Identity-Based Policy Examples ... 201
Amazon Polly API Permissions Reference ... 205
Logging and Monitoring ... 206
Compliance Validation ... 206
Resilience ... 207
Infrastructure Security ... 207
Security Best Practices ... 207
Logging Amazon Polly API Calls with AWS CloudTrail ... 209
Amazon Polly Information in CloudTrail ... 209
Example: Amazon Polly Log File Entries ... 210
CloudWatch Integration ... 212
Getting CloudWatch Metrics (Console) ... 212
Getting CloudWatch Metrics (CLI) ... 212
Amazon Polly Metrics ... 213
Dimensions for Amazon Polly Metrics ... 214
API Reference ... 215
Actions ... 215
DeleteLexicon ... 216
DescribeVoices ... 218
GetLexicon ... 221
GetSpeechSynthesisTask ... 223
ListLexicons ... 225
ListSpeechSynthesisTasks ... 227
PutLexicon ... 229
StartSpeechSynthesisTask ... 231
SynthesizeSpeech ... 237
Data Types ... 241
Lexicon ... 242
LexiconAttributes ... 243
LexiconDescription ... 245
SynthesisTask ... 246
Voice ... 249
Document History ... 251
AWS glossary ... 255
What Is Amazon Polly?
Amazon Polly is a cloud service that converts text into lifelike speech. You can use Amazon Polly to develop applications that increase engagement and accessibility. Amazon Polly supports multiple languages and includes a variety of lifelike voices, so you can build speech-enabled applications that work in multiple locations and use the ideal voice for your customers. With Amazon Polly, you only pay for the text you synthesize. You can also cache and replay Amazon Polly’s generated speech at no additional cost.
Additionally, Amazon Polly includes a number of Neural Text-to-Speech (NTTS) voices, delivering ground-breaking improvements in speech quality through a new machine learning approach, thereby offering to customers the most natural and human-like text-to-speech voices possible. Neural TTS technology also supports a Newscaster speaking style that is tailored to news narration use cases.
Common use cases for Amazon Polly include, but are not limited to, mobile applications such as
newsreaders, games, eLearning platforms, accessibility applications for visually impaired people, and the rapidly growing segment of Internet of Things (IoT).
Amazon Polly is certified for use with regulated workloads for HIPAA (the Health Insurance Portability and Accountability Act of 1996), and Payment Card Industry Data Security Standard (PCI DSS).
Some of the benefits of using Amazon Polly include:
• High quality – Amazon Polly offers both new neural TTS and best-in-class standard TTS technology to synthesize the superior natural speech with high pronunciation accuracy (including abbreviations, acronym expansions, date/time interpretations, and homograph disambiguation).
• Low latency – Amazon Polly ensures fast responses, which make it a viable option for low-latency use cases such as dialog systems.
• Support for a large portfolio of languages and voices – Amazon Polly supports dozens of voices languages, offering male and female voice options for most languages. Neural TTS currently supports three British English voices and eight US English voices. This number will continue to increase as we bring more neural voices online. US English voices Matthew and Joanna can also use the Neural Newscaster speaking style, similar to what you might hear from a professional news anchor.
• Cost-effective – Amazon Polly's pay-per-use model means there are no setup costs. You can start small and scale up as your application grows.
• Cloud-based solution – On-device TTS solutions require significant computing resources, notably CPU power, RAM, and disk space. These can result in higher development costs and higher power consumption on devices such as tablets, smart phones, and so on. In contrast, TTS conversion done in the AWS Cloud dramatically reduces local resource requirements. This enables support of all the available languages and voices at the best possible quality. Moreover, speech improvements are instantly available to all end-users and do not require additional updates for devices.
Are You a First-time User of Amazon Polly?
If you are a first-time user of Amazon Polly, we recommend that you read the following sections in the listed order:
1.How Amazon Polly Works (p. 3) – This section introduces various Amazon Polly inputs and options that you can work with in order to create an end-to-end experience.
2.Getting Started with Amazon Polly (p. 4) – In this section, you set up your account and test Amazon Polly speech synthesis.
3.Example Applications (p. 161) – This section provides additional examples that you can use to explore Amazon Polly.
How Amazon Polly Works
Amazon Polly converts input text into life-like speech. You call one of the speech synthesis methods, provide the text that you want to synthesize, choose one of the Neural Text-to-Speech (NTTS) or Standard Text-to-Speech (TTS) voices, and specify an audio output format. Amazon Polly then synthesizes the provided text into a high-quality speech audio stream.
• Input text – Provide the text that you want to synthesize, and Amazon Polly returns an audio stream.
You can provide the input as plain text or in Speech Synthesis Markup Language (SSML) format. With SSML you can control various aspects of speech, such as pronunciation, volume, pitch, and speech rate.
For more information, see Generating Speech from SSML Documents (p. 106).
• Available voices – Amazon Polly provides a portfolio of languages and a variety of voices, including a bilingual voice (for both English and Hindi). For most languages you can choose from several voices, both male and female. When launching a speech synthesis task, you specify the voice ID, and then Amazon Polly uses this voice to convert the text to speech. Amazon Polly is not a translation service
—the synthesized speech is in the same language as the text. However, if the text is in a different language than designated for the voice, numbers represented as digits (for example, 53, not fifty- three) are synthesized in the language of the voice and not the text. For more information, see Voices in Amazon Polly.
• Output format – Amazon Polly can deliver the synthesized speech in multiple formats. You can select the audio format that suits your needs. For example, you might request the speech in the MP3 or Ogg Vorbis format for consumption by web and mobile applications. Or, you might request the PCM output format for consumption by AWS IoT devices and telephony solutions.
What's Next?
If you are new to Amazon Polly, we recommend that you to read the following topics in order:
• Getting Started with Amazon Polly (p. 4)
• Example Applications (p. 161)
• Quotas in Amazon Polly (p. 192)
Getting Started with Amazon Polly
Amazon Polly provides simple API operations that you can easily integrate with your existing
applications. For a list of supported operations, see Actions (p. 215). You can use either of the following options:
• AWS SDKs – When using the SDKs, your requests to Amazon Polly are automatically signed and authenticated using the credentials you provide. This is the recommended choice for building your applications.
• AWS CLI – You can use the AWS CLI to access any of Amazon Polly functionality without having to write any code.
The following sections describe how to get set up and provide an introductory exercise.
Topics
• Step 1: Set Up an AWS Account and Create a User (p. 4)
• Step 2: Getting Started (Console) (p. 5)
• Step 3: Getting Started (AWS CLI) (p. 7)
• Python Examples (p. 10)
Step 1: Set Up an AWS Account and Create a User
Before you use Amazon Polly for the first time, complete the following tasks:
1.Step 1.1: Sign up for AWS (p. 4) 2.Step 1.2: Create an IAM User (p. 5)
Step 1.1: Sign up for AWS
When you sign up for Amazon Web Services (AWS), your AWS account is automatically signed up for all services in AWS, including Amazon Polly. You are charged only for the services that you use.
With Amazon Polly, you pay only for the resources you use. If you are a new AWS customer, you can get started with Amazon Polly for free. For more information, see AWS Free Usage Tier.
If you already have an AWS account, skip to the next step. If you don't have an AWS account, perform the steps in the following procedure to create one.
To create an AWS account
1. Open https://portal.aws.amazon.com/billing/signup.
2. Follow the online instructions.
Part of the sign-up procedure involves receiving a phone call and entering a verification code on the phone keypad.
Note your AWS account ID because you'll need it for the next step.
Services in AWS, such as Amazon Polly, require that you provide credentials when you access them so that the service can determine whether you have permissions to access the resources owned by that service. The console requires your password. You can create access keys for your AWS account to access the AWS CLI or API. However, we don't recommend that you access AWS using the credentials for your AWS account. Instead, we recommend that you use AWS Identity and Access Management (IAM). Create an IAM user, add the user to an IAM group with administrative permissions, and then grant administrative permissions to the IAM user that you created. You can then access AWS using a special URL and that IAM user's credentials.
If you signed up for AWS, but you haven't created an IAM user for yourself, you can create one using the IAM console.
The exercises in this guide assume that you have a user (adminuser) with administrator privileges.
Follow the procedure to create adminuser in your account.
To create an administrator user and sign in to the console
1. Create an administrator user called adminuser in your AWS account. For instructions, see Creating Your First IAM User and Administrators Group in the IAM User Guide.
2. A user can sign in to the AWS Management Console using a special URL. For more information, How Users Sign In to Your Account in the IAM User Guide.
Important
The Getting Started exercises use the adminuser credentials. For added security, when building and testing production application we recommend you create a service-specific administrator user who has permissions for only the Amazon Polly actions. For an example policy that grants Amazon Polly specific permissions, see Example 1: Allow All Amazon Polly Actions (p. 203).
For more information about IAM, see the following:
• AWS Identity and Access Management (IAM)
• Getting started
• IAM User Guide
Next Step
Step 2: Getting Started (Console) (p. 5)
Step 2: Getting Started (Console)
The Amazon Polly console is the easiest way to get started testing and using Amazon Polly's speech synthesizing. The Amazon Polly console supports synthesizing speech from either plain text or SSML input.
Topics
• Exercise 1: Synthesizing Speech Quick Start (Console) (p. 6)
• Exercise 2: Synthesizing Speech with Plain Text Input (Console) (p. 6)
• Next Step (p. 7)
Exercise 1: Synthesizing Speech Quick Start (Console)
The Quick Start walks you through the fastest way to test the Amazon Polly speech synthesis for speech quality. When you select the Text-to-Speech tab, the text field for entering your text is pre-loaded with example text so you can quickly try out Amazon Polly.
To quickly test Amazon Polly (Console)
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
2. Choose the Text-to-Speech tab.
3. Turn off SSML.
4. Under Engine, choose Standard or Neural.
5. Choose a language and AWS Region, then choose a voice. If you choose Neural for Engine, only the languages and voices that support NTTS are available. All Standard voices are disabled.
6. Choose Listen.
For more in-depth testing, see the following topics:
• Exercise 2: Synthesizing Speech with Plain Text Input (Console) (p. 6)
• Using SSML (Console) (p. 108)
• Applying Lexicons Using the Console (Synthesize Speech) (p. 136)
Exercise 2: Synthesizing Speech with Plain Text Input (Console)
The following procedure synthesizes speech using plain text input. Note how "W3C" and the date
"10/3" (October 3rd) are synthesized.
To synthesize speech using plain text input (console)
1. After logging on to the Amazon Polly console, choose Try Amazon Polly, and then choose the Text- to-Speech tab.
2. Turn off SSML.
3. Type or paste this text into the input box.
He was caught up in the game.
In the middle of the 10/3/2014 W3C meeting he shouted, "Score!" quite loudly.
4. For Engine, choose Standard or Neural.
5. Choose a language and AWS Region, then choose a voice. If you choose Neural for Engine, only the languages and voices that support NTTS are available. All Standard voices are disabled.
6. To listen to the speech immediately, choose Listen.
7. To save the speech to a file, do one of the following:
a. Choose Download.
b. To change to a different file format, expand Additional settings, turn on Speech file format settings, choose the file format that you want, and then choose Download.
For more in-depth examples, see the following topics:
• Using SSML (Console) (p. 108)
Next Step
Step 3: Getting Started (AWS CLI) (p. 7)
Step 3: Getting Started (AWS CLI)
You can perform almost all of the Amazon Polly operations that you can perform using the Amazon Polly console using the AWS Command Line Interface (AWS CLI). You can't listen to synthesized speech using the AWS CLI. Instead, you must save it to a file and then open the file in an application that can play it.
Topics
• Step 3.1: Set Up the AWS Command Line Interface (AWS CLI) (p. 7)
• Step 3.2: Getting Started Exercise Using the AWS CLI (p. 8)
Step 3.1: Set Up the AWS Command Line Interface (AWS CLI)
Follow the steps to download and configure the AWS CLI.
Important
You don't need the AWS CLI to perform the steps in this exercise. However, some of the exercises in this guide use the AWS CLI. You can skip this step and go to Step 3.2: Getting Started Exercise Using the AWS CLI (p. 8), and then set up the AWS CLI later when you need it.
To set up the AWS CLI
1. Download and configure the AWS CLI. For instructions, see the following topics in the AWS Command Line Interface User Guide:
• Getting Set Up with the AWS Command Line Interface
• Configuring the AWS Command Line Interface
2. Add a named profile for the administrator user in the AWS CLI config file. You use this profile when running the AWS CLI commands. For more information about named profiles, see Named Profiles in the AWS Command Line Interface User Guide.
[profile adminuser]
aws_access_key_id = adminuser access key ID
aws_secret_access_key = adminuser secret access key region = aws-region
For a list of available AWS Regions and those supported by Amazon Polly, see Regions and Endpoints in the Amazon Web Services General Reference.
NoteIf you're using the Region supported by Amazon Polly that you specified when you configured the AWS CLI, omit the following line from the AWS CLI code examples.
--region aws-region
3. Verify the setup by typing the following help command at the command prompt.
aws help
A list of valid AWS commands should appear in the AWS CLI window.
To enable Amazon Polly in the AWS CLI (optional)
If you have previously downloaded and configured the AWS CLI, Amazon Polly might not be available unless you reconfigure the AWS CLI. This procedure checks to see if this is necessary and provides instructions if Amazon Polly is not automatically available.
1. Verify the availability of Amazon Polly by typing the following help command at the AWS CLI command prompt.
aws polly help
If a description of Amazon Polly and a list of valid commands appears in the AWS CLI window, Amazon Polly is available in the AWS CLI and can be used immediately. In this case, you can skip the rest of this procedure. If this is not displayed, continue with Step 2.
2. Use one of the two following options to enable Amazon Polly:
a. Uninstall and reinstall the AWS CLI.
For instructions, see Installing the AWS Command Line Interface in the AWS Command Line Interface User Guide.
or
b. Download the file service-2.json.
At the command prompt, run the following command.
aws configure add-model --service-model file://service-2.json --service-name polly 3. Reverify the availability of Amazon Polly.
aws polly help
The description of Amazon Polly should be visible.
Next Step
Step 3.2: Getting Started Exercise Using the AWS CLI (p. 8)
Step 3.2: Getting Started Exercise Using the AWS CLI
Now you can test the speech synthesis offered by Amazon Polly. In this exercise, you call the
SynthesizeSpeech operation by passing in sample text. You can save the resulting audio as a file and verify its content.
The following AWS CLI example is formatted for Unix, Linux, and macOS. For Windows, replace the backslash (\) Unix continuation character at the end of each line with a caret (^) and use full quotation marks (") around the input text with single quotes (') for interior tags.
aws polly synthesize-speech \ --output-format mp3 \ --voice-id Joanna \
--text 'Hello, my name is Joanna. I learned about the W3C on 10/3 of last year.' \ hello.mp3
In the call to synthesize-speech, you provide sample text for the synthesis, the voice to use (by providing a voice ID, explained in the following step 3), and the output format. The command saves the resulting audio to the hello.mp3 file.
In addition to the MP3 file, the operation sends the following output to the console.
{ "ContentType": "audio/mpeg", "RequestCharacters": "71"
}
2. Play the resulting hello.mp3 file to verify the synthesized speech.
3. Get the list of available voices by using the DescribeVoices operation. Run the following describe-voices AWS CLI command.
aws polly describe-voices
In response, Amazon Polly returns the list of all available voices. For each voice, the response provides the following metadata: voice ID, language code, language name, and the gender of the voice. The following is a sample response.
{
"Voices": [ {
"Gender": "Female", "Name": "Salli",
"LanguageName": "US English", "Id": "Salli",
"LanguageCode": "en-US"
}, {
"Gender": "Female", "Name": "Joanna",
"LanguageName": "US English", "Id": "Joanna",
"LanguageCode": "en-US"
} ] }
Optionally, you can specify the language code to find the available voices for a specific language.
Amazon Polly supports dozens of voices. The following example lists all the voices for Brazilian Portuguese.
--language-code pt-BR
For a list of language codes, see Languages Supported by Amazon Polly (p. 18). These language codes are W3C language identification tags (ISO 639 code for the language name-ISO 3166 country code). For example, en-US (US English), en-GB (British English), and es-ES (Spanish), etc.
You can also use the help option in the AWS CLI to get the list of language codes:
aws polly describe-voices help
Python Examples
This guide provides additional examples, some of which are Python code examples that use AWS SDK for Python (Boto) to make API calls to Amazon Polly. We recommend that you set up Python and test the example code provided in the following section. For additional examples, see Example Applications (p. 161).
Set Up Python and Test an Example (SDK)
To test the Python example code, you need the AWS SDK for Python (Boto). For instruction, see AWS SDK for Python (Boto3).
To test the example Python code
The following Python code example performs the following actions:
• Uses the AWS SDK for Python (Boto) to send a SynthesizeSpeech request to Amazon Polly (by providing simple text as input).
• Accesses the resulting audio stream in the response and saves the audio to a file (speech.mp3) on your local disk.
• Plays the audio file with the default audio player for your local system.
Save the code to a file (example.py) and run it.
"""Getting Started Example for Python 2.7+/3.3+"""
from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError from contextlib import closing
import os import sys import subprocess
from tempfile import gettempdir
# Create a client using the credentials and region defined in the [adminuser]
# section of the AWS credentials file (~/.aws/credentials).
session = Session(profile_name="adminuser") polly = session.client("polly")
try:
# Request speech synthesis
response = polly.synthesize_speech(Text="Hello world!", OutputFormat="mp3", VoiceId="Joanna")
except (BotoCoreError, ClientError) as error:
# The service returned an error, exit gracefully
# Access the audio stream from the response if "AudioStream" in response:
# Note: Closing the stream is important because the service throttles on the # number of parallel connections. Here we are using contextlib.closing to # ensure the close method of the stream object will be called automatically # at the end of the with statement's scope.
with closing(response["AudioStream"]) as stream:
output = os.path.join(gettempdir(), "speech.mp3") try:
# Open a file for writing the output as a binary stream with open(output, "wb") as file:
file.write(stream.read()) except IOError as error:
# Could not write to file, exit gracefully print(error)
sys.exit(-1) else:
# The response didn't contain audio data, exit gracefully print("Could not stream audio")
sys.exit(-1)
# Play the audio using the platform's default player if sys.platform == "win32":
os.startfile(output) else:
# The following works on macOS and Linux. (Darwin = mac, xdg-open = linux).
opener = "open" if sys.platform == "darwin" else "xdg-open"
subprocess.call([opener, output])
For additional examples including an example application, see Example Applications (p. 161).
Voices in Amazon Polly
Amazon Polly provides a number of different voices for you to use. To hear example voices, see the Amazon Polly product overview. To hear a specific voice speak a sample that you provide, you can use the Amazon Polly console. For instructions, see Listening to the Voices (p. 16).
Available Voices
Amazon Polly provides a variety of different voices in multiple languages for synthesizing speech from text.
Language Name/ID Gender Neural Voice Standard
Voice
Arabic (arb) Zeina Female No Yes
Chinese, Mandarin (cmn-CN)
Zhiyu Female No Yes
Danish (da-
DK) Naja
Mads
Female Male
No No
Yes Yes Dutch (nl-NL) Lotte
Ruben
Female Male
No No
Yes Yes English
(Australian) (en-AU)
Nicole Olivia Russell
Female Female Male
No Yes No
Yes No Yes English
(British) (en- GB)
Amy**
Emma Brian
Female Female Male
Yes Yes Yes
Yes Yes Yes English
(Indian) (en- IN)
Aditi*
Raveena
Female Female
No No
Yes Yes English (New
Zealand) (en- NZ)
Aria Female Yes No
English (South African) (en- ZA)
Ayanda Female Yes No
English (US)
(en-US) Ivy
Joanna**
Female (child) Female
Yes Yes
Yes Yes
Voice Kendra
Kimberly Salli Joey Justin Kevin Matthew**
Female Female Female Male Male (child) Male (child) Male
Yes Yes Yes Yes Yes Yes Yes
Yes Yes Yes Yes Yes No Yes English
(Welsh) (en- GB-WLS)
Geraint Male No Yes
French (fr-FR) Céline/Celine Léa
Mathieu
Female Female Male
No Yes No
Yes Yes Yes French
(Canadian) (fr- CA)
Chantal Gabrielle
Female Female
No Yes
Yes No German (de-
DE) Marlene
Vicki Hans
Female Female Male
No Yes No
Yes Yes Yes
Hindi (hi-IN) Aditi* Female No Yes
Icelandic (is-
IS) Dóra/Dora
Karl
Female Male
No No
Yes Yes Italian (it-IT) Carla
Bianca Giorgio
Female Female Male
No Yes No
Yes Yes Yes Japanese (ja-
JP) Mizuki
Takumi
Female Male
No Yes
Yes Yes Korean (ko-
KR) Seoyeon Female Yes Yes
Norwegian
(nb-NO) Liv Female No Yes
Language Name/ID Gender Neural Voice Standard Voice Polish (pl-PL) Ewa
Maja Jacek Jan
Female Female Male Male
No No No No
Yes Yes Yes Yes Portuguese
(Brazilian) (pt- BR)
Camila Vitória/Vitoria Ricardo
Female Female Male
Yes No No
Yes Yes Yes Portuguese
(European) (pt-PT)
Inês/Ines Cristiano
Female Male
No No
Yes Yes Romanian (ro-
RO) Carmen Female No Yes
Russian (ru-
RU) Tatyana
Maxim
Female Male
No No
Yes Yes Spanish
(European) (es-ES)
Conchita Lucia Enrique
Female Female Male
No Yes No
Yes Yes Yes Spanish
(Mexican) (es- MX)
Mia Female No Yes
Spanish (US)
(es-US) Lupe**
Penélope/
Penelope Miguel
Female Female Male
Yes No No
Yes Yes Yes
Swedish (sv-
SE) Astrid Female No Yes
Turkish (tr-TR) Filiz Female No Yes
Welsh (cy-GB) Gwyneth Female No Yes
* This voice is bilingual and can speak both English and Hindi. For more information, see Bilingual Voices (p. 15).
** These voices can be used with Newscaster speaking styles when used with the Neural format. For more information, see NTTS Newscaster Speaking Style (p. 97).
customers. To learn more about Amazon Polly Brand Voices, please see Brand Voice.
Bilingual Voices
Amazon Polly has two ways of producing bilingual voices:
• Accented bilingual voices (p. 15)
• Fully bilingual voices (p. 15)
Accented bilingual voices
Accented bilingual voices can be created using any Amazon Polly voice, but only when using SSML tags.
Normally, all words in the input text are spoken in the default language of the voice specified you're using.
For example, if you're using the voice of Joanna (who speaks US English), Amazon Polly speaks the following in the Joanna voice without a French accent:
<speak>
Why didn't she just say, 'Je ne parle pas français?'
</speak>
In this case, the words Je ne parle pas français are spoken as they would be if they were English.
However, if you use the Joanna voice with the <lang> tag, Amazon Polly speaks the sentence in the Joanna voice in American-accented French:
<speak>
Why didn't she just say, <lang xml:lang="fr-FR">'Je ne parle pas français?'</lang>.
</speak>
Because Joanna is not a native French voice, pronunciation is based on her native language, US English.
For instance, although perfect French pronunciation features an uvual trill /R/ in the word français, Joanna's US English voice pronounces this phoneme as the corresponding sound /r/.
If you use the voice of Giorgio, who speaks Italian, with the following text, Amazon Polly speaks the sentence in Giorgio's voice with an Italian pronunciation:
<speak>
Mi piace Bruce Springsteen.
</speak>
Fully bilingual voices
A fully bilingual voice like Aditi (Indian English and Hindi) can speak two languages fluently. This gives you the ability to use words and phrases from both languages in a single text using the same voice.
Currently, Aditi is the only fully bilingual voice available.
Using a Bilingual Voice (Aditi)
Aditi speaks both Indian English (en-IN) and Hindi (hi-IN) fluently. You can synthesize speech in both English and Hindi, and the voice can switch between the two languages even within the same sentence.
Hindi can be used in two different forms:
• Devanagari: "उसेन कहँा, खेल तोह अब शुूर होगा"
• Romanagari (using the Latin alphabet): "Usne kahan, khel toh ab shuru hoga"
Additionally, it's possible to mix English and Hindi of either or both forms within a single sentence:
• Devanagari + English: "This is the song कभी कभी अदिति"
• Romanagari + English: "This is the song from the movie Jaane Tu Ya Jaane Na."
• Devanagari + Romanagari + English: "This is the song कभी कभी अदिति from the movie Jaane Tu Ya Jaane Na."
Because Aditi is a bilingual voice, text in all of these cases will be read correctly, as Amazon Polly can differentiate between the languages and scripts.
Amazon Polly also supports numbers, dates, times, and currency expansion in both English (Arabic numerals) and Hindi (Devanagari numerals). By default, Arabic numerals are read in Indian English. To make Amazon Polly read them in Hindi, you must use the hi-IN language code parameter.
Listening to the Voices
You can use the Amazon Polly console to hear a sample from any of the voices available in Amazon Polly To listen to a voice in Amazon Polly
1. Sign in to the AWS Management Console and open the Amazon Polly console at https://
console.aws.amazon.com/polly/.
2. Choose the Text-to-Speech tab.
3. For Engine, choose Standard or Neural.
4. Choose a language and a Region, then choose a voice.
5. Enter text for the voice to speak or use the default phrase, and then choose Listen.
You can choose any of the languages offered by Amazon Polly and the console will display the voices available for that language. In most cases, there will be at least one male and one female voice, often more than one of each. A few only have a single voice. For a complete list, see Voices in Amazon Polly (p. 12)
NoteThe inventory of voices and the number of languages included is continually being updated to include additional choices. To suggest a new language or voice, feel free to provide feedback on this page. Unfortunately, we are not able to comment on plans for specific new languages be they are released.
Each voice is created using native language speakers, so there are variations from voice to voice, even within the same language. When selecting a voice for your project, you should test each of the possible voices with a passage of text to see which best suits your needs.
Voice Speed
Because of the natural variation between voices, each available voice will speak the text at slightly different speeds. For instance, with US English voices, Ivy and Joanna are slightly faster than Matthew when saying "Mary had a little lamb," and considerably faster than Joey.
you can find how long it takes for your voice to say the selected text using SpeechMarks. For more information on using speechmarks in Amazon Polly, see Using Speech Marks (p. 101)
To see approximately how long it takes to speak a text passage 1. Open the AWS CLI.
2. Run the following code, filling in as needed
aws polly synthesize-speech \
--language-code optional language code if needed --output-format json \
--voice-id [name of desired voice] \ --text '[desired text]' \
--speech-mark-types='["viseme"]' \ LengthOfText.txt
3. Open LengthOfText.txt
If the text were "Mary had a little lamb," the last few lines returned by Amazon Polly would be:
{"time":882,"type":"viseme","value":"t"}
{"time":964,"type":"viseme","value":"a"}
{"time":1082,"type":"viseme","value":"p"}
The last viseme, essentially the sound for the final letters in "lamb" starts 1082 milliseconds after the beginning of the speech. While this is not exactly the length of the audio, it's close and can serve as the basis for comparison between voices.
Changing Your Voice Speed
For certain applications, you may find that you'd prefer the voice you like be slowed down, or speeded up. If the speed of the voice is a concern, Amazon Polly provides the ability to modify this using SSML tags.
For example:
Your organization is making an application that reads books to immigrant audiences. The audience speaks English, but their fluency is limited. In this case, you might consider slowing the rate of speech to give your audience a little more time for comprehension while the application is speaking.
Amazon Polly helps you slow down the rate of speech using the SSML <prosody> tag, as in:
<speak>
In some cases, it might help your audience to <prosody rate="85%">slow the speaking rate slightly to aid in comprehension.</prosody>
</speak>
or
<speak>
In some cases, it might help your audience to <prosody rate="slow">slow the speaking rate slightly to aid in comprehension.</prosody>
</speak>
Two speed options are available to you when using SSML with Amazon Polly:
• Preset speeds: x-slow, slow, medium, fast, and x-fast. In these cases, the speed of each option is approximate, depending on your preferred voice. The medium option is the normal speed of the voice.
• n% of speech rate: any percentage of the speech rate, between 20% and 200% can be used. In these cases, you can choose exactly the speed you want. However, the actual speed of the voice is approximate, depending on the voice you've chosen. 100% is considered to be the normal speed of the voice.
Because the speed of each option is approximate and depends on the voice you choose, we recommend that you test your selected voice at various speeds to see what exactly meets your needs.
For more information on using the prosody tag to best effect, see Controlling Volume, Speaking Rate, and Pitch (p. 118)
Languages Supported by Amazon Polly
The following languages are supported by Amazon Polly and can be used to synthesize speech. With each language is the language code. These language codes are W3C language identification tags (ISO 639-3 for the language name and ISO 3166 for the country code).
For in-depth tables showing the phonemes and visemes associated with each language, choose the link on each language in the table below.
Language Language Code
Arabic (p. 20) arb
Chinese, Mandarin (p. 23) cmn-CN
Danish (p. 25) da-DK
Dutch (p. 28) nl-NL
English, Australian (p. 30) en-AU
English, British (p. 35) en-GB
English, Indian (p. 37) en-IN
English, New Zealand (p. 40) en-NZ
English, South African (p. 44) en-ZA
English, US (p. 33) en-US
English, Welsh (p. 46) en-GB-WLS
French (p. 49) fr-FR
French, Canadian (p. 51) fr-CA
Hindi (p. 56) hi-IN
German (p. 53) de-DE
Icelandic (p. 58) is-IS
Italian (p. 61) it-IT
Japanese (p. 63) ja-JP
Korean (p. 65) ko-KR
Norwegian (p. 66) nb-NO
Polish (p. 69) pl-PL
Portuguese, Brazilian (p. 73) pt-BR
Portuguese, European (p. 71) pt-PT
Romanian (p. 75) ro-RO
Russian (p. 77) ru-RU
Spanish, European (p. 79) es-ES
Spanish, Mexican (p. 81) es-MX
Spanish, US (p. 83) es-US
Swedish (p. 85) sv-SE
Turkish (p. 87) tr-TR
Welsh (p. 90) cy-GB
For more information, see Phoneme and Viseme Tables for Supported Languages (p. 19).
Phoneme and Viseme Tables for Supported Languages
The following tables list the phonemes for the languages supported by Amazon Polly, along with examples and the corresponding visemes.
Topics
• Arabic (arb) (p. 20)
• Chinese, Mandarin (cmn-CN) (p. 23)
• Danish (da-DK) (p. 25)
• Dutch (nl-NL) (p. 28)
• English, Australian (en-AU) (p. 30)
• English, American (en-US) (p. 33)
• English, British (en-GB) (p. 35)
• English, Indian (en-IN) (p. 37)
• English, New Zealand (en-NZ) (p. 40)
• English, South African (en-ZA) (p. 44)
• English, Welsh (en-GB-WSL) (p. 46)
• French (fr-FR) (p. 49)
• French, Canadian (fr-CA) (p. 51)
• German (de-DE) (p. 53)
• Hindi (hi-IN) (p. 56)
• Icelandic (is-IS) (p. 58)
• Italian (it-IT) (p. 61)
• Japanese (ja-JP) (p. 63)
• Korean (ko-KR) (p. 65)
• Norwegian (nb-NO) (p. 66)
• Polish (pl-PL) (p. 69)
• Portuguese (pt-PT) (p. 71)
• Portuguese, Brazilian (pt-BR) (p. 73)
• Romanian (ro-RO) (p. 75)
• Russian (ru-RU) (p. 77)
• Spanish (es-ES) (p. 79)
• Spanish, Mexican (es-MX) (p. 81)
• Spanish, US (es-US) (p. 83)
• Swedish (sv-SE) (p. 85)
• Turkish (tr-TR) (p. 87)
• Welsh (cy-GB) (p. 90)
Arabic (arb)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the Arabic voice of Zeina that is supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
ʔ ? glottal stop انَأ
ʕ ?\ voiced pharyngeal
fricative رَمُع k
b b voiced bilabial
plosive دَلَب p
d d voiced alveolar
plosive يراد t
dˤ d_?\ emphatic voiced
alveolar plosive ءوَض t
d͡ʒ dZ voiced postalveolar
affricate ليمَج S
ð D voiced dental
fricative َكِلذ T
ðˤ D_?\ emphatic voiced
dental fricative مالَظ T
f f voiceless labiodental
fricative لصَف f
ɡ g voiced velar plosive ارتلجنإ k
ɣ G voiced velar fricative برَغ k
h h voiceless glottal
fricative اذه k
j j palatal approximant يشمَي i
k k voiceless velar
plosive بلَك k
l l alveolar lateral
approximant ىقال t
lˠ l_G emphatic alveolar
lateral approximant هللا t
m m bilabial nasal اذام p
n n alveolar nasal رون t
p p voiceless bilabial
plosive سبَح p
q q voiceless uvular
plosive بيرَق k
r r alveolar trill لمَر r
s s voiceless alveolar
fricative لاؤُس s
sˤ s_?\ emphatic voiceless
alveolar fricative بِحاص s
ʃ S voiceless
postalveolar fricative ركُش S
t t voiceless alveolar
plosive رمَت t
tˤ t_?\ emphatic voiceless
alveolar plosive بِلاط t
θ T voiceless dental
fricative ثالَث T
v v voiced labiodental
fricative نيماتيف f
w w labio-velar
approximant دَلَو u
x x voiceless velar
fricative فْوَخ k
IPA X-SAMPA Description Example Viseme
ħ X\ voiceless pharyngeal
fricative َلْوَح k
z z voiced alveolar
fricative روهُز s
Vowels
a a open front
unrounded vowel درَب a
aː a: long open front
unrounded vowel راد a
ɑˤ A_?\ emphatic open back
unrounded vowel لبَط a
ɑˤː A_?\: emphatic long open
back unrounded vowel
مِلاظ a
u u close back rounded
vowel برُش u
u: u: long close back
rounded vowel روس u
uˤ u_?\ emphatic close back
rounded vowel ّدُب u
uˤː u_?\: emphatic long close
back rounded vowel لوط u
i i close front
unrounded vowel تنِب i
iː i: long close front
unrounded vowel نيزَح i
iˤ i_?\ emphatic close front
unrounded vowel ّدِض i
iˤː i_?\: emphatic long close
front unrounded vowel
يضام i
e e close-mid front
unrounded vowel تكرام e
eː e: long close-mid front
unrounded vowel ليدوم e
ɔ O open-mid back
rounded vowel يجولونكت O
ɔː O: long open-mid back
rounded vowel نويزفيلت O
The following table lists the Pinyin and International Phonetic Alphabet (IPA) phonemes for the Mandarin Chinese voice that is supported by Amazon Polly. Pinyin is the international standard for Standard Chinese romanization. IPA and X-SAMPA are not commonly used but are available for English support. The IPA and X-SAMPA symbols in the table are for reference only and should not be used for Chinese transcription. Pinyin examples and the corresponding visemes are also shown.
To make Amazon Polly use phonetic pronunciation win Pinyin, use the phoneme alphabet="x- amazon-phonetic standard used" tag.
The following examples show this with each standard.
Pinyin:
<speak>
## <phoneme alphabet="x-amazon-pinyin" ph="bo2">#</phoneme>#
## <phoneme alphabet="x-amazon-pinyin" ph="bao2">#</phoneme>#
</speak>
IPA:
<speak>
## <phoneme alphabet="ipa" ph="p##k##n">pecan</phoneme>#
## <phoneme alphabet="ipa" ph="#pi.kæn">pecan</phoneme>#
</speak>
X-SAMPA:
<speak>
## <phoneme alphabet='x-sampa' ph='pI"kA:n'>pecan</phoneme>#
## <phoneme alphabet='x-sampa' ph='"pi.k{n'>pecan</phoneme>#
</speak>
NoteAmazon Polly accepts Mandarin Chinese input encoded in UTF-8 only. The GB 18030 encoding standard is not currently supported by Amazon Polly.
Phoneme/Viseme Table
Pinyin IPA X-SAMPA Description Pinyin
Example Viseme
Consonants
f f f voiceless labiodental
fricative 发, fa1 f
h h h voiceless glottal fricative 和, he2 k
g k k voiceless velar plosive 古, gu3 k
k kʰ k_h aspirated voiceless velar
plosive 苦, ku3 k
l l l alveolar lateral
approximant 拉, la1 t
m m m bilabial nasal 骂, ma4 p
Pinyin IPA X-SAMPA Description Pinyin
Example Viseme
n n n alveolar nasal 那, na4 t
ng ŋ N velar nasal 正, zheng4 k
b p p voiceless bilabial plosive 爸, ba4 p
p pʰ p_h aspirated voiceless bilabial
plosive 怕, pa4 p
s s s voiceless alveolar fricative 四, si4 s
x ɕ s\ voiceless alveolo-palatal
fricative 西, xi1 J
sh ʂ s` voiceless retroflex fricative 是, shi4 S
d t t voiceless alveolar plosive 打, da3 t
t tʰ t_h aspirated voiceless
alveolar plosive 他, ta1 t
zh ʈ͡ʂ t`s` voiceless retroflex affricate 之, zhi1 S
ch ʈ͡ʂʰ t`s`_h aspirated voiceless
retroflex affricate 吃, chi1 S
s t͡s ts voiceless alveolar affricate 字, zi4 s
j t͡ɕ ts\ voiceless alveolo-palatal
affricate 鸡, ji1 J
q t͡ɕʰ ts\_h aspirated voiceless
alveolo-palatal affricate 七, qi1 J
c t͡sʰ ts_h aspirated voiceless
alveolar affricate 次, ci4 s
w w w labio-velar approximant 我, wo3 u
r ʐ z` voiced retroflex fricative 日, ri4 S
"er" and "r" colored syllables
er ɚ @` r-coloured mid central
vowel 二, er4 @
-r r-colored syllable 馅儿, xianr4 @
Vowels
e ɤ 7 close-mid back unrounded
vowel 恶, e4 e
e ə @ mid central vowel 恩, en1 @
a a a open front unrounded
vowel 安, an1 a
ai aɪ aI diphthong 爱, ai4 a
Example
ao aʊ aU diphthong 奥, ao4 a
ei eɪ e diphthong 诶, ei4 e
e ɛ E open-mid front unrounded
vowel 姐, jie3 E
i i i close front unrounded
vowel 鸡, ji1 i
ou oʊ oU diphthong 欧, ou1 o
o ɔ O open-mid back rounded
vowel 哦, o4 o
u u u close back rounded vowel 主, zhu3 u
yu y y close front rounded vowel 于, yu2 u
Tone marks and Additional Symbols
1 high level tone 淤, yu1
2 rising tone 鱼, yu2
3 low (falling-rising) tone 语, yu3
4 falling tone 育, yu4
0 neutral tone 的, de0
- . . syllable boundary 语音 yu3-yin1
Danish (da-DK)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the Danish voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive bat p
d d voiced alveolar
plosive da t
ð D voiced dental
fricative mad, thriller T
f f voiceless labiodental
fricative fat f
IPA X-SAMPA Description Example Viseme
g g voiced velar plosive gat k
h h voiceless glottal
fricative hat k
j j palatal approximant jo i
k k voiceless velar
plosive kat k
l l alveolar lateral
approximant ladt t
m m bilabial nasal mat p
n n alveolar nasal nay t
ŋ N velar nasal lang k
p p voiceless bilabial
plosive pande p
r r alveolar trill thriller, story r
ʁ R voiced uvular
fricative rat k
s s voiceless alveolar
fricative sat s
t t voiceless alveolar
plosive tal t
v v voiced labiodental
fricative vat f
w w labial-velar
approximant hav, weekend u
Vowels
ø 2 close-mid front
rounded vowel øst o
ø: 2: long close-mid front
rounded vowel øse o
ɐ 6 near-open central
vowel mor a
œ 9 open-mid front
rounded vowel skøn, grønt O
œ: 9: long open-mid front
rounded vowel høne, gøre O
ə @ mid central vowel ane @
æː {: long near-open front
unrounded vowel male a
a a open front
unrounded vowel man a
æ { near-open front
unrounded vowel adresse a
ɑ A open back
unrounded vowel lak, tak a
ɑ: A: long open back
unrounded vowel rase a
e e close-mid front
unrounded vowel midt e
e: e: long close-mid front
unrounded vowel mele e
ɛ E open-mid front
unrounded vowel mæt E
ɛ: E: long open-mid front
unrounded vowel mæle E
i i close front
unrounded vowel mit i
i: i: long close front
unrounded vowel mile i
o o close-mid back
rounded vowel foto o
o: o: long close-mid back
rounded vowel mole o
ɔ O open-mid back
rounded vowel mund O
ɔ: O: long open-mid back
rounded vowel måle O
ɒː Q: long open back
rounded vowel morse O
u u close back rounded
vowel lusk u
u: u: long close back
rounded vowel mule u
ʌ V open-mid back
unrounded kører E
y y close front rounded
vowel yt u
IPA X-SAMPA Description Example Viseme
y: y: long close front
rounded vowel hyle u
Additional Symbols
ˈ " primary stress Alabama
ˌ % secondary stress Alabama
. . syllable boundary A.la.ba.ma
Dutch (nl-NL)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the Dutch voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive bak p
d d voiced alveolar
plosive dak t
d͡ʒ dZ voiced postalveolar
affricate manager S
f f voiceless labiodental
fricative fel f
g g voiced velar plosive goal k
ɣ G voiced velar fricative hoed k
ɦ h\ voiced glottal
fricative hand k
j j palatal approximant ja i
k k voiceless velar
plosive kap k
l l alveolar lateral
approximant land t
m m bilabial nasal met p
n n alveolar nasal net t
ŋ N velar nasal bang k
p p voiceless bilabial
plosive pak p
r r alveolar trill rand r
s s voiceless alveolar
fricative sein s
ʃ S voiceless
postalveolar fricative show S
t t voiceless alveolar
plosive tak t
v v voiced labiodental
fricative vel f
ʋ v\ labiodental
approximant wit f
x x voiceless velar
fricative toch k
z z voiced alveolar
fricative ziin s
ʒ Z voiced postalveolar
fricative bagage S
Vowels
øː 2: long close-mid front
rounded vowel neus o
œy 9y diphthong buit O
ə @ mid central vowel de @
a: a: long open front
unrounded vowel baad a
ɑ: A open back
unrounded vowel bad a
e: e: long close-mid front
unrounded vowel beet e
ɜː 3: long open-mid
central unrounded vowel
barrière E
ɛ E open-mid front
unrounded vowel bed E
ɛi Ei diphthong beet E
i i close front
unrounded vowel vier i
ɪ I near-close near-front
unrounded vowel pit i
IPA X-SAMPA Description Example Viseme
o: o: long close-mid back
rounded vowel boot o
ɔ O open-mid back
rounded vowel pot O
u u close back rounded
vowel hoed u
ʌu Vu diphthong fout E
yː y: long close front
rounded vowel fuut u
ʏ Y near-close near-front
rounded vowel hut u
Additional Symbols
ˈ " primary stress Alabama
ˌ % secondary stress Alabama
. . syllable boundary A.la.ba.ma
English, Australian (en-AU)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the Australian English voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive bed p
d d voiced alveolar
plosive dig t
d͡ʒ dZ voiced postalveolar
affricate jump S
ð D voiced dental
fricative then T
f f voiceless labiodental
fricative five f
g g voiced velar plosive game k
h h voiceless glottal
fricative house k
j j palatal approximant yes i
k k voiceless velar
plosive cat k
l l alveolar lateral
approximant lay t
l̩ l= syllabic alveolar
lateral approximant battle t
m m bilabial nasal mouse p
m̩ m= syllabic bilabial nasal anthem p
n n alveolar nasal nap t
n̩ n= syllabic alveolar
nasal button t
ŋ N velar nasal thing k
p p voiceless bilabial
plosive pin p
ɹ r\ alveolar approximant red r
s s voiceless alveolar
fricative seem s
ʃ S voiceless
postalveolar fricative ship S
t t voiceless alveolar
plosive task t
t͡ʃ tS voiceless
postalveolar affricate chart S
Θ T voiceless dental
fricative thin T
v v voiced labiodental
fricative vest f
w w labial-velar
approximant west u
z z voiced alveolar
fricative zero s
ʒ Z voiced postalveolar
fricative vision S
Vowels
ə @ mid central vowel arena @
əʊ @U diphthong goat @
IPA X-SAMPA Description Example Viseme
æ { near open-front
unrounded vowel trap a
aɪ aI diphthong price a
aʊ aU diphthong mouth a
ɑː A: long open-back
unrounded vowel father a
eɪ eI diphthong face e
ɜː 3: long open mid-
central unrounded vowel
nurse E
ɛ E open mid-front
unrounded vowel dress E
ɛə E@ diphthong square E
i: i long close front
unrounded vowel fleece i
ɪ I near-close near-front
unrounded vowel kit i
ɪə I@ diphthong near i
ɔː OI long open-mid back
rounded vowel thought O
ɔɪ OI Diphthong choice O
ɒ Q open back rounded
vowel lot O
u: u: long close-back
rounded vowel goose u
ʊ U near-close near-back
rounded vowel foot u
ʊə U@ diphthong cure u
ʌ V Open-mid-back
unrounded vowel strut E
Additional Symbols
ˈ " primary stress Alabama
ˌ % secondary stress Alabama
. . syllable boundary A.la.ba.ma
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the American English voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive bed p
d d voiced alveolar
plosive dig t
d͡ʒ dZ voiced postalveolar
affricate jump S
ð D voiced dental
fricative then T
f f voiceless labiodental
fricative five f
ɡ g voiced velar plosive game k
h h voiceless glottal
fricative house k
j j palatal approximant yes i
k k voiceless velar
plosive cat k
l l alveolar lateral
approximant lay t
m m bilabial nasal mouse p
n n alveolar nasal nap t
ŋ N velar nasal thing k
p p voiceless bilabial
plosive speak p
ɹ r\ alveolar approximant red r
s s voiceless alveolar
fricative seem s
ʃ S voiceless
postalveolar fricative ship S
t t voiceless alveolar
plosive trap t
IPA X-SAMPA Description Example Viseme
t͡ʃ tS voiceless
postalveolar affricate chart S
θ T voiceless dental
fricative thin T
v v voiced labiodental
fricative vest f
w w labial-velar
approximant west u
z z voiced alveolar
fricative zero s
ʒ Z voiced postalveolar
fricative vision S
Vowels
ə @ mid-central vowel arena @
ɚ @` mid-central r-
colored vowel reader @
æ { near open-front
unrounded vowel trap a
aɪ aI diphthong price a
aʊ aU diphthong mouth a
ɑ A long open-back
unrounded vowel father a
eɪ eI diphthong face e
ɝ 3` open mid-central
unrounded r-colored vowel
nurse E
ɛ E open mid-front
unrounded vowel dress E
i i long close front
unrounded vowel fleece i
ɪ I near-close near-front
unrounded vowel kit i
oʊ oU diphthong goat o
ɔ O long open mid-back
rounded vowel thought O
ɔɪ OI diphthong choice O
u u long close-back
rounded vowel goose u
ʊ U near-close near-back
rounded vowel foot u
ʌ V open-mid-back
unrounded vowel strut E
Additional Symbols
ˈ " primary stress Alabama
ˌ % secondary stress Alabama
. . syllable boundary A.la.ba.ma
English, British (en-GB)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the British English voices that are supported by Amazon Polly.
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
b b voiced bilabial
plosive bed p
d d voiced alveolar
plosive dig t
d͡ʒ dZ voiced postalveolar
affricate jump S
ð D voiced dental
fricative then T
f f voiceless labiodental
fricative five f
g g voiced velar plosive game k
h h voiceless glottal
fricative house k
j j palatal approximant yes i
k k voiceless velar
plosive cat k
l l alveolar lateral
approximant lay t
l̩ l= syllabic alveolar
lateral approximant battle t
m m bilabial nasal mouse p
IPA X-SAMPA Description Example Viseme
m̩ m= syllabic bilabial nasal anthem p
n n alveolar nasal nap t
n̩ n= syllabic alveolar
nasal button t
ŋ N velar nasal thing k
p p voiceless bilabial
plosive pin p
ɹ r\ alveolar approximant red r
s s voiceless alveolar
fricative seem s
ʃ S voiceless
postalveolar fricative ship S
t t voiceless alveolar
plosive task t
t͡ʃ tS voiceless
postalveolar affricate chart S
Θ T voiceless dental
fricative thin T
v v voiced labiodental
fricative vest f
w w labial-velar
approximant west u
z z voiced alveolar
fricative zero s
ʒ Z voiced postalveolar
fricative vision S
Vowels
ə @ mid central vowel arena @
əʊ @U diphthong goat @
æ { near open-front
unrounded vowel trap a
aɪ aI diphthong price a
aʊ aU diphthong mouth a
ɑː A: long open-back
unrounded vowel father a
eɪ eI diphthong face e
ɜː 3: long open mid- central unrounded vowel
nurse E
ɛ E open mid-front
unrounded vowel dress E
ɛə E@ diphthong square E
i: i long close front
unrounded vowel fleece i
ɪ I near-close near-front
unrounded vowel kit i
ɪə I@ diphthong near i
ɔː O: long open-mid back
rounded vowel thought O
ɔɪ OI Diphthong choice O
ɒ Q open back rounded
vowel lot O
u: u: long close-back
rounded vowel goose u
ʊ U near-close near-back
rounded vowel foot u
ʊə U@ diphthong cure u
ʌ V Open-mid-back
unrounded vowel strut E
Additional Symbols
ˈ " primary stress Alabama
ˌ % secondary stress Alabama
. . syllable boundary A.la.ba.ma
English, Indian (en-IN)
The following table lists the International Phonetic Alphabet (IPA) phonemes, the Extended Speech Assessment Methods Phonetic Alphabet (X-SAMPA) symbols, and the corresponding visemes for the Indian English voice supported by Amazon Polly.
For additional phonemes used in conjunction with Indian English, see Hindi (hi-IN) (p. 56).
Phoneme/Viseme Table
IPA X-SAMPA Description Example Viseme
Consonants
IPA X-SAMPA Description Example Viseme
b b voiced bilabial
plosive bed p
d d voiced alveolar
plosive dig t
d͡ʒ dZ voiced postalveolar
affricate jump S
ð D voiced dental
fricative then T
f f voiceless labiodental
fricative five f
g g voiced velar plosive game k
h h voiceless glottal
fricative house k
j j palatal approximant yes i
k k voiceless velar
plosive cat k
l l alveolar lateral
approximant lay t
l̩ l= syllabic alveolar
lateral approximant battle t
m m bilabial nasal mouse p
m̩ m= syllabic bilabial nasal anthem p
n n alveolar nasal nap t
n̩ n= syllabic alveolar
nasal nap t
ŋ N velar nasal thing k
p p voiceless bilabial
plosive pin p
ɹ r\ alveolar approximant red r
s s voiceless alveolar
fricative seem s
ʃ S voiceless
postalveolar fricative ship S
t t voiceless alveolar
plosive task t
t͡ʃ tS voiceless
postalveolar affricate chart S
Θ T voiceless dental
fricative thin T
v v voiced labiodental
fricative vest f
w w labial-velar
approximant west u
z z voiced alveolar
fricative zero s
ʒ Z voiced postalveolar
fricative vision S
Vowels
ə @ mid central vowel arena @
əʊ @U diphthong goat @
æ { near open-front
unrounded vowel trap a
aɪ aI diphthong price a
aʊ aU diphthong mouth a
ɑː A: long open-back
unrounded vowel father a
eɪ eI diphthong face e
ɜː 3: long open mid-
central unrounded vowel
nurse E
ɛ E open mid-front
unrounded vowel dress E
ɛə E@ diphthong square E
i: i long close front
unrounded vowel fleece i
ɪ I near-close near-front
unrounded vowel kit i
ɪə I@ diphthong near i
ɔː OI long open-mid back
rounded vowel thought O
ɔɪ OI Diphthong choice O
ɒ Q open back rounded
vowel lot O