API
To create a vocabulary filter (API)
• For the CreateVocabularyFilter (p. 322) API, specify the following:
a. A name for your vocabulary filter that is unique in your AWS account for the VocabularyFilterName parameter
b. The language code for the language of your source audio in the LanguageCode parameter c. The words for your vocabulary filter using one of the following options:
• Specify the Amazon Simple Storage Service (Amazon S3) location of the text file for the VocabularyFilterFileUri parameter using this format: s3://DOC-EXAMPLE-BUCKET1/vocabulary-filter-example.txt.
• Enter the words as an array of strings in the Words parameter, for example ["word",
"banana", "potato", "chair"].
To see all of the vocabulary filters that you've created, use the ListVocabularyFilters (p. 389) API. You can then use that information with the GetVocabularyFilter (p. 362) API to retrieve the download URI for your vocabulary filter and learn more about that filter.
AWS CLI
The following is an example AWS Command Line Interface (AWS CLI) request to create a vocabulary filter with a text file stored in an Amazon S3 bucket. The commands are followed by the response elements in JSON format.
aws transcribe create-vocabulary-filter \ --vocabulary-filter-name your-filter-name \ --language-code en-US \
--vocabulary-filter-file-uri s3://DOC-EXAMPLE-BUCKET1/vocabulary-filter-example.txt
{
"VocabularyFilterName": "your-filter-name", "LanguageCode": "en-US"
}
Next step
Step 3: Filtering transcriptions (p. 100)
Step 3: Filtering transcriptions
You can filter unwanted words from both batch and streaming transcriptions. When you create a real-time stream or batch transcription job, specify the vocabulary filter that you want to use and the vocabulary filter method. The method specifies how the words are filtered from your transcription results. There are two vocabulary filter methods available for batch transcription and three methods available for real-time streaming.
You can use the following filter methods for both batch and real-time streaming transcriptions:
• To replace the words caught by your vocabulary filter with three asterisks, ***, use the mask method.
Use this method to hide unwanted words from your audience, but indicate that they were spoken.
Filtering batch transcriptions
• To remove the words from your transcripts, use the remove method. With this method, your audience won't know that unwanted words were spoken.
In real-time streaming transcriptions only, you can use the tag method to keep unwanted words in your transcription results with a tag that indicates that they were listed in your vocabulary filter. You can then manually remove the words from some transcripts and leave them in others to generate transcripts for multiple audiences from a single stream.
For information about streaming transcriptions, see Transcribing streaming audio (p. 110). For information about batch transcriptions, see How Amazon Transcribe works (p. 5).
Topics
• Filtering batch transcriptions (p. 101)
• Filtering streaming transcriptions (p. 105)
Filtering batch transcriptions
Use a vocabulary filter to filter unwanted words from a batch transcription job with either the Amazon Transcribe console or the StartTranscriptionJob (p. 403) API.
The following code shows the parameters and data types.
{
"ContentRedaction": {
"RedactionOutput": "string", "RedactionType": "string"
},
"JobExecutionSettings": {
"AllowDeferredExecution": boolean, "DataAccessRoleArn": "string"
},
"LanguageCode": "string", "Media": {
"MediaFileUri": "string"
},
"MediaFormat": "string", "MediaSampleRateHertz": number, "OutputBucketName": "string",
"OutputEncryptionKMSKeyId": "string", "Settings": {
"ChannelIdentification": boolean, "MaxAlternatives": number, "MaxSpeakerLabels": number, "ShowAlternatives": boolean, "ShowSpeakerLabels": boolean, "VocabularyFilterMethod": "string", "VocabularyFilterName": "string", "VocabularyName": "string"
},
"TranscriptionJobName": "string"
}
Console
To use the console to start a batch transcription job with vocabulary filtering, you must have created a vocabulary filter, as described in Step 2: Creating a vocabulary filter (p. 99).
Filtering batch transcriptions
To filter unwanted words in a transcription job (console) 1. Sign in to the Amazon Transcribe console.
2. In the navigation pane, choose Transcription jobs.
3. Choose Create job.
4. For Name, specify a name that is unique within your AWS account for your batch transcription job.
5. For Language, choose the language that will be spoken in your transcription job.
6. Specify the location of your audio file or video file in Amazon S3:
• For Input file location on S3 under Input data, specify the Amazon S3 URI that identifies the media file that you will transcribe.
• Choose Browse S3 under Input data to browse for the media file and choose it.
7. Choose Next.
8. Enable Vocabulary filtering under Content removal.
9. Choose your vocabulary filter and vocabulary filtering method under Filter selection.
10. Choose Create.
API
To filter a batch transcription (API)
• For the StartTranscriptionJob (p. 403) API, specify the following:
a. For TranscriptionJobName, specify a name unique to your AWS account.
b. For LanguageCode, specify the language code that corresponds to the language spoken in your media file and the language of your vocabulary filter.
c. For the MediaFileUri parameter of the Media object, specify the Amazon S3 location of the audio file or video file that you want to transcribe.
d. For the VocabularyFilterName parameter, specify the name of your vocabulary filter.
e. For the VocabularyFilterMethod parameter, choose one of the following options:
• To mask filtered words by replacing them with three asterisks ***, specify mask. Filtering the word "lazy" from the sentence: "The quick brown fox jumps over the lazy dog." with the mask method shows "The quick brown fox jumps over the *** dog." in the transcription.
• To remove the filtered words from the transcript, specify remove. Filtering the word "lazy"
from the sentence "The quick brown fox jumps over the lazy dog." with the remove method shows "The quick brown fox jumps over the dog." in the transcript.
• To mark words that match your vocabulary filter in the transcript, specify tag. Use this to produce transcripts that are tailored to different audiences. This enables you to:
• Present the transcription results to one audience, without removing any of the marked words that match the vocabulary filter.
• Remove any word that matched the vocabulary filter, and present those results to a different audience.
• Remove some words that matched the vocabulary filter, and present those results to a another audience. You can generate multiple transcriptions for many audiences from the same transcription output. For more information, see Tailoring transcripts to different audiences with tagging (p. 103).
Filtering batch transcriptions
Tailoring transcripts to different audiences with tagging
You can generate multiple transcriptions tailored to different audiences from a single audio file. For the StartTranscriptionJob (p. 403) API, use the tag method to mark the words in the transcription that match the words in your vocabulary filter. You can present the results of the transcription job to the audience that can see the complete transcription, including the words listed in your vocabulary filter. You can then copy your transcription results, remove the words tagged by your vocabulary filter, and show those results to the audience that shouldn't see the unwanted words.
With tagging, you aren't limited to generating transcriptions for two different audiences. You can generate multiple transcriptions for many audiences from the same audio file. You can choose to remove some words caught by the vocabulary filter in one transcript and leave them in other transcripts.
For example, if "lazy" were in the vocabulary filter, the sentence "The quick brown fox jumps over the lazy dog." would be unchanged in the transcription results. Instead of being masked in, or removed from, the transcription, the value for the VocabularyFilterMatch parameter would be true for "lazy."
To enable tagging in your batch transcription job, see Filtering batch transcriptions (p. 101).
The word "bloody" is tagged in the following truncated transcription output.
{
"transcript":"...recording bloody well set up this stupid bloody meeting anyway..."
Filtering batch transcriptions
Filtering streaming transcriptions
Use a vocabulary filter to filter unwanted words in real-time streams with either the Amazon Transcribe console or the StartStreamTranscription (p. 436) API.
The following syntax shows the parameters and their data types.
{
"LanguageCode" : "enum",
"MediaSampleRateHertz" : "integer", "MediaEncoding" : "enum",
"VocabularyName" : "string", "SessionId" : "string", "AudioStream" : "eventstream", "VocabularyFilterName" : "string", "VocabularyFilterMethod": "enum"
}
To filter a streaming transcription (API)
• For the StartStreamTranscription (p. 436) API, specify the following:
a. The language code of your audio in the LanguageCode field.
b. The sample rate of your audio in the MediaSampleHertz field.
c. The name of your vocabulary filter in the VocabularyFilterName field.
d. The filtering method in the VocabularyFilterMethod parameter:
Filtering streaming transcriptions
• To mask the filtered words by replacing them with three asterisks (***) specify mask.
Filtering the word "lazy" from the sentence "The quick brown fox jumps over the lazy dog." with the mask method shows "The quick brown fox jumps over the *** dog." in the transcription.
• To remove the words from the transcript, specify remove. Filtering the word "lazy" from the sentence "The quick brown fox jumps over the lazy dog." with the remove method shows
"The quick brown fox jumps over the dog." in the transcription.
• To tag words that match the vocabulary filter, specify tag. This enables you
to mark the words matching the vocabulary filter without masking or removing them.
To use the same stream to create one transcript with the content filtered and one transcript that is unfiltered, use the tagging method. For information, see Tailoring transcripts to different audiences with tagging (p. 106).
To filter a streaming transcription (console) 1. Sign in to the Amazon Transcribe console.
2. In the navigation pane, choose Real-time transcription.
3. In Language, choose the language of your real-time stream.
4. Choose the Additional settings tab and choose your vocabulary filter and vocabulary filtering method.
5. Choose Start streaming to begin your stream with vocabulary filtering enabled.
Tailoring transcripts to different audiences with tagging
You can use a single stream to generate a transcription that doesn't show unwanted words and one that does. For the StartStreamTranscription (p. 436) API, use the tag method to mark the words in the transcription that match the words in your vocabulary filter. You can present the results of the real-time stream to the audience that can see the complete transcription, including the words listed in your vocabulary filter. You can then copy your transcription results, remove the words tagged by your vocabulary filter, and show those results to the audience that shouldn't see the unwanted words.
With tagging, you aren't limited to generating transcriptions for two different audiences. You can generate multiple transcriptions for many audiences from the same stream. You can choose to remove some words caught by the vocabulary filter in one transcript and leave them in other transcripts.
To enable tagging in a real-time transcription
• For the StartStreamTranscription (p. 436) API, specify the following:
a. For VocabularyFilterName, the name of your vocabulary filter.
b. For VocabularyFilterMethod, specify tag.
For example, if "bloody" were in the vocabulary filter, the phrase "recording bloody well set up this stupid bloody meeting anyway..." would be unchanged in the transcription results. Instead of being masked in, or removed from, the transcription, the value for the VocabularyFilterMatch parameter would be true for "bloody."
The following example JSON output shows this.
{
Filtering streaming transcriptions
"jobName": "your-transcription-job-name", "accountId": "account-id",
"results": {
Filtering streaming transcriptions
Filtering streaming transcriptions
"status": "COMPLETED"
}
Transcribing streaming audio
Use Amazon Transcribe streaming transcription to send an audio stream and receive a stream of text in real time. You can use this text stream to add real-time speech-to-text capability to your applications.
Streaming transcription takes a stream of your audio data and transcribes it in real time. Streaming uses HTTP/2 or WebSocket streams so that the results of the transcription are returned to your application while you send more audio to Amazon Transcribe. Use streaming transcription when you want to make the results of live audio transcription available immediately, or when you have an audio file that you want to process as it is transcribed.
To see which languages support streaming audio transcription, see Supported languages and language-specific features (p. 3).
For a streaming transcription using:
• HTTP/2 – Use the StartStreamTranscription (p. 436) API to start a stream. When you use this API, the client handles retrying the connection when there are transient problems on the network.
• WebSocket – Create your own client.
• Amazon Transcribe console – Speak directly into a computer microphone. You can use the console as a preview to see how your transcription results are returned to your application when using the API.
NoteStreaming transcription is not supported in all languages. See Supported languages and language-specific features (p. 3) for details.
You can use Amazon Transcribe streaming transcription for a variety of purposes. For example:
• Streaming transcriptions can generate real-time subtitles for live broadcast media.
• Lawyers can make real-time annotations on top of streaming transcriptions during depositions.
• Video game chat can be transcribed in real time so that hosts can moderate content or run real-time analysis.
• Streaming transcriptions can provide assistance to the hearing impaired.
Streaming transcription takes a stream of your audio data and transcribes it in real time. The transcription is returned to your application in a stream of transcription events.
Amazon Transcribe breaks up the incoming audio stream based on natural speech segments, such as a change in speaker or a pause in the audio. The transcription is returned progressively to your application, with each response containing more transcribed speech until the entire segment is transcribed.
In the following example, each line is a partial result transcription output of an audio segment that is being streamed.
The The ad.
The and The Amazon.
The Amazon is The Amazon is the The Amazon is the law.
The Amazon is the lar.
The Amazon is the large The Amazon is the largest The Amazon is the largest ray The Amazon is the largest rain The Amazon is the largest rain for The Amazon is the largest rainforest.
The Amazon is the largest rainforest on the planet
Each Result object in the response contains a field called IsPartial. The value indicates whether the response is a partial response that contains the transcription results so far, or whether it is a complete transcription of the audio segment.
Each Result object also contains the start and end time of the transcribed audio segment from the stream. You can use these values to, for example, synchronize the transcription with the video.
The following example is a partial transcription response.
{ "TranscriptResultStream": { "TranscriptEvent": {
],
The following example shows the transcription results for a fully transcribed speech segment.
{
"StartTime": 3.95,
Event stream encoding
"VocabularyFilterMatch": false },
{
"Content": ".", "EndTime": 7.92, "StartTime": 7.92, "Type": "punctuation",
"VocabularyFilterMatch": false }
],
"Transcript": "The Amazon is the largest rainforest on the planet covering over seven million kilometers."
} ],
"EndTime": 7.92, "IsPartial": false,
"ResultId": "a37114fa-5288-438d-afdf-833cfd1ea55c", "StartTime": 2.33
} ] } }
Each word, phrase, or punctuation mark in the transcription output is an item. Each word or phrase has a confidence score. The confidence score is a value between 0 and 1 that indicates how confident Amazon Transcribe is that it correctly transcribed the item. A confidence score with a larger value indicates that Amazon Transcribe is more confident that it transcribed the item correctly.
Topics
• Event stream encoding (p. 114)
• Using Amazon Transcribe streaming with WebSockets (p. 116)
• Using Amazon Transcribe streaming With HTTP/2 (p. 122)
• Partial result stabilization (p. 135)
Event stream encoding
Event stream encoding provides bidirectional communication using messages between a client and a server. Data frames sent to the Amazon Transcribe streaming service are encoded in this format. The response from Amazon Transcribe also uses this encoding.
Each message consists of two sections: the prelude and the data. The prelude consists of:
1. The total byte length of the message
2. The combined byte length of all of the headers
The data section consists of:
1. The headers 2. A payload
Each section ends with a 4-byte big-endian integer CRC checksum. The message CRC checksum is for both the prelude section and the data section. Amazon Transcribe uses CRC32 (often referred to as GZIP CRC32) to calculate both CRCs. For more information about CRC32, see GZIP file format specification version 4.3.
Total message overhead, including the prelude and both checksums, is 16 bytes.
Event stream encoding
The following diagram shows the components that make up a message and a header. There are multiple headers per message.
Each message contains the following components:
• Prelude: Always a fixed size of 8 bytes, two fields of 4 bytes each.
• First 4 bytes: The total byte-length. This is the big-endian integer byte-length of the entire message, including the 4-byte length field itself.
• Second 4 bytes: The headers byte-length. This is the big-endian integer byte-length of the headers portion of the message, excluding the headers length field itself.
• Prelude CRC: The 4-byte CRC checksum for the prelude portion of the message, excluding the CRC itself. The prelude has a separate CRC from the message CRC to ensure that Amazon Transcribe can detect corrupted byte-length information immediately without causing errors such as buffer overruns.
• Headers: Metadata annotating the message, such as the message type, content type, and so on.
Messages have multiple headers. Headers are key-value pairs where the key is a UTF-8 string. Headers can appear in any order in the headers portion of the message and any given header can appear only once. For the required header types, see the following sections.
• Payload:The audio content to be transcribed.
• Message CRC: The 4-byte CRC checksum from the start of the message to the start of the checksum.
That is, everything in the message except the CRC itself.
Each header contains the following components. There are multiple headers per frame.
• Header name byte-length: The byte-length of the header name.
• Header name: The name of the header indicating the header type. For valid values, see the following frame descriptions.
• Header value type: An enumeration indicating the header value.
The following shows the possible values for the header and what they indicate.
• 0 – TRUE
• 1 – FALSE
• 2 – BYTE
• 3 – SHORT
• 4 – INTEGER
• 5 – LONG
• 6 – BYTE ARRAY
• 7 – STRING
• 8 – TIMESTAMP
• 9 – UUID
Using WebSocket streaming
• Value string byte length: The byte-length of the header value string.
• Header value: The value of the header string. Valid values for this field depend on the type of header.
For valid values, see the following frame descriptions.
Using Amazon Transcribe streaming with WebSockets
When you use the WebSocket protocol to stream audio, Amazon Transcribe transcribes the stream in real time. You encode the audio with event stream encoding, Amazon Transcribe responds with a JSON structure that is also encoded using event stream encoding. For more information, see Event stream encoding (p. 114). You can use the information in this section to create applications using the WebSocket library of your choice.
Topics
• Adding a policy for WebSocket requests to your IAM role (p. 116)
• Creating a pre-signed URL (p. 116)
• Handling the WebSocket upgrade response (p. 120)
• Making a WebSocket streaming request (p. 121)
• Handling a WebSocket streaming response (p. 121)
• Handling WebSocket streaming errors (p. 122)
Adding a policy for WebSocket requests to your IAM role
To use the WebSocket protocol to call Amazon Transcribe, you need to attach the following policy to the AWS Identity and Access Management (IAM) role that makes the request. See Adding IAM policies for more information on how to do this.
{
"Version": "2012-10-17", "Statement": [
{
"Sid": "transcribestreaming", "Effect": "Allow",
"Action": "transcribe:StartStreamTranscriptionWebSocket", "Resource": "*"
} ] }
Creating a pre-signed URL
Construct a URL for your WebSocket request that contains the information needed to set up communication between your application and Amazon Transcribe. WebSocket streaming uses the Amazon Signature Version 4 process for signing requests. Signing the request helps to verify the identity
Construct a URL for your WebSocket request that contains the information needed to set up communication between your application and Amazon Transcribe. WebSocket streaming uses the Amazon Signature Version 4 process for signing requests. Signing the request helps to verify the identity