Filtering transcriptions - Amazon Transcribe

API

To create a vocabulary ﬁlter (API)

• For the CreateVocabularyFilter (p. 322) API, specify the following:

a. A name for your vocabulary ﬁlter that is unique in your AWS account for the VocabularyFilterName parameter

b. The language code for the language of your source audio in the LanguageCode parameter c. The words for your vocabulary ﬁlter using one of the following options:

• Specify the Amazon Simple Storage Service (Amazon S3) location of the text ﬁle for the VocabularyFilterFileUri parameter using this format: s3://DOC-EXAMPLE-BUCKET1/vocabulary-filter-example.txt.

• Enter the words as an array of strings in the Words parameter, for example ["word",

"banana", "potato", "chair"].

To see all of the vocabulary ﬁlters that you've created, use the ListVocabularyFilters (p. 389) API. You can then use that information with the GetVocabularyFilter (p. 362) API to retrieve the download URI for your vocabulary ﬁlter and learn more about that ﬁlter.

AWS CLI

The following is an example AWS Command Line Interface (AWS CLI) request to create a vocabulary ﬁlter with a text ﬁle stored in an Amazon S3 bucket. The commands are followed by the response elements in JSON format.

aws transcribe create-vocabulary-filter \ --vocabulary-filter-name your-filter-name \ --language-code en-US \

--vocabulary-filter-file-uri s3://DOC-EXAMPLE-BUCKET1/vocabulary-filter-example.txt

{

"VocabularyFilterName": "your-filter-name", "LanguageCode": "en-US"

}

Next step

Step 3: Filtering transcriptions (p. 100)

Step 3: Filtering transcriptions

You can ﬁlter unwanted words from both batch and streaming transcriptions. When you create a real-time stream or batch transcription job, specify the vocabulary ﬁlter that you want to use and the vocabulary ﬁlter method. The method speciﬁes how the words are ﬁltered from your transcription results. There are two vocabulary ﬁlter methods available for batch transcription and three methods available for real-time streaming.

You can use the following ﬁlter methods for both batch and real-time streaming transcriptions:

• To replace the words caught by your vocabulary ﬁlter with three asterisks, ***, use the mask method.

Use this method to hide unwanted words from your audience, but indicate that they were spoken.

Filtering batch transcriptions

• To remove the words from your transcripts, use the remove method. With this method, your audience won't know that unwanted words were spoken.

In real-time streaming transcriptions only, you can use the tag method to keep unwanted words in your transcription results with a tag that indicates that they were listed in your vocabulary ﬁlter. You can then manually remove the words from some transcripts and leave them in others to generate transcripts for multiple audiences from a single stream.

For information about streaming transcriptions, see Transcribing streaming audio (p. 110). For information about batch transcriptions, see How Amazon Transcribe works (p. 5).

Topics

• Filtering batch transcriptions (p. 101)

• Filtering streaming transcriptions (p. 105)

Filtering batch transcriptions

Use a vocabulary ﬁlter to ﬁlter unwanted words from a batch transcription job with either the Amazon Transcribe console or the StartTranscriptionJob (p. 403) API.

The following code shows the parameters and data types.

{

"ContentRedaction": {

"RedactionOutput": "string", "RedactionType": "string"

"JobExecutionSettings": {

"AllowDeferredExecution": boolean, "DataAccessRoleArn": "string"

"LanguageCode": "string", "Media": {

"MediaFileUri": "string"

"MediaFormat": "string", "MediaSampleRateHertz": number, "OutputBucketName": "string",

"OutputEncryptionKMSKeyId": "string", "Settings": {

"ChannelIdentification": boolean, "MaxAlternatives": number, "MaxSpeakerLabels": number, "ShowAlternatives": boolean, "ShowSpeakerLabels": boolean, "VocabularyFilterMethod": "string", "VocabularyFilterName": "string", "VocabularyName": "string"

"TranscriptionJobName": "string"

}

Console

To use the console to start a batch transcription job with vocabulary ﬁltering, you must have created a vocabulary ﬁlter, as described in Step 2: Creating a vocabulary ﬁlter (p. 99).

Filtering batch transcriptions

To ﬁlter unwanted words in a transcription job (console) 1. Sign in to the Amazon Transcribe console.

2. In the navigation pane, choose Transcription jobs.

3. Choose Create job.

4. For Name, specify a name that is unique within your AWS account for your batch transcription job.

5. For Language, choose the language that will be spoken in your transcription job.

6. Specify the location of your audio ﬁle or video ﬁle in Amazon S3:

• For Input ﬁle location on S3 under Input data, specify the Amazon S3 URI that identiﬁes the media ﬁle that you will transcribe.

• Choose Browse S3 under Input data to browse for the media ﬁle and choose it.

7. Choose Next.

8. Enable Vocabulary ﬁltering under Content removal.

9. Choose your vocabulary ﬁlter and vocabulary ﬁltering method under Filter selection.

10. Choose Create.

API

To ﬁlter a batch transcription (API)

• For the StartTranscriptionJob (p. 403) API, specify the following:

a. For TranscriptionJobName, specify a name unique to your AWS account.

b. For LanguageCode, specify the language code that corresponds to the language spoken in your media ﬁle and the language of your vocabulary ﬁlter.

c. For the MediaFileUri parameter of the Media object, specify the Amazon S3 location of the audio ﬁle or video ﬁle that you want to transcribe.

d. For the VocabularyFilterName parameter, specify the name of your vocabulary ﬁlter.

e. For the VocabularyFilterMethod parameter, choose one of the following options:

• To mask ﬁltered words by replacing them with three asterisks ***, specify mask. Filtering the word "lazy" from the sentence: "The quick brown fox jumps over the lazy dog." with the mask method shows "The quick brown fox jumps over the *** dog." in the transcription.

• To remove the ﬁltered words from the transcript, specify remove. Filtering the word "lazy"

from the sentence "The quick brown fox jumps over the lazy dog." with the remove method shows "The quick brown fox jumps over the dog." in the transcript.

• To mark words that match your vocabulary ﬁlter in the transcript, specify tag. Use this to produce transcripts that are tailored to diﬀerent audiences. This enables you to:

• Present the transcription results to one audience, without removing any of the marked words that match the vocabulary ﬁlter.

• Remove any word that matched the vocabulary ﬁlter, and present those results to a diﬀerent audience.

• Remove some words that matched the vocabulary ﬁlter, and present those results to a another audience. You can generate multiple transcriptions for many audiences from the same transcription output. For more information, see Tailoring transcripts to diﬀerent audiences with tagging (p. 103).

Filtering batch transcriptions

Tailoring transcripts to diﬀerent audiences with tagging

You can generate multiple transcriptions tailored to diﬀerent audiences from a single audio ﬁle. For the StartTranscriptionJob (p. 403) API, use the tag method to mark the words in the transcription that match the words in your vocabulary ﬁlter. You can present the results of the transcription job to the audience that can see the complete transcription, including the words listed in your vocabulary ﬁlter. You can then copy your transcription results, remove the words tagged by your vocabulary ﬁlter, and show those results to the audience that shouldn't see the unwanted words.

With tagging, you aren't limited to generating transcriptions for two diﬀerent audiences. You can generate multiple transcriptions for many audiences from the same audio ﬁle. You can choose to remove some words caught by the vocabulary ﬁlter in one transcript and leave them in other transcripts.

For example, if "lazy" were in the vocabulary ﬁlter, the sentence "The quick brown fox jumps over the lazy dog." would be unchanged in the transcription results. Instead of being masked in, or removed from, the transcription, the value for the VocabularyFilterMatch parameter would be true for "lazy."

To enable tagging in your batch transcription job, see Filtering batch transcriptions (p. 101).

The word "bloody" is tagged in the following truncated transcription output.

{

"transcript":"...recording bloody well set up this stupid bloody meeting anyway..."

Filtering batch transcriptions

Filtering streaming transcriptions

Use a vocabulary ﬁlter to ﬁlter unwanted words in real-time streams with either the Amazon Transcribe console or the StartStreamTranscription (p. 436) API.

The following syntax shows the parameters and their data types.

{

"LanguageCode" : "enum",

"MediaSampleRateHertz" : "integer", "MediaEncoding" : "enum",

"VocabularyName" : "string", "SessionId" : "string", "AudioStream" : "eventstream", "VocabularyFilterName" : "string", "VocabularyFilterMethod": "enum"

}

To ﬁlter a streaming transcription (API)

• For the StartStreamTranscription (p. 436) API, specify the following:

a. The language code of your audio in the LanguageCode ﬁeld.

b. The sample rate of your audio in the MediaSampleHertz ﬁeld.

c. The name of your vocabulary ﬁlter in the VocabularyFilterName ﬁeld.

d. The ﬁltering method in the VocabularyFilterMethod parameter:

Filtering streaming transcriptions

• To mask the ﬁltered words by replacing them with three asterisks (***) specify mask.

Filtering the word "lazy" from the sentence "The quick brown fox jumps over the lazy dog." with the mask method shows "The quick brown fox jumps over the *** dog." in the transcription.

• To remove the words from the transcript, specify remove. Filtering the word "lazy" from the sentence "The quick brown fox jumps over the lazy dog." with the remove method shows

"The quick brown fox jumps over the dog." in the transcription.

• To tag words that match the vocabulary ﬁlter, specify tag. This enables you

to mark the words matching the vocabulary ﬁlter without masking or removing them.

To use the same stream to create one transcript with the content ﬁltered and one transcript that is unﬁltered, use the tagging method. For information, see Tailoring transcripts to diﬀerent audiences with tagging (p. 106).

To ﬁlter a streaming transcription (console) 1. Sign in to the Amazon Transcribe console.

2. In the navigation pane, choose Real-time transcription.

3. In Language, choose the language of your real-time stream.

4. Choose the Additional settings tab and choose your vocabulary ﬁlter and vocabulary ﬁltering method.

5. Choose Start streaming to begin your stream with vocabulary ﬁltering enabled.

Tailoring transcripts to diﬀerent audiences with tagging

You can use a single stream to generate a transcription that doesn't show unwanted words and one that does. For the StartStreamTranscription (p. 436) API, use the tag method to mark the words in the transcription that match the words in your vocabulary ﬁlter. You can present the results of the real-time stream to the audience that can see the complete transcription, including the words listed in your vocabulary ﬁlter. You can then copy your transcription results, remove the words tagged by your vocabulary ﬁlter, and show those results to the audience that shouldn't see the unwanted words.

With tagging, you aren't limited to generating transcriptions for two diﬀerent audiences. You can generate multiple transcriptions for many audiences from the same stream. You can choose to remove some words caught by the vocabulary ﬁlter in one transcript and leave them in other transcripts.

To enable tagging in a real-time transcription

• For the StartStreamTranscription (p. 436) API, specify the following:

a. For VocabularyFilterName, the name of your vocabulary ﬁlter.

b. For VocabularyFilterMethod, specify tag.

For example, if "bloody" were in the vocabulary ﬁlter, the phrase "recording bloody well set up this stupid bloody meeting anyway..." would be unchanged in the transcription results. Instead of being masked in, or removed from, the transcription, the value for the VocabularyFilterMatch parameter would be true for "bloody."

The following example JSON output shows this.

{

Filtering streaming transcriptions

"jobName": "your-transcription-job-name", "accountId": "account-id",

"results": {

Filtering streaming transcriptions

"status": "COMPLETED"

}

Transcribing streaming audio

Use Amazon Transcribe streaming transcription to send an audio stream and receive a stream of text in real time. You can use this text stream to add real-time speech-to-text capability to your applications.

Streaming transcription takes a stream of your audio data and transcribes it in real time. Streaming uses HTTP/2 or WebSocket streams so that the results of the transcription are returned to your application while you send more audio to Amazon Transcribe. Use streaming transcription when you want to make the results of live audio transcription available immediately, or when you have an audio ﬁle that you want to process as it is transcribed.

To see which languages support streaming audio transcription, see Supported languages and language-speciﬁc features (p. 3).

For a streaming transcription using:

• HTTP/2 – Use the StartStreamTranscription (p. 436) API to start a stream. When you use this API, the client handles retrying the connection when there are transient problems on the network.

• WebSocket – Create your own client.

• Amazon Transcribe console – Speak directly into a computer microphone. You can use the console as a preview to see how your transcription results are returned to your application when using the API.

NoteStreaming transcription is not supported in all languages. See Supported languages and language-speciﬁc features (p. 3) for details.

You can use Amazon Transcribe streaming transcription for a variety of purposes. For example:

• Streaming transcriptions can generate real-time subtitles for live broadcast media.

• Lawyers can make real-time annotations on top of streaming transcriptions during depositions.

• Video game chat can be transcribed in real time so that hosts can moderate content or run real-time analysis.

• Streaming transcriptions can provide assistance to the hearing impaired.

Streaming transcription takes a stream of your audio data and transcribes it in real time. The transcription is returned to your application in a stream of transcription events.

Amazon Transcribe breaks up the incoming audio stream based on natural speech segments, such as a change in speaker or a pause in the audio. The transcription is returned progressively to your application, with each response containing more transcribed speech until the entire segment is transcribed.

In the following example, each line is a partial result transcription output of an audio segment that is being streamed.

The The ad.

The and The Amazon.

The Amazon is The Amazon is the The Amazon is the law.

The Amazon is the lar.

The Amazon is the large The Amazon is the largest The Amazon is the largest ray The Amazon is the largest rain The Amazon is the largest rain for The Amazon is the largest rainforest.

The Amazon is the largest rainforest on the planet

Each Result object in the response contains a ﬁeld called IsPartial. The value indicates whether the response is a partial response that contains the transcription results so far, or whether it is a complete transcription of the audio segment.

Each Result object also contains the start and end time of the transcribed audio segment from the stream. You can use these values to, for example, synchronize the transcription with the video.

The following example is a partial transcription response.

{ "TranscriptResultStream": { "TranscriptEvent": {

The following example shows the transcription results for a fully transcribed speech segment.

{

"StartTime": 3.95,

Event stream encoding

"VocabularyFilterMatch": false },

{

"Content": ".", "EndTime": 7.92, "StartTime": 7.92, "Type": "punctuation",

"VocabularyFilterMatch": false }

"Transcript": "The Amazon is the largest rainforest on the planet covering over seven million kilometers."

} ],

"EndTime": 7.92, "IsPartial": false,

"ResultId": "a37114fa-5288-438d-afdf-833cfd1ea55c", "StartTime": 2.33

} ] } }

Each word, phrase, or punctuation mark in the transcription output is an item. Each word or phrase has a conﬁdence score. The conﬁdence score is a value between 0 and 1 that indicates how conﬁdent Amazon Transcribe is that it correctly transcribed the item. A conﬁdence score with a larger value indicates that Amazon Transcribe is more conﬁdent that it transcribed the item correctly.

Topics

• Event stream encoding (p. 114)

• Using Amazon Transcribe streaming with WebSockets (p. 116)

• Using Amazon Transcribe streaming With HTTP/2 (p. 122)

• Partial result stabilization (p. 135)

Event stream encoding

Event stream encoding provides bidirectional communication using messages between a client and a server. Data frames sent to the Amazon Transcribe streaming service are encoded in this format. The response from Amazon Transcribe also uses this encoding.

Each message consists of two sections: the prelude and the data. The prelude consists of:

1. The total byte length of the message

2. The combined byte length of all of the headers

The data section consists of:

1. The headers 2. A payload

Each section ends with a 4-byte big-endian integer CRC checksum. The message CRC checksum is for both the prelude section and the data section. Amazon Transcribe uses CRC32 (often referred to as GZIP CRC32) to calculate both CRCs. For more information about CRC32, see GZIP ﬁle format speciﬁcation version 4.3.

Total message overhead, including the prelude and both checksums, is 16 bytes.

Event stream encoding

The following diagram shows the components that make up a message and a header. There are multiple headers per message.

Each message contains the following components:

• Prelude: Always a ﬁxed size of 8 bytes, two ﬁelds of 4 bytes each.

• First 4 bytes: The total byte-length. This is the big-endian integer byte-length of the entire message, including the 4-byte length ﬁeld itself.

• Second 4 bytes: The headers byte-length. This is the big-endian integer byte-length of the headers portion of the message, excluding the headers length ﬁeld itself.

• Prelude CRC: The 4-byte CRC checksum for the prelude portion of the message, excluding the CRC itself. The prelude has a separate CRC from the message CRC to ensure that Amazon Transcribe can detect corrupted byte-length information immediately without causing errors such as buﬀer overruns.

• Headers: Metadata annotating the message, such as the message type, content type, and so on.

Messages have multiple headers. Headers are key-value pairs where the key is a UTF-8 string. Headers can appear in any order in the headers portion of the message and any given header can appear only once. For the required header types, see the following sections.

• Payload:The audio content to be transcribed.

• Message CRC: The 4-byte CRC checksum from the start of the message to the start of the checksum.

That is, everything in the message except the CRC itself.

Each header contains the following components. There are multiple headers per frame.

• Header name byte-length: The byte-length of the header name.

• Header name: The name of the header indicating the header type. For valid values, see the following frame descriptions.

• Header value type: An enumeration indicating the header value.

The following shows the possible values for the header and what they indicate.

• 0 – TRUE

• 1 – FALSE

• 2 – BYTE

• 3 – SHORT

• 4 – INTEGER

• 5 – LONG

• 6 – BYTE ARRAY

• 7 – STRING

• 8 – TIMESTAMP

• 9 – UUID

Using WebSocket streaming

• Value string byte length: The byte-length of the header value string.

• Header value: The value of the header string. Valid values for this ﬁeld depend on the type of header.

For valid values, see the following frame descriptions.

Using Amazon Transcribe streaming with WebSockets

When you use the WebSocket protocol to stream audio, Amazon Transcribe transcribes the stream in real time. You encode the audio with event stream encoding, Amazon Transcribe responds with a JSON structure that is also encoded using event stream encoding. For more information, see Event stream encoding (p. 114). You can use the information in this section to create applications using the WebSocket library of your choice.

Topics

• Adding a policy for WebSocket requests to your IAM role (p. 116)

• Creating a pre-signed URL (p. 116)

• Handling the WebSocket upgrade response (p. 120)

• Making a WebSocket streaming request (p. 121)

• Handling a WebSocket streaming response (p. 121)

• Handling WebSocket streaming errors (p. 122)

Adding a policy for WebSocket requests to your IAM role

To use the WebSocket protocol to call Amazon Transcribe, you need to attach the following policy to the AWS Identity and Access Management (IAM) role that makes the request. See Adding IAM policies for more information on how to do this.

{

"Version": "2012-10-17", "Statement": [

{

"Sid": "transcribestreaming", "Effect": "Allow",

"Action": "transcribe:StartStreamTranscriptionWebSocket", "Resource": "*"

} ] }

Creating a pre-signed URL

Construct a URL for your WebSocket request that contains the information needed to set up communication between your application and Amazon Transcribe. WebSocket streaming uses the Amazon Signature Version 4 process for signing requests. Signing the request helps to verify the identity

在文檔中 Amazon Transcribe (頁 106-200)