录音文件识别 - Python SDK_语音交互服务 SIS_SDK参考

前提条件

● 确保已按照配置Java环境配置完毕。

● 确保已存在待识别的音频文件并上传OBS或者有公网可访问服务器上（需保证可使用域名访问），示例音频可参考下载SDK压缩包文件。如果音频存放在OBS 上，确保服务已授权访问OBS，可参考配置OBS服务。

初始化 Client

初始化AsrCustomizationClient，其参数包括AuthInfo和SisConfig。

表5-15 AuthInfo 参数名称是否

必选

参数类型描述

ak 是 String 用户的ak，可参考AK/SK认证。

sk 是 String 用户的sk，可参考AK/SK认证。

region 是 String 区域，如cn-north-4，参考终端节点。

projectId 是 String 项目ID，同region一一对应，参考获取项目ID。

endpoint 否 String 终端节点，参考地区和终端节点。一般使用默认即可。

表5-16 SisConfig 参数名称是否必选

参数类型

描述

connecti onTimeo ut

否 Integer 连接超时，默认10000，单位ms。

readTim

eout 否 Integer 读取超时，默认10000，单位ms。

请求参数

请求类为AsrCustomLongRequest，详见表5-17。

表5-17 AsrCustomLongRequest

dataUrl 是 String 存放录音文件地址：

● 推荐使用华为云OBS：授权配置请参见OBS配

mat 是 String 音频格式，具体信息请参见《API参考》中录音文件

识别章节。

property 是 String 属性字符串，语言_采样率_模型，如

chinese_8k_common。具体信息请参见《API参考》中录音文件识别章节。

addPunc 否 String 表示是否在识别结果中添加标点，取值为yes 、 no，默认no。

lysisInfo 否 Boolean 是否选择分析信息。当前仅对8k模型有效。

如果选择false，则声道、话者分离、情绪检测、速度信息均无效。默认false。

diarizatio

n 否 Boolean 是否需要话者分离，表示识别结果会包含role项，默认true。

channel 否 String 语音文件声道信息，可以为MONO（缺省)、

LEFT_AGENT、RIGHT_AGENT。

emotion 否 Boolean 是否需要做情绪检测，默认true。

speed 否 Boolean 是否需要输出语速信息，默认true。

vocabula

ryId 否 String 热词表id，不使用则不填写。

创建热词表请参考《API参考》中创建热词表章节。

needWor

dInfo 否 String 表示是否在识别结果中输出分词结果信息，取值为

“yes”和“no”，默认为“no”。

响应参数

响应类为AsrCustomLongResponse，详见表5-18。

表5-18 AsrCustomLongResponse 参数名称是否

必选

参数类型

描述

status 是 String 描述返回状态。

● WAITING 等待识别。

● FINISHED识别已经完成。

● ERROR 识别过程中发生错误。

createTi

me 否 String 任务创建时间。格式如 2018-12-04T13:10:29.310Z。

startTim

e 否 String 开始识别时间。格式如

2018-12-04T13:10:29.310Z。

finishTim

e 否 String 识别完成时间。格式如

2018-12-04T13:10:29.310Z。

segment

s 否 Array of

objects 识别结果, 多句结果的数组。

数据结构参见表5-19。

表5-19 Segment

参数名是否必选参数类型说明

start_time 是 Integer 一句的起始时间戳，单位ms。

end_time 是 Integer 一句的结束时间戳，单位ms。

result 是 Object 调用成功表示识别结果，调用失败时无

此字段。数据结构参见表5-20。

表5-20 Result

参数名是否必选参数类型说明

text 是 String 识别结果文本。

analysis_info 否 Object 每一句的质检分析结果对象。

仅在识别配置中的need_analysis_info 不为null时存在该返回结果。数据结构参见表5-21。

word_info 否 Array of

Object 分词输出列表。

表5-21 Analysisinfo

参数名是否必选参数类型说明

role 否 String 角色类型，目前仅支持 AGENT（座席），USER（用户）。

emotion 否 String 情绪类型，目前仅支持NORMAL（正常），ANGRY（愤怒）。

在识别配置中emotion为true时存在。

speed 否 Float 语速信息，单位是每秒字数。

在识别配置中speed为true时存在。

表5-22 Word_info 数据结构

参数名是否必选参数类型说明

start_time 否 Integer 起始时间 end_time 否 Integer 结束时间

word 否 String 分词

代码示例

import com.huawei.sis.bean.SisConfig;

import com.huawei.sis.bean.SisConstant;

import com.huawei.sis.bean.request.AsrCustomLongRequest;

import com.huawei.sis.bean.response.AsrCustomLongResponse;

import com.huawei.sis.bean.request.AsrCustomShortRequest;

import com.huawei.sis.bean.response.AsrCustomShortResponse;

import com.huawei.sis.bean.AuthInfo;

import com.huawei.sis.client.AsrCustomizationClient;

import com.huawei.sis.exception.SisException;

import com.huawei.sis.util.IOUtils;

import com.huawei.sis.util.JsonUtils;

/** * 录音文件识别Demo

*/public class AsrCustomizationDemo { private static final int SLEEP_TIME = 500;

private static final int MAX_POLLING_NUMS = 1000;

private String ak = "";

private String sk = "";

private String region = ""; // 区域，如cn-north-1、cn-north-4

private String projectId = ""; // 项目id。登录管理控制台，鼠标移动到右上角的用户名上，在下拉列表中选择我的凭证，在项目列表中查看项目id。多项目时，展开“所属区域”，从“项目ID”列获取子项目ID。

/** * todo 请正确填写音频格式和模型属性字符串 * 1. 音频格式一定要相匹配.

* 例如obs url是xx.wav, 则在录音文件识别格式是auto。

* 例如音频是pcm格式，并且采样率为8k，则格式填写pcm8k16bit。

* 如果返回audio_format is invalid 说明该文件格式不支持。

* * 2. 音频采样率要与属性字符串的采样率要匹配。

private String obsAudioFormat = ""; // 文件格式，如auto等

private String obsProperty = ""; // 属性字符串，如chinese_8k_common等 /**

* 设置录音文件识别参数，所有参数均有默认值，不配置也可使用 * * @param request 录音文件识别请求

private void setLongParameter(AsrCustomLongRequest request) { // 设置否是添加标点，yes 或no，默认是no

request.setAddPunc("yes");

// 设置是否将语音中的数字转写为阿拉伯数字，yes或no，默认yes request.setDigitNorm("no");

// 设置声道，MONO/LEFT_AGENT/RIGHT_AGENT, 默认是单声道MONO request.setChannel("MONO");

// 设置是否需要分析，默认为false。当前仅支持8k采样率音频。当其设置为true时，话者分离、情绪检测，速度、声道才生效。

request.setNeedAnalysis(true);

// 设置是否需要话者分离，若是，则识别结果包含role，默认true request.setDirization(true);

// 设置是否需要情绪检测，默认true。

request.setEmotion(true);

// 设置是否需要速度。默认true。

request.setSpeed(true);

// 设置回调地址，设置后音频转写结果将直接发送至回调地址。请务必保证地址可联通,不支持ip地址。

// request.setCallbackUrl("");

// 设置热词id，不使用则不用填写 // request.setVocabularyId("");

} /**

* 定义config，所有参数可选，设置超时时间等。

* * @return SisConfig */

private SisConfig getConfig() { SisConfig config = new SisConfig();

// 设置连接超时，默认10000ms

config.setConnectionTimeout(SisConstant.DEFAULT_CONNECTION_TIMEOUT);

// 设置读取超时，默认10000ms

config.setReadTimeout(SisConstant.DEFAULT_READ_TIMEOUT);

// 设置代理, 一定要确保代理可用才启动此设置。代理初始化也可用不加密的代理，new ProxyHostInfo(host, port);

// ProxyHostInfo proxy = new ProxyHostInfo(host, port, username, password);

// config.setProxy(proxy);

return config;

}

/**

* 录音文件识别demo */

private void longDemo() { try {

// 1. 初始化AsrCustomizationClient

// 定义authInfo，根据ak，sk，region,projectId.

AuthInfo authInfo = new AuthInfo(ak, sk, region, projectId);

// 设置config，主要与超时有关 SisConfig config = getConfig();

// 根据authInfo和config，构造AsrCustomizationClient

AsrCustomizationClient asr = new AsrCustomizationClient(authInfo, config);

// 2. 生成请求

AsrCustomLongRequest request = new AsrCustomLongRequest(obsUrl, obsAudioFormat, obsProperty);

// 设置请求参数，所有参数均为可选 setLongParameter(request);

// 3. 提交任务，获取jobId

String jobId = asr.submitJob(request);

// 4 轮询jobId，获取最终结果。

int count = 0;

int successFlag = 0;

AsrCustomLongResponse response = null;

while (count < MAX_POLLING_NUMS) {

System.out.println("正在进行第" + count + "次尝试");

response = asr.getAsrLongResponse(jobId);

String status = response.getStatus();

System.out.println(JsonUtils.obj2Str(response, true));

} catch (SisException e) { e.printStackTrace();

System.out.println("error_code:" + e.getErrorCode() + "\nerror_msg:" + e.getErrorMsg());

} }

public static void main(String[] args) {

AsrCustomizationDemo demo = new AsrCustomizationDemo();

// 录音文件识别

/** * 查询长语音转写结果 */@Getter

@Setter

@JsonIgnoreProperties(ignoreUnknown = true)

@ToString

public class QueryTranscriptionResp { @JsonProperty("job_id")

private String jobId;

@JsonProperty("status") private String status;

@JsonProperty("create_time") private String createTime;

@JsonProperty("start_time") private String startTime;

@JsonProperty("finish_time") private String finishTime;

@JsonProperty("segments") private List<Segment> segments;

@JsonProperty("error_code") private String errorCode;

@JsonProperty("error_msg") private String errorMsg;

/**

* Segments */

@Getter @Setter

@JsonIgnoreProperties(ignoreUnknown = true) public static class Segment {

@JsonProperty("start_time") private long startTime;

@JsonProperty("end_time") private long endTime;

@JsonProperty("result") private Result result;

@JsonIgnoreProperties(ignoreUnknown = true) public static class Result {

@JsonProperty("text") private String text;

@JsonProperty("score") private double score;

@JsonProperty("analysis_info") private AnalysisInfo analysisInfo;

@JsonProperty("word_info") private List<WordInfo> wordInfo;

}

/**

* AnalysisInfo */

@Getter @Setter

@JsonIgnoreProperties(ignoreUnknown = true) public static class AnalysisInfo {

@JsonProperty("role") private String role;

@JsonProperty("emotion") private String emotion;

@JsonProperty("speed") private Double speed;

@JsonIgnoreProperties(ignoreUnknown = true) public static class WordInfo {

@JsonProperty("start_time") private Integer startTime;

@JsonProperty("end_time") private Integer endTime;

@JsonProperty("word") private String word;

public class CallbackController { @PostMapping("/v1/callback")

public ResponseEntity<?> callback(@RequestBody QueryTranscriptionResp queryTranscriptionResp) { if (!StringUtils.isEmpty(queryTranscriptionResp.getErrorCode())

|| queryTranscriptionResp.getStatus().equals("ERROR")) {

System.out.println("receive error resp"+queryTranscriptionResp.toString());

return new ResponseEntity<>("error resp", HttpStatus.BAD_REQUEST);

}

List<QueryTranscriptionResp.Segment> segments = queryTranscriptionResp.getSegments();

for (QueryTranscriptionResp.Segment segment : segments) { QueryTranscriptionResp.Result result = segment.getResult();

System.out.println("result: " + result.getText());

}

return new ResponseEntity<>("", HttpStatus.OK);

}}

在文檔中 Python SDK_语音交互服务 SIS_SDK参考_华为云 (頁 24-32)