• 沒有找到結果。

表格 3 、Experiment of Single LSTM vs. Bidirectional LSTM Hidden

layers

Kernel size

stride Number of kernels

S-LSTM 512 32 5 (16,1) 256

B-LSTM 512 2 32 5 (16,1) 256

我們為CRNN做了以下實驗,以選擇最好的模型。首先,對單LSTM和雙向LSTM進行了實 驗。從上表中發現,雙向LSTM具有較高的PESQ。我們認為,雙向 LSTM 的結構為輸出序列中每 個點提供完整的過去和未來上下文資訊,因此性能良好。

Table 4 Experiment of different kernel size Hidden

layers

Kernel size

stride Number of kernels

1-conv 512 2 32 1 (16,1) 256

Table 5 Experiment of different number of kernels Hidden

layers

Kernel size

stride Number of kernels

128 512 2 32 5 (16,1) 128

12

兩層架構。

Table 0-1 Experiment of different number of hidden layers Hidden

layers

Kernel size

stride Number of kernels

1L 512 2 32 5 (16,1) 256

Table 0-2 Experiment of different units of hidden layers Hidden

layers

Kernel size

stride Number of kernels

512-1 512 2 5 32 (1,16) 256

Table 0-3 Speech enhancement system PESQ experimental result

13

為了瞭解我們系統的實用性和實用性,我們到實田對系統進行了測試。我們讓5名參與者在 每個真實領域測試20個句子,以計算語音增強系統後語音辨識的結果。用語音辨識的單字錯誤率 來評價。

調查報告的結果表式,該系統可以提高語音辨識的準確性。具體來說,RCNN 能夠將感知 品質 (PESQ 度量) 提高 0.83,而不會降低識別精度。這是非常令人驚訝的,因為我們使用的SSR系 統是谷歌SSR,我們不能在實驗中微調。為了進行比較,雖然所有其他方法都可以提高 PESQ,但它 們通常會降低辨識精度。

Table 0-4 Speech enhancement system WER experimental result

Wrong

words/

Total words (A)

Wrong words/

Total words (B)

Wrong words/

Total words (C)

WER

SS [15]

153/775 170/775 225/775 23.57%

DNN 156/775 177/775 196/775 22.75%

RNN [38]

145/775 153/775 174/775 20.3%

CRNN

102/775 130/775 136/775 15.83%

Noisy 116/775 125/775 144/775 16.56%

以下表4Table 0-5表Table 0-64。.

Table 0-5 The example of improved speech recognition results-1

Table 0-6 The example of improved speech recognition results-2 Sentence type Recognition result clean sentence 今天的特餐是什麼?

noisy sentence 今天的特產是什麼?

enhanced sentence 今天的特餐是什麼?

Sentence type Recognition result clean sentence 飲料續杯需要收費嗎?

noisy sentence 你要去杯需要收費嗎?

enhanced sentence 續杯需要收費嗎?

14

五、結論

在計畫第一年中,我們提出了診斷推理模塊,我們採用TF-IDF算法來訓練我們的醫療 產品和相關疾病語料庫中的疾病和症狀的重量。訓練體重後,我們可以了解每種症狀和疾 病的重要性。我們提出了一個公式來計算疾病的分數,該分數可以知道用戶可能患有哪種 疾病。最後,我們可以得到最可能的疾病。在醫療產品選擇模塊中,我們可以根據從前模 塊收集的信息搜索我們的醫療產品數據庫,並為用戶選擇最合適的產品。在實驗結果中,

診斷推理模擬的準確率為86%。並且也完成了吵雜環境下的語音辨識模組。第二年,本計 畫進行深度學習模型應用於語音增強演演演算法。增加了對循環神經網路的體系結構,以

提高模型提取特徵和處理時間模型的能力。在語音增強實驗中

附件一

使用我們的模型可

以提高 PESQ 分數 0.83 。這表明,我們的模型可以在不同的雜訊測試中保持一定的雜訊抑

制效果。在語音辨識實驗中

附件一

我們的模型還可以有效地將語音辨識的正確率提高

0.73%。這表明我們的模型在真實領域還具有雜訊抑制功能,提高了語音品質。

六、參考文獻

[1] S. J. Young, "Probabilistic methods in spoken–dialogue systems," Philosophical

Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 358, no.

1769, pp. 1389-1402, 2000.

[2] M. Tu and X. Zhang, "Speech enhancement based on Deep Neural Networks with skip connections," in 2017 IEEE International Conference on Acoustics, Speech and Signal

Processing (ICASSP), 2017, pp. 5565-5569: IEEE.

[3] S. Pascual, A. Bonafonte, and J. Serrà, "SEGAN: Speech Enhancement Generative Adversarial Network," arXiv preprint arXiv:1703.09452, 2017.

[4] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.

[5] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.

15

附件一

科技部補助專題研究計畫出席國際學術會議心得報告

日期:108 年 12 月 10 日

計畫編號 MOST 107-2221-E-006 -216 -MY2

計畫名稱 人工智慧機器人吵雜環境多人友善對話關鍵技術之研究(2/2) 出國人員

姓名 王駿發 服務機構

及職稱 成功大學電機系教授 會議時間 108 年 11 月 14 日至

108 年 11 月 17 日 會議地點

CHANGZHOU JIANGSU, CHINA 中國江蘇省常州市

會議名稱

(中文) 2019 橘色科技國際會議

(英文)

The 2019 IEEE International Conference on Orange Technologies (ICOT 2019)

發表論文 題目

(中文) 基於美食部落格文本之混合階層中文閱讀理解

(英文)

Hybrid Layers of Chinese Machine Reading Comprehension for Delicacy Food Blog

一、參加會議經過

108.11.12 (二)從台南至桃園國際機場搭機至無錫。

108.11.14 ( 四 )-17( 日 ) 參加 The 2019 IEEE International Conference on Orange Technologies (ICOT 2019) 國 際 研 討 會 , 發 表論文” Hybrid Layers of Chinese Machine Reading Comprehension for Delicacy Food Blog”,並與會中各國專家學 者討論及交流研究計畫相關技術及研發趨勢並搜集研究計畫相關資料。

108.11.19 (二)早上從上海搭機抵達台灣高雄國際機場。

16

本會議包含以下主題:

➢ Health Technology

--Artificial Intelligence on Healthcare --Biomedical Informatics

--Information Technology in Biomedicine --Medical Imaging Processing

--Affordable and Adaptive Healthcare IOT --Intelligent Health Instrumentation and Robotics --Networking and Security for Health/Medical Care --Intelligent Health Multimedia Information Processing

➢ Happiness Technology and Index --Affective Computing for Happiness Detection --Internet of Things for Smart Living

--Smart Manufacturing for GNH

--Healthcare Service Oriented Computing --Industry IOT for GNH

--Natural Language Processing for Happiness

--Happiness Detection from Psychological/Physiological Bio-Signals --System Design for Happiness Promotion

➢ Warming Care Technology

--Human-Machine Interface for Senior and Children Care --Cloud Health and Mental Care Services

--Assistive Technology and Senior Companion Robot --Multimedia Information Processing for Healthcare --Big Data application on Health/Medical Care --Care Service Oriented Computing

--Friendly and Affordable Human-Computer Interaction

17

二、與會心得

1.

與澳洲Victoria University Prof. Yanchun Zhang,討論老人睡眠品質偵 測技術與建置系統。

2.

與大陸西安Northwestern Polytechnical University 謝磊教授, 討論電 腦語音最新發展趨勢包括客製化TTS、多人語音對話系統、多模式影 像及語音整合對話系統。

3.

與印尼Bina Nusantara University Dr. Emil Kaburuan 洽談橘色科技垃 圾生態處理及管理系統。

4.

與香港理工大學Dr. Jiannong Cao曹建農教授洽談大數據及區塊鏈之 最新發展及應用趨勢。

三、發表論文全文或摘要如 附件一

四、建議 無

五、攜回資料名稱及內容

ICOT 2019 會議資料

六、其他

18

Hybrid Layers of Chinese Machine Reading Comprehension for Delicacy Food Blog

Ta-Wen, Kuan 3*, Bo-Hao Su1, Yuan-Ta Hsu1, Jhing-Fa Wang1,2, Tzong-Song Wang41 Department of Electrical Engineering, National Cheng Kung University

2 Department of Information Engineering, Tajen University

3 School of AI,Guangdong & Taiwan, Foshan University, Guangdong

4 Graduate Institute of Culture and Creative Industries E-mail: [email protected]

Abstract— In this paper, a Strong Attention Architecture

with Hybrid Vectors framework for Machine Reading Comprehension (MRC) system is proposed, to effectively solve the MRC problem in Chinese. The system is consisted of five layers, including 1) Hybrid embedding layer. 2) Encoding layer. 3) Stronger attention layer. 4) Output layer and 5) Generate layer, in which the hybrid embedded layer and the stronger attention layer are inspected, where the character-embedding and the word-embedding models are utilized, respectively, to convert the words of the article and the question into a character vector and a word vector, whereas the words in article being highly relevant to the problem are then weighted by stronger attentional architecture. In dataset, the Delta Reading Comprehension Dataset (DRCD) and a Tainan Delicacy corpus are applied for a series of experiments. In experimental results, the criteria on Exact Match (EM) can be achieved 70.43%, and F1-score is also reached 72.61%.

Overall, the proposed work gives the superior performance compared to other two related works, yet worse than the human performance.

Index Terms — Word embedding model; Character embedding model; pointer network model; long short-term memory model; stronger attention architecture; Chinese machine reading comprehension system

I. INTRODUCTION Reading comprehension is a human fundamental skill through systematically learning since elementary school by reading and questioning from an article content. To answer these questions, summarization, assertion, inference, refinement of those evidences then finally answering by words. However, to infer the writer’s intention from article is a challenge work, this motivates us to proposed this work to investigate the reading comprehension question for answering questions from a given document.

Dataset for benchmark is a critical factor for usability in MRC, broadly divided into four categories: 1) Multiple choices such as McTest [1], 2) Cloze as CNN / Daily News [2], 3) Extraction in SQUAD [3], and 4) Abstraction e.g. MS MARCO [4]. MRC investigation in English, Wang et. al [5] utilized semantics through match-LSTM and Pointer Net to predict the position of the answer in the article. Seo et al. [6] examined interaction

Fig. 1. System architecture

information between questions and paragraphs through a bidirectional attention mechanism. Xiong et al. [7] built a network model with dynamically co-attentional flow to iteratively predict the range of answers.

In this study, a Strong Attention Architecture with Hybrid Vectors for Chinese MRC with corresponding proposed system framework shown in Fig 1, to deal with the challenges of MRC in Chinese in three aspects. Firstly, Chinese is differentiated from English in words discrimination by space. Secondly, related works on MRC in Chinese generally used word vectors for sentences, yet problems met on the out of vector (OOV) in terms of training efficiency issues. Thirdly, extracting the words relating questions in the article would meet problems if article is too long, led to hardly find the appropriate answer.

The remaining parts in this work is as follows. Section II introduces the proposed system. Section III shows the experimental results, and Section IV concludes the proposed work.

19 II. PROPOSED SYSTEM Framework Overview

Figure 1 is shown the proposed framework wherein five layers are composited, including 1) hybrid embedding layer, 2) encoding layer, 3) stronger attention layer, 4) output layer, and 5) generation layer, whereas the corresponding characters and words are featured as vectors, respectively. Hybrid embedding layer maps each character and word to a vector space using a pre-trained hybrid character and word embedding model. After that, encoding layer utilizes surrounding word and character vectors to yield the contextual information. Previous two layers are then applied to the question and article. Next, stronger attention layer merged the question and article vectors to generate question-awareness featured vectors for each word in article, and employs a Long Short-Term Memory (LSTM) [8] to scan the article.

Thereafter, output layer provides an answer index to the question.

Eventually, generation layer generates an answer from the article through the answer index.

Hybrid Embedding Layer

1) Layer Overview

The flow diagram of Hybrid Embedding Layer is shown in Fig.

2, including for parts, that is, 1) input question and article, 2) Jieba for segmentation, 3) word-embedding and char- embedding, 4) word vectorization and character vectorization and 5) concatenation. Question and article firstly are segmented separately, then using Jieba for input text processing and customizing nouns for related domains by User_dict, example shown in Table I. The result is thereafter segmented into the words and the characters by the word-embedding and char- embedding models to create language vector space for trained models through wiki dataset. Word vectorization and character vectorization are then used to map words and characters into the language vector space, where each converted word and character can be operated by machine. Finally, the concatenation part merges the word and character vectors of article into a hybrid vector, the steps for the question is the same as the article step.

2) Embedding and Vectorization

Word Embedding [9] is a method to map words into a vector space, having a denser representation compared to one-hot encode method, by using a variety of language models for learning shown in Table II. Word embedding hints that many hidden relationships between words can be obtained, for example, vector ("Spain") - vector ("Madrid") are similar to vector ("Italy") - vector ("Rome"), given the relationship between country and capital. For word-embedding treats a word as a non-binary numeric vector, such that a lower and denser dimension is then obtained. Intuitively, suitable word embedding gives the better similarity between similar words or hidden semantic relationships, where word contexts can be learned and understood by contextual information, for similar words often appeared in similar contexts.

In word2vec, Chinese version of Wikipedia files are downloaded for training contained about 3.7 million articles for transformation into text format, herein the genism word2vec model from Google is used to train these data, and defined the dimension size 300 as a word space, additionally, Continous Bag of Words Model (CBOW) is chosen here as the model for its lower computational complexity O(V), whereas Skip-gram

is O(5V) and 5 is a window size. Then feeding sentence contained 5 words into the model, after that the vector of each word is then acquired. The example is shown in table II. In the case of char2vec, as previously downloaded files from Wikipedia used for training data. Herein the genism char2vec model from Google, used to train these data, size in 200 treated as the dimension of the character space and CBOW is also used

Fig. 2. The flow diagram of Hybrid Embedding Layer

TABLE I. SEGMENTATION WITH JIEBA Example 1

TABLE ⅠⅠ THE RESULT Of WORD EMBEDDING AND VECTORIZATION Word Vector representation of word

Size of dimension=300 台灣大雞排 [-0.19855534,…,-0.0057442924]

很 [-0.02085873,…, -0.50289285]

好吃 [-0.1597478,…,-0.22335216]

TABLE ⅠⅠⅠ THE RESULT OF CHARACTER EMBEDDING AND VECTORIZATION

Character Vector representation of character Size of dimension=200 台 [0.028282,…, 0.009892]

灣 [-0.334899,…,0.060722]

好 [0.443514,…, 0.201379]

吃 [0.402922,…, 0.354191]

as word2sec case. Then feeding sentence formatted as 5 characters to get the vector of each character after training.

Example of the trained character vector is shown in table III.

20

A. Stronger Attention Layer

e

ij

(s

i 1

,h

j

)

(5)

1) Framework Overview

Framework of stronger attention layer is mainly divided into two parts including, question-article attention and self-attention, as shown Fig. 3. Firstly, use the question-article attention to capture words relating the problem in the article. Secondly, weight of the words is emphasized by the Self-Attention Architecture according to the relevance, such that greatly differentiates between the related and the unrelated words, and benefits to the subsequent decoder to get a more appropriate answer.

2) Question-Article Attention

The question-article attention scores the relevance degree between the article and the question, for not all words are beneficial for the answer, therefore, by weighting skill to score the relevance degree between problem and article, that is, the higher relevance implied the higher weight between article words and question.

The encoded article and question are denoted as A and Q, respectively. The similarity between article and question is calculated through a trainable scalar function as (1), where the

where the score is based on the RNN hidden state Si-1 and the j- th annotation hj of the input sentence, and

β

is the learning function.

Fig. 3. The flow of Stronger Attention Layer

similarity matrix S I×Jis given by

S

i, j function. The question-article attention is calculated as:

C A

soft max(S T

) A T

(2)

Fig. 4. The original architecture of auto-encoder

3) Self-Attention Layer

The attention model is used to solve the problem of weak translation on MRC when met the long sentence, by creating a context vector for each word of input, rather than just creating a context vector from the hidden state for the input. For example, if a N-words sentence input, then N-context vectors are generated to benefit the decoding efficiency. Noted that above is different from the standard encoder-decoder method as Fig.

4. The probability is conditioned on the different context vector ci of each target word yi shown Fig. 5.

The context vector ci relied on a sequence of annotations (h1…hTx), of which an encoder maps the input sentence. Each annotation hi contains information for the whole input sequence with a strong focus on the parts surrounding the i-th word of the input sequence. The context vector ci is, then, computed as a weighted sum of these annotations hi in (3):

Tx

Fig. 5. The architecture of self-attention

TABLE ⅠV EXAMPLE OF DRCD DATASET

c

i ij

h

j j 1 (3)

The weight αij of each annotation hj is computed as (4), and (5) is an attention model to score how well the inputs surrounding

Type Content

21 1) Dataset

III. EXPERIMENTAL RESULTS TABLE V THE EVALUATION RESULTS OF DIFFERENT MODEL ON DRCD

Two datasets including, Delta Reading Comprehension Dataset (DRCD) [10] and Tainan delicacy corpus, are mainly used to evaluate the proposed work, wherein DRCD used to train the proposed model for accuracy inspection. The observation is shown that by training Tainan delicacy food dataset, the proposed work is able to reach the higher accurate MRC performance in the domain of Tainan cuisine compared to ORCD. For DRCD is a machine comprehension dataset based on a set of Wikipedia articles in Chinese, having more than 30,000 questions and 10,000 paragraphs. The answer to each question is always a span in the context, where the model given a credit if the answer is matched one of the human written answers. The example is shown in Table IV.

2) Model Evaluation

Two criterions, that is, Exact Match (EM) and F1 score, based on statistical analysis and measurement of classification herein used to examine the preformation of proposed model. Both criterions ignore the punctuations and the definite articles

i.e. a, an, the. EM measures the percentage of predictions to match any one of the ground truth answers exactly as (6).

and a stronger attention architecture, wherein the approaches of a semantic understanding and question answering in Chinese are applied. Due to the complexity on Chinese language led to dataset incompleteness, such that would be influencing the semantic understanding. Accordingly, the hybrid vector is used to solve the problem in terms of three aspects. Firstly, most of the known words are converted into vectors through the word- embedding model, and the character vector generated by the character-embedding model used to represent vectors for the unknown word, thereby alleviating the common problem of out of vector in Chinese. Secondly, the stronger attention architecture is proposed to understand articles and questions, in order to find words that are highly relevant to the questions in the article. Thirdly, the pointer network model is used to predict the answer relating the question in the article. In corpus, the DRCD dataset used for training and Tainan delicacy food corpus

is then used to predict the appropriate answer regarding the questions. In experimental results, the criteria EM can be achieved 70.43%, and F1-score is also reached 72.61%, overall the proposed work gives the superior performance compared to Accuracy correct

100%

test

(6)

other two related works, yet being worse than the human performance.

F1-Score (F1) inspects the character-level fuzzy matching between the prediction and the ground truth. That is, measuring the overlap between the prediction and the answer. Herein we used the maximum F1 throughout the ground truth answers for a given question as (7): Note that, non-Chinese words will not be segmented.

2×𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛×𝑅𝑒𝑐𝑎𝑙𝑙

REFERENCES

[1] M.Richardson, C.J.Burges, and E.Renshaw, " Mctest: Achallenge dataset for the open-domain machine comprehension of text," in Proc.

Conf. Empirical Methods Natural Lang. Process., 2013, pp. 193–203.

K. M. Hermann et al., " Teaching machines to read and comprehend, n Proc. Adv.

Neural Inf. Process. Syst., 2015, pp. 1693–1701. iP. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. (2016). " Squad: 100000+questions for machine comprehension of text." [Online].

[2] "

𝐹1 =

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙 (7)

3) Experimental Results

Totally 3,485 questions and 1,000 articles are collected into the dataset for evaluation through EM and F1 score criterions. Table V illustrates the experimental results of two related works’

models (ID1 and ID2) on DRCD having comparison with the proposed model (ID3) as well as the real human participant (ID4). The model marked “SRC” [11] indicated fine tuning with DRCD data only matching the BERTserini condition proposed by Yang et al. [11], where the model marked “SRC + DS” [11]

released the fine tune BERT with all data grouped together.

Although model SRC+DS outperformed the original SRC model in both EM and F1 score, yet the proposed model significantly achieved the superior performance among three models, however being worse than human performance, luckily, the EM and F1 scores are improved about 15.03% and 4.91% of accuracies compared to SRC and SRC + DS models, respectively.

CONCLUSION

This paper proposed a framework of machine reading

comprehension for Tainan gastronomy, based on a hybrid vector Available: https://arxiv.org /abs/1606.05250.

[3] T. Nguyen, M. Rosenberg, S. Xia, J. Gao, and D. Li. (2016). " Ms marco:

A human generated machine reading comprehension dataset. " [Online].

Available: https://arxiv.org/abs/1611.09268.

[4] S. Wang and J. Jiang. (2016). ‘‘Machine comprehension using match- LSTM and answer pointer.’’ [Online]. Available: https://arxiv.

org/abs/1608.07905.

[5] M. Seo, A. Kembhavi, A. Farhadi, and H. Hajishirzi. (2016).

‘‘Bidirectional attention flow for machine comprehension.’’ [Online].

Available: https://arxiv.org/abs/1611.01603.

[6] C. Xiong, V. Zhong, and R. Socher. (2014). ‘‘Dynamic coattention networks for question answering.’’ [Online]. Available: https://arxiv.

org/abs/1611.01604.

[7] Sepp Hochreiter and Jürgen Schmidhuber. Long Short-Term memory.

Neural computation, 9(8):1735–1780, 1997.

[8] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,

"Distributed representations of words and phrases and their compositionality," in Advances in neural information processing systems, 2013, pp. 3111-3119.

"Distributed representations of words and phrases and their compositionality," in Advances in neural information processing systems, 2013, pp. 3111-3119.

相關文件