• 沒有找到結果。

半自動人聲配音系統之研究

N/A
N/A
Protected

Academic year: 2021

Share "半自動人聲配音系統之研究"

Copied!
5
0
0

加載中.... (立即查看全文)

全文

(1). 

(2)  

(3) A Study on Semi-automatic Voice Dubbing System NSC 88-2213-E-011-035  

(4)  .

(5)   . . !"#$%. E-mail: root@guhy.ee.ntust.edu.tw. . tests, it can indeed translate a single voice timbre into many distinct timbres. Keywords: voice dubbing, timbre translation, vocal track, speech synthesis.  

(6)   !"#$ %&'()

(7) *&+, -./012!34 567&89 :)

(8) ;*<=>?@ =&ABCD&EFG%H7I -.JKLMN5OPQR&+O STUVW9XY

(9) 2Z[\]^4 _`-.a &b?cPQR de&Afg_`hijklmjno [jpkqrst5du&49vw xy&z{kqrst5_`+| }b ?4,~&-.€ ,deEG ‚ƒ„ †5LM&‡ˆ‰2EŠ‹ŒŽ& v‘’“”    

(10)  . 

(11)  . 

(12)    !"#$%&' ()*+',-!./012 34 56789(:;< =>?@:AB 1CDE<FG$*H I2JKLM NOP?QRS6T*H UV=CDWX- YZ[\]2^_`9((voice conversion) ab<[c0FGd efghi`j`klmn 9(:ofghijmn 7 pq`r[sVtu9(:pv` w?xyz{^_ab2|} ,~- /0^_€t;‚ƒ „!; †‡†ˆ†‰Š‹Œ@ ZŽ‘’“6”•/ 0€7A–—9(-/0g˜ ™ƒ„€d<’ “`Tš ‡ˆ‰Š‹PŽ› œ6”•žŸ ¡`(textto-speech)€|¢£?F¤¥¦6 k§™ƒ„€d ¨©ª«- .  Abstract Voice dubbing or timbre translation is meant a processing that can translate a single voice timbre into many distinct timbres. In drama works, different actors are usually dubbed with different timbres and many dubbing persons are therefore required. To reduce the cost spent to dub the actors, it will be useful if the computer can help to convert a single voice timbre into many distinct timbres. Therefore, we intend to develop a semiautomatic voice-dubbing system. It is called semiautomatic because the emotions (like, anger, sad, happy) perceived from the translated voice need still be controlled by the person who provides the original voice. In this project, a method for timbre translation is proposed. The goal is accomplished by providing independent control of fundamental frequency, vocal-track length, voice source, and internal ratio of vocal track. Among the four factors, the internal ratio of vocal track is newly studied here. In addition, an on-line operable system is built with this method. It can be used for real-time voice dubbing. Also, according to our perception.   yz{/0^_†¬E­8 . .

(13) . VkÎ15-ϼ¤¥ÐÑ6¸ºÒ§ ÓÒaÔÕÖ×ØY[1] ~ Y[45]&ÀÁ –ÙœÂ_ÚÛ¤¥ÜÝÐ'Õª"»Clips ºŒŽ”-6•VkÎ1-¤¥Þ¼ Џ max1, max2, max3Dß1-Œ Ž màbl¬wOá Clip=(max1 + max2 + max3) * 0.5  È à b  ° á Clip=min(max1, max2, max3)*0.6 & À Á –âœÂ_ ãªY[1]ĒY[45]'m@ä1-’åZ_ _. 

(14)  (vocal track) . . 

(15)  .  . .  . . .

(16)  

(17)  

(18)    !"#.  !"#$%& '()*+,-./01-2345 67/#89:.;<=>? @A2B CDEF&G H9:IJKLMNOPQRST QUVWXY+Z [V\U]^&_ 

(19)  _  _ ____`abcdeKLfghi JjklmnJjklop qmnJ56KLars9:tuJ jjklopjkvAwxy z6{|}~56KLnj€‚ gu6ƒ„ 4†‡ˆ=Q‰@ GHŠ‹lˆ=MNQ&_ ____Œ  Ž (buffer)      500 * sampling_rate / 11,025-k‘’'(1Œ“0”-6•–—˜™š›œKL6• ŒŽ1(50%“ž [1-6•Ÿ &Y¡¢£@1-6• e‘s¤¥¦§¨D1-6 •Y¡©ª 256,000 6O+«6• l¬w­%®¯°R±®’² ³´@µe¶·6¸¹º»D16•®’²³´¼ª60O½«6• U¾bl¬wr+«6•l¬w ­%®¯°R&_ ____lop¿kKLÀÁ@Â_ ÀÁ–RœÂ_ÃÄÅÆl¬w­%@¯Ç ®Èǐ®–5Él¬weœ†‡Ê Ë®-l¬&ÀÁ–‰œÂ_Ì͔-6•U. $%&' ( %&'. / )*+,-.. 0 )*+,-. /0123 45. 0. / 6789  ! #:;<=>?. @A<=>?;BC. . _. _. _ æÐop€ÑÒkÎY[1]ĒY[45]U ¤¥¼ª Clip (çÐè+éHÕÖ× Ø X[1]ĒX[K]_&ÀÁ–êœÂ_Èß1-ŒŽ àbl¬w°á[l¬Üݐ ß1-ŒŽl¬¦ëÐìíîïê4l ¬Üݐl¬¦ëÐìRîï걯°á [ l ¬   Ü Ý  35*sampling_rate /11,025  4 l ¬   Ü Ý  200* sampling_rate / 11,025&ÀÁ–ðœÂ ŒŽ ñ’òó’+X[1]ĒX[K]UôõZ4 [l¬ÜÝx¸X[i]öÎ÷Uk Î1-àbϼ¤¥’D/@l¬ø ù’±ú øù’òó’ûßöÎ X[1] ĒX[K]U[1üýª4[l¬Ü Ýx¸ X[i] ÷UþΤ¥Ï¼xX[i]D /@[1-l¬øù’Pö[ &_ GHŠ‹ul¿k !. .

(20) . g67l„ O2 B *·  \Ì0l¬„ u&_ _. O@ C  -l¬m

(21) b !"’ú8Ìß1©\S ¦ël¬C¤¥!"¡ SkBlop&§Ölop xKLÔ.>@2 +lop¿kA´95% 4&_ _.    ____@µ „ PQÙST+1-l ¬ã>_`ϼ6a™û b“ -cžd"eRfe‰KL]gh. 

(22)  _ ____ž@µ   -“&ϼ €@–œ ž!¤ ´ (formant frequency) U!¤´ ž"b#$%&'D"" 6!¤´

(23) (x°

(24) ) 5 *+(&'€GH2 *ã! "!¤´ 0

(25) "! "$&_ sÚ,h‚gGHÄßÎTIPW \eÌ0TIPW2 

(26)   (duration)  !¤´ r “-

(27) žJ j& [O TIPW KLÀÁ/1. /01234 5òó6789:ï;&_ <L=£@µ>¤? ´KLÀÁ@@(Step1)Ñ l´AÖehl jk$ßBÌ0xl&(Step2)Ñ Ì0l¬6¸opöC -f ZAÖeh67laÔÑ3w6 ¸  ´ Ú Û 6 7  l „ §D& (Step3) 67l„ VE4FG§ D& (Step4) Ñ Ì0l¬67l ¬     ! z H ¢ I J K –LMNOP›_ QOPRMQœaÔ+67l„ E 4 -øIJKSTZÌ0l UV øù' §&(Step5)+ -KL ·67l„ C§&_ ____@W·!¤ ´x0r!¤´@ *ãúk‘(resampling)0PJX !¤´YZ6Rî‰ê[6O CDªZ67e„ 4äRî‰ê-k ‘ ’  ¢1-\k‘’–]k‘´m "œ&B \resamplingÔl¬„ x Ôú+„ k^(Step3)hS. i6Äl¬& ß#jk :l;,‘!"2mdŠn" oDeRp§62 màbqOrr›˜ srDe‰ p§6°àbNtOšš›˜s &_. ____óu./42Yv wxGHy *ãz“{|}“~xIjB „ r@‚g€‚&ムI_.    

(28)  _ žGH„DI¤¥ ϼ6¶ O@QÙh™’o prú†Iž67x¸ f&'°2Š+D67de  l‡’(pitch peak)6I¤¥q

(29)  ϼ€GHˆ‰D67d eoªl‡’6¶ O@ ¤¥Ï¼Ðop&_ ____QÙUe ¶ mªe!]@u hiž6l¬Cz€e že! @bŠ>&'GHÂ È T < T f °á T ' = T × R f_ T ' = T − T ' f 1. 2. 1. 1. 2. 1. È T ≥ T f ° á T ' = T × ( 2 − R ) f_ 1 2 2 2 fUe‹e _Œ_e r@ T ' = T −T ' ! 1. 2. ´»ÐŽ@í____‰bu\ e ’f_e!’ xÔ‡# resamplin‘/ !"4’Ž¸ž[)Ž¸‘s ’»&_. ". .

(30) . º‹íîð6”-½å1¾¿"BƒÀ Á –   –“’uœrD¢º‹Rîâ 6”-½qå1¾¿"BƒÂ. ––“) uœ&_ _. ____Z<“4*·d eDÐRîí¼”•6–PRîêœ <ñ–“—@D˜™š›–Šœœ 6SŠÎrDRîí©”•6 –Píîêœ<ñdbžŸm  –“&_ _.  _ ____sÚ,jkufgÃw Ε‘'Ñ ./ Î1-56F?#89:&Zj k·NUGHóuuŠ±Ä»Å ÆÇÃwP>¤?´  !" ,cÃw³È ¯"oÉÊ B •‘o&ãª,hF? #89:}/56lˆ=K L(lopˆ=vAwª[Ë KLdÌÍb ¼ S GHÎumÏÐѪ56lˆ= wxjk4$ßlˆ=A´ 2lêÒ&y SÓ/9:Ì0Î d/<“=ÔÕTGH A2 Ε‘o&_ _.   _ ____GHóu5mCz #¡H<ñga@b¢- £0,+¤6€GH½ –5¥¦§¦œ@U1 -àbŸ¼€&_ ____Ñ ß #  jk_GH„*ã LPC“~SB (¨'»_©2©M^ ª=£@VC Ž¸«¬z rã(¨'»S­ ®¯(lattice)° „±²2D/@1³´&€ sÚ,hOÄÑ (¨'»jÎ VŽ¸C«¬z¼©ú µ VŽ¸«¬z¼© aÔú+!"Ô«¬z(0\1¶ (¨'»g ­ Ô  \    ³ ´  2 3 KLÀÁ@ )Tufq2*_V-del¬{|}“ ~rjÎ(¨'»©2 ·©MžI{.   _ [1] H. Kuwabara and Y. Sagisaka, “Acoustic Characteristics of Speaker Individuality: Control and Conversion”, Speech Communication, Vol. 16, pp. 165-174, 1995. [2] H. Mizuno and M. Abe, “Voice Conversion Algorithm Based on Piecewise Linear Conversion Rules of Formant Frequency and Spectrum Tilt”, Speech Communication, Vol. 16, pp. 153-164, 1995. [3] Y. Stylianou and O. Cappe, “A System for Voice Conversion Based on Probabilistic Classification and a Harmonic plus Noise Model”, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 281-284, 1998. [4] M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, “Transformation of Formants for Voice Conversion Using Artificial Neural Networks”, Speech Communication, Vol. 16, pp. 207-216, 1995. [5] N. Iwahashi and Y. Sagisaka, “Speech Spectrum Conversion Based on Speaker Interpolation and Mulit-functional Representation with Weighting by Radial Basis Function Networks”, Speech Communication, Vol. 16, pp. 139-152, 1995. [6] G. Baudoin and Y. Stylianou, “On the Transformation of the Speech Spectrum for Voice Conversion”, Int. Conf. on Spoken Language. {|}“~x¸»±)Tufq3* ¹–Rœ_ _ Area =  1 − Ki  × Area f__ (1)_ i+1 i 1+ K   i   j Î   « ¬ z *IB>2 · *IB>Mf_ á. *IB>1! ‹_Ríí±)Tufq4*ÑSnJ º+*IB>2·*IB>M03µ 0 *IB>’2 · *IB>’Mv03  + *IB>M03,2 · *IB>M  µ 0 *IB>’Mv03,2 · *IB>’M ± )Tufq5*+*IB>’2·*IB>’M>Ö¹_–‰œ_ Area′i − Areai′+ 1 _ _ (2)_ K ′i = Areai′ + Area′i +1 j k   Ô  ( ¨ ' » ©’ · ©’ ±. )Tufq6* +(¨'»© ©’ »^ú+ )Tufq2*SjBI^Ö®¯°„ ±²2B !"xÔd e&_ ____4ƒGH ¼™¼¼O¼ ¼º¼ ” - ½ .>D¢. #. .

(31) . Processing, Vol. 3, pp. 1405-1408, 1996. [7] H. Y. Gu and W. L. Shiu, "A Mandarin-syllable Signal Synthesis Method with Increased Flexibility in Duration, Tone and Timbre Control", Proceedings of the National Science Council, Republic of China, Part A: Physical Science and Engineering, Vol. 22, No. 3, pp. 385-395, 1998. [8] D. G. Childers, “Glottal Source Modeling for Voice Conversion”, Speech Communication, Vol. 16, pp. 127-138, 1995. [9] P. H. Milenkovic, “Voice Source Model for Continuous Control of Pitch Period”, J. Acost. Soc. Am., Vol. 93, No. 2, pp. 1087-1096, 1993. [10] L. R. Rabiner, et al., “A Comparative Performance Study of Several Pitch Detection Algorithms”, IEEE trans. Acoust., Speech, and Signal Processing, pp. 399-418, Oct. 1976. [11] J. D. Markel and A. H. Gray, Jr., Linear Prediction of Speech, New York: Springer-Verlag, 1976. [12] Y. Medan, E. Yair, and D. Chazan, “Super Resolution Pitch Determination of Speech Signals”, IEEE trans. Signal Processing, pp. 40-48, Jan. 1991. [13]       ”

(32)      ”      !("#) $ 228-234 %, 1995& [14] J. F. Wang, et al., “A Hierarchical Neural Network Model Based on a C/V Segmentation Algorithm for Isolated Mandarin Speech Recognition”, IEEE trans. Signal Processing, pp. 2141-2146, Sep. 1991. [15] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall,1993.. $. .

(33)

參考文獻

相關文件

Milk and cream, in powder, granule or other solid form, of a fat content, by weight, exceeding 1.5%, not containing added sugar or other sweetening matter.

Wu, “Control parallel double inverted pendulum by hierarchical reinforcement learning,” in Proceedings of the 7th International Conference on Signal Processing, Beijing,

Late Qing Master Taixu, recognized as the leader of Buddhist reform movement, and several Buddhist intellectuals collaborated to remodel and revive Buddhist

Candidate, Department of Architecture, National Cheng Kung University; Chief of Building Management Section of Public Works Bureau, Tainan, Republic of China..

This article is for the founding of the modern centuries of Buddhist Studies in Taiwan, the mainland before 1949, the Republic of China period (1912~1949), and Taiwan from

11 釋 聖 嚴,《 明 末 佛 教 研 究 》;Chün-fang Yü, The Renewal of Buddhism in China: Chu-hung and the late Ming Synthesis (New York: Columbia University Press,

Peace Between Japan and the People’s Republic of China, Treaty

Internal Control and Management Practices in Non-Profit Organisations Northern India Regional Council of the Institute of Chartered Accounts of India Terrance S. Demczur