半自動人聲配音系統之研究
全文
(2)
(3) A Study on Semi-automatic Voice Dubbing System NSC 88-2213-E-011-035
(4) .
(5) . . !"#$%. E-mail: root@guhy.ee.ntust.edu.tw. . tests, it can indeed translate a single voice timbre into many distinct timbres. Keywords: voice dubbing, timbre translation, vocal track, speech synthesis.
(6) !"#$ %&'()
(7) *&+, -./012!34 567&89 :)
(8) ;*<=>?@ =&ABCD&EFG%H7I -.JKLMN5OPQR&+O STUVW9XY
(9) 2Z[\]^4 _`-.a &b?cPQR de&Afg_`hijklmjno [jpkqrst5du&49vw xy&z{kqrst5_`+| }b ?4,~&-. ,deEG 5LM&2E& v
(10) .
(11) .
(12) !"#$%&' ()*+',-!./012 34 56789(:;< =>?@:AB 1CDE<FG$*H I2JKLM NOP?QRS6T*H UV=CDWX- YZ[\]2^_`9((voice conversion) ab<[c0FGd efghi`j`klmn 9(:ofghijmn 7 pq`r[sVtu9(:pv` w?xyz{^_ab2|} ,~- /0^_t; !; @ Z6/ 07A9(-/0g d< `T P 6 ¡`(textto-speech)|¢£?F¤¥¦6 k§d ¨©ª«- . Abstract Voice dubbing or timbre translation is meant a processing that can translate a single voice timbre into many distinct timbres. In drama works, different actors are usually dubbed with different timbres and many dubbing persons are therefore required. To reduce the cost spent to dub the actors, it will be useful if the computer can help to convert a single voice timbre into many distinct timbres. Therefore, we intend to develop a semiautomatic voice-dubbing system. It is called semiautomatic because the emotions (like, anger, sad, happy) perceived from the translated voice need still be controlled by the person who provides the original voice. In this project, a method for timbre translation is proposed. The goal is accomplished by providing independent control of fundamental frequency, vocal-track length, voice source, and internal ratio of vocal track. Among the four factors, the internal ratio of vocal track is newly studied here. In addition, an on-line operable system is built with this method. It can be used for real-time voice dubbing. Also, according to our perception. yz{/0^_¬E8 . .
(13) . VkÎ15-ϼ¤¥ÐÑ6¸ºÒ§ ÓÒaÔÕÖ×ØY[1] ~ Y[45]&ÀÁ ÙÂ_ÚÛ¤¥ÜÝÐ'Õª"»Clips º-6VkÎ1-¤¥Þ¼ Ð max1, max2, max3Dß1- màbl¬wOá Clip=(max1 + max2 + max3) * 0.5 È à b ° á Clip=min(max1, max2, max3)*0.6 & À Á âÂ_ ãªY[1]ĒY[45]'m@ä1-åZ_ _.
(14) (vocal track) . .
(15) . . . . . .
(16)
(17)
(18) !"#. !"#$%& '()*+,-./01-2345 67/#89:.;<=>? @A2B CDEF&G H9:IJKLMNOPQRST QUVWXY+Z [V\U]^&_
(19) _ _ ____`abcdeKLfghi JjklmnJjklop qmnJ56KLars9:tuJ jjklopjkvAwxy z6{|}~56KLnj gu6 4=Q@ GHl=MNQ&_ ____ (buffer) 500 * sampling_rate / 11,025-k'(10-6KL6 1(50% [1-6 &Y¡¢£@1-6 es¤¥¦§¨D1-6 Y¡©ª 256,000 6O+«6 l¬w%®¯°R±®² ³´@µe¶·6¸¹º»D16®²³´¼ª60O½«6 U¾bl¬wr+«6l¬w %®¯°R&_ ____lop¿kKLÀÁ@Â_ ÀÁRÂ_ÃÄÅÆl¬w%@¯Ç ®ÈÇ®5Él¬weÊ Ë®-l¬&ÀÁÂ_ÌÍ-6U. $%&' ( %&'. / )*+,-.. 0 )*+,-. /0123 45. 0. / 6789 ! #:;<=>?. @A<=>?;BC. . _. _. _ æÐopÑÒkÎY[1]ĒY[45]U ¤¥¼ª Clip (çÐè+éHÕÖ× Ø X[1]ĒX[K]_&ÀÁêÂ_Èß1- àbl¬w°á[l¬ÜÝ ß1-l¬¦ëÐìíîïê4l ¬ÜÝl¬¦ëÐìRîï걯°á [ l ¬ Ü Ý 35*sampling_rate /11,025 4 l ¬ Ü Ý 200* sampling_rate / 11,025&ÀÁð ñòó+X[1]ĒX[K]UôõZ4 [l¬ÜÝx¸X[i]öÎ÷Uk Î1-àbϼ¤¥D/@l¬ø ù±ú øùòóûßöÎ X[1] ĒX[K]U[1üýª4[l¬Ü Ýx¸ X[i] ÷UþΤ¥Ï¼xX[i]D /@[1-l¬øùPö[ &_ GHul¿k !. .
(20) . g67l O2 B *· \Ì0l¬ u&_ _. O@ C -l¬m
(21) b !"ú8Ìß1©\S ¦ël¬C¤¥!"¡ SkBlop&§Ölop xKLÔ.>@2 +lop¿kA´95% 4&_ _. ____@µ PQÙST+1-l ¬ã>_`ϼ6aû b -cd"eRfeKL]gh.
(22) _ ____@µ -&ϼ @ !¤ ´ (formant frequency) U!¤´ "b#$%&'D"" 6!¤´
(23) (x°
(24) ) 5 *+(&'GH2 *ã! "!¤´ 0
(25) "! "$&_ sÚ,hgGHÄßÎTIPW \eÌ0TIPW2
(26) (duration) !¤´ r -
(27) J j& [O TIPW KLÀÁ/1. /01234 5òó6789:ï;&_ <L=£@µ>¤? ´KLÀÁ@@(Step1)Ñ l´AÖehl jk$ßBÌ0xl&(Step2)Ñ Ì0l¬6¸opöC -f ZAÖeh67laÔÑ3w6 ¸ ´ Ú Û 6 7 l §D& (Step3) 67l VE4FG§ D& (Step4) Ñ Ì0l¬67l ¬ ! z H ¢ I J K LMNOP_ QOPRMQaÔ+67l E 4 -øIJKSTZÌ0l UV øù' §&(Step5)+ -KL ·67l C§&_ ____@W·!¤ ´x0r!¤´@ *ãúk(resampling)0PJX !¤´YZ6Rîê[6O CDªZ67e 4äRîê-k ¢1-\k]k´m "&B \resamplingÔl¬ x Ôú+ k^(Step3)hS. i6Äl¬& ß#jk :l;,!"2mdn" oDeRp§62 màbqOrr srDe p§6°àbNtOs &_. ____óu./42Yv wxGHy *ãz{|}~xIjB r@g&ã I_.
(28) _ GHDI¤¥ ϼ6¶ O@QÙho prúI67x¸ f&'°2+D67de l(pitch peak)6I¤¥q
(29) ϼGHD67d eoªl6¶ O@ ¤¥Ï¼Ðop&_ ____QÙUe ¶ mªe!]@u hi6l¬Cze e! @b>&'GHÂ È T < T f °á T ' = T × R f_ T ' = T − T ' f 1. 2. 1. 1. 2. 1. È T ≥ T f ° á T ' = T × ( 2 − R ) f_ 1 2 2 2 fUee __e r@ T ' = T −T ' ! 1. 2. ´»Ð@í____bu\ e ’f_e!’ xÔ# resamplin/ !"4¸[)¸s »&_. ". .
(30) . ºíîð6-½å1¾¿"BÀ Á urD¢ºRîâ 6-½qå1¾¿"BÂ. ) u&_ _. ____Z<4*·d eDÐRîí¼6PRîê <ñ@D 6SÎrDRîí©6 Píîê<ñdbm &_ _. _ ____sÚ,jkufgÃw Î'Ñ ./ Î1-56F?#89:&Zj k·NUGHóuu±Ä»Å ÆÇÃwP>¤?´ !" ,cÃw³È ¯"oÉÊ B o&ãª,hF? #89:}/56l=K L(lop=vAwª[Ë KLdÌÍb ¼ S GHÎumÏÐѪ56l= wxjk4$ßl=A´ 2lêÒ&y SÓ/9:Ì0Î d/<=ÔÕTGH A2 Îo&_ _. _ ____GHóu5mCz #¡H<ñga@b¢- £0,+¤6GH½ 5¥¦§¦@U1 -àb¼&_ ____Ñ ß # jk_GH*ã LPC~SB (¨'»_©2©M^ ª=£@VC ¸«¬z rã(¨'»S ®¯(lattice)° ±²2D/@1³´& sÚ,hOÄÑ (¨'»jÎ V¸C«¬z¼©ú µ V¸«¬z¼© aÔú+!"Ô«¬z(0\1¶ (¨'»g Ô \ ³ ´ 2 3 KLÀÁ@ )Tufq2*_V-del¬{|} ~rjÎ(¨'»©2 ·©MI{. _ [1] H. Kuwabara and Y. Sagisaka, “Acoustic Characteristics of Speaker Individuality: Control and Conversion”, Speech Communication, Vol. 16, pp. 165-174, 1995. [2] H. Mizuno and M. Abe, “Voice Conversion Algorithm Based on Piecewise Linear Conversion Rules of Formant Frequency and Spectrum Tilt”, Speech Communication, Vol. 16, pp. 153-164, 1995. [3] Y. Stylianou and O. Cappe, “A System for Voice Conversion Based on Probabilistic Classification and a Harmonic plus Noise Model”, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 281-284, 1998. [4] M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, “Transformation of Formants for Voice Conversion Using Artificial Neural Networks”, Speech Communication, Vol. 16, pp. 207-216, 1995. [5] N. Iwahashi and Y. Sagisaka, “Speech Spectrum Conversion Based on Speaker Interpolation and Mulit-functional Representation with Weighting by Radial Basis Function Networks”, Speech Communication, Vol. 16, pp. 139-152, 1995. [6] G. Baudoin and Y. Stylianou, “On the Transformation of the Speech Spectrum for Voice Conversion”, Int. Conf. on Spoken Language. {|}~x¸»±)Tufq3* ¹R_ _ Area = 1 − Ki × Area f__ (1)_ i+1 i 1+ K i j Î « ¬ z *IB>2 · *IB>Mf_ á. *IB>1! _Ríí±)Tufq4*ÑSnJ º+*IB>2·*IB>M03µ 0 *IB>’2 · *IB>’Mv03 + *IB>M03,2 · *IB>M µ 0 *IB>’Mv03,2 · *IB>’M ± )Tufq5*+*IB>’2·*IB>’M>Ö¹__ Area′i − Areai′+ 1 _ _ (2)_ K ′i = Areai′ + Area′i +1 j k Ô ( ¨ ' » ©’ · ©’ ±. )Tufq6* +(¨'»© ©’ »^ú+ )Tufq2*SjBI^Ö®¯° ±²2B !"xÔd e&_ ____4GH ¼¼¼O¼ ¼º¼ - ½ .>D¢. #. .
(31) . Processing, Vol. 3, pp. 1405-1408, 1996. [7] H. Y. Gu and W. L. Shiu, "A Mandarin-syllable Signal Synthesis Method with Increased Flexibility in Duration, Tone and Timbre Control", Proceedings of the National Science Council, Republic of China, Part A: Physical Science and Engineering, Vol. 22, No. 3, pp. 385-395, 1998. [8] D. G. Childers, “Glottal Source Modeling for Voice Conversion”, Speech Communication, Vol. 16, pp. 127-138, 1995. [9] P. H. Milenkovic, “Voice Source Model for Continuous Control of Pitch Period”, J. Acost. Soc. Am., Vol. 93, No. 2, pp. 1087-1096, 1993. [10] L. R. Rabiner, et al., “A Comparative Performance Study of Several Pitch Detection Algorithms”, IEEE trans. Acoust., Speech, and Signal Processing, pp. 399-418, Oct. 1976. [11] J. D. Markel and A. H. Gray, Jr., Linear Prediction of Speech, New York: Springer-Verlag, 1976. [12] Y. Medan, E. Yair, and D. Chazan, “Super Resolution Pitch Determination of Speech Signals”, IEEE trans. Signal Processing, pp. 40-48, Jan. 1991. [13] ”
(32) ” !("#) $ 228-234 %, 1995& [14] J. F. Wang, et al., “A Hierarchical Neural Network Model Based on a C/V Segmentation Algorithm for Isolated Mandarin Speech Recognition”, IEEE trans. Signal Processing, pp. 2141-2146, Sep. 1991. [15] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall,1993.. $. .
(33)
相關文件
Milk and cream, in powder, granule or other solid form, of a fat content, by weight, exceeding 1.5%, not containing added sugar or other sweetening matter.
Wu, “Control parallel double inverted pendulum by hierarchical reinforcement learning,” in Proceedings of the 7th International Conference on Signal Processing, Beijing,
Late Qing Master Taixu, recognized as the leader of Buddhist reform movement, and several Buddhist intellectuals collaborated to remodel and revive Buddhist
Candidate, Department of Architecture, National Cheng Kung University; Chief of Building Management Section of Public Works Bureau, Tainan, Republic of China..
This article is for the founding of the modern centuries of Buddhist Studies in Taiwan, the mainland before 1949, the Republic of China period (1912~1949), and Taiwan from
11 釋 聖 嚴,《 明 末 佛 教 研 究 》;Chün-fang Yü, The Renewal of Buddhism in China: Chu-hung and the late Ming Synthesis (New York: Columbia University Press,
Peace Between Japan and the People’s Republic of China, Treaty
Internal Control and Management Practices in Non-Profit Organisations Northern India Regional Council of the Institute of Chartered Accounts of India Terrance S. Demczur