• 沒有找到結果。

Modeling Melodic Feature Dependency

N/A
N/A
Protected

Academic year: 2022

Share "Modeling Melodic Feature Dependency"

Copied!
19
0
0

加載中.... (立即查看全文)

全文

(1)

NTUMIULAB

Modeling Melodic Feature Dependency with Modularized Variational Auto-Encoder

Yu-An Wang* Yu-Kai Huang* Tzu-Chuan Lin* Shang-Yu Su Yun-Nung (Vivian) Chen

(2)

NTUMIULAB

Outline

Motivation Approach Experiments Conclusion

(3)

NTUMIULAB

Outline

Motivation Approach Experiments Conclusion

(4)

NTUMIULAB

• VRAE = RNN + VAE

o modeling temporal dependency via recurrent units o diverse generation via controllable codes

• Contributions

✓incorporate domain knowledge by a modularized framework

✓integrate note-unrolling to model the dependency between melodic features

✓achieve better performance than other generative models

Symbolic Music Generation

Sequence Modeling

• RNN (Recurrent Neural Networks)

Generative Modeling

• VAE (Variational Auto-Encoders)

• GAN (Generative Adversarial Networks)

?

Idea: modeling melodic dependency of notes in terms of time, duration, and pitch in a specific order

(5)

NTUMIULAB

Outline

Motivation Approach Experiments Conclusion

(6)

NTUMIULAB

MVAE: Modularized Variational Auto-Encoder

𝑡 𝑑𝑇𝑡 = 𝑇𝑡 =

𝑃𝑡 = #𝐹

Note Dictionary

𝑛𝑜𝑡𝑒𝑡 = (𝑃𝑡, 𝑇𝑡, 𝑑𝑇𝑡)

notet

𝑇1

𝑇0 𝑇𝑡

𝑃1

𝑃0 𝑃𝑡

𝑑𝑇1

d𝑇0 𝑑𝑇𝑡

Variational Inference

Reverse Note Dictionary

Modularized Note Unrolling Decoder

Modularized Encoder

Latent Code

Fully-Connected 𝑑𝑇1, 𝑧𝑑𝑇0, 𝑧 𝑑𝑇𝑡, 𝑧

𝑇1, 𝑧

𝑇0, 𝑧 𝑇𝑡, 𝑧

𝑃1, 𝑧

𝑃0, 𝑧 𝑃𝑡, 𝑧

𝑛𝑜𝑡𝑒0 𝑛𝑜𝑡𝑒1 𝑛𝑜𝑡𝑒𝑡 𝑛𝑜𝑡𝑒𝑡+1

<Start>

𝑡+3, 𝑧

𝑑𝑇𝑡 𝑑𝑇𝑡+1

𝑡+5, 𝑧 𝑡+4, 𝑧

𝑇𝑡 𝑇𝑡+1

𝑃𝑡 𝑃𝑡+1 0 1

𝑡+1

𝑡+2 𝑧

𝑧

𝑧 2

(7)

NTUMIULAB

• Represent note events via different features

Data Representation

Note Dictionary

note

t

Pitch

Time Difference Duration

(8)

NTUMIULAB

• Given an input , the goal is to reconstruct the input

Variational Auto-Encoder

Decoder

Reverse Note Dictionary

Variational Inference

Encoder

Latent Code

Note Dictionary

(9)

NTUMIULAB

Modularized Encoder and Decoder

• Each feature is modeled by its own RNN

Note Dictionary

𝑛𝑜𝑡𝑒𝑡 = (𝑃𝑡, 𝑇𝑡, 𝑑𝑇𝑡)

𝑇1

𝑇0 𝑇𝑡

𝑃1

𝑃0 𝑃𝑡

𝑑𝑇1

d𝑇0 𝑑𝑇𝑡

Variational Inference

Modularized Encoder

Latent Code

Fully-Connected

Reverse Note Dictionary

𝑑𝑇1, 𝑧

𝑑𝑇0, 𝑧 𝑑𝑇𝑡, 𝑧

𝑇1, 𝑧

𝑇0, 𝑧 𝑇𝑡, 𝑧

𝑃1, 𝑧

𝑃0, 𝑧 𝑃𝑡, 𝑧

0 1 2

Modularized Decoder

(10)

NTUMIULAB

• Modeling inter-feature dependency in a specific order

Modularized Note-Unrolling Decoder

Decoder Decoder Decoder

(11)

NTUMIULAB

MVAE: Modularized Variational Auto-Encoder

𝑡 𝑑𝑇𝑡 = 𝑇𝑡 =

𝑃𝑡 = #𝐹

Note Dictionary

𝑛𝑜𝑡𝑒𝑡 = (𝑃𝑡, 𝑇𝑡, 𝑑𝑇𝑡)

notet

𝑇1

𝑇0 𝑇𝑡

𝑃1

𝑃0 𝑃𝑡

𝑑𝑇1

d𝑇0 𝑑𝑇𝑡

Variational Inference

Reverse Note Dictionary

Modularized Note Unrolling Decoder

Modularized Encoder

Latent Code

Fully-Connected 𝑑𝑇1, 𝑧𝑑𝑇0, 𝑧 𝑑𝑇𝑡, 𝑧

𝑇1, 𝑧

𝑇0, 𝑧 𝑇𝑡, 𝑧

𝑃1, 𝑧

𝑃0, 𝑧 𝑃𝑡, 𝑧

𝑛𝑜𝑡𝑒0 𝑛𝑜𝑡𝑒1 𝑛𝑜𝑡𝑒𝑡 𝑛𝑜𝑡𝑒𝑡+1

<Start>

𝑡+3, 𝑧

𝑑𝑇𝑡 𝑑𝑇𝑡+1

𝑡+5, 𝑧 𝑡+4, 𝑧

𝑇𝑡 𝑇𝑡+1

𝑃𝑡 𝑃𝑡+1 0 1

𝑡+1

𝑡+2 𝑧

𝑧

𝑧 2

(12)

NTUMIULAB

Outline

Motivation Approach Experiments Conclusion

(13)

NTUMIULAB

• Data: a merged set of Nottingham, Piano-midi.de, JSB Chorales

• Q1: Is the modularized encoder better?

• Baseline – BachProp (Colombo & Gerstner, 2018)

• Q2: Is the variational inference important?

• Baseline – modularized auto-encoder

• Q3: Is the note-enrolling important?

• Ablation test

Experimental Setup

Florian Colombo and Wulfram Gerstner, “Bachprop: Learning to compose music in multiple styles,” CoRR, 2018.

(14)

NTUMIULAB

• 1-6 scales (1: machine-generated; 6: human-generated)

• Collect 85 scores for each model

Human Evaluation

Model Reconstruction

Error KL Divergence Human Score

𝜇 𝜎

BachProp 240.16 - 3.51 1.61

Modularized AutoEncoder 20.79 - 2.77 1.65

Proposed w/o note unrolling 85.88 264.00 3.22 1.73

Proposed w/ note unrolling 73.19 30.37 4.24 1.54

Real data - - 4.34 1.55

✓ A1: The modularized encoder is better.

✓ A2: The variational inference is necessary.

✓ A3: The note-enrolling is important.

(15)

NTUMIULAB

• Interpolation distribution

Latent Space Analysis

Smooth curve: meaningful interpolation points Distinct features for different music characteristics

→ informative latent codes

• Visualization

(16)

NTUMIULAB

Outline

Motivation Approach Experiments Conclusion

(17)

NTUMIULAB

• We propose a VAE with a modularized framework to model the melodic dependency between note attributes

• The proposed note event representations bring better flexibility

• The experiments in a merged dataset with diverse music types show the superior performance of our MVAE

✓ The modularized encoder is better.

✓ The variational inference is necessary.

✓ The note-enrolling is important.

✓ The learned latent codes are informative

Conclusion

(18)

NTUMIULAB

MiuLaber

(19)

NTUMIULAB

Demo & Code Available @

http://mvae.miulab.tw

參考文獻

相關文件

We say that the series converges if the partial sums converge and denote the limit by.. It is usually not easy to determine whether a

In Sec- tion 4, we introduce these basic ideas by developing dynamic-programming solu- tions for problems from different application areas, including the maximum-sum segment

A fuzzy Petri nets approach to modeling fuzzy rule-based reasoning is proposed to bring together the possibilistic entailment and the fuzzy reasoning to handle uncertain and

BPR-MF optimizes the Area Under receiver-operating- characteristic Curve (AUC) instead of square error. The reason to optimize AUC is that it yields a model that produce

Because the Scale-Invariant Feature Transform can also find the feature points with the change of image size and rotation, in this research, we propose a new method to

We also propose to use the ReliefF feature selection method for selecting relevant words to improve the classification performance.. Comparisons with traditional techniques

The note-enrolling mechanism was first proposed by BachProp on the sequence prediction model, and they claimed that there is de- pendency between dT, T and P for one note according

In the three following notes, our goal is to approximate any smooth function into a function involving only addition and multiplication (i.e.. The approximation will be performed