Modeling Melodic Feature Dependency

(1)

NTUMIULAB

Modeling Melodic Feature Dependency with Modularized Variational Auto-Encoder

Yu-An Wang* Yu-Kai Huang* Tzu-Chuan Lin* Shang-Yu Su Yun-Nung (Vivian) Chen

(2)

NTUMIULAB

Outline

Motivation Approach Experiments Conclusion

(3)

NTUMIULAB

Outline

(4)

NTUMIULAB

• VRAE = RNN + VAE

o modeling temporal dependency via recurrent units o diverse generation via controllable codes

• Contributions

✓incorporate domain knowledge by a modularized framework

✓integrate note-unrolling to model the dependency between melodic features

✓achieve better performance than other generative models

Symbolic Music Generation

Sequence Modeling

• RNN (Recurrent Neural Networks)

Generative Modeling

• VAE (Variational Auto-Encoders)

• GAN (Generative Adversarial Networks)

?

Idea: modeling melodic dependency of notes in terms of time, duration, and pitch in a specific order

(5)

NTUMIULAB

Outline

(6)

NTUMIULAB

MVAE: Modularized Variational Auto-Encoder

ℎ_𝑡 𝑑𝑇_𝑡 = 𝑇_𝑡 =

𝑃_𝑡 = #𝐹

Note Dictionary

𝑛𝑜𝑡𝑒_𝑡 = (𝑃_𝑡, 𝑇_𝑡, 𝑑𝑇_𝑡)

note_t

𝑇₁

𝑇₀ 𝑇_𝑡

𝑃₁

𝑃₀ 𝑃_𝑡

𝑑𝑇₁

d𝑇₀ 𝑑𝑇_𝑡

Variational Inference

Reverse Note Dictionary

Modularized Note Unrolling Decoder

Modularized Encoder

Latent Code

Fully-Connected _𝑑𝑇₁_{, 𝑧}𝑑𝑇₀, 𝑧 𝑑𝑇_𝑡, 𝑧

𝑇₁, 𝑧

𝑇₀, 𝑧 𝑇_𝑡, 𝑧

𝑃₁, 𝑧

𝑃₀, 𝑧 𝑃_𝑡, 𝑧

𝑛𝑜𝑡𝑒₀ 𝑛𝑜𝑡𝑒₁ 𝑛𝑜𝑡𝑒_𝑡 𝑛𝑜𝑡𝑒_𝑡+1

<Start>

ℎ_𝑡+3, 𝑧

𝑑𝑇_𝑡 𝑑𝑇_𝑡+1

ℎ_𝑡+5, 𝑧 ℎ_𝑡+4, 𝑧

𝑇_𝑡 𝑇_𝑡+1

𝑃_𝑡 𝑃_𝑡+1 ℎ₀ ℎ₁

ℎ_𝑡+1

ℎ_𝑡+2 𝑧

𝑧

𝑧 ℎ₂

(7)

NTUMIULAB

• Represent note events via different features

Data Representation

Note Dictionary

note

_t

Pitch

Time Difference Duration

(8)

NTUMIULAB

• Given an input , the goal is to reconstruct the input

Variational Auto-Encoder

Decoder

Encoder

Latent Code

Note Dictionary

(9)

NTUMIULAB

Modularized Encoder and Decoder

• Each feature is modeled by its own RNN

Note Dictionary

𝑇₁

𝑇₀ 𝑇_𝑡

𝑃₁

𝑃₀ 𝑃_𝑡

𝑑𝑇₁

Modularized Encoder

Latent Code

Fully-Connected

𝑑𝑇₁, 𝑧

𝑑𝑇₀, 𝑧 𝑑𝑇_𝑡, 𝑧

𝑇₁, 𝑧

𝑃₁, 𝑧

ℎ₀ ℎ₁ ℎ₂

Modularized Decoder

(10)

NTUMIULAB

• Modeling inter-feature dependency in a specific order

Modularized Note-Unrolling Decoder

Decoder ^ℎ Decoder ^ℎ Decoder

(11)

NTUMIULAB

MVAE: Modularized Variational Auto-Encoder

ℎ_𝑡 𝑑𝑇_𝑡 = 𝑇_𝑡 =

𝑃_𝑡 = #𝐹

Note Dictionary

note_t

𝑇₁

𝑇₀ 𝑇_𝑡

𝑃₁

𝑃₀ 𝑃_𝑡

𝑑𝑇₁

Modularized Note Unrolling Decoder

Modularized Encoder

Latent Code

Fully-Connected _𝑑𝑇₁_{, 𝑧}𝑑𝑇₀, 𝑧 𝑑𝑇_𝑡, 𝑧

𝑇₁, 𝑧

𝑃₁, 𝑧

𝑛𝑜𝑡𝑒₀ 𝑛𝑜𝑡𝑒₁ 𝑛𝑜𝑡𝑒_𝑡 𝑛𝑜𝑡𝑒_𝑡+1

<Start>

ℎ_𝑡+3, 𝑧

𝑑𝑇_𝑡 𝑑𝑇_𝑡+1

ℎ_𝑡+5, 𝑧 ℎ_𝑡+4, 𝑧

𝑇_𝑡 𝑇_𝑡+1

𝑃_𝑡 𝑃_𝑡+1 ℎ₀ ℎ₁

ℎ_𝑡+1

ℎ_𝑡+2 𝑧

𝑧

𝑧 ℎ₂

(12)

NTUMIULAB

Outline

(13)

NTUMIULAB

• Data: a merged set of Nottingham, Piano-midi.de, JSB Chorales

• Q1: Is the modularized encoder better?

• Baseline – BachProp (Colombo & Gerstner, 2018)

• Q2: Is the variational inference important?

• Baseline – modularized auto-encoder

• Q3: Is the note-enrolling important?

• Ablation test

Experimental Setup

Florian Colombo and Wulfram Gerstner, “Bachprop: Learning to compose music in multiple styles,” CoRR, 2018.

(14)

NTUMIULAB

• 1-6 scales (1: machine-generated; 6: human-generated)

• Collect 85 scores for each model

Human Evaluation

Model Reconstruction

Error KL Divergence Human Score

𝜇 𝜎

BachProp 240.16 - 3.51 1.61

Modularized AutoEncoder 20.79 - 2.77 1.65

Proposed w/o note unrolling 85.88 264.00 3.22 1.73

Proposed w/ note unrolling 73.19 30.37 4.24 1.54

Real data - - 4.34 1.55

✓ A1: The modularized encoder is better.

✓ A2: The variational inference is necessary.

✓ A3: The note-enrolling is important.

(15)

NTUMIULAB

• Interpolation distribution

Latent Space Analysis

Smooth curve: meaningful interpolation points Distinct features for different music characteristics

→ informative latent codes

• Visualization

(16)

NTUMIULAB

Outline

(17)

NTUMIULAB

• We propose a VAE with a modularized framework to model the melodic dependency between note attributes

• The proposed note event representations bring better flexibility

• The experiments in a merged dataset with diverse music types show the superior performance of our MVAE

✓ The modularized encoder is better.

✓ The variational inference is necessary.

✓ The note-enrolling is important.

✓ The learned latent codes are informative

Conclusion

(18)

NTUMIULAB

MiuLaber

(19)

NTUMIULAB

Modeling Melodic Feature Dependency