NTUMIULAB
Modeling Melodic Feature Dependency with Modularized Variational Auto-Encoder
Yu-An Wang* Yu-Kai Huang* Tzu-Chuan Lin* Shang-Yu Su Yun-Nung (Vivian) Chen
NTUMIULAB
Outline
Motivation Approach Experiments Conclusion
NTUMIULAB
Outline
Motivation Approach Experiments Conclusion
NTUMIULAB
• VRAE = RNN + VAE
o modeling temporal dependency via recurrent units o diverse generation via controllable codes
• Contributions
✓incorporate domain knowledge by a modularized framework
✓integrate note-unrolling to model the dependency between melodic features
✓achieve better performance than other generative models
Symbolic Music Generation
Sequence Modeling
• RNN (Recurrent Neural Networks)
Generative Modeling
• VAE (Variational Auto-Encoders)
• GAN (Generative Adversarial Networks)
?
Idea: modeling melodic dependency of notes in terms of time, duration, and pitch in a specific order
NTUMIULAB
Outline
Motivation Approach Experiments Conclusion
NTUMIULAB
MVAE: Modularized Variational Auto-Encoder
ℎ𝑡 𝑑𝑇𝑡 = 𝑇𝑡 =
𝑃𝑡 = #𝐹
Note Dictionary
𝑛𝑜𝑡𝑒𝑡 = (𝑃𝑡, 𝑇𝑡, 𝑑𝑇𝑡)
notet
𝑇1
𝑇0 𝑇𝑡
𝑃1
𝑃0 𝑃𝑡
𝑑𝑇1
d𝑇0 𝑑𝑇𝑡
Variational Inference
Reverse Note Dictionary
Modularized Note Unrolling Decoder
Modularized Encoder
Latent Code
Fully-Connected 𝑑𝑇1, 𝑧𝑑𝑇0, 𝑧 𝑑𝑇𝑡, 𝑧
𝑇1, 𝑧
𝑇0, 𝑧 𝑇𝑡, 𝑧
𝑃1, 𝑧
𝑃0, 𝑧 𝑃𝑡, 𝑧
𝑛𝑜𝑡𝑒0 𝑛𝑜𝑡𝑒1 𝑛𝑜𝑡𝑒𝑡 𝑛𝑜𝑡𝑒𝑡+1
<Start>
ℎ𝑡+3, 𝑧
𝑑𝑇𝑡 𝑑𝑇𝑡+1
ℎ𝑡+5, 𝑧 ℎ𝑡+4, 𝑧
𝑇𝑡 𝑇𝑡+1
𝑃𝑡 𝑃𝑡+1 ℎ0 ℎ1
ℎ𝑡+1
ℎ𝑡+2 𝑧
𝑧
𝑧 ℎ2
NTUMIULAB
• Represent note events via different features
Data Representation
Note Dictionary
note
tPitch
Time Difference Duration
NTUMIULAB
• Given an input , the goal is to reconstruct the input
Variational Auto-Encoder
Decoder
Reverse Note Dictionary
Variational Inference
Encoder
Latent Code
Note Dictionary
NTUMIULAB
Modularized Encoder and Decoder
• Each feature is modeled by its own RNN
Note Dictionary
𝑛𝑜𝑡𝑒𝑡 = (𝑃𝑡, 𝑇𝑡, 𝑑𝑇𝑡)
𝑇1
𝑇0 𝑇𝑡
𝑃1
𝑃0 𝑃𝑡
𝑑𝑇1
d𝑇0 𝑑𝑇𝑡
Variational Inference
Modularized Encoder
Latent Code
Fully-Connected
Reverse Note Dictionary
𝑑𝑇1, 𝑧
𝑑𝑇0, 𝑧 𝑑𝑇𝑡, 𝑧
𝑇1, 𝑧
𝑇0, 𝑧 𝑇𝑡, 𝑧
𝑃1, 𝑧
𝑃0, 𝑧 𝑃𝑡, 𝑧
ℎ0 ℎ1 ℎ2
Modularized Decoder
NTUMIULAB
• Modeling inter-feature dependency in a specific order
Modularized Note-Unrolling Decoder
Decoder ℎ Decoder ℎ Decoder
NTUMIULAB
MVAE: Modularized Variational Auto-Encoder
ℎ𝑡 𝑑𝑇𝑡 = 𝑇𝑡 =
𝑃𝑡 = #𝐹
Note Dictionary
𝑛𝑜𝑡𝑒𝑡 = (𝑃𝑡, 𝑇𝑡, 𝑑𝑇𝑡)
notet
𝑇1
𝑇0 𝑇𝑡
𝑃1
𝑃0 𝑃𝑡
𝑑𝑇1
d𝑇0 𝑑𝑇𝑡
Variational Inference
Reverse Note Dictionary
Modularized Note Unrolling Decoder
Modularized Encoder
Latent Code
Fully-Connected 𝑑𝑇1, 𝑧𝑑𝑇0, 𝑧 𝑑𝑇𝑡, 𝑧
𝑇1, 𝑧
𝑇0, 𝑧 𝑇𝑡, 𝑧
𝑃1, 𝑧
𝑃0, 𝑧 𝑃𝑡, 𝑧
𝑛𝑜𝑡𝑒0 𝑛𝑜𝑡𝑒1 𝑛𝑜𝑡𝑒𝑡 𝑛𝑜𝑡𝑒𝑡+1
<Start>
ℎ𝑡+3, 𝑧
𝑑𝑇𝑡 𝑑𝑇𝑡+1
ℎ𝑡+5, 𝑧 ℎ𝑡+4, 𝑧
𝑇𝑡 𝑇𝑡+1
𝑃𝑡 𝑃𝑡+1 ℎ0 ℎ1
ℎ𝑡+1
ℎ𝑡+2 𝑧
𝑧
𝑧 ℎ2
NTUMIULAB
Outline
Motivation Approach Experiments Conclusion
NTUMIULAB
• Data: a merged set of Nottingham, Piano-midi.de, JSB Chorales
• Q1: Is the modularized encoder better?
• Baseline – BachProp (Colombo & Gerstner, 2018)
• Q2: Is the variational inference important?
• Baseline – modularized auto-encoder
• Q3: Is the note-enrolling important?
• Ablation test
Experimental Setup
Florian Colombo and Wulfram Gerstner, “Bachprop: Learning to compose music in multiple styles,” CoRR, 2018.
NTUMIULAB
• 1-6 scales (1: machine-generated; 6: human-generated)
• Collect 85 scores for each model
Human Evaluation
Model Reconstruction
Error KL Divergence Human Score
𝜇 𝜎
BachProp 240.16 - 3.51 1.61
Modularized AutoEncoder 20.79 - 2.77 1.65
Proposed w/o note unrolling 85.88 264.00 3.22 1.73
Proposed w/ note unrolling 73.19 30.37 4.24 1.54
Real data - - 4.34 1.55
✓ A1: The modularized encoder is better.
✓ A2: The variational inference is necessary.
✓ A3: The note-enrolling is important.
NTUMIULAB
• Interpolation distribution
Latent Space Analysis
Smooth curve: meaningful interpolation points Distinct features for different music characteristics
→ informative latent codes
• Visualization
NTUMIULAB
Outline
Motivation Approach Experiments Conclusion
NTUMIULAB
• We propose a VAE with a modularized framework to model the melodic dependency between note attributes
• The proposed note event representations bring better flexibility
• The experiments in a merged dataset with diverse music types show the superior performance of our MVAE
✓ The modularized encoder is better.
✓ The variational inference is necessary.
✓ The note-enrolling is important.
✓ The learned latent codes are informative
Conclusion
NTUMIULAB
MiuLaber
NTUMIULAB