Theory-Structured Harmonic Embeddings for Chord-Conditioned Melody Generation

By Junnuo Wang 2025

Paper accepted to Academic Journal of Computing & Information Science (AJCIS)

Abstract

Chord-conditioned melody generation remains limited by coarse harmonic encodings that collapse extended, altered, and modal chords into a few root–quality labels, preventing models from learning nuanced tension and resolution behavior. This paper introduces a theory-structured framework that enriches harmonic conditioning and guides decoding in a modular Transformer architecture. First, a Theory-Structured Harmonic Embedding decomposes each chord into additive Root, Quality, Extension, and Tension components, yielding interpretable sub-embeddings without incurring a combinatorial chord vocabulary. Second, a Harmony-Aware Soft Constrained Decoding scheme adjusts pitch logits at inference time using music-theoretic priors on chord-tone preference, tension validity, non-chord-tone resolution, and scale adherence, controlled by a single constraint-strength parameter. Experiments on the Enhanced Wikifonia Leadsheet Dataset compare a CMT-style baseline, an EC2-VAE model, and three ablation variants. The full model significantly improves Chord Tone Ratio, Tension Correctness, and Non-Chord-Tone Resolution, while maintaining corpus-level pitch and rhythm statistics as measured by MGEval KLD and overlap area. These results demonstrate that explicit harmonic structure and theory-aware decoding jointly yield melodies that are both stylistically faithful and more music-theoretically aligned.

Paper

Demo