Music Emotion Volatility Prediction

A Feature-Based Analysis

Status: Research Completed

This research investigates whether the emotional volatility of music—how much listeners' emotional responses vary—can be predicted from acoustic features, and whether this prediction differs from predicting average emotional responses.

While most music emotion recognition focuses on mean responses, this study explores the often-overlooked dimension of emotional variance, revealing which songs evoke consistent emotions versus those producing highly varied listener experiences.

Presentation Video

Watch the complete research presentation explaining methodology, findings, and implications.

Music Emotion Volatility Research Presentation

Research Hypothesis

H1 — Emotional Volatility is Predictable

Acoustic structure predicts variance of emotion better than, or at least comparably to, mean emotion.

Dataset & Approach

The study utilized the PMEmo (Personalized Music Emotion Dataset), which contains:

This dataset enabled cross-listener context sensitivity analysis, making it ideal for examining emotional variance across diverse listening experiences.

6,373 Initial Acoustic Features
767 Songs Analyzed
4 Target Variables
227 Features for MEAN Targets
322 Features for STD Targets

Methodology Pipeline

Stage 1: Unsupervised Feature Pruning

Variance Filtering: Removed 279 near-constant features with variance below 1e-6

Collinearity Reduction: Eliminated 2,254 redundant features with correlation exceeding 0.95

Result: Feature space reduced from 6,373 to 3,840 features

Stage 2: Mutual Information Ranking

Applied Pareto selection, retaining features accounting for top 20% of total mutual information per target:

⚠️ Critical Finding: Zero intersection among all four targets — predicting mean versus variance requires fundamentally different acoustic information.

Model Training

Results: Static Feature Dataset

Target Train R² Validation R² Test R² Interpretation
mean_A 0.797 0.780 0.707 Excellent — mean arousal is highly predictable
mean_V 0.551 0.562 0.493 Moderate — mean valence is predictable but weaker
std_A 0.234 0.039 0.220 Low — arousal variance is difficult to predict
std_V 0.107 0.029 -0.028 Unpredictable — valence variance not captured by features

Results: Dynamic Feature Dataset

Given the poor volatility prediction from static features, I tested whether dynamic temporal statistics (computed from time-varying acoustic properties) would better capture emotional variance.

Target Test R² Best Alpha Best L1 Ratio
mean_A 0.657 0.001 0.1
mean_V 0.506 0.001 0.1
std_A 0.211 0.001 0.9
std_V -0.009 0.01 0.7

Hypothesis Outcome: Not Supported

Dynamic features produced similar results to static features, confirming that acoustic structure alone cannot reliably predict emotional volatility, particularly for valence.

Visualization: Static Feature Results

Predicted vs. actual emotion values for static acoustic features across all four targets.

Visualization: Dynamic Feature Results

Predicted vs. actual emotion values for dynamic temporal features across all four targets.

Key Findings & Implications

1. Acoustic Determinism vs. Listener Factors

While acoustic features excellently predict mean arousal (R² = 0.707) and moderately predict mean valence (R² = 0.493), they fail to explain variance in emotional responses. This suggests emotional volatility arises from listener-specific factors—personal history, context, mood, cultural background—rather than acoustic structure.

2. Distinct Feature Sets for Mean vs. Variance

The zero-intersection result in feature selection reveals that predicting average emotional response requires fundamentally different acoustic information than predicting variability. No single feature was important for all four prediction tasks.

3. Implications for Music Recommendation Systems

Truly personalized music recommendation systems must account for both acoustic properties and listener-specific contextual factors to understand the full spectrum of emotional response. Systems that only consider acoustic features will miss the interpersonal variance that makes music experiences uniquely personal.

This research demonstrates that some aspects of musical emotion are acoustically determined (mean responses), while others emerge from the interaction between music and individual listeners (emotional variance). Understanding this distinction is crucial for advancing both music emotion recognition and personalized listening experiences.

🎵 Future work will explore incorporating listener metadata, contextual information, and temporal dynamics to better model emotional volatility.