- Probabilistic Latent Component Analysis (PLCA)
- Nonnegative Matrix/Tensor Factorization (NMF/NTF)
- Probabilistic compositional models for audio mixtures
- Source Separation
- Audio Mixture Analysis
- Acoustic Source Localization and Tracking
- Directional Bayesian Filters
Machine learning, signal processing, and audio interact in interesting ways. I've found that the best way to understand those interactions is through visualizations, keeping notes on various models and mathematical techniques, and just coding things up. I post some of my efforts here in the hopes that they might serve others in their own (human) learning.
Fitting a mixture of Gaussians with the EM algorithm
This MATLAB function fits a MoG with the Expectation-Maximization algorithm in 2 or 3 dimensions and then plays back a movie of the fitting process. This pdf shows a few snapshots from one such video (fix the view to a single page and run through them).
Green colormap for spectrograms
This colormap goes from black to green to cyan. It's sub-optimal for visualizing spectrograms, but it sure looks neat.
3D bar plot with arbitrary coloring
At times, it is helpful to visualize a matrix as a 3D bar plot. Here's an example
for the low-frequency portion of a speech spectrogram.
In a moment of artistic inspiration, I decided to make something that was red, green, and blue and spun around, and this happened. Thanks to undersampling, it's wrought with visual illusions. (Call it without input arguments.)
This write-up briefly describes maximum likelihood/a posteriori estimation and the Expectation-Maximization algorithm (in simplified terms) and derives the update rules for several mixture models.
This write-up goes over the basics for derivatives of the trace of a matrix-valued function. Some matrix derivatives look nasty, but they can often be broken down into smaller, easier problems.
Least-Squares Line Intersection
Triangulation is an important step in localization algorithms that combine spatial cues from multiple sensors. This write-up walks through some of the math involved when (1) sensor positions and their source direction estimates are known and (2) only source direction estimates are known.
Dynamic Bayesian Networks
I cooked up this presentation for my research group while learning about the Kalman Filter and its extensions, various problems associated with the Hidden Markov Model (HMM), Factorial HMM, Switching HMM, Linear Dynamical System, etc. It's a decent overview of DBNs and exact/approximate inference and learning within a probabilistic treatment of time series data.
Directional Statistics Slides
I gave a guest lecture in one of my advisor's (Paris Smaragdis) classes. These slides look at the difference between Euclidean and directional feature spaces and give some examples of source separation/tracking algorithms tailored especially for the latter.
Wrapped Kalman Filter
This function implements the WKF algorithm described in the Signal Processing Letter "A Wrapped Kalman Filter for Azimuthal Speaker Tracking" (Traa, Smaragdis). It is a deterministic recursive filter that tracks a quantity evolving on the unit circle (e.g. a speaker's direction-of-arrival). The basic idea is to replace the Gaussian distributions underlying the well-known Kalman filter with Wrapped Gaussians to deal with wrapping issues at the pi,-pi boundary.
Blind Source Separation and Tracking
The contents of this .zip folder demonstrate a Blind Source Separation (BSS) algorithm described in "Multichannel Source Separation and Tracking with RANSAC and Directional Statistics" (Traa, Smaragdis) that uses random sampling, directional statistics, and EM to separate (possibly moving) speakers from a multichannel recording. It first applies a Random Sample Consensus (RANSAC) algorithm to quickly determine the Directions-of-Arrival (DOA) of multiple speakers positioned around a microphone array. Parameters specific to each frequency band are then tuned with an EM procedure designed to fit a Mixture of Wrapped Gaussians (MoWG). The posterior probabilities from EM are used to construct TF masks and separated the sources. This is extended to the case of moving sources by tracking their DOAs with a Factorial Wrapped Kalman Filter (FWKF).