Introduction
Vector databases have revolutionized how we handle high-dimensional data, especially in domains like audio processing, image recognition, and natural language processing. In this blog post, I’ll walk you through my experience building and testing a Qdrant vector database implementation for audio feature extraction and similarity search.
As a music producer and synthesizer enthusiast, I was particularly interested in analyzing Serum 2 - my personal favorite wavetable synthesizer - to extract meaningful features from its wavetables and create a system for finding similar sounds. The project combines Python’s powerful audio analysis libraries (librosa) with Qdrant’s efficient vector storage to create a system that can analyze audio files, extract meaningful features, and find similar audio based on acoustic characteristics.
The Goal: Analyzing Serum 2 Wavetables
Serum 2 is a powerful wavetable synthesizer that allows users to create complex, evolving sounds by manipulating wavetables - essentially arrays of single-cycle waveforms. My aim was to:
- Extract comprehensive audio features from Serum 2 wavetables
- Store these features as vectors in Qdrant for efficient similarity search
- Build a recommendation system that could suggest similar wavetables based on acoustic characteristics
- Understand the harmonic content of different wavetable types for synthesis applications
This would enable producers to find wavetables with similar timbral characteristics, discover new sounds, and understand the harmonic relationships between different wavetable types.
What is Qdrant?
Qdrant is a vector similarity search engine that provides a production-ready service with a convenient API to store, search, and manage points (vectors) with additional payload data. Unlike traditional databases that organize data in rows and columns, vector databases are optimized for storing and querying high-dimensional vectors efficiently.
Key features of Qdrant:
- High Performance: Uses HNSW (Hierarchical Navigable Small World) indexing
- Multiple Distance Metrics: Supports Euclidean Distance, Cosine Similarity, and Dot Product
- Payload Support: Store additional metadata alongside vectors
- Production Ready: Docker deployment and cloud options available
Project Architecture
The audio analysis system consists of three main components:
- AudioAnalyzer: Extracts comprehensive audio features using librosa
- QdrantAudioDatabase: Manages vector storage and similarity search
- Main Application: Orchestrates the analysis and storage process
Let’s dive into each component:
Audio Feature Extraction
The AudioAnalyzer
class is the heart of our feature extraction system. It uses librosa to extract multiple types of audio features that capture different aspects of the audio signal.
Core Feature Sets
class AudioAnalyzer:
def __init__(self, sample_rate: int = 22050, n_mels: int = 128,
n_mfcc: int = 13, max_harmonics: int = 50):
self.sample_rate = sample_rate
self.n_mels = n_mels
self.n_mfcc = n_mfcc
self.max_harmonics = max_harmonics
self._audio_cache = {} # Cache for efficient processing
The analyzer extracts seven different feature sets:
1. Basic Features
def extract_basic_features(self, audio_path: Union[str, Path]) -> Dict[str, float]:
y, sr = self.load_audio(audio_path)
duration = librosa.get_duration(y=y, sr=sr)
rms = librosa.feature.rms(y=y)[0]
return {
"duration": duration,
"rms_mean": float(np.mean(rms)),
"rms_std": float(np.std(rms)),
}
2. Spectral Features
def extract_spectral_features(self, audio_path: Union[str, Path]) -> Dict[str, float]:
y, sr = self.load_audio(audio_path)
spectral_centroids = librosa.feature.spectral_centroid(y=y, sr=sr)[0]
spectral_rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)[0]
spectral_bandwidth = librosa.feature.spectral_bandwidth(y=y, sr=sr)[0]
return {
"spectral_centroid_mean": float(np.mean(spectral_centroids)),
"spectral_centroid_std": float(np.std(spectral_centroids)),
"spectral_rolloff_mean": float(np.mean(spectral_rolloff)),
# ... more spectral features
}
3. MFCC Features
MFCCs - (Mel-Frequency Cepstral Coefficients) are crucial for audio similarity. MFCCs capture what makes sounds perceptually distinct (like instruments or voices) while ignoring details we don’t notice (like exact pitch or phase).
That makes them perfect for tasks such as speech recognition, speaker identification, and music similarity search.
MFCC Mean – averaging across time compresses the whole clip into a fixed-length vector. This makes clips of different durations directly comparable and highlights the overall timbre rather than every frame.
MFCC Delta – measuring how MFCCs change across frames captures the dynamics of the sound: how notes attack, decay, or transition. This adds motion information on top of the static timbre.
The mean and delta turn MFCCs into a feature set that describes both what the sound is (its timbre) and how it evolves over time (its dynamics).
def extract_mfcc_features(self, audio_path: Union[str, Path]) -> Dict[str, np.ndarray]:
y, sr = self.load_audio(audio_path)
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=self.n_mfcc)
mfcc_mean = np.mean(mfccs, axis=1)
mfcc_delta = librosa.feature.delta(mfccs)
return {
"mfcc_mean": mfcc_mean.astype(np.float32),
"mfcc_std": np.std(mfccs, axis=1).astype(np.float32),
"mfcc_delta_mean": np.mean(mfcc_delta, axis=1).astype(np.float32),
# ... more MFCC features
}
4. Harmonic Analysis
In addition to MFCCs, another powerful way to describe audio is by looking at its fundamental frequency and harmonics. This helps us understand not just the timbre, but the musical character of a sound.
def extract_harmonic_features(self, audio_path: Union[str, Path]) -> Dict[str, Union[float, int, str, np.ndarray]]:
y, sr = self.load_audio(audio_path)
# Get fundamental frequency using librosa's pitch detection
f0, voiced_flag, voiced_probs = librosa.pyin(
y, fmin=librosa.note_to_hz("C2"), fmax=librosa.note_to_hz("C7")
)
# Analyze harmonics using FFT
fft_result = fft(y)
freqs = fftfreq(len(y), 1 / sr)
magnitude = np.abs(fft_result)
harmonic_data = self._analyze_harmonics(freqs, magnitude, fundamental_freq)
return {
"fundamental_frequency": fundamental_freq,
"total_harmonics": harmonic_data["total_harmonics"],
"odd_harmonics": harmonic_data["odd_harmonics"],
"even_harmonics": harmonic_data["even_harmonics"],
"waveform_type": self._classify_waveform_type(harmonic_data),
# ... more harmonic features
}
Fundamental Frequency (f₀)
Using librosa.pyin, we estimate the pitch — the lowest frequency that defines the note being played or sung. This is essential for recognizing melodies or matching sounds at the note level.
Harmonics via FFT
Every real-world sound isn’t just a pure sine wave. Instruments and voices generate overtones — multiples of the fundamental frequency — called harmonics. By applying the Fast Fourier Transform (FFT), we can break the signal into its frequency components and measure how strong each harmonic is.
Why this matters
The balance of harmonics (odd vs. even) is what makes a flute sound different from a violin playing the same note.
Counting harmonics and comparing their strengths lets us classify waveform types (sawtooth, square, triangle, etc.) and better capture timbre.
For similarity tasks, this adds another dimension: two sounds might share MFCCs but differ in harmonic structure.
Feature Vector Creation
All features are combined into a normalized vector for similarity search:
def create_feature_vector(self, features: Dict[str, Union[float, int, str, np.ndarray]]) -> np.ndarray:
vector_components = []
for key, value in features.items():
if key == "waveform_type": # Skip string features
continue
if isinstance(value, (int, float)):
if not np.isnan(value) and not np.isinf(value):
vector_components.append([float(value)])
elif isinstance(value, np.ndarray):
flat_value = value.flatten()
flat_value = np.where(np.isfinite(flat_value), flat_value, 0.0)
vector_components.append(flat_value.astype(np.float32))
# Normalize the feature vector
feature_vector = np.concatenate(vector_components)
norm = np.linalg.norm(feature_vector)
if norm > 1e-8:
feature_vector = feature_vector / norm
return feature_vector.astype(np.float32)
Qdrant Database Integration
The QdrantAudioDatabase
class provides a high-level interface for storing and retrieving audio features:
Database Initialization
class QdrantAudioDatabase:
def __init__(self, host: str = "localhost", port: int = 6333,
collection_name: str = "audio_features",
analyzer: Optional[AudioAnalyzer] = None):
self.client = QdrantClient(host=host, port=port)
self.collection_name = collection_name
self.analyzer = analyzer if analyzer is not None else AudioAnalyzer()
self.vector_size = self._calculate_vector_size()
Collection Setup
def initialize_collection(self, recreate: bool = False) -> None:
collection_exists = self.client.collection_exists(self.collection_name)
if recreate and collection_exists:
self.client.delete_collection(self.collection_name)
collection_exists = False
if not collection_exists:
self.client.create_collection(
collection_name=self.collection_name,
vectors_config=VectorParams(
size=self.vector_size,
distance=Distance.COSINE, # Best for normalized feature vectors
),
)
Storing Audio Features
def store_audio_features(self, audio_path: Union[str, Path],
metadata: Optional[Dict] = None,
feature_sets: Optional[List[str]] = None) -> str:
# Extract audio features
features = self.analyzer.extract_all_features(audio_path, feature_sets)
feature_vector = self.analyzer.create_feature_vector(features)
# Generate unique point ID
point_id = str(uuid.uuid4())
# Prepare payload with metadata
payload = {
"file_path": str(audio_path),
"file_name": audio_path.name,
"analysis_timestamp": str(np.datetime64("now")),
}
# Add relevant features to payload for filtering
self._add_features_to_payload(payload, features)
if metadata:
payload.update(metadata)
# Store in Qdrant
self.client.upsert(
collection_name=self.collection_name,
points=[PointStruct(
id=point_id,
vector=feature_vector.tolist(),
payload=payload
)],
)
return point_id
Similarity Search
def find_similar_audio(self, audio_path: Union[str, Path],
limit: int = 5,
score_threshold: Optional[float] = None) -> List[Dict]:
# Extract features from query audio
features = self.analyzer.extract_all_features(audio_path)
query_vector = self.analyzer.create_feature_vector(features)
# Search for similar vectors
search_results = self.client.search(
collection_name=self.collection_name,
query_vector=query_vector.tolist(),
limit=limit,
with_payload=True,
score_threshold=score_threshold
)
# Format results
results = []
for result in search_results:
result_data = {
"id": result.id,
"score": result.score,
"file_path": result.payload.get("file_path"),
"file_name": result.payload.get("file_name"),
}
result_data.update(result.payload)
results.append(result_data)
return results
Usage Examples
Command Line Interface
The main application provides a comprehensive CLI for processing audio files:
# Process all audio files in a directory
python main.py --directory /path/to/audio/files
# Generate sample files for testing
python main.py --generate-samples
# Process directory and show similarity results
python main.py --directory Tables/ --similarity-search
# Extract only specific features
python main.py --directory Tables/ --features basic spectral harmonic
Programmatic Usage
from audio_analyzer import AudioAnalyzer
from qdrant_database import QdrantAudioDatabase
# Initialize components
analyzer = AudioAnalyzer(sample_rate=22050, n_mels=128, n_mfcc=13)
audio_db = QdrantAudioDatabase(
host="localhost",
port=6333,
collection_name="audio_features",
analyzer=analyzer
)
# Initialize collection
audio_db.initialize_collection(recreate=True)
# Store audio features
point_id = audio_db.store_audio_features(
"path/to/audio.wav",
metadata={"genre": "electronic", "instrument": "synthesizer"}
)
# Find similar audio
similar_audio = audio_db.find_similar_audio("query_audio.wav", limit=5)
for result in similar_audio:
print(f"File: {result['file_name']}")
print(f"Similarity Score: {result['score']:.4f}")
print(f"Tempo: {result['tempo']:.1f} BPM")
print(f"Waveform Type: {result['waveform_type']}")
print(f"Fundamental Freq: {result['fundamental_frequency']:.1f} Hz")
Serum 2 Wavetable Analysis
Real Wavetable Analysis
One of the most exciting aspects of this project was analyzing actual Serum 2 wavetables. I processed wavetables from Serum’s built-in library, including the “4088” wavetable from the Analog category. Here’s what the analysis revealed:
================================================================================
SERUM WAVETABLE ANALYSIS REPORT
================================================================================
File: Tables/Analog/4088.wav
Analysis Date: 2025-08-03T21:26:46
## SUMMARY STATISTICS:
Total Harmonics: 31
Odd Harmonics: 8
Even Harmonics: 23
Odd Content Ratio: 0.699
Even Content Ratio: 0.301
Fundamental Frequency: 43.07 Hz
## HARMONIC ANALYSIS:
## Harm Freq (Hz) Magnitude Phase (rad) Type
1 43.07 204.979 2.506 odd
3 129.20 175.082 2.593 odd
6 258.40 46.229 -3.113 even
9 387.60 33.811 -2.383 odd
14 581.40 18.030 1.817 even
18 753.66 13.658 2.877 even
22 925.93 12.952 -2.308 even
24 1055.13 9.457 -1.549 even
28 1184.33 15.732 -0.705 even
30 1270.46 4.255 -0.024 even
32 1356.59 3.945 0.479 even
34 1442.72 10.751 0.948 even
37 1593.46 5.442 -1.436 odd
40 1722.66 5.774 -0.585 even
44 1894.92 4.624 0.473 even
48 2067.19 4.127 1.543 even
52 2239.45 4.508 2.671 even
55 2368.65 3.777 -2.926 odd
58 2497.85 6.147 -2.035 even
64 2777.78 4.983 2.756 even
68 2906.98 2.904 -2.830 even
70 3036.18 2.842 -1.942 even
74 3208.45 2.415 -0.903 even
79 3402.25 2.243 -3.095 odd
83 3574.51 2.369 -1.883 odd
86 3682.18 2.334 1.947 even
89 3832.91 3.406 -0.287 odd
96 4112.84 3.127 -1.915 even
118 5103.37 2.120 -2.562 even
120 5167.97 2.475 1.184 even
126 5426.37 2.242 2.967 even
## REVERSE ENGINEERING NOTES:
• This wavetable has mixed harmonic content
• Consider using custom wavetable or complex waveform
• Low fundamental frequency - good for bass sounds
================================================================================
Key Insights from Serum 2 Analysis
The analysis of the 4088 wavetable revealed fascinating characteristics:
- Mixed Harmonic Content: The wavetable contains both odd and even harmonics, making it suitable for complex, evolving sounds
- Low Fundamental Frequency: At 43.07 Hz, this wavetable is perfect for bass sounds
- Rich Harmonic Spectrum: 31 harmonics detected, indicating a complex timbre
- Synthesis Formula: The analysis provided an exact mathematical formula for recreating the wavetable using additive synthesis
This level of analysis would be incredibly valuable for:
- Sound Design: Understanding the harmonic makeup of favorite wavetables
- Wavetable Creation: Reverse-engineering complex sounds
- Sound Matching: Finding wavetables with similar harmonic characteristics
- Educational Purposes: Learning about harmonic relationships in synthesis
Practical Applications for Serum 2 Users
The system I built could revolutionize how producers work with Serum 2:
1. Intelligent Wavetable Search
# Find wavetables similar to your favorite bass sound
similar_wavetables = audio_db.find_similar_audio(
"Tables/Analog/4088.wav",
limit=10,
filter_conditions={"category": "Analog"} # Only search within Analog category
)
for result in similar_wavetables:
print(f"Similar wavetable: {result['file_name']}")
print(f"Similarity: {result['score']:.3f}")
print(f"Fundamental: {result['fundamental_frequency']:.1f} Hz")
print(f"Harmonic richness: {result['harmonic_richness']:.3f}")
2. Harmonic Content Analysis The system provides detailed insights into wavetable characteristics:
- Odd vs Even Harmonics: Understanding the timbral character
- Fundamental Frequency: Determining the best pitch range
- Harmonic Richness: Measuring complexity and brightness
- Spectral Characteristics: Brightness, warmth, and bandwidth analysis
3. Wavetable Recommendation Engine
# Find wavetables suitable for bass sounds
bass_wavetables = audio_db.filter_audio_by_metadata({
"fundamental_frequency": {"$lt": 100}, # Low fundamental frequency
"harmonic_richness": {"$gt": 0.5} # Rich harmonic content
})
4. Educational Tool for Synthesis The analysis reports help producers understand:
- How different wavetables create different timbres
- The relationship between harmonic content and perceived sound
- How to choose wavetables for specific musical applications
Testing and Results
Sample Audio Generation
I created synthetic audio files to test the system:
def create_sample_audio_file(output_path: Union[str, Path],
waveform_type: str = "sine",
frequency: float = 440.0,
duration: float = 3.0) -> Path:
t = np.linspace(0, duration, int(sample_rate * duration), False)
if waveform_type.lower() == "sine":
audio_data = amplitude * np.sin(2 * np.pi * frequency * t)
elif waveform_type.lower() == "square":
audio_data = amplitude * np.sign(np.sin(2 * np.pi * frequency * t))
# Add odd harmonics for more realistic sound
for n in range(3, 20, 2):
audio_data += (amplitude / n) * np.sin(2 * np.pi * frequency * n * t)
# ... more waveform types
sf.write(output_path, audio_data, sample_rate)
return output_path
Performance Testing
The system successfully processed various audio files and demonstrated:
- Feature Extraction: Comprehensive analysis of audio characteristics
- Vector Similarity: Accurate similarity matching based on acoustic features
- Scalability: Efficient processing of multiple audio files
- Metadata Storage: Rich payload data for filtering and display
Sample Results
When testing with synthetic waveforms, the system correctly identified:
- Sine waves: Low harmonic content, clean spectral characteristics
- Square waves: Odd harmonics dominant, characteristic timbre
- Sawtooth waves: Rich harmonic content, bright spectral signature
- Triangle waves: Odd harmonics with decreasing amplitude
Conclusion
Key Takeaways
- Vector databases excel at similarity search for high-dimensional audio data
- Feature engineering is crucial for meaningful similarity matching in synthesis applications
- Qdrant’s payload system enables rich metadata storage and filtering for audio libraries
- Proper normalization and error handling are essential for production audio analysis systems
- Serum 2 wavetables contain rich harmonic information that can be systematically analyzed and categorized
The system successfully demonstrates how vector databases can be applied to synthesizer analysis, opening possibilities for:
- Music production tools that suggest similar sounds
- Educational applications for learning synthesis
- Sound design workflows that leverage harmonic analysis
- Wavetable libraries with intelligent search capabilities
This project bridges the gap between music production and data science, showing how modern vector database technology can enhance creative workflows in music production.