Frequency Scales
The library supports multiple frequency scales, each suited for different applications.
Linear Scale
Standard FFT frequency bins, equally spaced in Hz.
Use cases:
General audio analysis
Scientific measurements
Transient detection
import spectrograms as sg
params = sg.SpectrogramParams(stft, sample_rate=16000)
spec = sg.compute_linear_power_spectrogram(samples, params)
Frequency spacing: sample_rate / n_fft
Log Frequency Scale
Logarithmically spaced frequencies.
import spectrograms as sg
params = sg.SpectrogramParams(stft, sample_rate=16000)
spec = sg.compute_log_power_spectrogram(samples, params)
Mel Scale
Perceptually-motivated scale based on human pitch perception.
Use cases:
Speech recognition
Audio classification
Music information retrieval
mel_params = sg.MelParams(
n_mels=80,
f_min=0.0,
f_max=8000.0
)
mel_spec = sg.compute_mel_power_spectrogram(samples, params, mel_params)
Properties:
Approximately linear below 1000 Hz
Logarithmic above 1000 Hz
Models human pitch discrimination
Mel-Hertz conversion:
$$ m = 2595 log_{10}left(1 + frac{f}{700}right) $$
ERB Scale
Equivalent Rectangular Bandwidth models auditory filter bandwidths.
Use cases:
Psychoacoustic modeling
Hearing research
Perceptual audio coding
erb_params = sg.ErbParams(
n_filters=32,
f_min=50.0,
f_max=8000.0
)
erb_spec = sg.compute_erb_power_spectrogram(samples, params, erb_params)
Properties:
Based on critical band theory
More accurate model of auditory perception than Mel
Filter bandwidth increases with center frequency
Constant-Q Transform
Logarithmically-spaced frequencies with constant Q factor.
Use cases:
Music analysis
Pitch detection
Harmonic analysis
cqt_params = sg.CqtParams(
bins_per_octave=12,
n_octaves=7,
f_min=55.0 # A1
)
cqt = sg.compute_cqt(samples, 22050, cqt_params, hop_size=512)
Properties:
Frequency resolution matches musical intervals
Q = constant for all bins
Variable time resolution (higher frequencies = better time resolution)
Scale Comparison
Scale |
Spacing |
Best For |
Common Uses |
|---|---|---|---|
Linear |
Equal Hz |
General analysis |
FFT, physics |
Mel |
Perceptual |
Speech/audio ML |
ASR, ML models |
ERB |
Auditory filters |
Psychoacoustics |
Hearing models |
CQT |
Logarithmic |
Musical pitch |
Music analysis |
Choosing a Scale
Linear: When frequency resolution matters more than perceptual modeling
# Example: Detecting specific frequencies
linear_spec = sg.compute_linear_power_spectrogram(samples, params)
Mel: For machine learning with speech or general audio
# Example: Speech recognition features
mel_params = sg.MelParams(n_mels=40, f_min=80.0, f_max=8000.0)
mel_spec = sg.compute_mel_db_spectrogram(samples, params, mel_params, db_params)
ERB: For perceptual modeling or psychoacoustic research
# Example: Perceptual loudness
erb_params = sg.ErbParams(n_filters=32, f_min=50.0, f_max=16000.0)
erb_spec = sg.compute_erb_magnitude_spectrogram(samples, params, erb_params)
CQT: For musical analysis or pitch-based applications
# Example: Music transcription
cqt_params = sg.CqtParams(bins_per_octave=36, n_octaves=7, f_min=55.0)
cqt = sg.compute_cqt(samples, 44100, cqt_params, hop_size=512)