Frequency Scales

The library supports multiple frequency scales, each suited for different applications.

Linear Scale

Standard FFT frequency bins, equally spaced in Hz.

Use cases:

General audio analysis
Scientific measurements
Transient detection

import spectrograms as sg

params = sg.SpectrogramParams(stft, sample_rate=16000)
spec = sg.compute_linear_power_spectrogram(samples, params)

Frequency spacing: sample_rate / n_fft

Log Frequency Scale

Logarithmically spaced frequencies.

import spectrograms as sg

params = sg.SpectrogramParams(stft, sample_rate=16000)
spec = sg.compute_log_power_spectrogram(samples, params)

Mel Scale

Perceptually-motivated scale based on human pitch perception.

Use cases:

Speech recognition
Audio classification
Music information retrieval

mel_params = sg.MelParams(
    n_mels=80,
    f_min=0.0,
    f_max=8000.0
)
mel_spec = sg.compute_mel_power_spectrogram(samples, params, mel_params)

Properties:

Approximately linear below 1000 Hz
Logarithmic above 1000 Hz
Models human pitch discrimination

Mel-Hertz conversion:

$$ m = 2595 log_{10}left(1 + frac{f}{700}right) $$

ERB Scale

Equivalent Rectangular Bandwidth models auditory filter bandwidths.

Use cases:

Psychoacoustic modeling
Hearing research
Perceptual audio coding

erb_params = sg.ErbParams(
    n_filters=32,
    f_min=50.0,
    f_max=8000.0
)
erb_spec = sg.compute_erb_power_spectrogram(samples, params, erb_params)

Properties:

Based on critical band theory
More accurate model of auditory perception than Mel
Filter bandwidth increases with center frequency

Constant-Q Transform

Logarithmically-spaced frequencies with constant Q factor.

Use cases:

Music analysis
Pitch detection
Harmonic analysis

cqt_params = sg.CqtParams(
    bins_per_octave=12,
    n_octaves=7,
    f_min=55.0  # A1
)
cqt = sg.compute_cqt(samples, 22050, cqt_params, hop_size=512)

Properties:

Frequency resolution matches musical intervals
Q = constant for all bins
Variable time resolution (higher frequencies = better time resolution)

Scale Comparison

Scale	Spacing	Best For	Common Uses
Linear	Equal Hz	General analysis	FFT, physics
Mel	Perceptual	Speech/audio ML	ASR, ML models
ERB	Auditory filters	Psychoacoustics	Hearing models
CQT	Logarithmic	Musical pitch	Music analysis

Choosing a Scale

Linear: When frequency resolution matters more than perceptual modeling

# Example: Detecting specific frequencies
linear_spec = sg.compute_linear_power_spectrogram(samples, params)

Mel: For machine learning with speech or general audio

# Example: Speech recognition features
mel_params = sg.MelParams(n_mels=40, f_min=80.0, f_max=8000.0)
mel_spec = sg.compute_mel_db_spectrogram(samples, params, mel_params, db_params)

ERB: For perceptual modeling or psychoacoustic research

# Example: Perceptual loudness
erb_params = sg.ErbParams(n_filters=32, f_min=50.0, f_max=16000.0)
erb_spec = sg.compute_erb_magnitude_spectrogram(samples, params, erb_params)

CQT: For musical analysis or pitch-based applications

# Example: Music transcription
cqt_params = sg.CqtParams(bins_per_octave=36, n_octaves=7, f_min=55.0)
cqt = sg.compute_cqt(samples, 44100, cqt_params, hop_size=512)