Frequency Scales

The library supports multiple frequency scales, each suited for different applications.

Linear Scale

Standard FFT frequency bins, equally spaced in Hz.

Use cases:

  • General audio analysis

  • Scientific measurements

  • Transient detection

import spectrograms as sg

params = sg.SpectrogramParams(stft, sample_rate=16000)
spec = sg.compute_linear_power_spectrogram(samples, params)

Frequency spacing: sample_rate / n_fft

Log Frequency Scale

Logarithmically spaced frequencies.

import spectrograms as sg

params = sg.SpectrogramParams(stft, sample_rate=16000)
spec = sg.compute_log_power_spectrogram(samples, params)

Mel Scale

Perceptually-motivated scale based on human pitch perception.

Use cases:

  • Speech recognition

  • Audio classification

  • Music information retrieval

mel_params = sg.MelParams(
    n_mels=80,
    f_min=0.0,
    f_max=8000.0
)
mel_spec = sg.compute_mel_power_spectrogram(samples, params, mel_params)

Properties:

  • Approximately linear below 1000 Hz

  • Logarithmic above 1000 Hz

  • Models human pitch discrimination

Mel-Hertz conversion:

$$ m = 2595 log_{10}left(1 + frac{f}{700}right) $$

ERB Scale

Equivalent Rectangular Bandwidth models auditory filter bandwidths.

Use cases:

  • Psychoacoustic modeling

  • Hearing research

  • Perceptual audio coding

erb_params = sg.ErbParams(
    n_filters=32,
    f_min=50.0,
    f_max=8000.0
)
erb_spec = sg.compute_erb_power_spectrogram(samples, params, erb_params)

Properties:

  • Based on critical band theory

  • More accurate model of auditory perception than Mel

  • Filter bandwidth increases with center frequency

Constant-Q Transform

Logarithmically-spaced frequencies with constant Q factor.

Use cases:

  • Music analysis

  • Pitch detection

  • Harmonic analysis

cqt_params = sg.CqtParams(
    bins_per_octave=12,
    n_octaves=7,
    f_min=55.0  # A1
)
cqt = sg.compute_cqt(samples, 22050, cqt_params, hop_size=512)

Properties:

  • Frequency resolution matches musical intervals

  • Q = constant for all bins

  • Variable time resolution (higher frequencies = better time resolution)

Scale Comparison

Scale

Spacing

Best For

Common Uses

Linear

Equal Hz

General analysis

FFT, physics

Mel

Perceptual

Speech/audio ML

ASR, ML models

ERB

Auditory filters

Psychoacoustics

Hearing models

CQT

Logarithmic

Musical pitch

Music analysis

Choosing a Scale

Linear: When frequency resolution matters more than perceptual modeling

# Example: Detecting specific frequencies
linear_spec = sg.compute_linear_power_spectrogram(samples, params)

Mel: For machine learning with speech or general audio

# Example: Speech recognition features
mel_params = sg.MelParams(n_mels=40, f_min=80.0, f_max=8000.0)
mel_spec = sg.compute_mel_db_spectrogram(samples, params, mel_params, db_params)

ERB: For perceptual modeling or psychoacoustic research

# Example: Perceptual loudness
erb_params = sg.ErbParams(n_filters=32, f_min=50.0, f_max=16000.0)
erb_spec = sg.compute_erb_magnitude_spectrogram(samples, params, erb_params)

CQT: For musical analysis or pitch-based applications

# Example: Music transcription
cqt_params = sg.CqtParams(bins_per_octave=36, n_octaves=7, f_min=55.0)
cqt = sg.compute_cqt(samples, 44100, cqt_params, hop_size=512)