Quickstart

This guide shows you how to get started with audio and image processing.

Audio Processing

Basic Spectrogram

Compute a linear power spectrogram from a simple sine wave:

import numpy as np
import spectrograms as sg

# Generate a 440 Hz sine wave (A4 note)
sample_rate = 16000
duration = 1.0
t = np.linspace(0, duration, int(sample_rate * duration), dtype=np.float64)
samples = np.sin(2 * np.pi * 440 * t)

# Configure STFT parameters
stft = sg.StftParams(
    n_fft=512,
    hop_size=256,
    window=sg.WindowType.hanning,
    centre=True
)
params = sg.SpectrogramParams(stft, sample_rate=sample_rate)

# Compute spectrogram
spec = sg.compute_linear_power_spectrogram(samples, params)

# Access results
print(f"Shape: {spec.shape}")  # (n_bins, n_frames)
print(f"Frequency range: {spec.frequency_range()}")
print(f"Duration: {spec.duration()}")

Understanding the Result

The Spectrogram object contains:

  • data: 2D NumPy array with shape (n_bins, n_frames)

  • frequencies: Frequency values for each bin

  • times: Time values for each frame

  • params: The parameters used for computation

Mel Spectrogram

For perceptually-scaled analysis (common in speech and music):

import spectrograms as sg

# Configure mel filterbank
mel_params = sg.MelParams(
    n_mels=80,
    f_min=0.0,
    f_max=8000.0
)

# Use decibel scale for visualization
db_params = sg.LogParams(floor_db=-80.0)

# Compute mel spectrogram in dB
mel_spec = sg.compute_mel_db_spectrogram(
    samples, params, mel_params, db_params
)

Image Processing

Basic 2D FFT

Compute the 2D FFT of an image:

import numpy as np
import spectrograms as sg

# Create or load a 256x256 image
image = np.random.randn(256, 256)

# Compute 2D FFT
spectrum = sg.fft2d(image)
print(f"Spectrum shape: {spectrum.shape}")  # (256, 129)

# Compute power spectrum
power = sg.power_spectrum_2d(image)

Image Filtering

Apply spatial filters to enhance or smooth images:

import spectrograms as sg

# Apply Gaussian blur
kernel = sg.gaussian_kernel_2d(size=9, sigma=2.0)
blurred = sg.convolve_fft(image, kernel)

# Detect edges with high-pass filter
edges = sg.highpass_filter(image, cutoff=0.1)

# Sharpen image
sharpened = sg.sharpen_fft(image, amount=1.5)

Understanding Image Results

2D FFT functions return NumPy arrays:

  • fft2d(): Complex array with shape (nrows, ncols//2 + 1)

  • power_spectrum_2d(): Real array with shape (nrows, ncols//2 + 1)

  • Filtering functions: Real array with same shape as input

Next Steps

Audio:

Image: