Quickstart
This guide shows you how to get started with audio and image processing.
Audio Processing
Basic Spectrogram
Compute a linear power spectrogram from a simple sine wave:
import numpy as np
import spectrograms as sg
# Generate a 440 Hz sine wave (A4 note)
sample_rate = 16000
duration = 1.0
t = np.linspace(0, duration, int(sample_rate * duration), dtype=np.float64)
samples = np.sin(2 * np.pi * 440 * t)
# Configure STFT parameters
stft = sg.StftParams(
n_fft=512,
hop_size=256,
window=sg.WindowType.hanning,
centre=True
)
params = sg.SpectrogramParams(stft, sample_rate=sample_rate)
# Compute spectrogram
spec = sg.compute_linear_power_spectrogram(samples, params)
# Access results
print(f"Shape: {spec.shape}") # (n_bins, n_frames)
print(f"Frequency range: {spec.frequency_range()}")
print(f"Duration: {spec.duration()}")
Understanding the Result
The Spectrogram object contains:
data: 2D NumPy array with shape(n_bins, n_frames)frequencies: Frequency values for each bintimes: Time values for each frameparams: The parameters used for computation
Mel Spectrogram
For perceptually-scaled analysis (common in speech and music):
import spectrograms as sg
# Configure mel filterbank
mel_params = sg.MelParams(
n_mels=80,
f_min=0.0,
f_max=8000.0
)
# Use decibel scale for visualization
db_params = sg.LogParams(floor_db=-80.0)
# Compute mel spectrogram in dB
mel_spec = sg.compute_mel_db_spectrogram(
samples, params, mel_params, db_params
)
Image Processing
Basic 2D FFT
Compute the 2D FFT of an image:
import numpy as np
import spectrograms as sg
# Create or load a 256x256 image
image = np.random.randn(256, 256)
# Compute 2D FFT
spectrum = sg.fft2d(image)
print(f"Spectrum shape: {spectrum.shape}") # (256, 129)
# Compute power spectrum
power = sg.power_spectrum_2d(image)
Image Filtering
Apply spatial filters to enhance or smooth images:
import spectrograms as sg
# Apply Gaussian blur
kernel = sg.gaussian_kernel_2d(size=9, sigma=2.0)
blurred = sg.convolve_fft(image, kernel)
# Detect edges with high-pass filter
edges = sg.highpass_filter(image, cutoff=0.1)
# Sharpen image
sharpened = sg.sharpen_fft(image, amount=1.5)
Understanding Image Results
2D FFT functions return NumPy arrays:
fft2d(): Complex array with shape(nrows, ncols//2 + 1)power_spectrum_2d(): Real array with shape(nrows, ncols//2 + 1)Filtering functions: Real array with same shape as input
Next Steps
Audio:
Learn about Choosing Parameters for your application
Optimize batch processing with the Batch Processing
Explore Audio Features like MFCC and chromagrams
Image:
Learn about Image Processing with 2D FFT for 2D FFT operations
See Performance and Benchmarks for optimization tips