Performance and Benchmarks =========================== The spectrograms library is designed for high performance, with a Rust core and zero-copy Python bindings. Benchmark Results ----------------- Benchmarks comparing ``spectrograms`` against NumPy and SciPy implementations are available in the `PYTHON_BENCHMARK.md `_ file. Summary ~~~~~~~ Average speedups across all parameter configurations and signal types: .. list-table:: :header-rows: 1 :widths: 20 15 15 25 25 * - Operation - Rust (ms) - NumPy (ms) - SciPy (ms) - Avg Speedup * - Power - 0.126 - 0.205 - 0.327 - 1.6x / 2.6x * - Magnitude - 0.140 - 0.198 - 0.319 - 1.4x / 2.3x * - Decibels - 0.257 - 0.350 - 0.451 - 1.4x / 1.8x * - Mel - 0.180 - 0.630 - 0.612 - **3.5x / 3.4x** * - LogHz - 0.178 - 0.547 - 0.534 - **3.1x / 3.0x** * - ERB - 0.601 - 3.713 - 3.714 - **6.2x / 6.2x** Key Findings ~~~~~~~~~~~~ 1. **Filterbank operations** (Mel, ERB, LogHz) show the largest speedups (3-6x) due to: - Pre-computed filterbanks cached in plans - Sparse matrix operations - Minimal memory allocation 2. **Basic operations** (Power, Magnitude, dB) show 1.4-2.6x speedups from: - Rust's performance - Zero-copy NumPy integration - GIL release during computation 3. **Consistency**: Low standard deviations show reliable, predictable performance Why spectrograms is Faster --------------------------- The library achieves superior performance through several optimizations that are applied automatically: Pre-computed Filterbanks ~~~~~~~~~~~~~~~~~~~~~~~~~ When using the planner API, filterbanks (Mel, ERB, LogHz) are computed once and cached: .. code-block:: python planner = sg.SpectrogramPlanner() plan = planner.mel_db_plan(params, mel_params, db_params) # Filterbank computed once ↑ for signal in signals: spec = plan.compute(signal) # Reuses cached filterbank NumPy/SciPy recompute filterbanks on every call, wasting time on redundant calculations. Sparse Matrix Operations ~~~~~~~~~~~~~~~~~~~~~~~~~ Filterbanks are stored as sparse matrices and applied using optimized sparse matrix-vector multiplication, avoiding unnecessary computations on zero elements. Memory Efficiency ~~~~~~~~~~~~~~~~~ The Rust implementation uses: - Pre-allocated workspace buffers - Minimal temporary allocations - Efficient memory layouts GIL Release ~~~~~~~~~~~ All computation functions release Python's Global Interpreter Lock (GIL), enabling: - Parallel processing of multiple files across threads - Concurrent computation with other Python operations Optimization Tips ----------------- 1. Use the Planner API ~~~~~~~~~~~~~~~~~~~~~~ Always use plans for batch processing: .. code-block:: python # ❌ Slow: Creates new plan every iteration for signal in signals: spec = sg.compute_mel_db_spectrogram(signal, params, mel_params, db_params) # ✅ Fast: Reuses plan planner = sg.SpectrogramPlanner() plan = planner.mel_db_plan(params, mel_params, db_params) for signal in signals: spec = plan.compute(signal) **Speedup: 1.5-3x** depending on operation type. 2. Choose Power-of-2 FFT Sizes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FFT algorithms are optimized for power-of-2 sizes: .. code-block:: python # ✅ Fast stft = sg.StftParams(n_fft=512, ...) # 2^9 stft = sg.StftParams(n_fft=1024, ...) # 2^10 stft = sg.StftParams(n_fft=2048, ...) # 2^11 # ❌ Slower stft = sg.StftParams(n_fft=1000, ...) # Not power-of-2 3. Streaming for Real-Time Applications ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For real-time processing, use frame-by-frame computation: .. code-block:: python plan = planner.mel_db_plan(params, mel_params, db_params) for frame_idx in range(n_frames): frame_data = plan.compute_frame(signal, frame_idx) # Process frame immediately This minimizes latency and memory usage. 4. Batch Processing with Parallelism ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since computation releases the GIL, process multiple files in parallel: .. code-block:: python from concurrent.futures import ThreadPoolExecutor planner = sg.SpectrogramPlanner() plan = planner.mel_db_plan(params, mel_params, db_params) def process_file(signal): return plan.compute(signal) with ThreadPoolExecutor(max_workers=4) as executor: results = list(executor.map(process_file, signals)) 5. Choose the Right Backend ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The library supports two FFT backends: - **RealFFT** (default): Pure Rust, no dependencies, good performance - **FFTW**: Requires system library, may be faster for specific sizes If performance is critical, benchmark both backends for your specific use case. Backend Comparison ~~~~~~~~~~~~~~~~~~ Performance depends on FFT size, system architecture, and available SIMD instructions. General guidelines: .. list-table:: :header-rows: 1 :widths: 30 35 35 * - FFT Size - RealFFT - FFTW * - Small (≤ 512) - Excellent - Excellent * - Medium (1024-2048) - Excellent - Excellent (slightly faster) * - Large (≥ 4096) - Good - Better Both backends provide substantial speedups over NumPy/SciPy. Measuring Your Performance --------------------------- Use the included benchmark notebook to measure performance on your system: .. code-block:: bash # Install development dependencies pip install jupyter matplotlib seaborn # Run benchmark notebook jupyter lab python/examples/notebook.ipynb This provides detailed timings for your specific hardware and configurations. See Also -------- - `PYTHON_BENCHMARK.md `_ - Full benchmark results - :doc:`planner_guide` - Efficient batch processing - ``python/examples/fft_performance_analysis.py`` - Performance analysis example