Choosing Parameters =================== Selecting the right parameters is crucial for spectrogram quality and performance. STFT Parameters --------------- FFT Size (n_fft) ~~~~~~~~~~~~~~~~ Controls frequency resolution and time resolution trade-off: - **Larger values** (2048, 4096): Better frequency resolution, poorer time resolution - **Smaller values** (512, 256): Better time resolution, poorer frequency resolution **Recommendations:** - Speech: 512 - Music: 2048 - General audio: 1024 Hop Size ~~~~~~~~ Number of samples between successive frames: - **Smaller hop**: Better time resolution, more computation - **Larger hop**: Faster computation, coarser time resolution **Common ratios:** - ``hop_size = n_fft / 4`` (75% overlap) - standard for speech - ``hop_size = n_fft / 2`` (50% overlap) - good balance Window Function ~~~~~~~~~~~~~~~ Affects spectral leakage: - ``"hanning"``: General purpose, good sidelobe suppression - ``"hamming"``: Similar to Hanning, slightly different characteristics - ``"blackman"``: Excellent sidelobe suppression, wider main lobe - ``"kaiser=5.0"``: Adjustable (higher beta = less leakage, wider main lobe) Centering ~~~~~~~~~ When ``centre=True``, frames are centered by padding: - First frame centered at ``t=0`` - Last frame centered at end of signal - Recommended for most applications When ``False``, no padding is applied (useful for streaming). Default Configurations ---------------------- The library provides sensible defaults: Speech Processing ~~~~~~~~~~~~~~~~~ .. code-block:: python import spectrograms as sg params = sg.SpectrogramParams.speech_default(sample_rate=16000) # Uses: n_fft=512, hop_size=160, Hanning window, centre=True Music Processing ~~~~~~~~~~~~~~~~ .. code-block:: python import spectrograms as sg params = sg.SpectrogramParams.music_default(sample_rate=44100) # Uses: n_fft=2048, hop_size=512, Hanning window, centre=True Mel Scale Parameters -------------------- Number of Mel Bands ~~~~~~~~~~~~~~~~~~~ - **Speech recognition**: 40-80 bands - **Music analysis**: 80-128 bands - **General audio**: 64 bands Frequency Range ~~~~~~~~~~~~~~~ Set based on your signal content: .. code-block:: python # Full range (0 Hz to Nyquist) mel_params = sg.MelParams(n_mels=80, f_min=0.0, f_max=sample_rate/2) # Speech range (common human voice frequencies) mel_params = sg.MelParams(n_mels=40, f_min=80.0, f_max=8000.0) # Music range mel_params = sg.MelParams(n_mels=128, f_min=20.0, f_max=20000.0) Decibel Conversion ------------------ The floor parameter clips low values: .. code-block:: python # Standard for visualization db_params = sg.LogParams(floor_db=-80.0) # Higher floor for very quiet signals db_params = sg.LogParams(floor_db=-60.0) ERB Scale --------- ERB (Equivalent Rectangular Bandwidth) models human auditory perception: .. code-block:: python # Good for psychoacoustic applications erb_params = sg.ErbParams( n_filters=32, f_min=50.0, f_max=8000.0 ) Performance Considerations -------------------------- Memory Usage ~~~~~~~~~~~~ Memory scales with: - ``n_fft``: Larger FFT = more memory - Signal length / ``hop_size``: More frames = more memory Computation Time ~~~~~~~~~~~~~~~~ Factors affecting speed: 1. FFT size (larger = slower) 2. Number of frames (signal length / hop size) 3. FFT backend (FFTW is fastest) For batch processing, use the :doc:`planner_guide` to reuse FFT plans.