Why endpointing matters
Endpointing is one of the main knobs that controls the tradeoff between:- Latency (speed): how quickly you get final utterances
- Completeness: whether you avoid cutting someone off mid-thought
- Chunking quality: whether utterances align well with natural turns or sentences
How it works conceptually
During a live session, Gladia continuously analyzes the incoming audio stream and:- Detects speech activity on each channel (voice activity detection)
- Groups speech into an “utterance” while speech is ongoing
- When it observes silence lasting at least endpointing seconds, it considers the utterance finished and closes it (finalizes it).
- The AI model is then used to transcribe the final result of the utterance.
- If speech never pauses long enough, Gladia still has a safety mechanism to close the utterance (maximum_duration_without_endpointing, see next section)
The 2 key parameters
endpointing (seconds)Definition: the duration of silence that closes the current utterance.
- Default: 0.05
- Range: 0.01 to 10
- Smaller value = closes utterances faster, but can split sentences if the speaker hesitates briefly.
- Larger value = waits longer before finalizing, which improves segment completeness but increases latency.
- Default: 5
- Range: 5 to 60