WCAG (Level AA) SC 1.2.4 Captions (Live) (w3.org)
Issue description
WCAG 1.2.4, “Captions (Live)” addresses the accessibility of live audio content, such as webcasts, online meetings, and live streams. It requires that synchronized captions are provided for all live audio content. This is crucial for people who are deaf or hard of hearing, as it allows them to follow along with the spoken information in real-time. Without captions, these users would be excluded from participating in or understanding live events.
While similar to 1.2.2 (Captions for Prerecorded media), the “live” aspect adds a layer of complexity. Providing captions for live content often requires real-time captioning techniques, such as:
- Stenography: A trained stenographer types what they hear, and software translates it into text.
- Automatic Speech Recognition (ASR): AI-powered software that converts spoken words into text.
These methods have varying levels of accuracy and cost. However, the goal is to provide captions that are as close to real-time as possible, with minimal delays, to ensure that people with hearing disabilities can fully engage with live online content.
Related requirements
The following WCAG source criteria are often related to this as well. They can provide additional insights into specific challenges you may be encountering.
- WCAG (Level A) SC 1.2.3 Audio Description or Media Alternative (Prerecorded)
- WCAG (Level AA) SC 1.2.5 Audio Description (Prerecorded)
- WCAG (Level AAA) SC 1.2.6 Sign Language (Prerecorded)
- WCAG (Level AAA) SC 1.2.7 Extended Audio Description (Prerecorded)
- WCAG (Level AAA) SC 1.2.8 Media Alternative (Prerecorded)
- WCAG (Level AAA) SC 1.2.9 Audio-only (Live)
Who this issue impacts
Follow the links for additional information on user impairments:
Suggestions for remediation
To remediating WCAG 1.2.4, “Captions (Live)” and making your live video content accessible requires providing synchronized captions in real-time. Here’s how:
Choose a captioning method
- Human captioner (stenographer): A trained stenographer types what they hear, and specialized software translates it into text displayed on screen. This is generally the most accurate method but can be expensive.
- Automatic Speech Recognition (ASR): AI-powered software that automatically converts speech to text. This is becoming increasingly accurate and affordable, but may require human monitoring and editing for optimal quality.
Implementation options
- On-screen display: Captions are displayed directly within the video player or on a designated area of the screen.
- Separate caption window: Captions can be displayed in a separate window, which can be helpful for users who prefer to customize the size and position of the captions.
Captioning platforms and services
- Streaming platforms: Many live streaming platforms (e.g., YouTube, Zoom, Microsoft Teams) offer built-in or integrated captioning features, including options for manual captioning, ASR, or third-party integrations.
- Captioning service providers: Several companies specialize in providing real-time captioning services, offering human captioners or ASR solutions.
Best practices for live captions
- Accuracy: Strive for high accuracy in the captions, minimizing errors and delays.
- Synchronization: Captions should be synchronized with the audio, appearing on screen as close to real-time as possible.
- Speaker identification: If there are multiple speakers, identify each speaker in the captions.
- Non-speech information: Include relevant non-speech sounds (e.g., laughter, applause, music) to enhance understanding.
- Formatting: Use clear and easy-to-read fonts, colors, and positioning.
Testing
- Monitor quality: If using ASR, monitor the captions in real-time to catch and correct any errors.
Example
In a live webinar, the captions might look like this:
[Speaker 1] Welcome everyone to today’s webinar on web accessibility.
[Audience] (member) Hello!
[Speaker 1] Thank you for joining us on this accessibility adventure!