Contributing audio track switching and default audio track selection to Yattee
Table of Contents
Introduction #
Yattee is a polished, privacy-oriented video player for iOS, tvOS, and macOS.
I wanted to solve a practical playback issue: on YouTube adaptive streams, a video can expose multiple audio tracks (different languages, dubbed tracks, original audio), but switching between them in Yattee was not even possible in the player UI.
On top of that, default selection could prefer dubbed audio in some cases, which made foreign-language videos frustrating to watch. This is a common YouTube annoyance too: if you watch, for example, Italian videos while abroad, the audio can often be the wrong language.
The problem #
In the streams I tested, audio metadata was encoded in URL query parameters (xtags), including track content type and language.
For example, the URL could include acont=dubbed-auto:lang=en-US for a dubbed track or acont=original:lang=it for the original Italian audio.
Before this change, that metadata was not used at all.
So the player needed:
- A normalized audio track model
- Track extraction from stream metadata
- A way to switch tracks without losing playback position
- A picker to choose tracks during playback
Parsing and modeling audio tracks #
In InvidiousAPI, I added parsing of xtags key-value pairs:
Yattee receives these stream URLs through Invidious, an open-source alternative front-end for YouTube that exposes video metadata and stream URLs through its own API. So this parsing happens on Invidious-provided metadata, not by calling YouTube directly.
1func extractXTags(from urlString: String) -> [String: String] { 2 guard let urlComponents = URLComponents(string: urlString), 3 let queryItems = urlComponents.queryItems, 4 let xtagsValue = queryItems.first(where: { $0.name == "xtags" })?.value else { 5 return [:] 6 } 7 guard let decoded = xtagsValue.removingPercentEncoding else { return [:] } 8 9 // format: key1=value1:key2=value210 return decoded11 .split(separator: ":")12 .reduce(into: [String: String]()) { result, pair in13 let parts = pair.split(separator: "=", maxSplits: 1).map(String.init)14 guard parts.count == 2 else { return }15 result[parts[0]] = parts[1]16 }17}Then I introduced Stream.AudioTrack with:
urlcontent(for example dubbed/original)language
Plus helper fields:
displayLanguagedescriptionisDubbed
That allowed sorting so original audio is preferred over dubbed tracks by default.
Switching tracks in MPV #
Yattee uses MPV, an open-source media player engine, as one of its playback backends.
The implementation keeps track of the audio options available for the current stream and the currently selected one.
On the UI side, I added an audio track picker in the player controls so the available tracks show up as a normal selection menu during playback.
When the user picks a different track, the backend reloads playback with that track while preserving the current timestamp, so switching language does not restart the video.
I also reset audio-track state when changing video, so stale selection does not leak between different videos.
At the time of this PR, one limitation remained: when Invidious Companion was enabled, different audio tracks could resolve to the same itag URL, so switching tracks did not always change the final language.
Result #



Conclusion #
This was a small feature on paper, but it fixed a real annoyance when watching videos with multiple language tracks.
Since PR #874 (merged on June 17, 2025), this area was improved further.
As of February 25, 2026, the feature has not been reverted and the core logic is still in main, including xtags parsing and dubbed-vs-original prioritization.
- Improve MPV backend audio track handling added safer handling for multiple and single-track streams.
- Fix array index out of bounds crash in audio track handling hardened backend and UI state access.
- Fix audio track label showing “Original” instead of “Unknown” improved track labeling.
- Fix Invidious companion API endpoint path corrected companion URL routing used for audio and video stream URLs.