Linux PulseAudio: How to Stream Audio from any application to Discord
Or more generally, how to stream audio from any app to any other app, in any many-to-many combinations
Every now and then, I like to have an “Anime Watch Party” on Discord. For those not familiar, Discord is a video/voice chat program, and my friends and I will all get on a Discord call together, and I’ll stream some Anime while sharing my screen with my friends, so we all get to watch the same show/movie together.
Discord is great at streaming the visual content of my desktop, but it’s a little finicky for the audio part. On Windows, there are some complications regarding whether you share “one specific application” or share “your whole screen”. On Linux, it seems like streaming audio doesn’t work at all, and only the visual information is sent across.
There’s a solution to this if you’re using PulseAudio as your sound server. To check if you’re using PulseAudio, try running pactl --version
at a terminal, or running pavucontrol
to get a GUI configuration tool. If those commands work, you’re probably using PulseAudio.
PulseAudio is actually a very powerful tool for wiring how audio travels through your various applications, but we’ll start with the relatively simple scenario of streaming audio from a single app (e.g., your anime video player) and a single hardware device (e.g., your microphone) to another single app (e.g., Discord) and another single hardware device (e.g., your computer speakers). Then at the end, we’ll explore some more complex use cases.
Watch Party Solution Overview
Here are the requirements we want to satisfy:
We can hear other peopleon Discord (i.e., Discord’s output goes to our speakers/headphones).
We can hear the audio from our show (i.e., the video player app’s output also goes to our speakers).
The people on the Discord call can hear the show (i.e., the video player app output goes to Discord).
The people on the Discord call can hear us speaking (i.e., our microphone’s output goes to Discord).
And this is the setup we’re going to create in PulseAudio that’ll satisfy those requirements:
For each of the four requirements listed above, there’s a path following a sequence of arrows that fulfills that requirement.
To fully understand what’s going on here, there are a few PulseAudio concepts you’ll need to understand: Sources/Sinks, Monitors, Input/Output Devices, and Loopbacks.
Sources and Sinks are relatively straightforward: Anything that creates a sound signal in the PulseAudio system is a source. Examples include applications that play sound and your microphone. Anything that receives a sound signal is a sink. Examples include applications that record sound and your computer speakers. Your video player typically produces a sound signal but doesn’t record a sound signal, so it would be a source. Discord both creates a sound signal (it creates locally the sound produced by other people in the call with you) and records audio (it consumes the audio you produce on your microphone so that it can send it over the Internet to other people in the call), and so Discord appears both as a source and a sink.
A Monitor is essentially something that turns a sink into a source. Your computer speakers are a sink: you might have many applications producing sound, which all go to your speaker. But let’s say you want to also record all the audio going to your speakers. In that case, you want a source that produces precisely the audio signal your speaker-sink receives. That source that you want is the “Monitor” of your speakers. The monitor is a source where the sound it produces is exactly the same as the sound the corresponding sink receives. In the above diagram, I drew dotted arrows connecting a device with its corresponding monitor.
Input devices and output devices are the things that are not applications, and that produce or consume audio signals. Input devices are sources, and output devices are sinks. For example, your microphone is an input device, but your video player is not an input device, because it’s an application. Likewise, your speakers are an output device, but Discord is not an output device because it’s an application. Note that devices are not always hardware, as it’s possible to create virtual devices, and in fact, the solution we will use for our anime watch party involves creating virtual devices.
Finally, you can think of a loopback as a cable that connects a source to a sink. Every application (i.e., every blue box in our diagram) gets an arrow for free. (Also, monitors are connected to their corresponding devices for free). For each additional arrow we want, we have to create a loopback. In the above diagram, we needed three more arrows, labeled “loop1”, “loop2”, and “loop3”.
In the diagram shown earlier, we’ve drawn all applications in blue, all hardware devices in red, and all virtual devices in orange.
PulseAudio has a limitation such that an application can only be connected to one device at a time. Devices, however, can be connected to multiple things simultaneously (generally via loopbacks). Unfortunately, that means we cannot directly connect our video player to both Discord and our speakers simultaneously as that would involve the video player application being connected to two things simultaneously. In graphical terms, this means that a blue box can only have one arrow connected to it at any one time.
It’s because of that limitation that we’ve created a couple of virtual devices. Specifically, we create two sinks, one called V_AppOutput
and one called V_Mixed
. PulseAudio automatically creates their corresponding monitors, which act as sources.
The purpose of V_AppOutput is to simulate multiple outgoing connections from the video player app, so that the signal can travel to Discord and our speakers simultaneously. Similarly, the purpose of V_Mixed
is to simulate multiple incoming connections to Discord, so that the people on the call can hear both our microphone and the anime’s audio simultaneously.
Watch Party Solution Implementation
To implement this solution, we will use a combination of terminal commands and interacting with a GUI.
The first step is to get the names of the sinks and sources for your devices. To do that, run pactl list sources | grep "Name: "
and pactl list sinks | grep "Name: "
. Here’s the output I get for those commands:
> pactl list sources | grep "Name: "
Name: alsa_input.usb-Jieli_Technology_USB_PHY_2.0-02.mono-fallback
Name: alsa_output.usb-Alpha_Imaging_Tech_HTC_Vive-02.analog-stereo.monitor
Name: alsa_input.usb-Alpha_Imaging_Tech_HTC_Vive-02.mono-fallback
Name: alsa_output.pci-0000_1f_00.1.hdmi-surround71-extra1.monitor
Name: alsa_output.usb-Blue_Microphones_Yeti_Stereo_Microphone_FST_2018_07_20_79628-00.analog-stereo.monitor
Name: alsa_input.usb-Blue_Microphones_Yeti_Stereo_Microphone_FST_2018_07_20_79628-00.analog-stereo
> pactl list sinks | grep "Name: "
Name: alsa_output.usb-Alpha_Imaging_Tech_HTC_Vive-02.analog-stereo
Name: alsa_output.pci-0000_1f_00.1.hdmi-surround71-extra1
Name: alsa_output.usb-Blue_Microphones_Yeti_Stereo_Microphone_FST_2018_07_20_79628-00.analog-stereo
Under sources, the “Jieli Technology” is my webcam, “HTC Vive” is my VR headset, “HDMI Surround71” is my surround sound speakers, and “Yeti Stereo Microphone” is my Blue Yeti microphone. The Blue Yeti is what I want to use to talk to my friends over Discord, so that’s the name I want to write down. Note that there are two entries for the Yeti under sources: “alsa_output.usb-Blue_Microphones_Yeti_Stereo_Microphone_FST_2018_07_20_79628-00.analog-stereo.monitor
” and “alsa_input.usb-Blue_Microphones_Yeti_Stereo_Microphone_FST_2018_07_20_79628-00.analog-stereo
”. The Blue Yeti actually also acts as a sink, and the physical hardware has a headphone jack on it, so that’s what the “.monitor” refers to. That’s not what I want to send to my friends over Discord. I want to send the actual signal of me speaking into the microphone, so I want the name that doesn’t end in “.monitor”.
Under sinks, we again have my VR headset (which has headphones in it), my surround speakers, and the headphones jack in my Blue Yeti. In my case, I actually want to listen to my friends via the headphones jack on my Blue Yeti (I don’t want to listen to them via the 7.1 surround sound system as I don’t want feedback to go from those speakers back into the microphone), so I write down the sink name for the Blue Yeti again.
Now we’re ready to start creating the virtual devices. Note that when you run these commands, PulseAudio will give you a number. You’ll want to write these numbers down, as they’ll be different each time you run the commands, and they’ll probably differ from my numbers. The first pair of commands you’ll want to run are “pactl load-module module-null-sink sink_name=V1 sink_properties=device.description=V1AppOutput
” and “pactl load-module module-null-sink sink_name=V2 sink_properties=device.description=V2Mixed
”. Here’s the output I get when I run those commands:
> pactl load-module module-null-sink sink_name=V1 sink_properties=device.description=V1AppOutput
24
> pactl load-module module-null-sink sink_name=V2 sink_properties=device.description=V2Mixed
25
This creates the two virtual devices, along with their monitors, that we need in our solution. In other words, it creates the four orange boxes in our diagram. PulseAudio also gave me the numbers 24 and 25, which I’ll write down. These are the ID numbers of the modules it just loaded. The sink_name
is what we’re going to use to refer to these devices in future commands (so I like to keep it short), whereas the device.description
is the name that’ll show up in the GUI that we’re going to use later (so I like to make it relatively descriptive).
Next, we need to create the three loopbacks. Here are the commands I used and the output I got:
> pactl load-module module-loopback source=alsa_input.usb-Blue_Microphones_Yeti_Stereo_Microphone_FST_2018_07_20_79628-00.analog-stereo sink=V2
28
> pactl load-module module-loopback source=V1.monitor sink=V2
29
> pactl load-module module-loopback source=V1.monitor sink=alsa_output.usb-Blue_Microphones_Yeti_Stereo_Microphone_FST_2018_07_20_79628-00.analog-stereo
30
You’ll want to change the source from the first command and the sink in the last command to match the names of your source and sinks.
I’m not aware of how to use the command line to control the connections to the applications (i.e., the arrows that connect to the blue box), so for the last step, we’ll use the pavucontrol GUI. Run pavucontrol
, and you should see a UI like this:
You’re currently on the “Playback” tab, as seen in the top left corner. This lists all streams (both applications and loopbacks) acting as sources for PulseAudio. The applications must be running to show up here, so if you haven’t started your video player and Discord yet, do so now to have them appear here. Discord also lazily initiates its audio engine, so you may need to actually join the Discord call to have its entry show up here. Note that for me, Discord shows up as “WEBRTC VoiceEngine”, probably because Discord is built on top of Electron.
In the bottom right corner, you can optionally click on the “Show” dropdown and change it from “All Streams” to “Application Streams”. The goal is to find the application whose audio you want to share over Discord (note that the application has to be running, so if you haven’t started your video player yet, do so now). In my case, the application I want to share is “Moonlight”. So I click on the dropdown next to “Moonlight”, and change it from “Digital Surround 7.1” (my surround sound speakers) to “V1AppOutput” (the virtual device we created earlier).
Next, while still on the Playback tab, find the entry for Discord (which again might show up as “WEBRTC VoiceEngine”) and have it point to the speakers you want to hear your friend’s voices on (for me, I choose the Blue Yeti, since I want to hear them from the headphone jack off of my microphone). It’s possible that the dropdown may already be set to the correct value by default.
Finally, go to the “Recording” tab, and find Discord there again. You’ll want to set this to “Monitor of V2Mixed” to match the diagram we had earlier.
Preventing Discord from Messing with your Audio
There’s one last step before you’re ready to watch anime with your friends. Discord assumes that the primary content you’re streaming is human voices, so it has some noise cancellation algorithms that try to filter out anything that’s not a human voice.
In this case, you want to disable that, because you’re trying to stream the audio from the anime, which will contain non-human-voice content (such as background music, sound effects, and so on). So go into the Discord options and turn off these filters:
Basically, you want Discord to send the audio signal unaltered.
Cleaning Up After The Party
Remember those ID numbers that PulseAudio gave us after each command? Once the watch party’s over, you may want to remove the modules you created to clean up after yourself. This is optional: You could leave modules loaded if you want to save yourself the trouble of recreating these entries again next time (although note that the modules will disappear after a reboot). To unload the modules, use the following commands, passing in the ID numbers that PulseAudio gave you:
> pactl unload-module 24
> pactl unload-module 25
> pactl unload-module 28
> pactl unload-module 29
> pactl unload-module 30
Note that this is using the ID numbers I got. Your ID numbers will likely differ, so you’ll have to update those commands accordingly.
More Advanced Use Cases
Admittedly, this is probably a lot of complexity overhead for something that should simply work out of the box in Discord. However, there are three reasons I think it’s still worth learning this technique:
Audio sharing otherwise doesn’t really work on Linux as of 2023. You’d have to boot into Windows to get a semi-satisfying audio streaming experience on Discord (or convince the Discord devs to add better Linux support; or use a different voice chat program).
This technique generalizes to other applications. I used Discord in this example, but the same technique applies to Zoom calls, Google Talk, or whatever else you use to voice chat with your friends.
This technique generalizes to very complex use cases.
For the last one, I have a relatively advanced use case for streaming on Twitch. Here are some of my requirements:
I want to have independent sources for (1) my mic, (2) Discord audio for when I’m co-streaming with a friend, (3) game audio, (4) my music player playing stream-safe copyright-free music for when the music in the game isn’t satisfactory for whatever reason.
I want to have independent sinks (1) What I hear in my headphones, (2) What my Twitch viewers hear (via what audio gets sent to OBS, my Twitch streaming software), (3) What my co-host hears over Discord.
I want to be able to mute and unmute for all 12 possible combinations of sources and sinks:
Sometimes I want to hear my own voice over my headphones, and sometimes I don’t, so I need to toggle the connection between “My Mic” and “My Headphones”.
My co-streamer basically never wants to hear their own voice sent back to them via Discord, so I need “Discord Output” to “Discord Input” disabled.
Depending on who I’m co-streaming with, they may or may not want to hear the background music, so I need to toggle the connection between “My Music Player” and “Discord Input”.
Sometimes, I need to mute my mic for all sinks (so I want to toggle the connection between “My Mic” and “OBS”, and between “My Mic” and “Discord Input”). This can happen, for example, if I’m about to cough and don’t want anyone to hear it.
I need to sometimes make the conversation between me and my co-host private so that the people on Stream don’t hear it, but my co-host and I can still hear each other. So I want to toggle the connection between “My Mic” and “OBS”, and between “Discord Output” and “OBS”.
This PulseAudio audio technique is nice in that it can scale to arbitrary complexity.