A web based video player for files with multichannel audio where the user can adjust the volume levels.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Nils 3c15e2f892 remember last volume level when toggling tracks (by ike) 5 months ago
example remember last volume level when toggling tracks (by ike) 5 months ago
wip/wavyjs Add first few files 2 years ago
LICENSE Initial commit 2 years ago
README.md readme 2 years ago
example.ini fix the fix 2 years ago
generator.py generator and example files 2 years ago
template.html Add first few files 2 years ago
videomixer.js remember last volume level when toggling tracks (by ike) 5 months ago



A web based video player for files with multichannel audio where the user can adjust the volume levels of individual instruments and follow along a visual presentation (notation, conductor video etc.).

View the example:


Works in every browser. No obscure technology is involved.

Purpose and Motivation

You may be the songwriter of a band or the conductor of a choir. This player exists to share your own music pre-productions with your fellow musicians so that they can practice.

They should be able to control the volume levels (-> mix) of individual instruments, or tracks. Also they must be able to follow the music visually. Primarily intended usecase is notation, of course in sync to the music. Or you record yourself conducting. It also could be just a few hand written instructions. For more ideas see the next chapter.

The multi-track approach also enables you to provide "optional" tracks to the musicians, such as metronome click or spoken instructions ("Solopart in 4, 3, 2, 1 NOW!") because they can easily be switched on and off.

All this must be possible without any work done by the musicians. Going to a website and clicking play on a video is the most easy and most straightforward solution. No downloads, no extra client software (crossplatform!), no setup time, no special and obscure javascript webframeworks that try to render poorly sounding midis in the browser, combined with bad rendering of notation (and having the audacity to ask for a fee for that "service"!)

A video with multichannel audio offers maximum flexibility and quality. You can produce whatever you want.

The web player itself is based on the standard WebAudio API that is available in any browser since at least 2015 and considered robust and tested technology.

But complexity has to go somewhere: All work is done during the video production process, which is a bit more involved and time consuming than simply pressing the export button in a DAW.

Video ideas

So, what benefit does the video player method offer? Here are a few video ideas. Some of them follow the main idea to offer a way for musicians to practice alongside a "playback"-video, some are eyecandy or extras for an interesting video. Maybe a special video for your most dedicated fans, with "band commentary" etc.

  • Notation, of course.
  • A video of the conductor, including vocal instructions
  • A video of a musician playing the music (think top down view of a piano), e.g. for instrument teachers
  • Splitscreen recordings of all musicians playing
  • Instructions for Stage Performances, such as changing stage position, choreographies etc
  • Slide shows of any kind. Basic written instructions on how to perform the music (if no notation available)
  • Graphics like Frequency Spectrum, Note Visualizers

What you need to do to use this

Shortcut: Use the provided example website, copy and modify.

Each video needs its own sub-website. One .html page per project.

Copy the file videoplayer.js to your own webserver. You only need this once in a known location, no matter how many videos you want to show. Usually javascript is put into /js/videoplayer.js

You need to load the js file in your html. At the bottom(!) of your file ( after </html> ) insert:

<script type="text/javascript" src="js/videomixer.js"></script>

Order is important. Videomixer.js can only be loaded after you setup the following in the html body. Note: This work is done automatically by the supplied generator program. See the chapter below. But here is the manual way, which also explains what is going on:

    var tracknames = [
        "Drums", "Bass", "Chords", "Commentary", "Click",

    <!-- Initial / Default volume. Same order as tracks.-->
    var volumeMap = [
        1.0, 1.0, 1.0, 0.0, 0.0,

    var videoAudioSampleRate = 48000 ;

    <video controls>
      <source id="videofilename" src="withcommentary.mp4" type="video/mp4">

    <div id="mixerstrips" align="center">
        <!-- filled in by javascript -->

Explanation of the code above:

Your video files audio channels are sorted in stereo pairs, left and right channel (by design requirement of this tool.) The array tracknames gives each stereo pair a label. These will be the names of the mixer strips. We call this a "Track". Each mixer strip (up to 9) automatically gets assigned a number keyboard-shortcut to quickly toggle volume between 0 and 1 (off and on). This also works when the video is in fullscreen mode.

The array volumeMap has default volume values for each of the tracks, in the same order as they appear in trackNames. These are the starting values for each mixer strip. 0.0 is off, 1.0 is the volume level of the original audio/video and you can go up to 2.0 for software amplification (may sound bad). This was primarily designed to mute certain tracks, like metronome click or vocal conductor instructions, from the start and therefore make them "optional".

You need to manually give your audios sample rate in videoAudioSampleRate. Common values are 44100 Hz and 48000 Hz. This is not a choice but you simply need to match the value your audio/video actually already has. Since you created the video yourself you most likely already know the value. If not use any tool such as VLC, (S)Mplayer or commandline tools like ffprobe (from ffmpeg) to determine it.

The HTML5 player needs the source id videofilename. The actual filename and media-type need of course to be filled in by you. You can place the video player where it fits your HTML design.

Finally the <div id="mixerstrips"> part can be simply copied. Javascript uses this part as marker where to display the mixerstrips. You can move that around in your HTML to fit your design.

Further functions and commands

As the provided example files show there are numerous buttons and keyboard-shortcuts such as "mute all". You can optionally use them in your HTML. The keyboard shortcuts are hardcoded in videomixer.js and need to be changed there. This way they also work when the video is in fullscreen mode.

Here is a list of all available javascript functions, with keyboard [shortcut]

[S] setAllVolumeToZero() // all tracks off. Silence.
[R] resetAllVolumeToDefault() // reset. all tracks to the values of array "volumeMap"
[H] setAllVolumeToOne() // all tracks on. Hear all.

[Space] playPause() // toggle playing the video, even if the video player is not in input focus.
[All four arrow keys] seek(SECONDS) // seek SECONDS forward or backward (negative number), even if player is not in focus.

[W] normalPlaybackSpeed() //normal playback speed.
[D] fasterPlaybackSpeed() //increase playback speed (may sound bad)
[A] slowerPlaybackSpeed() //decrease playback speed (may sound bad)

If seeking and faster playback does not work you should suspect your webserver. In this case you will be also unable to skip around in the videos timeline with your mouse. Your webserver does not support this kind of video streaming.

Video Production

The hardest part of all this is to actually create the video.

The (so far) best file format is mp4 with opus audio (mp4 has problems with .flac audio, .mkv can do flac but Firefox does not support .mkv, ffmpeg itself has problems with multichannel .aac and .wav).

Each instrument (or track) has two audio channels. The first two channels are track1, the next two track2 and so on. When producing your audio write down your track order because you need to write that manually into your HTML file, or the generator .ini.

Generating a quick test video

To get started let's concentrate on the audio part and produce a test video. Create a multichannel audio opus. For example Ardour can export multichannel .wav, so can jack_capture. Convert that to opus with opusenc export.wav multi.opus.

Write down how long your music is in seconds.

Now generate a video of the same length (or a second longer, it doesn't need to be frame-accurate). The next command generates a video of 1:32 (92s) length with resolution of 1920x1080 and 30 frames per second.

ffmpeg -f lavfi -i testsrc=duration=92:size=1920x1080:rate=30 testsrc.mp4

Now merge your audio into the test video. This simply merges (-copy) the two files, it does not render anything and should be done in a few moments. The overall duration of the video is the shorter of the two files (-shortest)

ffmpeg -i testsrc.mp4 -i multi.opus -map 0 -map 1 -c copy -shortest testcombined.mp4

The video is now ready to be used by the player, as described in this README.

A useful Video

This setup is quite involved. It assumes that you are familiar with the linux command line and are able to analyze the given commands so that you can adjust your file-paths and names yourself.

Nothing prevents you from using a non-linear video editor that can export multichannel audio, and do everything by hand. However, I would like to have a more automated approach. I'll now describe my own process, which is hardly the best one. But it works for now. This process could be made more easier by scripting some steps, but it will still remain a bit of work.

The process is only so complicated and verbose because recording multichannel video via JACK is largely unexplored. Read the roadmap at the end of the file to see how that could be made easier in the future.

However, keep in mind that all this complexity has nothing to do with your musicians watching your video. You do the work so they don't have to.


This is on Linux with "simple screen recorder" and "jack_capture". You also need "sox" for one step and compile an extra program from github (see below). We assume that we want to record music with four tracks (8 channels).

Start programs:

jack_capture --channels 8 --jack-transport --manual-connections

jack_capture --channels 2 --jack-transport --manual-connections

Those two wait until jack transport starts rolling and will record until transport stops.


Setup, through the GUI, that you want to record mp4 video with H.264 Codec. For Audio choose JACK. Don't use the options to "Record System Microphone" or "Record System Speakers". For audio recording codec (yes, we need that as well) choose Vorbis with 192 bitrate. This internal audio recording will later be deleted, but we need it to sync with the multi channel audio.

The rest of the settings is up to you, for example to show the mouse cursor, what portion of the screen to record and in which resolution etc. Configure a shortcut to start/stop the recording. I use Ctrl+Shift+R.

You should also use your desktop environment to setup a global keyboard shortcut to start and stop jack transport. Bind these two commands to keys of your choice:

bash -c 'echo play | jack_transport'

bash -c 'echo stop | jack_transport'

JACK Connections:

  • Connect your individual 8 audio channels to the jack_capture instance with 8 input ports.
  • Create a stereo downmix by connection each 4 left and 4 right ports to the jack_capture instance with 2 input port
  • Connect the very same(!) downmix by also connecting to simple screen records 2 input ports.

The order in which you connect to your input ports will determine the track order in the video player. First two input ports are your first instrument etc. Write down that order as tracks, for example "voice, guitar, bass, drums".

Before we start the actual recording, a short explanation what is going on here.

Both jack_capture instances will record different audio (multichannel vs. stereo mixdown) but they will be perfectly in sync, with frame-accurate mathematical precision, thanks to jack-transport.

SimpleScreenRecorders internal audio record will contain the same audio as the stereo jack_capture, but they will not(!) be in sync because we start video recording manually and audio recording at a different time.

We will later compare both stereo recording (automatically) to figure out the offset between the video and the pure audio recordings and then use that offset to integrate the multichannel recording into the video.

Instead of jack_capture you could use a DAW export, for example Ardour. But then you need to make sure that also the stereo export is done by Ardour, so those two are perfectly synced.


All three recording programs are now waiting to be started. And we can start them all with keyboard shortcuts.

Setup your recording area so it shows what you want to record (e.g. PDF reader or a live view of your webcam for conducting)

Now in this order:

  • Start SSR Recording via shortcut (Ctrl+Shift+R)
  • Start Jack Transport via shortcut
  • Do whatever the video requires. If there is no interaction just wait and let it play.
  • Stop Jack Transport via shortcut
  • Stop SSR Recording via shortcut (Ctrl+Shift+R again)

Go to the SSR GUI and finalize the recording. Save the file. jack_capture will have produced two files. One with .wav extension for the stereo version and one with .wavex extension for the multichannel mix.

Your video file is now longer than the audio recording. It started earlier and stopped later (even if only by half a second)

Extract the video-internal audio:

ffmpeg -i ssr_capture.mp4 -map 0:a -c copy video_audio.ogg

(This assumes you previously chose Vorbis as SSR setting.)

Calculate Offset

Download this program: https://github.com/alopatindev/sync-audio-tracks and make it in place. We only need the script compute-sound-offset.sh.

Run (with your own file names and paths of course):

./compute-sound-offset.sh /home/user/jack_capture_stereo.wav /home/user/video_audio.ogg 0

Argument 0 is a time limit for the tools analyzer, where zero is no limit. I never found any reason to actually set that. You will receive a number now, which is time in seconds. If you used shortcuts to start your recordings this may be under 1 second. In my case it took 0.8s. Copy this number:


We now know how much the video started earlier than the audio recording. We need to either trim the videos beginning or add silence to the multi-channel audio. Video trimming is easier. In the same step we delete the original video-audio from SSR. Insert your offset here:

ffmpeg -ss 0.85212500000000002 -i ssr_capture.mp4 -c:v copy -an video_cut_no_audio.mp4

Optional Step: Use the stereo capture for a sync test

To check if the calculated sync was correct we can merge our jack_capture stereo recording with the video. Then you can watch the video in any standard player (vlc, smplayer) without having to setup them for multichannel playback.

First convert our stereo capture to ogg vorbis (for simplicity), then merge. Then watch the video

sox jack_capture_stereo.wav replacement_stereo_audio.ogg
ffmpeg -i video_cut_no_audio.mp4 -i replacement_stereo_audio.ogg -c copy -map 0:v:0 -map 1:a:0 video_cut_with_test_audio.mp4
smplayer video_cut_with_audio.mp4

Everything should just look fine, in fact it should (to the human eye and ear) be no different than the original SSR screen capture, except the video starting 0.85 seconds earlier.

Merge Multi Track Audio

The final step is a quick one.

Convert the multi channel jack capture to opus

opusenc jack_capture_multi.wavex multi.opus

and merge it with our video file that had its audio removed.

ffmpeg -i video_cut_no_audio.mp4 -i multi.opus -map 0 -map 1 -c copy -shortest /home/user/multichannelwebsite/mysong.mp4

This simply merges (-copy) the two files, it does not render anything and should be done in a few moments. The overall duration of the video is the shorter of the two files (-shortest). Our video was a few moments longer because we stopped its recording last. This will trim the ending to the audio duration.

The file mysong.mp4 is used by the website.

What are all these files? - Generating your own example

In this repository is a python script called generate.py which takes an .ini as input argument and outputs a html website to standard output. You can save the output in any file you like. In the example we use index.html so we can point our test-webserver directly to the directory, but this is just for development reasons.

Full command:

./generate.py example.ini > example/index.html

The file example.ini is self-explanatory and you can copy it to adapt it to your own files.

As written above you really need just the file videomixer.js and certain sections in your HTML file. There is no important code in any .css file.

The files in the wip directory are an unmodified re-distribution of the javascript library wavyjs by Chris Schalick (MIT). This is currently not in use. It is intended to provide .wav generation and download in the future. You can see some commented-out sections of code in template.html and videoplayer.js to download your current mix. However, the WebAudio API side of that is not yet working.

Future Versions - Roadmap

We want the user to be able to download the current volume mix as .wav file so they can practice where they want without the video player website. This is work in progress.

Some small bugs could further simplify the code. For example figuring why the WebAudio API fails to correctly recognize the videos sample rate.

The video production process needs streamlining. My own workflow will forever be based on JACK, but it should be easier to record a video directly, without replacing the audio in post processing. Apparently OpenBroadCast Studio has a (complicated) way to offer multichannel JACK recordings, but even if that works properly, the OBS setup also has some complexity.

simplescreenrecorder, my screen capture tool of choice only supports stereo audio. Making the channel count a user setting would solve everything, but SSR is really tied to stereo audio, so much that is already has comments about this in their code.

Multiple Video Tracks. Record a video of each instrument, or choir-section. The users can switch between multiple "cameras" themselves.

Utilizing the subtitle and image embedding functionality of the mp4 video container format.

Multi video, subtitles and images are completely untested. It could already work since everything is based on a standard HTMl5 video player.