Creating A Wave Audio Player Using WinMM The API

Mickey Marshall
7y
21.1k
0
3

Article

This is the part two of my article trilogy concerning the use of Windows Mixer controls, playing an existing wave file through some available speaker, or recording and saving a wave file from an available audio input device such as a microphone. You might be wondering about the order I am presenting these tutorials. Simple! When my father taught me to play pool, we would often come up with a scenario where there are multiple shots available. “Take the easy ones first.”, he would always say. (Not quite “Use the force!”, but it worked.). Part one (using Windows Mixers) was the easiest to understand. I admit that I did have a struggle implementing the concept. I actually had to write an MFC cpp dll and though, that worked well, I was not satisfied. I wanted to implement it in C#. After getting the code written, it was horribly unstable. I knew what was wrong; there was a critical section in the Player thread that needed synchronizing with feedback routines in the UI Thread. Finally, I stumbled upon the answer and “Voila!”. (Boy, are we in trouble.)

Understanding And Controlling The Windows Mixer API

So, before I begin, I feel obliged to inform you that I picked up a lot of dirty words in this project. So, let me get them out now … DELEGATES, EVENTS, THREAD SAFETY, CALLBACKS, POINTERS. I'm surprised that my mom isn't trying to wash my mouth out with soap.

This app will have two threads. There is the main UI thread which all apps have, and the Player Thread which only plays the audio. Each thread will need to be aware of and safely pass information to the other thread. We will discuss this more as we proceed.

Short Answer

It works like this.

Open the wave file, read the header, and fill in a format structure.
Open the output device.
Define and fill a number of WAVHDR structures. Each structure must be able to play long enough to fill in another.
As each WAVHDR is filled, send it to the Windows Player DLL.
When a WAVHDR is finished playing the DLL will try to inform you by calling a callback function. That function has the duty to fill in that WAVHDR and pass it back to the DLL.
When you have finished playing, close the device and you are done.

Long Answer

Open the audio (.wav) file and read the header. You must pull the format section from the header. (Warning: For this example, you must use only Stereo, CD quality audio .wav files only 44100 samples per second, 16 bit wide samples and 2 channels. If you want to play other formats you must carefully modify the code to identify and handle the desired format.) Now the binary reader is positioned at the beginning of the audio data and we have the format structure filled in and ready to use.
Call waveOutOpen, give it the index of the device to be opened (The index that it was found in when we called waveOutGetDevCaps. Please refer to my previous article entitled Understanding and Controlling the Windows Mixer API). We also have the format structure to pass to it. Most important is the pointer to the WaveDelegate callback function. Here we use HandleWaveOut. This is our c# function that will fill in the freshly emptied buffer and send it back to the DLL. If you only have a small bit of audio you may use a single WAVHDR structure and you will not need this pointer.
The WAVHDR structure contains a pointer (.lpthat you must point to a byte array that contains the audio you want to play. Here I have created four header structures and each one pointing to a byte array of audio data. Each buffer is 44100 bytes long which is enough to play for ¼ of a second of Stereo, CD quality audio. When you have filled in the buffer, call waveOutPrepareHeader to inform the DLL that a header is coming soon.
As the header is filled in call waveOutWrite and the player will immediately start playing.
When the DLL is finished playing a buffer, it will try to call the HandleWaveOut callback function we defined in step 2. It reads up to a predefined number of bytes from the audio file, calls waveOutPrepareHeader and waveOutWrite until we reach the end of the file or you want to stop playing. At that time, you must call waveOutUnPrepareHeader for each header that has been passed to the DLL as callback function informs you that the buffer has been played. Important: the callback section is the critical section. No other routine is allowed to even read any variable that is used here. If so the whole thing will crash. The critical section must be 'locked' with a thread safety lock like “lock” or “mutex”. When all audio data has been read and sent to the DLL, waveOutUnprepareHeader must be called for each and every WAVHDR structure.
When all the audio data has been sent to the DLL, you must inform the UI thread that the player is done. When the UI receives a message that the player is done, the UI must call waveOutClose.

That's all folks!

Adendum

Figure 1 - General Flow.

The .wav audio file header is a Riff header which must contain,

RIFF (four bytes that represent the ASCII value of the four characters “R”,”I”,”F”,”F”)
File Size (four bytes that represent the file size – 8 )
WAVE (four bytes that represent the ASCII value of the four characters “W”,”A”,”V”,”E”)
FMT (four bytes that represent the ASCII value of the four characters “f”,”m”,”t”,” ”)
Format Size (four bytes the represents the Int32 size of the format structure that is to follow.)
Then comes Format Size bytes, that is, the binary representation of the Format Structure that informs us about this audio file.
DATA (four bytes that represent the ASCII value of the four characters “d”,”a”,”t”,”a”)
Data Size (our bytes the represents the Int32 size of the audio data that is to follow.)
AUDIO Data (Data Size bytes of audio data)

Figure 2: RIFF Structure

public struct WAVEFORMATEX
{
public ushort wFormatTag;
public ushort nChannels;
public uint nSamplesPerSec;
public uint nAvgBytesPerSec;
public ushort nBlockAlign;
public ushort wBitsPerSample;
public ushort cbSize;
}

Field Name	Bytesize	Meaning
wFormatTag	2	Format code
nChannels	2	Number of interleaved channels
nSamplesPerSec	4	Sampling rate (blocks per second)
nAvgBytesPerSec	4	Data rate
nBlockAlign	2	Data block size (bytes)
wBitsPerSample	2	Bits per sample
cbSize	2	Size of the extension (0 or 22)
wValidBitsPerSample	2	Optional: Number of valid bits
dwChannelMask	4	Optional: Speaker position mask
SubFormat	16	Optional: GUID, including the data format code

The standard format codes for waveform data are given below. The references above give more format codes for compressed data, a good fraction of which are now obsolete.

Format Code	PreProcessor Symbol	Data
0x0001	WAVE_FORMAT_PCM	PCM (CD Quality)
0x0003	WAVE_FORMAT_IEEE_FLOAT	IEEE float
0x0006	WAVE_FORMAT_ALAW	8-bit ITU-T G.711 A-law
0x0007	WAVE_FORMAT_MULAW	8-bit ITU-T G.711 µ-law
0xFFFE	WAVE_FORMAT_EXTENSIBLE	Determined by SubFormat

Examining the example code, you will find that I tested the code by loading something called “C And C Music Factory _ Let's Get Funkee.wav”. (Funny story. It was on a CD that I found in the parking lot of my local grocery store. My special skill is finding and collecting strange music. I once met a hiker in the Sierras who could find food in the wilderness. We walked by an empty campsite, never once getting closer than 100 feet. He exclaimed quietly, “There is food down there.” We hiked down and found a half carton of fresh eggs and nearly a full loaf of bread. He was a chef and I was hungry. We feasted very nicely.)

This hex dump of that file's header. You can see some of what I am talking about here.

Hex dump

Figure 3 - The Hex dump of the first 64 bytes of ' Let's Get Funkee.wav

The wave file that was created by 'Ripping' the track, has this format,

wFormatTag 2 – 1 → PCM (CD Quality)
nChannels 2 – 2 → Stereo
nSamplesPerSec 4 – 44100 → # of samples per second
nAvgBytesPerSec 4 – 176400 → 2 bytes per channel * 2 Channels * 44100 samples/second = 176400
nBlockAlign 2 – 4 → 2 bytes per channel * 2 Channels
wBitsPerSample 2 – 16 → 2 bytes per channel
cbSize 2 – 0 → No extra stuff
CD Audio format!

Cross thread control value setting

When you want to set a main UI control value from a thread other than the main thread, do not call control.value = x; bad things happen (or at least MS gets very cross(threaded) with you). Instead, you must check the control needs an InvokeRequired. If so, you must do something like this,

private void SetControlPropertyValue(Control oControl, string propName, object propValue)
{
String tstr = propValue.ToString();
if (oControl.InvokeRequired)
{
SetControlValueCallback d = new SetControlValueCallback(SetControlPropertyValue);
oControl.Invoke(d, new object[] { oControl, propName, propValue });
}
else
{
Type t = oControl.GetType();
PropertyInfo[] props = t.GetProperties();
foreach (PropertyInfo p in props)
{
if (p.Name.ToUpper() == propName.ToUpper())
{
p.SetValue(oControl, propValue, null);
return;
}
}
}
}

When you want to set a control value from another thread, call this function. When you call it, the code first check whether an invoke is required for that control (was it created on another thread). The it does something weird. It sets the same function as a call back function for that control. The same function essentially calls itself while negating the invoke required. This time it goes to the else clause where it cycles through every available property for that control. When it finds that property it uses magic to set that property to the value you wanted and exits. Whereupon the calling thread continues.

You can use something similar to get the property value from another thread.

You must make a declaration for the cross thread control callback before you can use it. We will speak more about delegates later.

delegate void SetControlValueCallback(Control oControl, string propName, object propValue);

Raising and Handling Events

When we are playing the audio, we might want to inform the UI thread. The UI thread may or may not have a method for handling that event. You must check to see if the UI does have an event handler before you try to raise the event.

So, in this example, the player periodically reports its current position and will try to report that to the UI. The UI then has the responsibility to show the user. Winmm raises a position event (which is only visible to the calling thread) so our player thread must inform the UI thread. The UI thread must set a control value. There are two ways to do this:

We could set a variable and have a timer (in the main UI thread) pick it up and then set a control value (not very timely). Or we could set a control value directly from the event handler (timely but a little perilous … we don't want MS to get cross(threaded) with us). This is the proper method.

When winmm raises its event, we must catch and handle it.

Delegates

Delegates in managed code have a black box mystique. I have used plenty of delegates but have never written one. From what can glean, a delegate is like a managed code wrapper for unsafe code especially if that unsafe code is doing real-time operations and handing off part of that functionality to managed code.