Hi. Audio data is usually smaller after compression->deompression. The loss is unnoticable in most cases. The buffers should have their own notification positions.

1) on playbuffer notify: receive, decompress and refill the buffer (actually some part of it)
2) on capturebuffer notify: compress the buffer (actually some part of it) and send

playing should be stopped when there is no data
capturing should be stopped in case of silence (directsound has a 'microphone settings' dialog for that).
Posted on 2006-05-02 14:13:16 by ti_mo_n
We'll see what happens.. I remain unconvinced of the benefits of setting up playbuffer notifications, especially since there is no correlation between the Time a packet is received and the Time it should play, and particularly so if there is more than one source of data.. think about it, two users begin sending you data, if we blindly assume that packets should slot into the playbuffer in the order received AND at the time received...

If I'm not satisfied with the output  of the decompressor, I have a 'fix' in mind.
Right now I am marking the sent udp packets with their recording offset and their compressed size.. if I also mark them with the uncompressed size, then it becomes possible to 'stretch' the decompressed wave data to its original size by inserting a few values interpolated from the existing values.. since the decompressed data never exceeds a couple of kilobytes, it should be possible to make this correction without causing any further artifacts.. hopefully, like you say, that won't be necessary, we'll see.


Posted on 2006-05-03 02:17:29 by Homer
Use DSBCAPS_GETCURRENTPOSITION2, have your soundbuffers in software, lock them only once ever (and keep the pointer in a global var).

Preferrably, use only GSM compression 160 samples->33 bytes. Every Win32 PC has it by default. Set only the samplerate, for quality/size adjustment.



Maybe you should take a look at my EasySound implementation (though it's for output only, for now)
Posted on 2006-05-03 09:03:52 by Ultrano
Ultrano, I'm just screwing around with stuff I haven't yet had the chance to, and trying to put it to some reasonable purpose, for no good reason except that I can.

Why do you suggest I use GSM over other codecs?
Is there a technical reason for your proposed sample size?

Like I said, I'm just screwing around, I really have no idea what I'm trying to achieve here.
I'm interested right now in mixing multiple incoming streams, which is why I lean towards sending the 'raw recording offset' of each sent chunk of compressed audio.. it lends itself towards software mixing, but I have no idea about hardware mixing atm except that 'I can have as many secondary buffers as I like" - which seems weird.

Posted on 2006-05-03 11:16:15 by Homer
A GSM frame is always 160 samples (320 bytes) in PCM, and 33 bytes compressed. Thus, mixing out-of-order decompressed data is straightforward. The GSM codec doesn't take samplerate in account internally, when compressing and decompressing.


afaik, in the sndcard market, there are only 3 affordable models of audiocards that support at least one DirectSound secondary buffer in hardware (and thus have hardware mixing): Yamaha YMF724 , Creative Live (not sure much) and Creative Audigy1/2. Thus secondary buffers are usually done in software, in the form of:

SecBuffer struct ; (simplified version)
IsPlaying db ?
BytesRemaining db ?
BytesTotal db ?
format WAVEFORMATEX <>

; and the wavedata
WaveData db BytesTotal dup (?)
SecBuffer ends


I've never got into DSound as deep as driver-level, so from my experience and knowledge I can just share my assumptions, for now ^^' :

Let's assume our cheapo card has a 100ms buffer in 44100Hz stereo 16-bit. (17640 bytes). Every 50ms the soundcard will be interrupting the cpu, if it's playing (otherwise they're just idling, and some cards are damn noisy while doing that). So, on this interrupt, DirectSound will render all of its active SecBuffers (pointers to them are placed in a vector-array) . Rendering of a SecBuffer can include decompression, pitching, panning/volume, resampling (to match the primary buffer samplerate), and adding effects (DSound8).
Current PC games don't do any mixing themselves for sfx - they decode+dump their sfx once at level-load into SecBuffer/3D,  and occassionally play some of them (DSound/3D adds them to the vector-array of currently-playing-sounds when the game executes lpDSBsoundYikes->Play() ). During/after rendering, buffers that have finished playing completely, get kicked out of the vector-array of currently-playing-sounds ^^ .
Then, finally, DSound sends this rendered buffer to the soundcard. If it's a bit late, we hear rave-music ^^.


Win2k & XP add an extra 20ms buffer, handled by the WDM drivers, to enable simultaneous work of apps, that use DSound with apps that use WinMM. Win98/ME and previous Win OS could not do that with cheapo cards - there if you had some longer system sound (a'la recycle-bin-emptying), you could not start any DSound app at the same time and vice-versa. But you get worse latency ^^'


Sound capture, on the other hand, is always with exclusive access, iirc.


The 'raw recording offset' will always have to be adjusted (add eax,inputadjust) when received, and then clipped (and eax,OutputBufSize-1). Then mix + clip (by sample volume) on the output buffer, then "inc eax", "and eax,OutputBufSize-1", <- repeated for the number of PCM samples received (and decoded).
This way, you can mix-in as many sources you want (each of them having their own "inputadjust dd ?" ). As long as just after sending some samples to the soundcard, you clear their values ("invoke memcpy,pSndcardBuf,myMixBuf,numBytes" + "invoke memset,myMixBuf,0,numBytes")

Posted on 2006-05-03 12:45:19 by Ultrano
Attached is a clean example of how to capture sound with no da*n notifications and obfuscations. Though, still in cpp. Will convert it later to pure asm, with the audio polling moved to another thread in the process (as it should be).

Those notifications' examples... the MSDN team tend to make horror+gore stories, instead of reusable examples... hmm maybe they're being nice, since the "reusable" word could make some people shudder, thinking about licenses ^^. Oh well, anyway, an EasySoundCapture library is coming up next, usable in cooperation with EasySound(output),  so stay tuned :).
Attachments:
Posted on 2006-05-03 14:16:53 by Ultrano

Yeah, the notifications are only waking a thread that services the capture-send/receive-play code.. your demo version is nice and simple indeed :P
I guess I could drive my code in a loop like that and get rid of the notifications too, but I'd still want to have it running in a separate thread.
I have a thread with a loop that is calling WaitForMultipleObjects on 16 event handles, with the capturebuffer's hardware cursor triggering my event handles.. I know it's overkill.

Well anyway, I've got everything working in loopback mode using TrueSpeech now, but I have one final problem (that I kinda expected and mentioned previously) that I am getting "sandwich packets" and not dealing with them correctly.
I guess if I was using GSM codec this problem would be easier to deal with because the payload size would always be known, but the problem would certainly still exist.
I can "walk" the received subpackets in a "sandwich packet" because I added a 6-byte header containing the payload size (among other things)..

Tomorrow I'll implement code to handle "sandwich packets" and then I'll post the source as it stands, and then I'll add support for GSM as a bonus, so you can choose your own poison :)

Posted on 2006-05-03 15:11:49 by Homer
Here're the EasySound + EasySoundCapture libs + src + examples. Running in separate threads, of course :)

Though, you'll need to add an extra proc in EasySound to write out-of-order data in the output buffer ^^" . Right now I can't remember exactly how to do it ^^' .

The ESCapture needs no modification for your app, I think. I added floating-point support, since you'd best put an audio compressor (emulation of an analog circuit) - to auto-adjust microphone gain. Well, and since I need the audio data in FP for my stuff ^^

I haven't tested ES + ESCap in co-op yet, neither have I fully tested ESCap on its own, sandman's calling ...
Attachments:
Posted on 2006-05-03 17:29:39 by Ultrano
OK, updated the zip one last time.
This version represents what SHOULD be the complete, working 'debug build' of the demo.
I "should" be able to set the host to localhost and hear myself speaking.
However, I just jumped on my other machine that happens to have a microphone headset and tried it, and I hear machinegun junk :(

How it works: sent packets of compressed audio are marked with their payload size (ie the size of the compressed data), the original uncompressed size, and the offset at which they were recorded in the one second capture buffer... received packets are decompressed and written to the playback buffer at the offset they were originally recorded at, so all things being equal, we should hear a small delay in the playback when we perform the loopback test, which is the difference between the current hardware playbuffer cursor position and the offset that the data was recorded at/written to.

I know I don't have to send the original uncompressed size in my packets, I do so only because it is interesting to compare it with the decompressed size, once we've received and decompressed the data..

The capturebuffer is currently not "started and stopped", but the playback buffer is.. when packets start arriving we start playing, and when we run out of data we stop playing.. really we should make sure the harware playbuffer cursor has moved beyond the end of the last data we wrote to the playbuffer before we stop, but anyway, that has nothing to do with the fact that my application sucks.


Posted on 2006-05-04 20:49:00 by Homer
You just forgot to adjust the offset of writing to the output, I think. I'd first try to make it with compression disabled
Posted on 2006-05-04 23:03:09 by Ultrano

Yeah I don't adjust the playback offset.. should I be? I guess I didn't think that hard about what happens on the playback side.
Let's see.. recording begins at some arbitrary offset X .. I don't start playback until packets are received, so I have a chance to set the play cursor, which I don't do... packets received are fed into playbuffer beginning at offset X.... ah !! What if I just set the play cursor to offset X after I have written my first data and before I start playing? Does this have the same effect as adjusting the playback offset to suit the cursor? On the other hand, why should it matter what offset? Like, if I write to the playbuffer at an offset that is "one half of the buffer away from the play cursor at the time of writing", and the playbuffer is one second long, shouldn't it play a half second of junk, then start playing my audio?

Posted on 2006-05-05 06:02:56 by Homer
To determine the InputOffset:

Run our input and output soundbuffers never with pause here.
Let's be having just 1:1 conversation: between John and Sally. We'll only look at Sally sending audio to John.
And let's assume our (John's) output buffer is 262144 samples long (2^18).


We're now looking at John's side only.


We receive the first audio packet from Sally, with the following data:
SallysOffset = 1245255 ; total number of samples captured from Sally's sndcard since the proggie started capturing (never stops capturing)
VoiceSize = 700 ; length of current chunk, in number of samples (in PCM form, decompressed)
VoiceCompressedSize = 90 ; in bytes; doesn't matter to us now.
compressedData[]


we decompress the audio chunk, to get a

pcmDecompressedData[]


So, now, and only now, John's receiving proggie computes a variable, called:
ReceivedAudioOffsetAdjuster dd 0 ;

We get John's the output buffer's position. Let's say it's 55542. This is the position, up to which we can safely put some audio data (in the output buffer).

;-------[ this is computed only once, on receiving the first packet ]-- -------[
mov eax,OutBufPosition ; this has the 55542 value
add eax,22050 ; Let's set 0.5 second of latency (if we're @ 44100Hz samplerate)
sub eax,SallysOffset
mov ReceivedAudioOffsetAdjuster,eax
;------------------------------------------------------------------------------------------/



Right after that, and on each new received audio packet, we do this:

mov eax,SallysOffset
add eax,ReceivedAudioOffsetAdjuster
and eax,(1 SHL 18) - 1
; now EAX has the index of the SndOutBuffer, from which we must start
; writing samples to the soundcard.
; Don't forget to wrap-around on each new written sample

;-------[ write to output buffer ]--------[
mov esi,pcmDecompressedData
mov edi,SndOutBuffer
xor ecx,ecx
.while ecx<VoiceSize
    mov dx,
    mov ,dx

    and eax,(1 SHL 18) - 1 ; wrap-around
    inc ecx
.endw

; all done !
;----------------------------------------------/

Posted on 2006-05-05 07:10:34 by Ultrano

On the other hand, why should it matter what offset? Like, if I write to the playbuffer at an offset that is "one half of the buffer away from the play cursor at the time of writing", and the playbuffer is one second long, shouldn't it play a half second of junk, then start playing my audio?

If you remove the "ReceivedAudioOffsetAdjuster" and the packet's "SallysOffset", and always write half a second forward, things will work right only if you receive (+ process) packets absolutely at the perfect time (+/-10microsecs), and always in perfect order with UDP. Which is impossible outside of lab conditions.
Posted on 2006-05-05 07:35:37 by Ultrano