I need a library (with source) that can handle a stream sound and modify it (change a female to male voice, for example :grin: )/ Are there any info at WWW for this topic, so on?
Posted on 2002-05-18 05:36:15 by Maestro
Micro$oft has some SDK that u can download (i think) 200Mb... or more.. i don't remember now, that is only about Speech and stuff like that... so u can put ur programs recognysing words from the microphone or reading text... search on M$ site for speech .. i don't know much about this theme, but i know that u need some engime. Do stuff like that is not easy. Anyway hoppe it helps ;)
Posted on 2002-05-18 07:45:49 by coder
Its called M$ Agent,

I spent some time supporting it for ASM last year. I have OOP downloads for it on my site, but i dont think its your answer.

Its only a Text to Voice engine (and recognition).

You want to get into DSP (digital signal processing) and modify the frequency components of a signal. Your right in by MODULATING the signal, by convolution with a well planned sinusoidal signal and a low pass filter you can remodulate the voice data. Since the 'steam' is sampled voice information, and hence DISCRETIZED, you will need to learn and understand the Z-Transforms to do this on a CPU....

(( All good google 'tips' here ))

I dont have any libs to give you on this tho, i just know the 'comunicaitons theory' behind such signals.

However, if you search the HEAP under MAVERICK's name, you will find a thread with DSP in it. He mentioned a while ago he wrote a DSP tutorial... you might want to start here.

Good Luck.
Posted on 2002-05-18 14:02:53 by NaN
Hmmm maybe this will be my excuse to jump into this game again ;)

I enjoy this stuff, and you get rusty at it if you dont practice it :grin:

No promises when i will finish, but i think i will see if i can produce something in ASM.... ;)

(( Now where is my discrete book again ))

Posted on 2002-05-18 14:11:36 by NaN

However, if you search the HEAP under MAVERICK's name, you will find a thread with DSP in it. He mentioned a while ago he wrote a DSP tutorial... you might want to start here.

Didn't have time to update it on pitch scaling algorithms (time domain based or frequency domain (FT) based).. but it's pretty simple stuff anyway. No time to write a tutorial on it right now, but we can go in detail here.

Posted on 2002-05-18 16:01:27 by Maverick
Maverick, i just fixed up your link (searched the heap), but it appears the page you linked has gone missing?

Posted on 2002-05-18 22:54:53 by NaN
Ok cause im a moron, i linked the to your first reply... (which wasnt the tut :rolleyes: )

So here is Maverick's tut, as he first intended:


I like your source for the IIR building blocks. They are written well! Im going to convert them into an ASM lib if you dont mind? But i think i will axe the case type structure, and and make them separate 'modules' for linking from the lib, such that you dont *have* to include the src to a notch filter is your dont use it ;)

Hope you dont mind?
Posted on 2002-05-19 02:06:59 by NaN
There is a good dsp book in pdf format here :
It's very clear, with lot of examples.
Posted on 2002-05-19 02:21:00 by Dr. Manhattan
I skimmed thru the higher chapters, and i have to say, its quiet complete!

Im very impressed at it, especially since its free..

Its nicely ripped to disk now ;)

Thanx for the source!
Posted on 2002-05-19 03:46:10 by NaN

I like your source for the IIR building blocks. They are written well! Im going to convert them into an ASM lib if you dont mind? But i think i will axe the case type structure, and and make them separate 'modules' for linking from the lib, such that you dont *have* to include the src to a notch filter is your dont use it ;)

Hope you dont mind?
No and it's good the way you're reorganizing it ;) (one of the things I like of my programming language is that only the really used code finishes in the final "EXE" :) ). Doing that in C is a pain in the arse. :)
Posted on 2002-05-19 04:56:37 by Maverick
That is the best Message Board I have ever seen! I couldn't manage to get a clear answer at others, but here I got!
Thanks to Maverick, NaN, Dr. Manhattan and others.
What I wanted is to make a simple program that can digitally process a humans voice and to change its vibration from woman to man (for example). That would be useful for phones, voice chats and for children who want to foul their teachers :grin: :grin: :grin: . How do you find this idea?
One more thing: as I got I need to use a streaming audio/ What is it and where to read about this? I am a complete newbie to sounds/voices so, if any, give me an URL for newbies.
Posted on 2002-05-19 12:11:29 by Maestro
I know that some amplitute can make people frighten the music or even have a strong feeling of fear. What I need to do to add this possibility?
Posted on 2002-05-19 13:24:49 by Maestro
Keep in mind that because of the way the vocal tract is made, you will never fool anybody's ears anyway with any of these effects. It will anyway sound artificial.
Posted on 2002-05-19 16:04:14 by Maverick
Well i successfully remodulated the microsoft "ding.wav" which has its tone centered at 790hz.

I modulated it with a 700hz cos signal, which produced a 90hz "remodulated" copy and a 1490hz byproduct signal, I then low pass filtered it with a 100 hz corner to get the re-modulated ring! ;)

Now all i have to do is code it up in MASM ;)

I did it with MATLAB and some hand made M-Files. Its like writing pseudo code that actually does work ;)

If there are Matlab users here, i can post the m-files for you...

Posted on 2002-05-22 01:24:57 by NaN
There is a real problem to doing the 'voice modulation' that your looking for.

Its called "Nyquist Frequency" (da da da dummmmm) ;)

Problem is, when you remodulate information, it first moves the frequency of the 'source frequency area' , and makes a copy at another frequency area. This is due to those trig identities no one ever remembered ;)

cos(a)*cos(b) = 1/2[ cos(a+b) + cos(a-b)]

If cos(a) == remodulating carrier, and VoiceData=cos(b)

Then you get two voice Data products centered at a+b, and a-b, in the frequency spectrum.

Then you need to filter out one to leave the other as the desired, remodulated vioce information. In this case a lowpass filter will filter out the A+B Voice copy, leaving only the A-B voice data.

This is ok in analog systems, like the "hello sydney" voice box used in Scream. Because we remodulate up the frequency spectrum and cut out only the lower end, and were finished!

The problem with Nyquist Frequency:
Nyquist frequency is defined as SampleFrequency / 2.

For the wave file im playing with, it was sampled at 22050 times a second, or Fs = 22050hz. Thus Nyquist Frequency is only 11025 hz.

What does this mean? Any frequencies beyond this is aliased as lower frequencies. (your ear actually hears a lower frequency). This is a 'reality" cap to the highest achievable frequency, due to the sampling rate. The same effect is seen with car tires on highways, apparenly moving backwards slowly. Its obviously moving quite fast, but your eyes can only sample so much at once! and anything beyond your eyes Nyquist rate is aliased as lower frequencies.

So back to our modulation experiment. This cap means i can not remodulate and have one of the pairs beyond this point! Which is quite low, and leaves little room to work with. As well, bandwith of the voice data must be at best 1/4*fs wide, since you need "working space" in the distretized frequency spectrum, to mov the sound information around (modulate).

I managed to do all this with out data corruption, or loss, on the Microsoft "ding.wav" because it has a bandwidth of about 100 hz (+/- 50hz from 790hz). So in a spectume "space" of 11025 hz, there is alot of play here!. But if your voice vocal area is say 6000khz wide (which can be), there is no room to move. The only option is corruption of voice data.... (( I've not empirially tested this far yet, to see what it sounds like. But i expect it would be interesting to say the least ;) ))

Anywho, here is the remodulated wave file, i made this one to be 190hz (shifted the ding back 600hz).
Posted on 2002-05-22 02:25:57 by NaN
NaN: what you described is frequency shifting, i.e. one-sided modulation. And it sounds absolutely weird and unnatural on voice. As you know, this effects adds a certain frequency to all those of a signal.
What he asked for was frequency scaling, which multiplies by a certain value all frequency components of a signal.

That sounds much more realistic, but if one wants to be "pro", it still sounds artificial anyway because you scale also the formants, which is not realistic.
Posted on 2002-05-22 03:43:05 by Maverick
Are you sure about this?

What i described *is* modulation. And your right, im using it to shift a frequency block around in the spectrum.

But im haveing a hard time even visualizing what your proposing? You mean like how you scale a matrix? 1*2, 2*2, 3*2, 4*2, etc.

where 1,2,3,4 are frequency components (or bins of a FFT?)

I cant see how this would retain the 'word' quality of what is being said??

Posted on 2002-05-22 03:52:46 by NaN
Yes, what you described is one-sided modulation (which is more complex to perform than the usual double-sided modulation, or "ring modulation".. and involves the use of a precise quadrature oscillator, a complex (sin/cos) modulator, and a hilbert transformer (90 degrees phase shifter).

What the poster asked, though, was to transform female voice to male, etc..

You can properly do it only via frequency scaling, not frequency shifting.. because the latter is inharmonic.

Frequency scaling can be done via FFT (as you wrote), or in the time domain, or using a hybrid, better, method (to preserve transients, which the FFT fails to do well).

The whole point here is that frequency shifting produces totally inharmonic results.

Imagine you have a fundamental of 100 Hz, and the overtones 200 Hz, 300 Hz and 400 Hz.
If you scale * 2, you get 200 Hz, 400 Hz, 600 Hz and 800 Hz. It's all right.. the only (negligible, unless you want "pro" results) problem that remains is that also the formants of the voice got scaled, and that is not really realistic.

But if you shift the frequency, using the one-sided modulator, of say 10 Hz, you'll get 110 Hz, 210 Hz, 310 Hz, 410 Hz which is inharmonic, and sounds robotic, alien like.

Female<->Male is done either via an operation :grin: or via frequency scaling, not shifting.
Posted on 2002-05-22 04:21:16 by Maverick
So, I need your final toughts: if it is possible or not? Is it possible to tranform male voice to female and vice versa?
Posted on 2002-05-22 07:10:05 by Maestro
It has been answered. You should read with attention the posts you get from others to help you.
Posted on 2002-05-22 07:56:27 by Maverick