If you scale * 2, you get 200 Hz, 400 Hz, 600 Hz and 800 Hz. It's all right.. the only (negligible, unless you want "pro" results) problem that remains is that also the formants of the voice got scaled, and that is not really realistic.


You say its "alright", and i assume its in sound quality of the product, ie) being able to understand what is being said in the source wave still.

But I still have a hard time believing this "skewed" version is more precise than having the unaltered Voice pattern, just placed (shifted) at a higher frequency range?
[b]


|
| /||*\
|________________________________________ (f)
0 ^
200hz center (Whazzaaap) - Male
50 hz BW

| ^
| |
|________________________________________ (f)
0
1khz Carrier Freq (modulator)


|
| /*||\ /||*\
|________________________________________________ (f)
0
^ ^
Lower Upper
Side Band Side Band
(800hz Center) (1200hz Center)
50hz BW 50hz BW

| ................................\
| /*||\ \
|___________________________________\________ (f)
0
Low Pass Filter


| ^
| |
|_____________________________________________ (f)
0
1.4 khz Carrier Freq
(demodulator)


|
| /||*\ /*||\
|_____________________________________________ (f)
0
^ ^
Lower Upper
Side Band Side Band
(600hz Center) (2.2khz Center)
50hz BW 50hz BW

| .................\
| /||*\ \
|____________________\____________________ (f)
0
Low Pass Filter

|
| /||*\
|________________________________________ (f)
0 ^
600hz center (Whazzaaap) - Female
50 hz BW[/b]


As im understanding you, your suggesting that skewing (stretching) the vocal frequencies, will make a more believable 'female' voice???
[b] You proposing for me to skew: ( times 3 )

-----------------------------


|
| /||*\
|________________________________________ (f)
0 ^
200hz center (Whazzaaap) - Male
50 hz BW

|
| / | | * \
|________________________________________ (f)
0 ^
600hz center (female ~ esq)
150 hz BW (???????)[/b]


Here the vocal areas are further apart, and not relative anymore, having 3 times the bandwidth. How is it that this will sound better?? Futhermore, how would you even do it with out directly piddling with FFT and IFFT? How would you go back in frequency, ie, compress from 600 to 200 again?

I will admit, im novice in these areas of DSP, my studies are appilied to RF modulation. But Im having a hard time seeing a working solution here..

PS: I realize its not in anyways pratical to have a male voice bandwidth of 50 hz, just for simplicity sake ;)
PPS: Im really enjoying this converstation ;)
:alright:
NaN
Posted on 2002-05-22 13:23:30 by NaN
NaN: trust me, I'm into this stuff very, very deeply. What you described above would work perfectly if we had just one sine wave. In this specific (and semplicistic) case, in fact, frequency shifting (i.e. "one-sided modulation") and frequency scaling ("i.e. pitch shifting") are perfectly equivalent.

But the voice is not made of just one sine-wave, at all. Not only it's made of many harmonics but, worse than all, that's true almost only for vocals. Consonant sounds (e.g. "ssssssssshhhhh") are noisy, even. So we're far, far away from the single sine-wave case.

In this case frequency shifting and frequency scaling will have a dramatically different effect. I produced 3 sounds to describe this better, because as the ancients said, a sound is worth more than 1000 words :grin: ;)

in SampleA.MP3 you can hear a simple sine-wave sweep, and then the original, synthetic male voice.

in SampleB.MP3 I applyed frequency shifting (i.e. one-sided modulation).
For completeness of information, I produced also SampleB2.MP3 which features the simpler ring modulation (i.e. double-sided modulation).

in SampleC.MP3 I applyed frequency scaling (i.e. pitch shifting).
Notice the distinction between frequency and pitch. The former is linear (hence we need to scale it), while the latter is not, and a shift will be equivalent (assuming no psychoacoustics) to a scale in the other.

As you can hear the difference is dramatic.

If we don't pretend "pro" results (such as Government-level work, which I do) the frequency scaling is ok++ for male to female voice and vice-versa (or inbetween :grin: ).

But if we've to do serious work, then neither frequency scaling is sufficient, as I mentioned in my earlier posts.

Why?



Because the human voice can be controlled both by the vocal cords and by the vocal tract shape, which creates a dynamic filter. A truly realistic method should take into account the phisiology of the vocal tract you're trying to change, otherwise you scale also the formants which is not realistic. As I said above, though, one should be concerned about this only for "pro" work.



For normal, amateur stuff, frequency scaling is all that is needed. FFT, time-domain or hybrid.

http://www.infinito.it/utenti/bizzetti/dsprt/faq.html

never released publicly though.. I was pissed by some silly users on the program's mailinglist who wanted the program to look like Barbie, at the expense of performance and functionality. Thanks God I can make a living even not making such silly comprimizes.. so for now I kept the program for myself and few closest friends.
Posted on 2002-05-22 17:04:02 by Maverick
*LoL*

Ok, ok, you've prooven you point ;) I like the sample itself.. pretty funny. As well, the SampleB your recommending me against, sounds alot like the Empirial Drones communicating, from the Empire Strikes Back :grin:

So, how do you propose I "frequency shift" then? I really have no clue, unless its doing an FFT, manually shifting the output bins, and then performing an IFFT.

An in this case im back to square 1, looking for info on the net.... :rolleyes:

:alright:
NaN
Posted on 2002-05-23 00:19:24 by NaN
Have a quadrature oscillator (i.e. two outputs 90 degrees out of phase, i.e.v2.0 the ole good sin and cos) tuned to a frequency equal to the amount of Hz you want to shift.

Have a 90 degrees phase shifter (e.g. a Hilbert Transformer, usually done with a FIR), and thus phase shift the original signal by 90 degrees. Take, separately, also the original, unaffected signal, delayed of the same identical amount of delay that the previous phase shifter unfortunately creates in that channel (half the FIR size, in samples).

Now you will have your original signal 90 degrees shifted, and delayed of some time depending on the size of your FIR filter. You will have also your original signal delayed of the same amount of time. Modulate the former with the sin part of the oscillator, and the latter with the cos part.

Shake well.

Then, finally, subtract these two signals.

You can add salt and pepper now, to your tastes.

You get your one-side modulated signal now. :)

Serve chilled. ;)
Posted on 2002-05-23 00:40:38 by Maverick
I actually remeber alot of that from TV SSB stuff.

I follow you completely, but i dont see this "stretching", As well, the Quad Gen freq would be the 'calculated' result frequcy to 'get' the desired right? After all you are still modulating with two impulses in the frequency spectrum, right? (just 90deg out of phase with each other).

Let me see I get you right here, without deeper study:

[ cos(f) * delayed real, modulating ] and,
[ sin(f) * 90deg shift, modulating ]

Each modulating kills the amplitude by 1/2.

sine also makes +/-j thu its oscilation (90 deg phase shifted), and the signal is already 90 degress shifted (+j), so you get -1 and +1 thu oscillation of the latter modulating.

When subtracting: 1/2 - (-1)(1/2) == 1 for lower side band, and 1/2 - (+1)(1/2) == 0 for upper side band. (( Wow I actually remember this crap :grin: )) The upper sideband is completely filtered by virtue of phase differences.

Which comes back to me first statment:
the modulating freq of the quad generator (f) must be the center freq of the 'source' bandwidth (fs), plus the desired resulting frequency (fsb), SO: f = fsb + fs Right? Not f = fsb, as your suggesting?? Or am i making an error somewhere (extremely possible ;) )

And this is still modulation, so i would expect the bandwidth to remain from the source signal, and the output SSB signal?

I analysed your mp3's and i *did* see the bandwith get stretched, which convinced me of your arguments.



Thanks for you help so far!
:alright:
NaN
Posted on 2002-05-23 03:18:52 by NaN
Hi again NOTaNUMBER ;)
The PAL/NTSC color carrier stuff uses a quadrature oscillator, true, but does no frequency shift. It modulates with the quadrature oscillator to perform a kind of discrete fourier transform. Very cool stuff considering it's analogue electronics.. it even syncronizes the circuit using a burst signal.

The frequency of the oscillator is equivalent to how much you want to shift the frequency of your signal. For example, if you run the quadrature oscillator at 1000 Hz, you will produce a frequency shift of +1000 Hz. By intuition, if your quadrature oscillator is at 0 Hz, you will get back your original signal (ideally) completely unaffected.

You've to multiply by 2.0 the final signal, btw, I forgot to say that in my last post.

The result is ~perfect.. if you remember to pre-filter the high frequences where they would produce aliasing, once shifted. Most of the times this is not really necessary (e.g. in voice, where very high frequences tend to be absent already).

even tho you choose to make the SampleB that i was defending mono, and not sterio like *your* sampleC (*lol*)
Sorry, that was completely unintentional.. and doesn't affect any result anyway, because what were unintentionally saved as stereo, in reality, are mono (both channels are the same).

So.. ;)
Posted on 2002-05-23 04:49:07 by Maverick
Hi NaN:
Hmm.. argh.. just checked it and the two channels of SampleC aren't identical. But it was unintentional and anyway they sound the same (right channel is filtered).

Please accept my apologies for that.. I was too in a hurry.

But as I wrote the sound quality is unaffected by that error.
You can check/verify it by yourself.
Posted on 2002-05-24 07:45:43 by Maverick
Ya, i noticed this too.. no big deal here...

They are quite simillar in pattern anyways. ;)

I was only teasing a bit with the above wisecrack... Had i thought it mattered, i would have based an argument around it. Im still trying to figure out *why* the quatrature modulation is not like normal modulation, in th sence it ADDS, or shifts the frequency, and somehow, widens the bandwidth. To me this doesnt make sence....

My mind keeps saying it much be the phase changes (all those j's), but rational catches up and says they all get cancled into +1 or -1 anyways! I dont get it in this respect. However, i do believe it *will* move the signal up in the spectrum.

I was also thinking, you cant move 'down' implicitly, but you can move beyond the nyquist, which will in effect move it down as a result of aliasing. Since this method is non-selective, it should clean and effectively act like the 'ROR' asm command, but on the frequency spectrum.

On a side note, i dug up an interesting FFTshifting alogorithm that is a fraction of the work were are talking about. Still looking thu it.. perhaps today i will get it coded up to check out...

:NaN:
Posted on 2002-05-24 07:53:47 by NaN
Woow a great thread.

i am going to go more deep in signal processing after my current exams.

i have a little Matlab knowledge and little signal processing knowledge.

i would be glad NaN if you sent me your m-files. to learn from it.

thanks alot NaN for your offer.
Posted on 2002-05-24 12:51:34 by Sa6ry
Sure, here you go, but they are basic.

As well, there is probably a FFT plotting command, but i grew tired of looking for it, so i made my own.

I use matlab 6, and use the 'wavread', 'wavwrite' and 'wavplay' commands (which are built in) along with the following homemade m-files:

x = makeWave( Freq(hz), SampleRate, # of points)

x = LPFilter( Dataset, SampleRate, CornerFreq(hz) )

x = HPFilter( Dataset, SampleRate, CornerFreq(hz) )

PlotFFT( Dataset, SampleRate, LowFreqLimit, HighFreqLimit)

The last one, i built in limits to start looking at a bandwidth in the FFT (since alot of the range is typicaly null ). As well, i will display stereo signals if passed as a n*2 matrix. If not, n*1 will display only one FFT.

ie) Plot left channel

PlotFFT( Data(:,1), 22050, 0, 0 ) <<-- no limits defaults to full spectrum

ie) Plot right channel

PlotFFT( Data(:,2), 22050, 0, 0)

ie) Plot Both

PlotFFT( Data, 22050, 0,0)

etc.

The rest i did by hand with out m-files. Hope you can make use of em... ;)

:NaN:
Posted on 2002-05-24 15:40:46 by NaN
Thanks a loooot NaN .

this is very kind from you .

i will check them after the exams.


Thanks again NaN and Maverick for your value informations
Posted on 2002-05-24 16:48:56 by Sa6ry
Maverick,
I've just read your sfftm.html, and it was quite interesting, but I found one section that is incorrect. It is the one titled:

"How to get linear phase from any IIR filter (only if working on buffers though, not sample-by-sample):"

There you describe applying the same filter to two copies of the same waveform, but with reversed timeline for one of them, and then adding the results together, to achieve phase linear filtering with the frequency response of the original filter.

That is incorrect, though the phase will indeed be linear, but the gain will not match that of the intended filter, as the addition will have different effects depending on the amount of phase shift.

If we assume a 2nd order filter then the phase shift of each copy will be 90 degrees at the center frequency, and the relative shift between the two copies will then be 180 degrees. So the addition of the two signals will give a zero result for that frequency, and you will have a notch filter, regardless of what filtering you wanted...

IMO the correct method is the one you hinted at earlier in the text, using 'allpass' constant gain filters whose phase shifts cancel out those caused by the gain-modifying filters. That is the only method I know of that can achieve phase linear IIR filters.
Posted on 2002-05-24 20:34:13 by RAdlanor
Nan and Maverick,
sorry to butt in, but I think you've been talking 'past' each other for a couple of messages now.

If I'm right then:

Nan has really been asking for a method to do a 'stretch', as that is what would give understandable speech. (Like sampleC) And I think he also wants to discuss the more complex methods needed to achieve a naturalistic voice with raised pitch.

But Maverick has been replying with how to do a onesided shift,
which would produce lousy speech. (Like sampleB)

Whatever, such mistakes can sometimes be a good thing, as it leads the discussion to subtopics that might otherwise be forgotten.
Posted on 2002-05-24 21:27:07 by RAdlanor
Its nice to draw a crowd ;)

Your correct, at least with my assumptions... Im very currious about this 'stretch' method Maverick suggests that would be better. (and hopefully simpler).

I myself have a good amount of Theory thanx to university, but also thanx to university, i have little practical experience applying it (ohh the paradox of students life ;) )

So the reason im so earger now is cause i can apply this stuff to things that *interest* me, like DSP'n wave files.

So, RAdlanor, Sa6ry, and anyone else, please feel free to "butt in" if you have something to say ;)

:alright:
NaN
Posted on 2002-05-24 22:32:27 by NaN
It's a very interesting subject. I have a basic question : in a wav file, what is the unit of a data sample ?
Posted on 2002-05-25 00:49:28 by Dr. Manhattan
It depends on the number of bits used to represent the "sound wave", as well as if your in stereo, or mono.

1 bit would be a squarewave, either the speaker is fully out, or fully in. The end.

But 8 bits (is funny ~ cause standards change after 8 bits), 8 bits or less is considered UNSIGNED. This means 0 is fully in, and 255 is fully out. Zero or neutral speaker pos is then 128.

Once you hit 9-> bits, the MSBit is the sign bit, and everything turns into 2's complement.

So i hope you see, the unit of sampled data depends on how "clearly" you want to represent a sound. A sound itself is analog, and every value has Infinite precision. But in the land of the computer, everything has FINITE limits, so a 1 bit sine wave would sound pretty bad, but 16bit would sound pretty decient, providing 32K unit steps in each direction, to control a speaker, and represent a soundwave better.

So 16bits can make you sound pretty smart, but if you dont take notes in class, you wont know what your talking about anyways ;)

What im getting at here with my attempt at humor, is that resolution is good, but its not the defining factor. There is a ying-yang relationship with Bits per Sample, and the Sample Rate.

If you have to abilitie to record and represent sound information with a high degree of accuracy, but only take 4 samples a second, then you can only *at best* justifiably record and recreate a 2hz sound signal. (and it woundn't sound all that great anyways)!.

Analog signals (sound waves) have infinite precision, and there is also an infinite amount values to record every second. A computer again is limitied to reality of its finite limits.

So the sample rate is howmany equal distant "samples" you choose to jot-down every second, from the analog sound. If you could write an infinite amount, with infinite bits, you would have *PREFECTLY* stored a sound. But memory is costly, so we try to keep the sample rate justifiably low.

CD quality is said to be 44100 hz (IIRC). This means there is 44 thousand 'measurments' of the sound information every second, thats is saved to file. The catch all is, its not just this, if i record 44 thousand, 1bit entries (for 1 seconds worth), it will sound like crap! There is not resoultion between each 'measurment'. Sure the file size will be about 5Kb to store, but your not saving anything worth keeping!

If you up the bits/sample to 8bit (1byte). You get a decient reproduction of the sound! But you also have 44Kb of data for the same 1 second, and the same number of samples!.

So why not just take less samples with more bits/sample?

The reason 44khz is said to be "CD QUALITY" is because at this sample rate you justifiable reproduce ALL FREQUENCIES OF SOUND UP TO 22Khz. This is coined the "NYQUIST FREQUENCY". The frequency where any higher, will begin to sound lower again. (the car tire effect i talked about earlier).

The reason for this is simple to understand. Say i have a sine wave, one cycle, of arbitrary amplitude. For now we dont care about the 'frequency', just that you get one drawn up/down cycle.

Then you say, i can at best sample no more than 4 times in this cycle. Youd end up collecting data about this sine wave at 90', 180', 270', and 360'. The data, if plotted would look, Kinda like a sin wave, showing the pos, null, neg, null points. You'd have to use your imagination to 'fill in the gaps' but you'd be able to see it. Computers have no imagination, so it would look like sharp steps, and quite blocky, but still, kinda like a sine wave, at least its got up/donwn, with a null point.

No say you could only go two samples in one period. Then you get less data. You'd end up with 180', and 360'. Both NULL's. This is silence. But if your slow, and started late, you'd get say 190' and 370'. Your beyond one cycle, so this can also be seen as, 190' and 10'. This is the NYQUIST POINT. With a minimum of two samples, you can always recreate a frequency. Here you have data points alternating between 10' and 190' at the frequency of the souce sine wave. So its frequency is recreated. But its will look nothing like a sine wave, an more like a 1bit sine wave ~ eventho we were only playing with the Sample Rate to begin with!. Thus nyquist is defined to be at HALF the sample rate (so you can get at least two samples in one cycle!).

Lower frequencyis of sound will then have longer period (cycle times). Thus you can get MORE samples from each cycle, and recreate it better. As you sample lower frequency sounds at 44Khz, you get better quality reproduction, because there is less "use your imagination" areas (for the computer to ignore ;) ), and more specific amplitudes at specific points in time.

The ying-yang comes together now, with how much precision you want for each specific amplitude, and how high (or how far up the freq spectrum) can you save and reproduce the sound.

Higher Frequencies needed will cost more Samples/sec. As well, the better the quality will require more bits/sample. Finding a happy "mid-point" is sometimes hard to justify between quality and disk space!

This goes one step further, with STEREO. Everything doubles! you sample both left and right, at a sample rate, with a said number of bits! And thus the disk space also doubles!

You asked for a unit of data sample. In stereo a "unit" is coined a frame and is ment to encompass both the left and right sample data. This because they are interleaved when written to disk, writing FRAMES at a time. (L R L R L R .... etc).

I will leave it to you to think about ALIASING (trying to sample a frequency higher than the nyqust). My above sine wave thought experiment will yield the answers. Just remember to draw more than 1 period of source data (say 4 or so), and that you THEN only sample once per period (x # degrees from the last sample, as defined by the src freq, and your sample rate ;) )

I guess your simple question really opened up a can of worms here... hope it was informative...
:alright:
NaN
Posted on 2002-05-25 04:12:45 by NaN
RAdlanor: Yes, I fully agree with what you write. Actually I never tested that hint, because I always work, being it realtime or not, on a sample by sample basis anyway.. but the hint came from a very experienced (or so it was considered in comp.dsp, or he self-considered himself) engineer. At the time I didn't verify it, but I assumed it right, because of this. That was many years ago and I had more faith in others than in myself in this field, I was a beginner.

Coming to think about it now, you're perfectly right. In fact an impulse will be the same once you reversed it. Quite basilary. ;)
Maybe he was wrong, or maybe I didn't get how he meant to do it (I was quite a beginner when I collected that hint, more than 3 years ago, and since it worked on whole buffers I then ignored it completely, but just kept because it could have been useful for others.. and I preferred to write SFFTM *while* learning myself, to enter into the psychology of the learner perfectly). When I have some time I should check DejaNews, to see if I can find the original post again. In fact, thinking today, the method would work if you reverse the results of the 2nd filtering operation.. but only if you want just to build a impulse response (e.g. to obtain linear phase FIR coefficients, given a set of IIR filters).

Thanks again for the report.

PS: in reply to your other post, I've been concentrating on the "no-no" of frequency shifting (given this application) just because I know NaN knows how to do (FFT at least) frequency scaling, and he was convinced that the right way would be frequency shifting (which a FFT can do too, but the best way is the one-sided modulation anyway, so I talked about this. For frequency scaling there are too both FFT based and time domain based methods, but if one has to learn only one, then the FFT one IMHO is better, at least as quality of results.. that is what the post implyed).

Anyway, I have no time to get deep into this now, but you can take a look at this page, for example: http://www.dspdimension.com/
Posted on 2002-05-25 04:29:30 by Maverick
*cough* *cough*
inverse, time domain Fourier transform
( FFT is just an application)

Please ignore the crazy man in the
corner, he means well.
Posted on 2002-05-25 05:00:50 by bdjames
Thanks a lot NaN, it was very informative indeed. But by unit I meant a physical unit, like decibel or volt for example. For example what is the meaning of a sample with a value of 35 ? Is it 35 dB ?
Posted on 2002-05-25 06:31:25 by Dr. Manhattan
They are 'speaker position' units. They dont really have a unit value, only that they can be ratioed to the hightest possible unit.
(assuming 16 bit sample data)

SampleData:MaxSampleData --> 34: (2^15 -1)

Things like db's come in when you piddle with the volume API's, or your ANALOG volume control on your speakers.

Your can get a volume increase, by ADDING/SUBTRACTING xx db's and modifying the entire data set, however.

+5 db == 10^(5/20) == 1.778

So multiply every data value by 1.778 and round to the nearest integer, and you've increased the volume by 5 db. Keep in mind tho, you may saturate at +/- 2^15-1, in which case you should not allow the value to 'roll' over from the +5dB gain.

:NaN:
Posted on 2002-05-26 10:15:58 by NaN