hi all,

I've written a random generator function which uses 4 seeds and is partially based on the idea of Park Miller.

I would like to have your experts opinions / suggestions on it for making it even more random.

Speed isn't really an issue, randomness is.

Here's the code

Thanks in advance!

Mark Vogels

I've written a random generator function which uses 4 seeds and is partially based on the idea of Park Miller.

I would like to have your experts opinions / suggestions on it for making it even more random.

Speed isn't really an issue, randomness is.

Here's the code

pseed PROC s1:DWORD,s2:DWORD,s3:DWORD,s4:DWORD

.data

seed1 dd 0AAAABBBBh

seed2 dd 0CCCCDDDDh

seed3 dd 0EEEEFFFFh

seed4 dd 11112222h

.code

mov eax,s1 ;if s1 = 0 then use default value

.if eax!=0

mov seed1,eax

.endif

mov eax,s2 ;if s2 = 0 then use default value

.if eax!=0

mov seed2,eax

.endif

mov eax,s3 ;if s3 = 0 then use default value

.if eax!=0

mov seed3,eax

.endif

mov eax,s4 ;if s4 = 0 then use default value

.if eax!=0

mov seed4,eax

.endif

ret

pseed ENDP

prand PROC base:DWORD

;seed1 = AAAABBBB

;seed2 = CCCCDDDD

;seed3 = EEEEFFFF

;seed4 = 11112222

mov eax,seed1 ;AAAABBBB

mov ebx,seed2 ;CCCCDDDD

mov ecx,seed3 ;EEEEFFFF

mov edx,seed4 ;11112222

;start shifting

xchg ax,bx ;eax = AAAADDDD, ebx = CCCCBBBB

xchg cx,dx ;ecx = EEEE2222, edx = 1111FFFF

xchg al,cl ;eax = AAAADD22, ecx = EEEE22DD

xchg bl,dl ;ebx = CCCCBBFF, edx = 1111FFBB

push eax ;AAAADD22

push ecx ;EEEE22DD

shl eax,8 ;AADD2200

shr ecx,24 ;000000EE

add eax,ecx ;AADD22EE

mov seed1,eax ;s1 = AADD22EE

pop ecx ;EEEE22DD

pop eax ;AAAADD22

push ecx ;EEEE22DD

shr eax,24 ;000000AA

push edx ;1111FFBB

shl edx,8 ;11FFBB00

add edx,eax ;11FFBBAA

mov seed2,edx ;s2 = 11FFBBAA

pop edx ;1111FFBB

shr edx,24 ;00000011

push ebx ;CCCCBBFF

shl ebx,8 ;CCBBFF11

mov seed3,ebx ;s3 = CCBBFF11

pop ebx ;CCCCBBFF

shr ebx,24 ;000000CC

pop ecx ;EEEE22DD

shl ecx,8 ;EE22DD00

add ecx,ebx ;EE22DDCC

mov seed4,ecx ;s4 = EE22DDCC

;start calculating

mov eax,seed1

mov ecx,16587

xor edx,edx

div ecx ;AADD22EE / 16587, result in eax, remainder in edx

mov ebx,seed2 ;11FFBBAA

xchg ebx,eax

sub eax,ebx ;11FFBBAA - remainder

mov ecx,edx

xor edx,edx

mul ecx

mov seed2,eax ;seed2 = (s1 / 16587)*(s2 - (s1 % 16587))

mov ecx,29753

xor edx,edx

div ecx ; (s2 / 29753)

mov ebx,seed3 ;CCBBFF11

xchg ebx,eax

sub eax,ebx ;CCBBFF11 - remainder

mov ecx,edx

xor edx,edx

mul ecx

mov seed3,eax ;seed3 = (s2 / 29753)*(s3 - (s2 % 29753))

mov ecx,39744

xor edx,edx

div ecx ; (s3 / 39744)

mov ebx,seed4 ;EE22DDCC

xchg ebx,eax

sub eax,ebx ;EE22DDCC - remainder

mov ecx,edx

xor edx,edx

mul ecx

mov seed4,eax ;seed4 = (s3 / 39744)*(s4 - (s3 % 39744))

mov ecx,59721

xor edx,edx

div ecx ; (s4 / 59721)

mov ebx,seed1 ;AADD22EE

xchg ebx,eax

sub eax,ebx ;AADD22EE - remainder

mov ecx,edx

xor edx,edx

mul ecx

mov seed1,eax ;seed1 = (s4 / 59721)*(s1 - (s4 % 59721))

;use every last byte of each new seed

shl eax,24

mov ebx,seed2

shl ebx,24

shr ebx,8

add eax,ebx

mov ebx,seed3

shl ebx,24

shr ebx,16

add eax,ebx

mov ebx,seed4

add al,bl

mov ebx,seed1

xor eax,ebx

xor edx,edx

div base

mov eax,edx

ret

prand ENDP

Thanks in advance!

Mark Vogels

Test your random generator with ENT

http://www.fourmilab.ch/random/

Make a small program that writes RANDOM dwords (generated by your procedure) to a file. Then use ENT to test the randomness of the file.

At first glance your algorithm looks like it could work, but it's a little long (if speed is one of your concerns) if not then it's fine.

http://www.fourmilab.ch/random/

Make a small program that writes RANDOM dwords (generated by your procedure) to a file. Then use ENT to test the randomness of the file.

At first glance your algorithm looks like it could work, but it's a little long (if speed is one of your concerns) if not then it's fine.

Speed isn't a concern for me at the moment, since it will only be used to generate a 1-4 KB file. this is fast enough on a normal computer.

can you explain to me which values i should preferably get using ent?

Changing the static numbers changes the result a lot, so which one should i use?

here's one result

Entropy = 7.909002 bits per byte.

Optimum compression would reduce the size

of this 4096 byte file by 1 percent.

Chi square distribution for 4096 samples is 514.38, and randomly

would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 127.5481 (127.5 = random).

Monte Carlo value for Pi is 3.061583578 (error 2.55 percent).

Serial correlation coefficient is -0.002773 (totally uncorrelated = 0.0).

here's another one:

Entropy = 7.894121 bits per byte.

Optimum compression would reduce the size

of this 4096 byte file by 1 percent.

Chi square distribution for 4096 samples is 605.00, and randomly

would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 128.1001 (127.5 = random).

Monte Carlo value for Pi is 3.143695015 (error 0.07 percent).

Serial correlation coefficient is -0.018626 (totally uncorrelated = 0.0).

the first one is close to random (127.5481) the second one is close to PI.

which is better and why?

Thanks

can you explain to me which values i should preferably get using ent?

Changing the static numbers changes the result a lot, so which one should i use?

here's one result

Entropy = 7.909002 bits per byte.

Optimum compression would reduce the size

of this 4096 byte file by 1 percent.

Chi square distribution for 4096 samples is 514.38, and randomly

would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 127.5481 (127.5 = random).

Monte Carlo value for Pi is 3.061583578 (error 2.55 percent).

Serial correlation coefficient is -0.002773 (totally uncorrelated = 0.0).

here's another one:

Entropy = 7.894121 bits per byte.

Optimum compression would reduce the size

of this 4096 byte file by 1 percent.

Chi square distribution for 4096 samples is 605.00, and randomly

would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 128.1001 (127.5 = random).

Monte Carlo value for Pi is 3.143695015 (error 0.07 percent).

Serial correlation coefficient is -0.018626 (totally uncorrelated = 0.0).

the first one is close to random (127.5481) the second one is close to PI.

which is better and why?

Thanks

the first one is close to random (127.5481) the second one is close to PI.

which is better and why?

Good question :) - I dunno if the Arithmetic means should preferably be equal to 127.5, or just larger than that value.

From the docs, the more random your data is, the closer the Monte Carlo value is to Pi.

I've read that too. but i also saw this:

This quantity measures the extent to which each byte in the file depends upon the previous byte. For random sequences, this value (which can be positive or negative) will, of course, be close to zero. A non-random byte stream such as a C program will yield a serial correlation coefficient on the order of 0.5. Wildly predictable data such as uncompressed bitmaps will exhibit serial correlation coefficients approaching 1. See for more details.I've changed some values which as result gave here -0.000714 which would mean it is pretty close to random as well.

At the same time, the other things like Pi and the random went further away of the optimal value.

So which one would be better?

This quantity measures the extent to which each byte in the file depends upon the previous byte. For random sequences, this value (which can be positive or negative) will, of course, be close to zero. A non-random byte stream such as a C program will yield a serial correlation coefficient on the order of 0.5. Wildly predictable data such as uncompressed bitmaps will exhibit serial correlation coefficients approaching 1. See for more details.

At the same time, the other things like Pi and the random went further away of the optimal value.

So which one would be better?

You should run it with a larger sample - 4096 isn't really sufficient. A file size on the order of a meg or bigger will give more accuarte results.

Ossa

Ossa

I understand,

But then i still don't know what kind of output is wanted.

Entropy = 7.620056 bits per byte.

Optimum compression would reduce the size

of this 1048576 byte file by 4 percent.

Chi square distribution for 1048576 samples is 727702.68, and randomly

would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 126.2168 (127.5 = random).

Monte Carlo value for Pi is 3.180989002 (error 1.25 percent).

Serial correlation coefficient is -0.000042 (totally uncorrelated = 0.0).

Changed some values. how about this?

The serial correlation coefficient looks nice...

But then i still don't know what kind of output is wanted.

Entropy = 7.620056 bits per byte.

Optimum compression would reduce the size

of this 1048576 byte file by 4 percent.

Chi square distribution for 1048576 samples is 727702.68, and randomly

would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 126.2168 (127.5 = random).

Monte Carlo value for Pi is 3.180989002 (error 1.25 percent).

Serial correlation coefficient is -0.000042 (totally uncorrelated = 0.0).

Changed some values. how about this?

The serial correlation coefficient looks nice...

But then i still don't know what kind of output is wanted.

You mean that you don't know what values from ENT are the ideal values for a perfect RNG?

Well here's what a perfect output would be:

Entropy = 8.000000 bits per character.

Optimum compression would reduce the size

of this xxx character file by 0 percent.

Chi square distribution for xxx samples is yyy.yy, and randomly

would exceed this value zz.z percent of the times. <not so sure about this one>

Arithmetic mean value of data bytes is 127.5 (127.5 = random).

Monte Carlo value for Pi is 3.14159265 (error 0.00 percent).

Serial correlation coefficient is 0.000000 (totally uncorrelated = 0.0).

As a note, the serial correlation coefficient will actually be between (mean - 2*stddev) and (mean + 2*stddev) 95% of the time for excellent samples where:

mean = -1 / (n - 1)

stddev^2 = n^2/(((n-1)^2)*(n-2)))

(from Knuth's Art of Computer Programming: Section 3.3.2 Empirical Tests: Algorithm K - Serial Correlation Coefficient)

But you are never going to get exactly that. I guess you want to know how good is "good enough" and I can't really say without a context... for encryption, study the links I gave over in the other forum about cryptographic random number generators - some of the "minimum acceptable" values for various standards are covered there.

Ossa

I'm going to check on the expected value for the Chi-Squared test, but the value produced by your algorithm is alarming me

For the Chi-Squared test, read the ENT page:

Your value is most certainly less that 1%. From what I read there and from what I'm reading from Knuth's tome, your algorithm fails the test for randomness. In short: for cryptographic use, this RNG is useless.

Ossa

If the percentage is greater than 99% or less than 1%, the sequence is almost certainly not random. If the percentage is between 99% and 95% or between 1% and 5%, the sequence is suspect. Percentages between 90% and 95% and 5% and 10% indicate the sequence is "almost suspect".

Your value is most certainly less that 1%. From what I read there and from what I'm reading from Knuth's tome, your algorithm fails the test for randomness. In short: for cryptographic use, this RNG is useless.

Ossa

Once cause for the bad chi distribution could be how your creating the file.

Make sure you get rid of the last divide or make sure you're pasing 0xFFFFFFFF to your function as the max value parameter.

Another cause for bad chi could be your choice of seed values. Try to random them up a bit (if you are using the default values).

My 2cents, use more ADD and XOR instructions and less MUL and DIV. And maybe replace the SHifts with a ROlls.

Make sure you get rid of the last divide or make sure you're pasing 0xFFFFFFFF to your function as the max value parameter.

Another cause for bad chi could be your choice of seed values. Try to random them up a bit (if you are using the default values).

My 2cents, use more ADD and XOR instructions and less MUL and DIV. And maybe replace the SHifts with a ROlls.

Thanks guys, this is useful info.

I will post some new results later this day.

I will post some new results later this day.

Ok, here it goes:

Entropy = 7.999851 bits per byte.

Optimum compression would reduce the size

of this 1048576 byte file by 0 percent.

Chi square distribution for 1048576 samples is 216.70, and randomly

would exceed this value 95.00 percent of the times.

Arithmetic mean value of data bytes is 127.5783 (127.5 = random).

Monte Carlo value for Pi is 3.143795562 (error 0.07 percent).

Serial correlation coefficient is 0.000029 (totally uncorrelated = 0.0).

Using attached code.

This should be sufficient for encryption, or am i wrong?

this is strange:

the bigger the generated file gets, the better entropy, pi & correlation get:

Entropy = 7.999999 bits per byte.

Optimum compression would reduce the size

of this 167772160 byte file by 0 percent.

Chi square distribution for 167772160 samples is 275.12, and randomly

would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 127.4810 (127.5 = random).

Monte Carlo value for Pi is 3.142472151 (error 0.03 percent).

Serial correlation coefficient is 0.000026 (totally uncorrelated = 0.0).

for a 160meg file...

using the -b option on the 160meg file:

Entropy = 1.000000 bits per bit.

Optimum compression would reduce the size

of this 1342177280 bit file by 0 percent.

Chi square distribution for 1342177280 samples is 11.56, and randomly

would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bits is 0.5000 (0.5 = random).

Monte Carlo value for Pi is 3.142472151 (error 0.03 percent).

Serial correlation coefficient is -0.000021 (totally uncorrelated = 0.0).

Entropy = 7.999851 bits per byte.

Optimum compression would reduce the size

of this 1048576 byte file by 0 percent.

Chi square distribution for 1048576 samples is 216.70, and randomly

would exceed this value 95.00 percent of the times.

Arithmetic mean value of data bytes is 127.5783 (127.5 = random).

Monte Carlo value for Pi is 3.143795562 (error 0.07 percent).

Serial correlation coefficient is 0.000029 (totally uncorrelated = 0.0).

Using attached code.

This should be sufficient for encryption, or am i wrong?

this is strange:

the bigger the generated file gets, the better entropy, pi & correlation get:

Entropy = 7.999999 bits per byte.

Optimum compression would reduce the size

of this 167772160 byte file by 0 percent.

Chi square distribution for 167772160 samples is 275.12, and randomly

would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 127.4810 (127.5 = random).

Monte Carlo value for Pi is 3.142472151 (error 0.03 percent).

Serial correlation coefficient is 0.000026 (totally uncorrelated = 0.0).

for a 160meg file...

using the -b option on the 160meg file:

Entropy = 1.000000 bits per bit.

Optimum compression would reduce the size

of this 1342177280 bit file by 0 percent.

Chi square distribution for 1342177280 samples is 11.56, and randomly

would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bits is 0.5000 (0.5 = random).

Monte Carlo value for Pi is 3.142472151 (error 0.03 percent).

Serial correlation coefficient is -0.000021 (totally uncorrelated = 0.0).

this is strange:

the bigger the generated file gets, the better entropy, pi & correlation get

the bigger the generated file gets, the better entropy, pi & correlation get

That is not strange, it is a documented fact.

Paul

Well, you would say that on a 30 byte file you have lesser chance of finding the same byte strings than on a 300,000 byte file.

That's why i find this strange.

That's why i find this strange.

but the more random the file is, if you're looking for a decent sized bit string, there'll be a lot more unequal strings, than in a smaller file. Basically, even though there Is more of a chance of that perticular sequence being created in the large file, the large area between occourances is enough to indicate more random properties than a small file with only a few of the possible sequences in it. If that small file has 2 occourances of the same sequence in, say 512k, as opposed to a 5MB file with 2 occourances, the randomness of the smaller file is actually less so than the large file.

I'm having trouble properly explaining this, because I've got no formal education (self or otherwise) on this subject, I just understand the principle in use here.

I'm having trouble properly explaining this, because I've got no formal education (self or otherwise) on this subject, I just understand the principle in use here.

I understand what you mean ;)

But the program also checks if it's possible to compress the file.

If you would have more stringsequences which are equal, then it would be easier to compress it.

But i don't think it would really matter in this case.

The biggest problem i have is finding 4 pseudo random seeds to use with the function.

The ones i used now are:

1. GetTickCount

2. GlobalMemoryStatus

3. GetCurrentThreadId & GetCurrentProcessId

4. GetDiskFreeSpace

I'm not happy using these seeds, but i don't know of any other values which can be used to input my function.

Any ideas?

But the program also checks if it's possible to compress the file.

If you would have more stringsequences which are equal, then it would be easier to compress it.

But i don't think it would really matter in this case.

The biggest problem i have is finding 4 pseudo random seeds to use with the function.

The ones i used now are:

1. GetTickCount

2. GlobalMemoryStatus

3. GetCurrentThreadId & GetCurrentProcessId

4. GetDiskFreeSpace

I'm not happy using these seeds, but i don't know of any other values which can be used to input my function.

Any ideas?

you could combine some of them, or something.. or use a hash function on some of those values and use the resultant hash (which should be nice and random).

I've been thinking of using a hashfunction, but i think it would be useless in a security way.

example:

i have one random function with 4 dword seeds as input.

if i have a hashfunction which makes a 4 dword hash out of 1 dword, then i actually would only have 1 dword of possible different seeds and so defeating the strength of 4 seeds.

example:

i have one random function with 4 dword seeds as input.

if i have a hashfunction which makes a 4 dword hash out of 1 dword, then i actually would only have 1 dword of possible different seeds and so defeating the strength of 4 seeds.