Hi guys, I'v written an implementation of the SHA1 algorithim. However I've run into a difficulty and I can't check if I'm right or wrong in the implementation.
I need the Hash code produced to be compatible with another program.
When I hash the files in that program it gives
1) CZN3GPMKNMABB3W7QFAGS575PWPHX4OZ
2) KHUAUMQQTLDLBAHAG5BYZYLW7BZGFISM
3) ICMEAS7FYBVGE62QTQXX5L22BFEHYY6O
However my program gives in hex
1) DBDB582D9589952274477A856493C97354DBAA68
2) AF0E23F658855207693C86A4C23FD5E7564747A8
3) C0A59BF1EA023C059664168EC022B6E7245E9CD5
Clealy they are using a different method of converting the 5 dword (20 byte) hash code into ascii. I simply used hex2dw.
The question I want to ask is can anyone explain what method they seem to be using? Their hash string is 32 characters in lenght always (I've looked at more than three files). It seems to be made up of letters A-Z and however I've only ever noticed digits 2-7.
That gives a total 32 distinct characters, which would be 5 bits. This also fits in with the size 5 dwords : 160 bits divided by 5 gives 32; the number of characters.
So I tried the following code to create that type of string
And I get the following strings.
1) 36WRVWUSJMFFC25I2LBJWJGZTDVWNVCN
2) PVDGC3DLFUUOQURHGEJF47U2HXVOUDBV
3) AOJXZYL5CAPKALSMWQDBMRYWHHJ4FOW2
Now I can't see any similarity between them any the programs output, nor can I spot any pattern off hand.
So I have to ask for help, could any suggest another possible method they may be using to convert the 160 bit hash code to ascii. Or could someone, perhaps maybe implement the C code provided in the technical file in the zip as this generates hex outputs which I could compare with my own code.
Thanks guys, this really has me stumped.
I need the Hash code produced to be compatible with another program.
When I hash the files in that program it gives
1) CZN3GPMKNMABB3W7QFAGS575PWPHX4OZ
2) KHUAUMQQTLDLBAHAG5BYZYLW7BZGFISM
3) ICMEAS7FYBVGE62QTQXX5L22BFEHYY6O
However my program gives in hex
1) DBDB582D9589952274477A856493C97354DBAA68
2) AF0E23F658855207693C86A4C23FD5E7564747A8
3) C0A59BF1EA023C059664168EC022B6E7245E9CD5
Clealy they are using a different method of converting the 5 dword (20 byte) hash code into ascii. I simply used hex2dw.
The question I want to ask is can anyone explain what method they seem to be using? Their hash string is 32 characters in lenght always (I've looked at more than three files). It seems to be made up of letters A-Z and however I've only ever noticed digits 2-7.
That gives a total 32 distinct characters, which would be 5 bits. This also fits in with the size 5 dwords : 160 bits divided by 5 gives 32; the number of characters.
So I tried the following code to create that type of string
.data
HS db 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567'
.data?
tBuf db 128 dup (?)
.code
mov edi,-20
lea esi,tBuf
xor ecx,ecx
@@: mov eax,H[20][edi]
shr eax,cl
and eax,011111b
mov bl,HS[eax]
mov [esi],bl
inc esi
add ecx,5
cmp ecx,16
jb @B
sub ecx,16
add edi,2
jnz @B
And I get the following strings.
1) 36WRVWUSJMFFC25I2LBJWJGZTDVWNVCN
2) PVDGC3DLFUUOQURHGEJF47U2HXVOUDBV
3) AOJXZYL5CAPKALSMWQDBMRYWHHJ4FOW2
Now I can't see any similarity between them any the programs output, nor can I spot any pattern off hand.
So I have to ask for help, could any suggest another possible method they may be using to convert the 160 bit hash code to ascii. Or could someone, perhaps maybe implement the C code provided in the technical file in the zip as this generates hex outputs which I could compare with my own code.
Thanks guys, this really has me stumped.
Hmm, this is bad. I've been looking over the algo and I've spotted some mistakes. This begs the question how many other might there be which would be causeing my results to differ from other programs. :confused:
Basically I was using the lenght of the file in bytes, but the algo calls for bits. Another thing of note is the fact that the C code padded with an extra byte, however the algo calls for an extra bit. I figured that most people will go for the extra byte implementation as its easier in HLLs, I hope that was the right choice.
Heres the version with the length sorted, but it still produces different results.
Basically I was using the lenght of the file in bytes, but the algo calls for bits. Another thing of note is the fact that the C code padded with an extra byte, however the algo calls for an extra bit. I figured that most people will go for the extra byte implementation as its easier in HLLs, I hope that was the right choice.
Heres the version with the length sorted, but it still produces different results.
eoin,
what is the program? if it isnt open source (neither have a open source equivalent) its easier to reverse engennering it a bit, and discover how he do the bin2ascii conversion, than trying to guess the algo by the result.
ancev
what is the program? if it isnt open source (neither have a open source equivalent) its easier to reverse engennering it a bit, and discover how he do the bin2ascii conversion, than trying to guess the algo by the result.
ancev
Here is some PowerBASIC source for SHA-1. Hope it helps!
#COMPILE EXE
#REGISTER NONE
#DIM ALL
#INCLUDE "Win32Api.Inc"
$FileNm = "sha.bas" ' <------ Change
TYPE mSHA
H0 AS LONG
H1 AS LONG
H2 AS LONG
H3 AS LONG
H4 AS LONG
END TYPE
FUNCTION CalcSHA (Str AS STRING, SHA AS mSHA) AS LONG
#REGISTER NONE
DIM lStr AS LONG, nq AS LONG, n AS LONG, adrW AS LONG, adrWW AS LONG
DIM H0 AS LONG, H1 AS LONG, H2 AS LONG, H3 AS LONG, H4 AS LONG, W(0 : 79) AS LONG
DIM A AS LONG, B AS LONG, C AS LONG, D AS LONG, E AS LONG, TEMP AS LONG
lStr = LEN(Str)
nq = FIX((lStr + 8) / 64) + 1
n = 16 * nq
REDIM WW(0 TO n - 1) AS LONG
WW(n - 1) = lStr * 8
adrW = VARPTR(W(0))
adrWW = VARPTR(WW(0))
A = STRPTR(Str)
! PUSH EDI
! PUSH ESI
! MOV EDI, adrWW
! MOV ESI, A
! MOV ECX, lStr
! REP MOVSB
! MOV CL, &H80
! MOV [EDI], CL
! MOV EDI, adrWW
! MOV ECX, 2
CalcSHA_Lbl1:
! MOV AX, [EDI]
! MOV DX, [EDI + 2]
! MOV [EDI], DH
! MOV [EDI + 1], DL
! MOV [EDI + 2], AH
! MOV [EDI + 3], AL
! ADD EDI, 4
! INC ECX
! CMP ECX, n
! JNE CalcSHA_Lbl1
! MOV H0, &H67452301&
! MOV H1, &HEFCDAB89&
! MOV H2, &H98BADCFE&
! MOV H3, &H10325476&
! MOV H4, &HC3D2E1F0&
CalcSHA_Lbl2:
! MOV EDI, adrW
! MOV ESI, adrWW
! MOV ECX, 64
! REP MOVSB
! MOV adrWW, ESI
! MOV ECX, 0
CalcSHA_Lbl3:
! MOV ESI, ECX
! ADD ESI, ESI
! ADD ESI, ESI
! ADD ESI, adrW
! MOV EAX, [ESI + 52]
! XOR EAX, [ESI + 32]
! XOR EAX, [ESI + 8]
! XOR EAX, [ESI]
! MOV EDX, EAX
! SHL EAX, 1
! SHR EDX, 31
! OR EAX, EDX
! MOV [ESI + 64], EAX
! INC ECX
! CMP ECX, 64
! JNE CalcSHA_Lbl3
! MOV EAX, H0
! MOV A, EAX
! MOV EAX, H1
! MOV B, EAX
! MOV EAX, H2
! MOV C, EAX
! MOV EAX, H3
! MOV D, EAX
! MOV EAX, H4
! MOV E, EAX
! MOV EDI, 0
CalcSHA_Lbl4:
! CMP EDI, 19
! JA CalcSHA_Lbl5
! MOV ECX, B
! AND ECX, C
! MOV EAX, B
! NOT EAX
! AND EAX, D
! OR ECX, EAX
! ADD ECX, &H5A827999&
! JMP CalcSHA_Lbl8
CalcSHA_Lbl5:
! CMP EDI, 39
! JA CalcSHA_Lbl6
! MOV ECX, B
! XOR ECX, C
! XOR ECX, D
! ADD ECX, &H6ED9EBA1&
! JMP CalcSHA_Lbl8
CalcSHA_Lbl6:
! CMP EDI, 59
! JA CalcSHA_Lbl7
! MOV EAX, B
! AND EAX, C
! MOV ECX, B
! AND ECX, D
! MOV EDX, C
! AND EDX, D
! OR ECX, EAX
! OR ECX, EDX
! ADD ECX, &H8F1BBCDC&
! JMP CalcSHA_Lbl8
CalcSHA_Lbl7:
! MOV ECX, B
! XOR ECX, C
! XOR ECX, D
! ADD ECX, &HCA62C1D6&
CalcSHA_Lbl8:
! MOV EAX, A
! MOV EDX, EAX
! SHL EAX, 5
! SHR EDX, 27
! OR EAX, EDX
! ADD EAX, E
! ADD ECX, EAX
! MOV ESI, EDI
! ADD ESI, ESI
! ADD ESI, ESI
! ADD ESI, adrW
! MOV ESI, [ESI]
! MOV TEMP, ESI
! ADD Temp, ECX
! MOV EAX, D
! MOV E, EAX
! MOV EAX, C
! MOV D, EAX
! MOV EAX, B
! MOV EDX, EAX
! SHL EAX, 30
! SHR EDX, 2
! OR EAX, EDX
! MOV C, EAX
! MOV EAX, A
! MOV B, EAX
! MOV EAX, TEMP
! MOV A, EAX
! INC EDI
! CMP EDI, 80
! JNE CalcSHA_Lbl4
! MOV EAX, A
! ADD H0, EAX
! MOV EAX, B
! ADD H1, EAX
! MOV EAX, C
! ADD H2, EAX
! MOV EAX, D
! ADD H3, EAX
! MOV EAX, E
! ADD H4, EAX
! SUB nq, 1
! JNE CalcSHA_Lbl2
! POP ESI
! POP EDI
SHA.H0 = H0: SHA.H1 = H1: SHA.H2 = H2: SHA.H3 = H3: SHA.H4 = H4
END FUNCTION
FUNCTION tSHA (SHA AS mSHA) AS STRING
FUNCTION = HEX$(SHA.H0, 8) + " " + HEX$(SHA.H1, 8) + " " + HEX$(SHA.H2, 8) + " " + _
HEX$(SHA.H3, 8) + " " + HEX$(SHA.H4, 8)
END FUNCTION
FUNCTION PBMAIN
DIM SHA AS mSHA
CalcSHA "abc", SHA
MSGBOX tSHa(SHA)
END FUNCTION
ancev, the program in question is Gnucleus, a Gnutella client and as it happens it is open source. But I can't get the source code. According to the site I need WinCVS to download the latest version of the source but the program keeps crashing whenever I try run it. I must look into it more.
bAZik, thanks for that code, I'll study it to see where I'm going wrong.
A note regarding the hash string, I've been reading information on some gnutella forum and spotted BASE 32 mentioned so that probably is then method used to display the hash in ascii. Of course the specific order A-Z then 2-7 may not be right.
bAZik, thanks for that code, I'll study it to see where I'm going wrong.
A note regarding the hash string, I've been reading information on some gnutella forum and spotted BASE 32 mentioned so that probably is then method used to display the hash in ascii. Of course the specific order A-Z then 2-7 may not be right.
E?in,
Attached is the only .cpp with the word hash in it from the latest CVS version of Gnucleus. I hope it help's If it is missing anything or you need other files than send me a PM and I will email the source to you. Its 600k + and is over the boards limit.
Attached is the only .cpp with the word hash in it from the latest CVS version of Gnucleus. I hope it help's If it is missing anything or you need other files than send me a PM and I will email the source to you. Its 600k + and is over the boards limit.
E?in, browse the CVS and download anything you want:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gnucleus
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gnucleus
I can feel my blood beginning to boil, its time to take a break.
But I'll share something, the list of things the author of the C code seems to have done wrong.
First theres the fact that he assumes that a message will be a multiple of 8. Thats a fair assumptions in this world of bytes rather than bits, but why does he then use the number of bits for the length. If you start with bytes, stick with them.
Next for some reason he decided to bswap every single dword in the message. Nowhere in the algo description does it say the order is important, in fact since it should be read as a bit string you'd assume order is entirely unimportant. Moreoever his code bswap's the dword containing the additional padded byte, the end result being that the pad no longer remains at the end of the message. Clearly this must wrong.
Both these are repeated in the PowerBasic version so it seems this incorrect implementation (and hugely inefficient due to unnecessary bswaps) has been accepted as standard. :mad:
Finally the C version also decides to bswap the MessageDigest when its finished, however the PowerBasic version doesn't. So by my reckoning these two implementation of the same algo would yield different result. :rolleyes:
emonk, thanks for the file. Unfortunatly it contains information on word hashing, not file hashing. Perhaps if you checked for the word SHA1 the right file might turn up. I think my e-mail may have a 1/2 mb limit.
EDIT: Just spotted bitRAKEs link, thanks. emonk thanks fo the offer but I can get the file myself now. :alright:
But I'll share something, the list of things the author of the C code seems to have done wrong.
First theres the fact that he assumes that a message will be a multiple of 8. Thats a fair assumptions in this world of bytes rather than bits, but why does he then use the number of bits for the length. If you start with bytes, stick with them.
Next for some reason he decided to bswap every single dword in the message. Nowhere in the algo description does it say the order is important, in fact since it should be read as a bit string you'd assume order is entirely unimportant. Moreoever his code bswap's the dword containing the additional padded byte, the end result being that the pad no longer remains at the end of the message. Clearly this must wrong.
Both these are repeated in the PowerBasic version so it seems this incorrect implementation (and hugely inefficient due to unnecessary bswaps) has been accepted as standard. :mad:
Finally the C version also decides to bswap the MessageDigest when its finished, however the PowerBasic version doesn't. So by my reckoning these two implementation of the same algo would yield different result. :rolleyes:
emonk, thanks for the file. Unfortunatly it contains information on word hashing, not file hashing. Perhaps if you checked for the word SHA1 the right file might turn up. I think my e-mail may have a 1/2 mb limit.
EDIT: Just spotted bitRAKEs link, thanks. emonk thanks fo the offer but I can get the file myself now. :alright:
Here is a compiled version of the PowerBASIC SHA code. It takes the commandline parameters and calculates the hash from it.
Good info source:
http://csrc.nist.gov/cryptval/shs.html
Good info source:
http://csrc.nist.gov/cryptval/shs.html
P.S.: Here is my validation. :)
Next for some reason he decided to bswap every single dword in the message. Nowhere in the algo description does it say the order is important, in fact since it should be read as a bit string you'd assume order is entirely unimportant.
I'm not sure about SHA1, but SHA256 uses big endian convention (i.e. the left-most bit is in the most significant bit position), so you really have to bswap every dword.
bAZik, thank you very much, its working! :alright:
Tola, you're right of course. I still say it doesn't stress order in the algo description but if other implementations all generate the same hash code then I suppose I can't say that they're wrong.
It annoys me however that the gnutella crowd blindly implemented the algo. They wouldn't need their hash codes to match SHA1, just that they would be consistent throughout the network. They should have optimised.
bitRAKE, thanks again for that link I found the necessary file so I should be able to reproduce those BASE32 ascii strings.
I'll post optimised code soon, but here's a little piece which I'm sure mmx could optimise, I just don't know how. It involves bswaping 16 dwords.
Thanks guys, ye were a great help. :stupid:
Tola, you're right of course. I still say it doesn't stress order in the algo description but if other implementations all generate the same hash code then I suppose I can't say that they're wrong.
It annoys me however that the gnutella crowd blindly implemented the algo. They wouldn't need their hash codes to match SHA1, just that they would be consistent throughout the network. They should have optimised.
bitRAKE, thanks again for that link I found the necessary file so I should be able to reproduce those BASE32 ascii strings.
I'll post optimised code soon, but here's a little piece which I'm sure mmx could optimise, I just don't know how. It involves bswaping 16 dwords.
mov ecx,-16
sLp: mov eax,W[16*4][ecx*4]
bswap eax
mov W[16*4][ecx*4],eax
inc ecx
jnz sLp
Thanks guys, ye were a great help. :stupid:
:) In my current lazy state, only this comes to mind:
mov ecx,-8
sLp: mov eax,W[16*4][ecx*8]
mov edx,W[16*4][ecx*8][4]
bswap eax
bswap edx
mov W[16*4][ecx*8],eax
mov W[16*4][ecx*8][4],edx
inc ecx
jnz sLp
Yeah, I tried playing around with it on paper, but the pack and unpack instructions don't seem to be up to it, and I'd rather not use the shuffle instructions. Too limiting...
So I guess I'll uses that one. Here's the final code if anyones interested.
Posted on 2002-06-05 20:37:20 by Eóin
So I guess I'll uses that one. Here's the final code if anyones interested.
Posted on 2002-06-05 20:37:20 by Eóin
BTW, you really have to love the speed improvement you get for merely rewriting a C algo in Asm. ;)
E?in,
nice Tool that Hash Calculator, eh? :)
But I see 14 Algo's missing in your program... :grin:
Would you mind if I use your Hash code in my Email program for validating the Messages? I already added GPG support, but not everyone wants to use this. So I would like to check it internally with SHA :) For now I use a PowerBASIC .dll with the above code... didnt have the time to covert it to ASM as your now did.
Thanks again for sharing!
nice Tool that Hash Calculator, eh? :)
But I see 14 Algo's missing in your program... :grin:
Would you mind if I use your Hash code in my Email program for validating the Messages? I already added GPG support, but not everyone wants to use this. So I would like to check it internally with SHA :) For now I use a PowerBASIC .dll with the above code... didnt have the time to covert it to ASM as your now did.
Thanks again for sharing!
I posted a ASM version of the RIPEMD-160 algo in this forum about a month ago.
When I get round to it might see if it will benifit from a MMX version but I have this feeling it wont because of the structure of the code.
When I get round to it might see if it will benifit from a MMX version but I have this feeling it wont because of the structure of the code.
BTW, you really have to love the speed improvement you get for merely rewriting a C algo in Asm. ;)
Hehe, you are right...
But it's better I calculate a hash of every msg when it get added to the inbox. Hashing my whole Inbox is a bit slow....
BTW, don't ask me again for releasing my Mail client. I'll do it, when I don't find any bugs for a week or so :)
By all means bAZik, use the code however you wish.
As for actualy usage, hashing can be expensive, but you need only do it once then store the hashs for the various messages in a data file somewhere. After that you'd only need to hash message as they arrive.
I'd recommend calling the hash code in its own thread with a low priority, this means the job stills gets done, but you won't see any real effect on system performance. By that I mean GUI reaction time, mp3 players, etc.
I don't for sure if this is right but I always use the following code to achieve that.
As for actualy usage, hashing can be expensive, but you need only do it once then store the hashs for the various messages in a data file somewhere. After that you'd only need to hash message as they arrive.
I'd recommend calling the hash code in its own thread with a low priority, this means the job stills gets done, but you won't see any real effect on system performance. By that I mean GUI reaction time, mp3 players, etc.
I don't for sure if this is right but I always use the following code to achieve that.
invoke CreateThread,0,0,offset ThreadProc,0,NORMAL_PRIORITY_CLASS,addr tdd
mov ebx,eax
invoke SetThreadPriority,eax,THREAD_PRIORITY_LOWEST
invoke CloseHandle,ebx