Hi guys, I'v written an implementation of the SHA1 algorithim. However I've run into a difficulty and I can't check if I'm right or wrong in the implementation.

I need the Hash code produced to be compatible with another program.

When I hash the files in that program it gives
1) CZN3GPMKNMABB3W7QFAGS575PWPHX4OZ
2) KHUAUMQQTLDLBAHAG5BYZYLW7BZGFISM
3) ICMEAS7FYBVGE62QTQXX5L22BFEHYY6O

However my program gives in hex
1) DBDB582D9589952274477A856493C97354DBAA68
2) AF0E23F658855207693C86A4C23FD5E7564747A8
3) C0A59BF1EA023C059664168EC022B6E7245E9CD5


Clealy they are using a different method of converting the 5 dword (20 byte) hash code into ascii. I simply used hex2dw.

The question I want to ask is can anyone explain what method they seem to be using? Their hash string is 32 characters in lenght always (I've looked at more than three files). It seems to be made up of letters A-Z and however I've only ever noticed digits 2-7.

That gives a total 32 distinct characters, which would be 5 bits. This also fits in with the size 5 dwords : 160 bits divided by 5 gives 32; the number of characters.

So I tried the following code to create that type of string

.data

HS db 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567'

.data?
tBuf db 128 dup (?)

.code
mov edi,-20
lea esi,tBuf
xor ecx,ecx
@@: mov eax,H[20][edi]
shr eax,cl
and eax,011111b
mov bl,HS[eax]
mov [esi],bl
inc esi

add ecx,5
cmp ecx,16
jb @B

sub ecx,16
add edi,2
jnz @B


And I get the following strings.

1) 36WRVWUSJMFFC25I2LBJWJGZTDVWNVCN
2) PVDGC3DLFUUOQURHGEJF47U2HXVOUDBV
3) AOJXZYL5CAPKALSMWQDBMRYWHHJ4FOW2

Now I can't see any similarity between them any the programs output, nor can I spot any pattern off hand.

So I have to ask for help, could any suggest another possible method they may be using to convert the 160 bit hash code to ascii. Or could someone, perhaps maybe implement the C code provided in the technical file in the zip as this generates hex outputs which I could compare with my own code.

Thanks guys, this really has me stumped.
Posted on 2002-06-05 10:05:20 by Eóin
Hmm, this is bad. I've been looking over the algo and I've spotted some mistakes. This begs the question how many other might there be which would be causeing my results to differ from other programs. :confused:

Basically I was using the lenght of the file in bytes, but the algo calls for bits. Another thing of note is the fact that the C code padded with an extra byte, however the algo calls for an extra bit. I figured that most people will go for the extra byte implementation as its easier in HLLs, I hope that was the right choice.

Heres the version with the length sorted, but it still produces different results.
Posted on 2002-06-05 11:51:49 by Eóin
eoin,

what is the program? if it isnt open source (neither have a open source equivalent) its easier to reverse engennering it a bit, and discover how he do the bin2ascii conversion, than trying to guess the algo by the result.

ancev
Posted on 2002-06-05 13:28:13 by ancev
Here is some PowerBASIC source for SHA-1. Hope it helps!



#COMPILE EXE
#REGISTER NONE
#DIM ALL
#INCLUDE "Win32Api.Inc"

$FileNm = "sha.bas" ' <------ Change

TYPE mSHA
H0 AS LONG
H1 AS LONG
H2 AS LONG
H3 AS LONG
H4 AS LONG
END TYPE

FUNCTION CalcSHA (Str AS STRING, SHA AS mSHA) AS LONG
#REGISTER NONE

DIM lStr AS LONG, nq AS LONG, n AS LONG, adrW AS LONG, adrWW AS LONG
DIM H0 AS LONG, H1 AS LONG, H2 AS LONG, H3 AS LONG, H4 AS LONG, W(0 : 79) AS LONG
DIM A AS LONG, B AS LONG, C AS LONG, D AS LONG, E AS LONG, TEMP AS LONG

lStr = LEN(Str)
nq = FIX((lStr + 8) / 64) + 1
n = 16 * nq

REDIM WW(0 TO n - 1) AS LONG

WW(n - 1) = lStr * 8
adrW = VARPTR(W(0))
adrWW = VARPTR(WW(0))
A = STRPTR(Str)

! PUSH EDI
! PUSH ESI

! MOV EDI, adrWW
! MOV ESI, A
! MOV ECX, lStr
! REP MOVSB

! MOV CL, &H80
! MOV [EDI], CL

! MOV EDI, adrWW
! MOV ECX, 2
CalcSHA_Lbl1:
! MOV AX, [EDI]
! MOV DX, [EDI + 2]
! MOV [EDI], DH
! MOV [EDI + 1], DL
! MOV [EDI + 2], AH
! MOV [EDI + 3], AL
! ADD EDI, 4
! INC ECX
! CMP ECX, n
! JNE CalcSHA_Lbl1

! MOV H0, &H67452301&
! MOV H1, &HEFCDAB89&
! MOV H2, &H98BADCFE&
! MOV H3, &H10325476&
! MOV H4, &HC3D2E1F0&

CalcSHA_Lbl2:

! MOV EDI, adrW
! MOV ESI, adrWW
! MOV ECX, 64
! REP MOVSB
! MOV adrWW, ESI

! MOV ECX, 0
CalcSHA_Lbl3:
! MOV ESI, ECX
! ADD ESI, ESI
! ADD ESI, ESI
! ADD ESI, adrW

! MOV EAX, [ESI + 52]
! XOR EAX, [ESI + 32]
! XOR EAX, [ESI + 8]
! XOR EAX, [ESI]

! MOV EDX, EAX
! SHL EAX, 1
! SHR EDX, 31
! OR EAX, EDX
! MOV [ESI + 64], EAX

! INC ECX
! CMP ECX, 64
! JNE CalcSHA_Lbl3

! MOV EAX, H0
! MOV A, EAX
! MOV EAX, H1
! MOV B, EAX
! MOV EAX, H2
! MOV C, EAX
! MOV EAX, H3
! MOV D, EAX
! MOV EAX, H4
! MOV E, EAX

! MOV EDI, 0
CalcSHA_Lbl4:
! CMP EDI, 19
! JA CalcSHA_Lbl5

! MOV ECX, B
! AND ECX, C
! MOV EAX, B
! NOT EAX
! AND EAX, D
! OR ECX, EAX
! ADD ECX, &H5A827999&
! JMP CalcSHA_Lbl8

CalcSHA_Lbl5:
! CMP EDI, 39
! JA CalcSHA_Lbl6

! MOV ECX, B
! XOR ECX, C
! XOR ECX, D
! ADD ECX, &H6ED9EBA1&
! JMP CalcSHA_Lbl8

CalcSHA_Lbl6:
! CMP EDI, 59
! JA CalcSHA_Lbl7

! MOV EAX, B
! AND EAX, C
! MOV ECX, B
! AND ECX, D
! MOV EDX, C
! AND EDX, D
! OR ECX, EAX
! OR ECX, EDX
! ADD ECX, &H8F1BBCDC&
! JMP CalcSHA_Lbl8

CalcSHA_Lbl7:
! MOV ECX, B
! XOR ECX, C
! XOR ECX, D
! ADD ECX, &HCA62C1D6&

CalcSHA_Lbl8:
! MOV EAX, A
! MOV EDX, EAX
! SHL EAX, 5
! SHR EDX, 27
! OR EAX, EDX
! ADD EAX, E
! ADD ECX, EAX

! MOV ESI, EDI
! ADD ESI, ESI
! ADD ESI, ESI
! ADD ESI, adrW
! MOV ESI, [ESI]
! MOV TEMP, ESI

! ADD Temp, ECX
! MOV EAX, D
! MOV E, EAX
! MOV EAX, C
! MOV D, EAX
! MOV EAX, B
! MOV EDX, EAX
! SHL EAX, 30
! SHR EDX, 2
! OR EAX, EDX
! MOV C, EAX
! MOV EAX, A
! MOV B, EAX
! MOV EAX, TEMP
! MOV A, EAX

! INC EDI
! CMP EDI, 80
! JNE CalcSHA_Lbl4

! MOV EAX, A
! ADD H0, EAX
! MOV EAX, B
! ADD H1, EAX
! MOV EAX, C
! ADD H2, EAX
! MOV EAX, D
! ADD H3, EAX
! MOV EAX, E
! ADD H4, EAX

! SUB nq, 1
! JNE CalcSHA_Lbl2

! POP ESI
! POP EDI

SHA.H0 = H0: SHA.H1 = H1: SHA.H2 = H2: SHA.H3 = H3: SHA.H4 = H4

END FUNCTION

FUNCTION tSHA (SHA AS mSHA) AS STRING
FUNCTION = HEX$(SHA.H0, 8) + " " + HEX$(SHA.H1, 8) + " " + HEX$(SHA.H2, 8) + " " + _
HEX$(SHA.H3, 8) + " " + HEX$(SHA.H4, 8)
END FUNCTION


FUNCTION PBMAIN

DIM SHA AS mSHA

CalcSHA "abc", SHA
MSGBOX tSHa(SHA)


END FUNCTION
Posted on 2002-06-05 13:38:59 by bazik
ancev, the program in question is Gnucleus, a Gnutella client and as it happens it is open source. But I can't get the source code. According to the site I need WinCVS to download the latest version of the source but the program keeps crashing whenever I try run it. I must look into it more.

bAZik, thanks for that code, I'll study it to see where I'm going wrong.

A note regarding the hash string, I've been reading information on some gnutella forum and spotted BASE 32 mentioned so that probably is then method used to display the hash in ascii. Of course the specific order A-Z then 2-7 may not be right.
Posted on 2002-06-05 14:07:25 by Eóin
E?in,
Attached is the only .cpp with the word hash in it from the latest CVS version of Gnucleus. I hope it help's If it is missing anything or you need other files than send me a PM and I will email the source to you. Its 600k + and is over the boards limit.
Posted on 2002-06-05 14:41:42 by emonk
E?in, browse the CVS and download anything you want:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gnucleus
Posted on 2002-06-05 15:17:25 by bitRAKE
I can feel my blood beginning to boil, its time to take a break.

But I'll share something, the list of things the author of the C code seems to have done wrong.

First theres the fact that he assumes that a message will be a multiple of 8. Thats a fair assumptions in this world of bytes rather than bits, but why does he then use the number of bits for the length. If you start with bytes, stick with them.

Next for some reason he decided to bswap every single dword in the message. Nowhere in the algo description does it say the order is important, in fact since it should be read as a bit string you'd assume order is entirely unimportant. Moreoever his code bswap's the dword containing the additional padded byte, the end result being that the pad no longer remains at the end of the message. Clearly this must wrong.

Both these are repeated in the PowerBasic version so it seems this incorrect implementation (and hugely inefficient due to unnecessary bswaps) has been accepted as standard. :mad:

Finally the C version also decides to bswap the MessageDigest when its finished, however the PowerBasic version doesn't. So by my reckoning these two implementation of the same algo would yield different result. :rolleyes:

emonk, thanks for the file. Unfortunatly it contains information on word hashing, not file hashing. Perhaps if you checked for the word SHA1 the right file might turn up. I think my e-mail may have a 1/2 mb limit.

EDIT: Just spotted bitRAKEs link, thanks. emonk thanks fo the offer but I can get the file myself now. :alright:
Posted on 2002-06-05 15:46:24 by Eóin
Here is a compiled version of the PowerBASIC SHA code. It takes the commandline parameters and calculates the hash from it.

Good info source:
http://csrc.nist.gov/cryptval/shs.html
Posted on 2002-06-05 16:26:17 by bazik
P.S.: Here is my validation. :)
Posted on 2002-06-05 16:28:20 by bazik

Next for some reason he decided to bswap every single dword in the message. Nowhere in the algo description does it say the order is important, in fact since it should be read as a bit string you'd assume order is entirely unimportant.


I'm not sure about SHA1, but SHA256 uses big endian convention (i.e. the left-most bit is in the most significant bit position), so you really have to bswap every dword.
Posted on 2002-06-05 18:51:12 by Tola
bAZik, thank you very much, its working! :alright:

Tola, you're right of course. I still say it doesn't stress order in the algo description but if other implementations all generate the same hash code then I suppose I can't say that they're wrong.

It annoys me however that the gnutella crowd blindly implemented the algo. They wouldn't need their hash codes to match SHA1, just that they would be consistent throughout the network. They should have optimised.

bitRAKE, thanks again for that link I found the necessary file so I should be able to reproduce those BASE32 ascii strings.

I'll post optimised code soon, but here's a little piece which I'm sure mmx could optimise, I just don't know how. It involves bswaping 16 dwords.
mov ecx,-16

sLp: mov eax,W[16*4][ecx*4]
bswap eax
mov W[16*4][ecx*4],eax
inc ecx
jnz sLp


Thanks guys, ye were a great help. :stupid:
Posted on 2002-06-05 19:43:10 by Eóin
:) In my current lazy state, only this comes to mind:
mov ecx,-8

sLp: mov eax,W[16*4][ecx*8]
mov edx,W[16*4][ecx*8][4]
bswap eax
bswap edx
mov W[16*4][ecx*8],eax
mov W[16*4][ecx*8][4],edx
inc ecx
jnz sLp
Posted on 2002-06-05 20:08:15 by bitRAKE
Yeah, I tried playing around with it on paper, but the pack and unpack instructions don't seem to be up to it, and I'd rather not use the shuffle instructions. Too limiting...

So I guess I'll uses that one. Here's the final code if anyones interested.

Posted on 2002-06-05 20:37:20 by Eóin
BTW, you really have to love the speed improvement you get for merely rewriting a C algo in Asm. ;)
Posted on 2002-06-05 20:38:55 by Eóin
E?in,
nice Tool that Hash Calculator, eh? :)
But I see 14 Algo's missing in your program... :grin:

Would you mind if I use your Hash code in my Email program for validating the Messages? I already added GPG support, but not everyone wants to use this. So I would like to check it internally with SHA :) For now I use a PowerBASIC .dll with the above code... didnt have the time to covert it to ASM as your now did.

Thanks again for sharing!
Posted on 2002-06-06 00:59:27 by bazik
I posted a ASM version of the RIPEMD-160 algo in this forum about a month ago.

When I get round to it might see if it will benifit from a MMX version but I have this feeling it wont because of the structure of the code.
Posted on 2002-06-06 01:57:55 by huh

BTW, you really have to love the speed improvement you get for merely rewriting a C algo in Asm. ;)


Hehe, you are right...
But it's better I calculate a hash of every msg when it get added to the inbox. Hashing my whole Inbox is a bit slow....

BTW, don't ask me again for releasing my Mail client. I'll do it, when I don't find any bugs for a week or so :)
Posted on 2002-06-06 03:20:08 by bazik
By all means bAZik, use the code however you wish.

As for actualy usage, hashing can be expensive, but you need only do it once then store the hashs for the various messages in a data file somewhere. After that you'd only need to hash message as they arrive.

I'd recommend calling the hash code in its own thread with a low priority, this means the job stills gets done, but you won't see any real effect on system performance. By that I mean GUI reaction time, mp3 players, etc.

I don't for sure if this is right but I always use the following code to achieve that.
invoke CreateThread,0,0,offset ThreadProc,0,NORMAL_PRIORITY_CLASS,addr tdd

mov ebx,eax
invoke SetThreadPriority,eax,THREAD_PRIORITY_LOWEST
invoke CloseHandle,ebx
Posted on 2002-06-06 09:12:37 by Eóin