A few years ago I wrote a password cracker for NT and I needed a multithreaded des key generator and encryption routine that squeezed every machine cycle possible out of the algorithm –especially the key generator. The attached files are my des implementation written from scratch from the RFC in masm.  I believe they are the fastest implementation of des that I can find to compare against. Mine generates a key and encrypts a 64 bit message in about 1100 machine cycles.  Most other implementations I have tested take 20k to 70k.

I spent a few years trying to sell it but no takers. (Delusions of grander I suppose)

John
Attachments:
Posted on 2005-11-01 17:18:00 by JohnW
Nice! Utilizes the max speed of newer cpus - no branches, cmov* instruction :)
Posted on 2005-11-01 22:59:43 by Ultrano
How does it compare against http://www.openwall.com/john/ ? :)
Posted on 2005-11-02 09:29:53 by f0dder
It's been 2 years since I tested but if I remember correctly I only picked up 10 clocks with cmov on Intel and lost 400 clocks on AMD. Both macro's are in the source and to test just comment out the 386 version and uncomment the 486 version

I haven't compared it against other implementation for a long time but I would be surprised if any were faster. It's not the hand coded asm that makes it so much faster, it's the way I rearranged the substitution table to accommodate the x86 instruction set.

The key gen table is also hard coded in the x86 instruction set. The ken gen routine is the most important piece for brute force lanman password cracking.


Posted on 2005-11-02 09:49:18 by JohnW
JohnW, it doesn't work, (or something wrong with my test code?)

.data?
align 16
SubK db 16*8 dup(?)
.data
align 16
IF 1
; this won't work
Key        db 043h,029h,07fh,0adh,038h,0e3h,073h,0feh
Plaintext  db 076h,025h,014h,0b8h,029h,0bfh,048h,06ah
Ciphertext db 0eah,067h,06bh,02ch,0b7h,0dbh,02bh,07ah
ELSE
; this works
Key        db 001h,023h,045h,067h,089h,0ABh,0CDh,0EFh
Plaintext  db 011h,011h,011h,011h,011h,011h,011h,011h
Ciphertext db 017h,066h,08Dh,0FCh,072h,092h,053h,02Dh
ENDIF
.code
lea esi,Key
lodsd
xchg eax,edx
lodsd
bswap eax
bswap edx
lea edi,SubK
call DesKeyEncryptASM
lea esi,Plaintext
lodsd
xchg eax,edx
lodsd
bswap eax
bswap edx
lea edi,SubK
call DesCoreASM
mov ebx,eax
mov ecx,edx
lea esi,Ciphertext
lodsd
xchg eax,edx
lodsd
bswap eax
bswap edx
sub ecx,edx
sub ebx,eax


...so i've coded one myself
here is my implementation, it's keyset is very similar to your but different.
the enc routine is branchless.

Posted on 2005-11-08 14:25:51 by drizz
The keygen space has to be zeroed out each time. I used the stack and had a wrapper proc do the housekeeping. I have wrapper procs for all the different variations of des and triple des

For example here's the ECB des wrapper:

.CODE
ALIGN 4
ECBDesCore: push ebx
push esi
push edi
push ebp
mov esi,;Message
ALIGN 4
@@: mov eax,
mov edx,
mov edi,;KeyTable
call DesCoreASM
mov esi,;Message
mov ,eax
mov ,edx
add esi,8
mov ,esi;Message
mov ecx,dword ptr
dec ecx
mov dword ptr ,ecx
jnz @B

pop ebp
pop edi
pop esi
pop ebx
ret
ALIGN 4
DesKeysDecrypt PROC C USES ebx esi edi ebp Key:DWORD,KeyTable:DWORD
mov edi,KeyTable
mov esi,Key
xor eax,eax
KeyTableOffset=0
REPEAT 32
mov ,eax
KeyTableOffset=KeyTableOffset+4
ENDM
mov eax,
mov edx,
call DesKeyDecryptASM
ret
DesKeysDecrypt ENDP

ALIGN 4
DesKeysEncrypt PROC C USES ebx esi edi ebp Key:DWORD,KeyTable:DWORD
mov edi,KeyTable
mov esi,Key
xor eax,eax
KeyTableOffset=0
REPEAT 32
mov ,eax
KeyTableOffset=KeyTableOffset+4
ENDM
mov eax,
mov edx,
call DesKeyEncryptASM
ret
DesKeysEncrypt ENDP

Posted on 2005-11-08 18:02:59 by JohnW
Your des implementation looks great. This weekend I take a close look.



Posted on 2005-11-08 18:15:06 by JohnW
The keygen space has to be zeroed out each time.

if my test code is executed at entrypoint, then .data? buffer will be zero!
(uninitialized virtual memory of the .data PE file section will be zeroed)
and the
sub ecx,edx
sub ebx,eax
is checked with a debugger (olly) to see if they are 0
"IF 1" changed to "IF 0" generates different code (If true / if false)

i'm just asking if it passes the "vectors" i've put in the test code...

p.s.: your wrapper procs are a bit (byte) wrong unless you are using them on big endian machine!?!??
Posted on 2005-11-09 07:47:15 by drizz
Des does a lot of bit twiddling and swapping dword values. I tried to eliminate any unnecessary either by rearranging the tables of reversing the qword lord order but it has nothing to do with little ended architecture.

That is why you see things like:
Mov eax,
Mov edx,

Instead of the normal
Mov edx,
Mov eax,

The following is a test code you can use to verify the routines





                    .386
                    .MODEL FLAT,STDCALL
                    OPTION CASEMAP:NONE
EXTERN DesCoreASM:PROC
EXTERN DesKeyEncryptASM:PROC

.DATA
DesValidation dq 0133457799BBCDFF1h,00123456789ABCDEFh,085E813540F0AB405h
                    dq 00000000000000000h,00000000000000000h,08CA64DE9C1B123A7h
                    dq 0FFFFFFFFFFFFFFFFh,0FFFFFFFFFFFFFFFFh,07359B2163E4EDC58h
                    dq 00123456789ABCDEFh,01111111111111111h,017668DFC7292532Dh
                    dq 00000000000000000h,00000000000000000h,08CA64DE9C1B123A7h
                    dq 004B915BA43FEB5B6h,042FD443059577FA2h,0AF37FB421F8C4095h
                    dq 00101010101010101h,00123456789ABCDEFh,0617B3A0CE8F07100h
                    dq 00000000000000000h,0FFFFFFFFFFFFFFFFh,0355550B2150E2451h
                    dq 0FFFFFFFFFFFFFFFFh,00000000000000000h,0CAAAAF4DEAF1DBAEh
                    dq 00123456789ABCDEFh,00000000000000000h,0D5D44FF720683D0Dh

.CODE
ALIGN 4

Start: lea esi,DesValidation
        mov ecx,10
;Creat space for key pairs
        sub esp,256
TestLoop:
;Zero out the key space
        xor eax,eax
        KeyTableOffset=0
REPEAT 64
        mov ,eax
        KeyTableOffset=KeyTableOffset+4
ENDM
;Load the key in EDX:EAX
        mov edx,
        mov eax,
        mov edi,esp
        call DesKeyEncryptASM
;Key table updated with 16 48 bit key pairs
;Load EDX:EAX with message
        mov edx,
        mov eax,
;EDI point to ket pair table
        mov edi,esp
;Encrypt
        push esi
        push ecx
        call DesCoreASM
        pop ecx
        pop esi
        sub edx,
        sbb eax,
; jnz @F
        add esi,24
        dec ecx
        jz TestEnd
        jmp TestLoop
;Results Invalid
@@:  add esp,256
          ret
TestEnd:
;Results Valid
        add esp,256
        ret

END Start
Posted on 2005-11-09 16:09:30 by JohnW
John, please test this line
DesValidation       dq 043297FAD38E373FEh,0762514B829BF486Ah,0EA676B2CB7DB2B7Ah




the data and the key block should be interpreted and outputed like this (edx:eax) :
#define READ_64BIT_DATA(data, edx, eax) \
edx = (data[0] << 24) | (data[1] << 16) | (data[2] << 8) | data[3]; \
eax = (data[4] << 24) | (data[5] << 16) | (data[6] << 8) | data[7];

#define WRITE_64BIT_DATA(data, edx, eax) \
data[0] = (edx >> 24) &0xff; \
data[1] = (edx >> 16) &0xff; \
data[2] = (edx >> 8) &0xff;  \
data[3] = edx &0xff;        \
data[4] = (eax >> 24) &0xff; \
data[5] = (eax >> 16) &0xff; \
data[6] = (eax >> 8) &0xff;  \
data[7] = eax &0xff;

Posted on 2005-11-10 10:51:32 by drizz
drizz,
The source document I used to create the des routine.
http://www.aci.net/kalliste/des.htm
Running the example from this link:
Key (0133457799BBCDFF1h) Message (00123456789ABCDEFh) cipher text (085E813540F0AB405h)
Generate  the following correct key schedule
K0  001b02ef00fc7072
K1  0079aed900dbc9e5
K2  0055fc8a0042cf99
K3  0072add600db351d
K4  007cec0700eb53a8
K5  0063a53e00507b2f
K6  00ec84b700f618bc
K7  00f78a3a00c13bfb
K8  00e0dbeb00ede781
K9  00b1f34700ba464f
K10 00215fd300ded386
K11 007571f5009467e9
K12 0097c5d100faba41
K13 005f43b700f2e73a
K14 00bf918d003d3f0a
K15 00cb3d8b000e17f5

When I run your key schedule generator it appears to generate two extra key values and the keys don’t match the linked example

K0  2b162f203133130e
K1  06370e3b301c0723
K2  0f1821193d363717
K3  25261b2625263f11
K4  22350c2a2e2d1b3b
K5  172d0f1f23180731
K6  313c3d30301f2b13
K7  0d211e23372b223d
K8  3f2a2c14242d2f38
K9  070714162a1f361f
K10 33161b340f18312e
K11 2138123d1b330c3b
K12 2e3c391737011e22
K13 2c1a363f233c0116
K14 232f33051b2c0b27
K15 3b092c372a313807
K16 163a380f123b113e
K17 1c3c0b1a0d0b091d

I can't explain why this value doesn't work.
DesValidation       dq 043297FAD38E373FEh,0762514B829BF486Ah,0EA676B2CB7DB2B7Ah
I hope it’s a bad value or byte order problem because if it a bug in my algorithm I'm screwed.


Posted on 2005-11-10 12:47:43 by JohnW

I can't explain why this value doesn't work.
DesValidation       dq 043297FAD38E373FEh,0762514B829BF486Ah,0EA676B2CB7DB2B7Ah
I hope it’s a bad value or byte order problem because if it a bug in my algorithm I'm screwed.

http://csrc.nist.gov/publications/nistpubs/
publication 800-17.pdf, page 136

or take descert.dat from Crypto++ 5.21
http://cvs.sourceforge.net/viewcvs.py/cryptopp/src/descert.dat

i'm sorry, i had no attention to upset you...
Posted on 2005-11-10 15:02:49 by drizz
drizz,
Can you post some code to test your routine?
The key schedule on mine is bad on the following bits 01010101010101010h and I lost the original program I used to generate the schedule.
If can can compare your to mine it might make easier to find

Yes you upset me ! -you made me debug a program I haven't looked at in three years :-)


Posted on 2005-11-11 09:51:29 by JohnW
Hi, here it is, it contains a bit updated source, and testing program.

the top archive is deleted, this zip contains the newer files.
Attachments:
Posted on 2005-11-11 18:51:58 by drizz