Say I have two pointers to two strings, and I wish to do a search in string 1 for string 2. Trivial for exact match, but is there some nice technique for doing it case insensative besides a grunt work check if in range for A-Z? Note either strings may hold any legal character (nubmer, punctuation, control, etc), so just ignoring the 6th bit is not acceptable (one rairly wants '[' to exactly match '{' ).
Posted on 2001-03-22 23:48:00 by Ernie
It's quite simple: bring them to the same case and then compare them. It can be done dinamically in one loop or divided in two parts to convert first and then compare. Using appropriate way is havilly dependable on your conditions Do you care of both or one of values, how long the strings suppous to be ect. ect. I'm pro asm+database programmer and in my job there a lot of formation and copmaring tasks, after having been looking quite a while for proc major strategy I began developing teq. using tables and copmlex index addressing - for my job in lots of cases it became the way to get shortest and fastest result. If you use big table just for one proc it might seem quit memory consuming, but when you write a set of procs, and almost all of them you use in fairly big project then size of tables is almost nothing, you just put them in global section making them available for any proc, and writing procs passing them as parameter address of table. For example you can use the same table for UCASE conversions caseunsencetive search and comparing ect(upto 100 procs!). Create table of uninterraptive array and replace value of members UCASE with lowcase or in opposit. Then you don't need to check each time if it casesomething. Most universal whay with this will something like. edi pointer of one string esi pointer to another some where before the loop xor eax,eax xor edx,edx @@: in loop mov al, mov dl, mov al,tbl mov dl,tbl inc edi cmp al,dl jne notequ inc esi dec ecx jne @B Please note: it takes ~ 5-7 clocks for byte (iteration) and rep scasb 12once and + 4 byte for each byte comparasion So it will be fastest whay you never get using any other teq without table. In addition a lot of M32LIB procs useless in other not English speaking countries just for it use ULCASE proc with just first halve of ASCII table and other halve is national relative, but if you replace those procs with ones using tables they can be used there programmers needs just make their country seciffic table and pass pointer of them to the procs. The Svin.
Posted on 2001-03-23 01:31:00 by The Svin
Ernie, I know how it can be done but I don't consider that it is optimal code, case sensitive search is reasonably straight forward and what can be done easily enough is to convert the text to one case and do the comparison that way. It means an extra pass on the information being searched but case comversion is genuinely fast so it may not be a problem. I have seen a Boyer Moore written by Jeremy Collake that may do the job but I don't know if it has the precision you require. Also I think from memory, Jeremy's algos only worked in absolute addresses, not zero based offsets in the string to be searched. Regards, hutch@pbq.com.au
Posted on 2001-03-23 04:37:00 by hutch--


invoke strcmpi,addr string1,addr string2

why reinvent the wheel? umbongo
Posted on 2001-03-23 05:27:00 by umbongo
======================================== invoke strcmpi,addr string1,addr string2 ======================================== why reinvent the wheel? Not a bad point actually, it only matters if its speed critical but I know Ernie is a genuine speed freak. :) Regards, hutch@pbq.com.au
Posted on 2001-03-23 06:04:00 by hutch--
please be careful with manipulating single byte strings, and think to the german or arabic people, which will get problems with there special characters. Windows function work (mostly :)) proper.
Posted on 2001-03-23 07:03:00 by beaster
Sure, I like speed, I find it annoying to watch my system thrash for 30 secs just to read the HELP file. In this case, it's a recursive "find in files" text search. The app target is W9x ASCII. I wouldn't bother trying an international app without full unicode support, in which case I'd probably be stuck with the text API's for comparison. But in my case it's straight ascii comparing bytes. I've already done a few macros to do inline text cat and copy for output format (the case sensative case is already in the can). Since I'm checking every single character in the file, simple is better. The longest single task will be this search, and I was looking for something tight and light. Umm, yeah I know about lstrcmpi. It's works. I just want something faster. ----------------------------- "I saw weird stuff in that place last night. Weird, strange, sick, twisted, eerie, godless, evil stuff. And I want in."
Posted on 2001-03-23 09:44:00 by Ernie
Sure, I like speed, I find it annoying to watch my system thrash for 30 secs just to read the HELP file.
You want my system at work, they have a policy of running anti-virus software on every single file. I mean everything, .hlp .html .tmp - the lot, it takes 2 minutes to open the VC++ help file!!!!!! If you want to optimise anything call norton and work for them...... Umbongo This message was edited by umbongo, on 3/23/2001 1:05:00 PM
Posted on 2001-03-23 12:04:00 by umbongo
I have never used any virus protection software in the last twenty years :) Only lost two hardrives: first one wasn't backed-up - spent a month getting the source code off of it. :( I don't think I'll ever use virus protection software - like I don't get a flu shot every year. I do use condoms though :) - err, mostly :) Excute me while I knock on wood :) bitRAKE
Posted on 2001-03-23 15:50:00 by bitRAKE
Ernie, This library module is more or less hot off the press but it is testing up OK so far and you will find it will rip the titz of API calls. Now the down side is that its case sensitive search and it has no error checking built into it so if you pass an incorrect parameter, it will crash. In particular, specifying the starting position to search from at a position past the source length minus the search string length will cause a page fault. In the last service pack, the 2 case conversion modules were made a lot faster so converting a buffer to uppercase then searching it would probably be very fast as the case conversion would get it into cache. Regards, hutch@pbq.com.au

; #########################################################################

    .386
    .model flat, stdcall
    option casemap :none   ; case sensitive

    .code

; #########################################################################

bSearch proc StartPos:DWORD,lpString:DWORD,lnStrng:DWORD,
                            lpSubStr:DWORD,lnSubSt:DWORD

    push ebx
    push esi
    push edi

    mov esi, lpSubStr
    mov ah,          ; 1st byte in AH

    mov esi, lpString     ; source in ESI
    add esi, StartPos
    mov edx, lnStrng      ; len in EDX
    lea edx,   ; add esi to edx + 1
    sub edx, lnSubSt      ; subtract substr len from source len

; ---------------------- Main Loop --------------------------
  bsSt:
    mov al, 
    inc esi
    cmp al, ah            ; cmp byte to start character
    je sbLoop             ; jump to subloop if equal
    cmp esi, edx          ; test exit condition
    jne bsSt              ; jump back if not equal

    mov eax, -1           ; -1 = no match
    jmp bsOut
; -----------------------------------------------------------
  sbLoop:
    push esi              ; source offset is in ESI
    dec esi               ; start comparison from correct position
    mov edi, lpSubStr     ; substring offset in EDI
    mov ecx, lnSubSt      ; substr len in ECX
; ---------------------- Sub Loop ---------------------------
  cmSt:
    mov al,          ; read source byte
    mov bl,          ; read search byte
    inc esi
    inc edi
    cmp al, bl            ; test byte
    jne cmOut             ; exit subloop if not matching
    dec ecx               ; decrement counter
    jnz cmSt              ; return to compare next byte if ECX not zero
; -----------------------------------------------------------
;   if ECX = 0 fall through to following instruction
; ---------------------- If Match ---------------------------
    pop esi               ; restore esi
    dec esi               ; correct to get true position
    sub esi, lpString     ; sub start address from esi
    mov eax, esi          ; copy to eax as return value
    jmp bsOut
; ------------------------ Else -----------------------------
  cmOut:
    pop esi               ; restore ESI
    jmp bsSt              ; return to start

  bsOut:

    pop edi
    pop esi
    pop ebx

    ret

bSearch endp

; #########################################################################

    end
This message was edited by hutch--, on 3/23/2001 5:44:32 PM
Posted on 2001-03-23 16:43:00 by hutch--
There is no point to convert all before the hand. Look, try, clock it and think step by step about different scenarious in different procs... It's very hard for me explain code in Engilsh, a lot more diffical than write it. You must excuse it, please.

.data
;If English 128 for Russian and other lang - 256 bytes tables
uctbl	db 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
	db 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
	db 32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47
	db 48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63
	db 64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79
	db 80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95
	db 96,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79
	db 80,81,82,83,84,85,86,87,88,89,90,123,124,125,126,127

.code

.......
	invoke Strlen,addr Str1
	invoke CUStrcmp,addr Str1, addr Str2,eax,addr uctbl
	or eax,eax ; if Str1 and Str2 are equal caseuns.?
......
	invoke Ucase,addr OtherStr,addr uctbl  ;you may use the table for a lot of procs passing addr
......
	invoke CUInstr,addr OneMoreStr,addr ToSearchStr,index,addr uctbl

CUStrcmp proc lpstr1:DWORD,lpstr2:DWORD,lenth:DWORD,tbladrr:DWORD
push edi
push ebx
push esi
	xor eax,eax
	xor edx,edx
	mov ecx,lenth
	mov ebx,tbladdr
	mov esi,lpstr1
	mov edi,lpstr2
@@:	mov al,
	mov dl,
	mov al,
	mov dl,
	inc esi
	inc edi
	cmp al,dl
	jne	@F
	dec ecx
	jne @B
	mov eax,1  ;TRUE - equal
	jmp done
@@:	xor eax,eax
done:	pop esi
	pop ebx
	pop edi
	ret

CUStrcmp endp

Ucase proc lpszStr:DWORD,tbladdr:DWORD
	xor eax,eax
	mov edx,lpszStr
	mov ecx,tbladdr
@@:	mov al,
	mov al,
	mov ,al
	inc edx
	or al,al	;if end of asciiz string
	jne @B
	ret
Ucase  endp
InStrCU
CUInStr proc ... Steve can write himself or I can if anybody wants :) The Svin (added code tags -- ernie) This message was edited by Ernie, on 3/24/2001 2:30:40 AM
Posted on 2001-03-24 01:05:00 by The Svin
Alex, I have just learnt something useful when using tables although in a slightly different context, it is very important to have the table aligned even when you are making BYTE reads from it. I worked on an algorithm with a friend of mine that uses XLATB and got a wide range of speeds depending on the alignment of the 256 byte table, even though XLATB reads in BYTE size. Regards, hutch@pbq.com.au
Posted on 2001-03-24 15:49:00 by hutch--
Hmmm... I didn't get it :) How array of bytes can be aligned? For example if you alinged array tbl by 4 then tbl[0] tbl[4] tbl .... would be alinged but tbl[1],[2],[3],[5],[6].... will be missaligned. The same with even - only one address of two can be even. You can't aling addresses of all bytes in byte array be 2 or 4. Did I miss something?
Posted on 2001-03-25 11:34:00 by The Svin
That point about tables needing to be aligned - it is quite true. The only reason I can think is, even though the values themselves are read as bytes - the hardware doesn't! For each memory access, a value is read using the full bus width (ie a DWORD), regardless of whether this is to used as a byte, word, or dword once the dword is read - then it is masked and shifted accordingly if needed to fit the instruction So, the first byte read, in an aligned table will still need to be masked and shifted, but once this is done - the next byte to be read is already in the pipeline and has already been read so this is quick (and so for the next two bytes after that) But if the table is misaligned then this 'cached-bytes' effect is partly lost, since the loss on the first byte isn't made up by the next three Although, I'm not sure how much impact this will cause for byte tables, since only the first (hardware)dword is misaligned. But for word and dword tables it can be very bad, since usually two dword fetches will be required just to fetch a single (soft)dword (...which would be like getting a chache-miss on every second read - ouch!)
Posted on 2001-03-27 06:29:00 by Tedd
As far as I know, reads are done at full bus width, and only selected bytes are used (via the Byte Enables). The same is true of writes in reverse, but the rest of the data is discarded. Because the data is read anyway, it is usually cached, as there is a good cost/performance ratio, and it is also usually the case that there will be subsequent memory accesses (such as tables). Mirno
Posted on 2001-03-27 06:54:00 by Mirno
The Intel cache is pretty strange to me. Instead of a single cache, it's actually a bunch of little caches, each four levels deep (has this changed in the latest chips?) So it's extremely sensitive to access patterns. In the 486, the bad number was 2048. If you accessed data that was a multiple of 2K apart, you quickly flushed out the mini-cache. The 486 was worse than the Pentium because code and data bytes were mixed together in the L1 mini-caches. A loop with two levels of subroutine calls spaced 2K apart (three code sections), add in a couple of data accesses 2K apart from each other and the code -- and you get a lot of cache misses. Even worse is the 4K page size used for VM (virtual memory). If you have lots of RAM, this won't be a problem. But if you don't, you can have a similar scenario (with the bad number 4096) and get thrashing. With the L1 cache, each cache line in the 486 is 16 bytes (it's 32 in the Pentiums), so the 2K figure is pretty close to reality. With the VM pages, it's better to state reality -- it's how many 4K pages you need (vs. how many you can hold in RAM) that determines whether you thrash or not. Another thing to keep in mind about byte writes -- if the hardware does not support byte selection during writes, at least one whole dword will need to be read in before writing. If that dword is cached, a whole 32-byte cache line in a Pentium must be filled (meaning eight dwords must be read!) I'm not really sure of the exact Pentium behavior.
Posted on 2001-03-27 15:49:00 by tank
To Ted: Hmmm... It makes sence... Though not clear enogh for me yet. :) There always is something strange and unexplained when testing Intel procs with an oscilograph. To Ernie: In case you're going to use tables in search in files you need then 256 byte table with second part (index127-255) filled with the same numbers as indexes values. The Svin.
Posted on 2001-03-28 04:09:00 by The Svin