I searched a number of articles in the A&S section (many of old article links seem to link nowhere..), but did not find a very concrete hint, so I ask (probably) again:

I need to scan a large data block (MPEG movie data - up to several GBytes) for a special DWORD header, e.g. 0x00000100. I simply load chunks of the file, and loop each byte like this:
for (loop = 0 to len)
  if (* (dword *) pbMemory = 0x00000100)
    do_something(..) ;

  pbMemory ++ ;
so theoretically I check each byte up to four times. Do some advanced algorithms help here? I think checking single bytes will not speedup the things, cause of the 32bit nature of the CPU.
Posted on 2005-12-05 07:48:24 by beaster
SSE is the way. with SSE you can check 8 * 4 (32) dwords (128 bytes) at a time. you can mix SSE with MMX (it's allowed) to increase the number of dwords by additional 16 to finally get 48 DWORDs (192 bytes) at a time.
So the algo would be:

1. load few KBs
2. order an asynchronous load of next few KBs or do nothing if EndOfFile reached.
3. scan the already loaded KBs (while the next ones are being loaded). Cancel the load if found and exit.
4. go to 2.

of course you need 2 buffers: one is being scanned while the other one is being loaded and vice versa.

SSE have great instructions for such problems: one compares a value and sets 0xFFFFFFFF if equal or 0x0 if not equal. the second one extracts 1 bit from each dword and store it in a GP register. then you do BSF to find if there are '1's. IF there is a '1' then you've found what you wanted. Unfortunalety I don't remember the mnemonics. Look through the Intel's manual.
Posted on 2005-12-05 07:59:55 by ti_mo_n