Heya. I'm probably going to work on some backup-like software within long, and I need to ensure data integrity. Currently I'm pondering a bit about which algorithm to use. CRC32 is pretty easy to mess up - you can forge a checksum by changing four bytes. It's not that I need cryptographic security, I'm just wondering what the chance of getting a good crc32 on bad data, through a normal fault, would be.

And what about adler32, is it better?

I need to process files fast, and probably run on somewhat slow hardware, so MD5 or SHA would probably be too slow (not to mention overkill?)
Posted on 2004-08-24 19:03:19 by f0dder
CRC64?

That will certainly lessen the odds.
Posted on 2004-08-24 20:14:06 by iblis
Afaik, unless you're trying to break CRC32 it normally works very well. It is slightly more reliable than Adler32, but adler32 is much faster, though.

Perhaps if you stored both the CRC32 and adler32 checksums then you'd get some extra 'safeness' without sacrificing too much speed. On my computer (p4 1.5), both of those algorithms are held back by the hard disk speed anyway.
Posted on 2004-08-26 09:57:14 by stormix
That might be a good idea, stormix - the two algorithms look pretty different in nature, so if a change slips by one it's probably caught by the other. ADLER32 seems fast, and CRC32 is pretty fast when use with a table... so, that's definitely one possibility.

But I'm open to other brilliant suggestions too :)
Posted on 2004-08-26 10:23:16 by f0dder
f0dder,

if you think nobody will try to mess intentionally the data, crc32 is ok.

but, if there?s any reason for somebody want to trick the system, forget about it. its easily reversed.

ancev
Posted on 2004-08-27 16:31:18 by ancev
There shouldn't be anybody intentionally trying to corrupt the data - it's for some backup stuff, not program security (and yep, I know CRC32 is easy to fool intentionally). But I guess doing both CRC32 and ADLER32 should be okay - on the target CPUs they should basically be "free" operations because of CPU and disk speed...
Posted on 2004-08-29 12:48:50 by f0dder
I use Iblis' 128bit md5 module in my p2p fileshare project, it's plenty fast enough.. I CRC the entire file content, AND each "piece" sent... in fact, ALL the p2p packets contain an md5 hash... Bit Torrent clients use SHA-1 and get away with some very fast transfers, in fact it seems some routers can't keep up with the transfer speeds - I think you should benchmark.
Posted on 2004-08-29 22:02:51 by Homer
Homer, there's "some difference" between internet and harddisk speed, at least around here :) (the fastest line I've used myself has been able to handle 3-4MB/s, while my own harddrive can reach... dunno, but probably ten times as much).


But yes, benchmark it certainly is before I choose some algo. It'll be at least a couple of weeks before I'll start the project anyway, have some school work to take care of first.

And I'm still open to other suggestions, somebody might have some funky ideas ^_^
Posted on 2004-08-30 00:53:57 by f0dder