This is my attempt at solving this problem in the best way possible (fast and small, no drawbacks).

(the problem is: allowing multiple threads to simultaneously read from a shared object, and one thread to be able to write exclusively there. While a thread gains readwrite access to the object, no other thread can read or write to that object). Mostly applicable in internet-servers)

The object is only 4 bytes big, supports up to 65535 simultaneously-running readers, has optimized behaviour for its environment (number of processors and type of OS), and is completely safe.

Each of the lib's procs expects ECX to contain the address of the 4-byte RWLock object. The only modified register in each of the procs is EAX.

Since the 4-byte object contains only numbers (and no Windows objects), with CreateFileMapping you can even do inter-process synchronization.

Attached is the latest version, bug-free.
Posted on 2007-02-25 06:27:06 by Ultrano
Just to note that I added two nifty procs: RWLock_UpgradeToWriter() and RWLock_DowngradeToReader() .

Initially thought of adding the option to disable writing, but this was making the max number of threads 255 instead of 65535, so I noticed writing is disabled anyway if our thread is already a reader :) .

Each of the 8  procs (except for _Init/_Free) takes around 50 cycles (since xchg takes 19 cycles). Does anyone think that I could keep the lib's perfect stability if in the macro @UnlockByte I replace the "xchg" with a simple "mov byte ptr,0"  ? (in order to save 18 cycles per proc).
Posted on 2007-02-25 08:20:53 by Ultrano
Are you making an webserver? If so, I cant wait to check it! We just cant stand anymore just Apache and Lighty.

Now really, to the questions... You said if you set up something to writting state you gonna downgrade thread numbers to 255. So how UpgradeToWriter works then? Just out of curiousty I dont even know what your prog is all about. 8)

Now about macros, I always heard there is not a difference. I have a trend to believe however that auto tasks logically can be slower.
Posted on 2007-02-27 19:04:07 by codename
This is only really usable if the locks are held for a very short time though, isn't it? Otherwise you'll be burning a lot of CPU cycles... but for very short locks, this seems interesting :)
Posted on 2007-02-27 19:36:48 by f0dder
Whoops, forgot to update the code here ^^. I fixed it (covered all cases), added more procs, and wrote an article for

Attached is the same file as the one, uploaded on CodeProject.

F0dder: I doubt it'll be burning too many cycles, because on a single cpu it'll be quickly switching to the writer thread and stay there for more than 30ms. And on multicore cpu, only if the writer is on another cpu, and there are no active threads ready for the current cpu - we'll be looping. Actually, this can be handled somehow after a "SwitchToThread" returns "false" on a blocked reader - but it will increase latency.
Moving a thread from Active to Wait state seems slower (while I was doing some benchmarks).  I can't come up with a valid way to measure the time for moving a thread from Active to Wait state, I think I'll need to learn even more of Windows. My attempts at benchmarking WaitForSingleObject returned unexplainably long times. Someone that already has the ReadersWriterLock implemented with WaitForSingleObject told me it's actually slow, so maybe all the benchmark results I've encountered with Events are correct.
Was there a legally-avable sourcecode of the Windows kernel (in a DDK), or my memory is playing tricks?
attachment has been moved to the first post in this thread
Posted on 2007-02-28 02:27:27 by Ultrano
Legally-available source code would be for universities with a license - otherwise the closest you'll get is Windows Internals (the XP version of Inside Windows 2000), and WinDbg with .pdb files from Microsoft (those are amazingly enough freely available, and gives a LOT more detail about kernel as well as usermode components).

Yeah, Wait* is slower than what you can do with usermode code - they incur ring overhead, need the scheduler to wake them up which, again, means ring overhead, etc. Se yeah, it has overhead, which is why CRITICAL_SECTION does some spinlooping before waiting on event. If you need longer waits, it's of course very efficient because your thread gets removed from the ready-list.

Anyway, thanks for your work - reader/writer locking is much more nifty than "everybody blocks" :)

PS: "But Sleep(0) ignores threads with higher priority than the current thread's priority" - don't you mean lower priority?
Posted on 2007-02-28 08:04:32 by f0dder
Quote from msdn "to any other thread of equal priority that is ready to run".
So, if there are only active threads with a bit higher priority than the current one, this thread will eat-up its pie, causing higher latency.
Posted on 2007-02-28 08:31:52 by Ultrano

I was pretty sure it would relinquish control to higher-priority threads as well. Oh well, flawed memory, sorry.
Posted on 2007-02-28 09:10:23 by f0dder
My university has contracts with MS, and I've personally shuffled through the several thousands of cds/dvds they've provided so far (msdn ultimate+more), but I didn't see anything like that. And the tech-support MS provides us is a bit on the paranoia side when things come to licensing. And, I'll have to be asking through a lecturer of mine, with whom no-one (including me) is on good terms ^^". I've read the InsideWin2k book just recently, the insight it gave was part of the reason to make this RWLock lib and a ReWire-like audio-IPC lib of mine. So now I'll have to check these .pdbs and find/read more articles on the subject. Thanks for the pointer :)

Btw, I simply needed this RWLock lib for 2-3 threaded apps, and optimized it only in order to be able to squeeze-out as much performance as possible, if I or someone else ever needs it :)
Posted on 2007-02-28 09:41:09 by Ultrano
Thanks for the library. I was looking for anything like that a couple months ago. I'll certainly archive it somewhere on my HD and will check it out as soon as I got some spare time.
Posted on 2007-02-28 10:57:07 by JimmyClif
Yeah, your solution is obviously pretty good if you have very short lock times and are almost exclusive running user-mode code :)

As for the kernel source thing, it's only part of the source, and I think it's a special license, not some MSDN* thing.
Posted on 2007-02-28 18:01:50 by f0dder
Couple of bugs fixed, updated the attachment. Both caused by changes of design in the last minute >_<
Posted on 2007-03-06 02:30:48 by Ultrano
One minor thing: that .html file links to two .asp files which aren't included in the zip archive - you should either include them, or give a full URL to them :)

PS: cool that you've put it on The Code Project, should help get you some publicity :thumbsup:
Posted on 2007-03-06 03:45:16 by f0dder
The attached article is the exact html posted for the site - there I got 2,000 views in just a day or two .. and the obvious bugs were detected after whole 10 days XD
Posted on 2007-03-06 13:50:00 by Ultrano
Well, it would still be nice if you made absolute URLs for the .asp files - that way one can browse to them even when viewing locally.
Posted on 2007-03-06 17:23:37 by f0dder
I guess it's worth linking to the CodeProject page for this :)
Posted on 2007-04-15 10:52:16 by f0dder