Hi there,
I want to synchronize several threads that should be able to read/write
to some global data. Therefore I'm currently using the following code:


.data
access_var dd -1d

.code
;...
invoke RequestAccess
;access the global data
dec access_var

RequestAccess:
@@:
inc access_var
jz @GotAccess
dec access_var
invoke rand, 5d
invoke Sleep, eax
jmp @b
@GotAccess:
ret
RequestAccess endp


As in multi-threading one instruction is always performed completely (I assume),
the "inc access_var" instruction sets the zero-flag only in one thread at a time.
If some other thread tries to access (invoke RequestAccess) while the first
currently reads/writes to the protected memory it loops after sleeping some
random milliseconds, until the first thread gives up his access by doing "dec access_var".

I tested it with multiple threads, but my question is what happens on a DUAL-CORE system?
Can anyone tell me?

Regards,
Dominik



Posted on 2005-11-26 04:45:35 by Dom
initialization: InitializeCriticalSectionAndSpinCount()

ThreadProc code:

1) EnterCriticalSection()
2) do the work
3) LeaveCriticalSection()

deinitialization: DeleteCriticalSection()

much shorter, much better and guaranteed to work even on 78 CPUs :)
Posted on 2005-11-26 05:18:29 by ti_mo_n
Dom, the code is unreliable even on single core systems. Guess what will happen if thread switch occurs right after "inc access_var" instruction.

Just follow ti_mo_n's suggestion...
Posted on 2005-11-26 05:43:26 by arafel
Wow did you try it on a 78 cpu computer?  ;)
Posted on 2005-11-26 06:52:30 by roticv
I don't know why exactly but you have to add a lock prefix to be safe with multiple CPUs. arafel, it works even if there is a thread switch (all registers are saved including the flags)

ti_mo_n is right. It's better to use EnterCriticalSection(). In fact the function starts with an inc to try to acquire the lock.
Posted on 2005-11-26 07:20:15 by Dr. Manhattan
You can modify your code to use xadd or xchg, too.

RequestAccess proc pLock:PTR DWORD
@@:
mov ecx,pLock
mov eax,1
xadd dword ptr,eax
inc eax
jz @GotAccess
invoke Sleep,0
jmp @B
@GotAccess:
ret
RequestAccess endp



LockVar proc pLock:PTR BYTE
@@:
mov al,1
mov ecx,pLock
xchg byte ptr,al
or al,al
jz @GotAccess
invoke Sleep,0
jmp @B
@GotAccess:
ret
LockVar endp


Critical sections are good, since they also check which thread is requesting access to them, thus

some proc
invoke EnterCriticalSection,addr csect
invoke EnterCriticalSection,addr csect
ret
some endp

will run perfectly. Only that a critical_section takes 24 bytes ^^', instead of 1 or 4.


On PCs with 2 or more cpus, it's good to use a "spin lock loop" - before going to "Sleep,0". EnterCriticalSection has it implemented: it tries to lock the critical_section ~4000 times, before calling Sleep(). On single-cpu systems, the "spin count" value is 0 (set by Windows automatically).

The "lock" prefix:
This instruction is a prefix that causes the CPU assert bus lock
        signal during the execution of the next instruction.  Used to
        avoid two processors from updating the same data location.  The
        cpu always asserts lock during an XCHG with memory operands.  This
        should only be used to lock the bus prior to XCHG,  MOV, IN and
        OUT instructions.
Posted on 2005-11-26 07:28:51 by Ultrano
Wow did you try it on a 78 cpu computer?  ;)

:P

will run perfectly. Only that a critical_section takes 24 bytes ^^', instead of 1 or 4.

20 bytes of RAM are not very expensive nowadays ^^" ;)
Posted on 2005-11-26 07:54:59 by ti_mo_n
Also, it's worth mentioning that (unless my memory fails me), critical sections don't just sleep after spinning - they do a blocking wait, which takes no CPU time.
Posted on 2005-11-27 13:38:38 by f0dder
Hmm isn't Sleep(0) better on single-cpu systems?  Sleep(0) immediately switches to another thread (of the same priority) - thus we'll more likely lock the section sooner (because the thread that locked it will be executed sooner, and thus the object will be unlocked sooner). Actually SwitchToThread() might be better, but it's not implemented in Win98-like OSes.

Wasn't blocking-waiting actually about "no cpu electrical power", but still taking cycles?
Posted on 2005-11-27 15:16:57 by Ultrano
I guess it depends on how you look at it. Sleep(0) won't give lower-priority threads a chance, and will still give a (somewhat artificial) CPU usage of 100%. Sleep(1) (or some other small amount) is better. This would still end up wasting cycles though, especially if the thread is going to block for a larger amount of time. You can reduce the amount of cycles wasted by increasing the Sleep() amount, but that gives a higher latency.

On the other hand, blocking on an object really does take 0% CPU time while the thread is blocking - it's removed from the scheduler's ready-list, and thus isn't even considered for execution. Only when the object is triggered will the scheduler spend time on the thread - by iterating through the object's "waiting for trigger" list and re-activating the threads (unless they're also waiting on other condition(s)).

The method to choose, of course, depends on how long you're likely to be waiting for the object to trigger. If you almost never have to wait for the object, CRITICAL_SECTION is good because of it's spin-then-wait strategy. If you'll almost always have to wait, doing WaitForSingleObject right away could be better. And if you're doing some kernel-level driver stuff and must have as low latency as possible (at the expense of burning cycles), spinlock without blocking waits can be appropriate.

But Sleep(0) is the devil :)
Posted on 2005-11-27 15:35:51 by f0dder
Btw, some useful info about thread sync:
http://www.iseran.com/Win32/CodeForSpeed/multithreading.html
Posted on 2005-11-27 16:39:26 by Ultrano
thanks for your comments...
so critical sections might be the way to go, unless
my source with a lock prefix would be ok, i assume.
Dom
Posted on 2005-11-28 14:57:35 by Dom

arafel, it works even if there is a thread switch (all registers are saved including the flags)


Heh, I know about flag register being saved on taks switch...
At first I thought that the access_var could be erroneously modified by other thread if switched before the jz. Anyway, I was wrong.  :oops:
Posted on 2005-11-28 17:20:31 by arafel