Hi there,
I want to synchronize several threads that should be able to read/write
to some global data. Therefore I'm currently using the following code:
As in multi-threading one instruction is always performed completely (I assume),
the "inc access_var" instruction sets the zero-flag only in one thread at a time.
If some other thread tries to access (invoke RequestAccess) while the first
currently reads/writes to the protected memory it loops after sleeping some
random milliseconds, until the first thread gives up his access by doing "dec access_var".
I tested it with multiple threads, but my question is what happens on a DUAL-CORE system?
Can anyone tell me?
Regards,
Dominik
I want to synchronize several threads that should be able to read/write
to some global data. Therefore I'm currently using the following code:
.data
access_var dd -1d
.code
;...
invoke RequestAccess
;access the global data
dec access_var
RequestAccess:
@@:
inc access_var
jz @GotAccess
dec access_var
invoke rand, 5d
invoke Sleep, eax
jmp @b
@GotAccess:
ret
RequestAccess endp
As in multi-threading one instruction is always performed completely (I assume),
the "inc access_var" instruction sets the zero-flag only in one thread at a time.
If some other thread tries to access (invoke RequestAccess) while the first
currently reads/writes to the protected memory it loops after sleeping some
random milliseconds, until the first thread gives up his access by doing "dec access_var".
I tested it with multiple threads, but my question is what happens on a DUAL-CORE system?
Can anyone tell me?
Regards,
Dominik
initialization: InitializeCriticalSectionAndSpinCount()
ThreadProc code:
1) EnterCriticalSection()
2) do the work
3) LeaveCriticalSection()
deinitialization: DeleteCriticalSection()
much shorter, much better and guaranteed to work even on 78 CPUs :)
ThreadProc code:
1) EnterCriticalSection()
2) do the work
3) LeaveCriticalSection()
deinitialization: DeleteCriticalSection()
much shorter, much better and guaranteed to work even on 78 CPUs :)
Dom, the code is unreliable even on single core systems. Guess what will happen if thread switch occurs right after "inc access_var" instruction.
Just follow ti_mo_n's suggestion...
Just follow ti_mo_n's suggestion...
Wow did you try it on a 78 cpu computer? ;)
I don't know why exactly but you have to add a lock prefix to be safe with multiple CPUs. arafel, it works even if there is a thread switch (all registers are saved including the flags)
ti_mo_n is right. It's better to use EnterCriticalSection(). In fact the function starts with an inc to try to acquire the lock.
ti_mo_n is right. It's better to use EnterCriticalSection(). In fact the function starts with an inc to try to acquire the lock.
You can modify your code to use xadd or xchg, too.
Critical sections are good, since they also check which thread is requesting access to them, thus
some proc
invoke EnterCriticalSection,addr csect
invoke EnterCriticalSection,addr csect
ret
some endp
will run perfectly. Only that a critical_section takes 24 bytes ^^', instead of 1 or 4.
On PCs with 2 or more cpus, it's good to use a "spin lock loop" - before going to "Sleep,0". EnterCriticalSection has it implemented: it tries to lock the critical_section ~4000 times, before calling Sleep(). On single-cpu systems, the "spin count" value is 0 (set by Windows automatically).
The "lock" prefix:
This instruction is a prefix that causes the CPU assert bus lock
signal during the execution of the next instruction. Used to
avoid two processors from updating the same data location. The
cpu always asserts lock during an XCHG with memory operands. This
should only be used to lock the bus prior to XCHG, MOV, IN and
OUT instructions.
RequestAccess proc pLock:PTR DWORD
@@:
mov ecx,pLock
mov eax,1
xadd dword ptr,eax
inc eax
jz @GotAccess
invoke Sleep,0
jmp @B
@GotAccess:
ret
RequestAccess endp
LockVar proc pLock:PTR BYTE
@@:
mov al,1
mov ecx,pLock
xchg byte ptr,al
or al,al
jz @GotAccess
invoke Sleep,0
jmp @B
@GotAccess:
ret
LockVar endp
Critical sections are good, since they also check which thread is requesting access to them, thus
some proc
invoke EnterCriticalSection,addr csect
invoke EnterCriticalSection,addr csect
ret
some endp
will run perfectly. Only that a critical_section takes 24 bytes ^^', instead of 1 or 4.
On PCs with 2 or more cpus, it's good to use a "spin lock loop" - before going to "Sleep,0". EnterCriticalSection has it implemented: it tries to lock the critical_section ~4000 times, before calling Sleep(). On single-cpu systems, the "spin count" value is 0 (set by Windows automatically).
The "lock" prefix:
This instruction is a prefix that causes the CPU assert bus lock
signal during the execution of the next instruction. Used to
avoid two processors from updating the same data location. The
cpu always asserts lock during an XCHG with memory operands. This
should only be used to lock the bus prior to XCHG, MOV, IN and
OUT instructions.
Wow did you try it on a 78 cpu computer? ;)
:P
will run perfectly. Only that a critical_section takes 24 bytes ^^', instead of 1 or 4.
20 bytes of RAM are not very expensive nowadays ^^" ;)
Also, it's worth mentioning that (unless my memory fails me), critical sections don't just sleep after spinning - they do a blocking wait, which takes no CPU time.
Hmm isn't Sleep(0) better on single-cpu systems? Sleep(0) immediately switches to another thread (of the same priority) - thus we'll more likely lock the section sooner (because the thread that locked it will be executed sooner, and thus the object will be unlocked sooner). Actually SwitchToThread() might be better, but it's not implemented in Win98-like OSes.
Wasn't blocking-waiting actually about "no cpu electrical power", but still taking cycles?
Wasn't blocking-waiting actually about "no cpu electrical power", but still taking cycles?
I guess it depends on how you look at it. Sleep(0) won't give lower-priority threads a chance, and will still give a (somewhat artificial) CPU usage of 100%. Sleep(1) (or some other small amount) is better. This would still end up wasting cycles though, especially if the thread is going to block for a larger amount of time. You can reduce the amount of cycles wasted by increasing the Sleep() amount, but that gives a higher latency.
On the other hand, blocking on an object really does take 0% CPU time while the thread is blocking - it's removed from the scheduler's ready-list, and thus isn't even considered for execution. Only when the object is triggered will the scheduler spend time on the thread - by iterating through the object's "waiting for trigger" list and re-activating the threads (unless they're also waiting on other condition(s)).
The method to choose, of course, depends on how long you're likely to be waiting for the object to trigger. If you almost never have to wait for the object, CRITICAL_SECTION is good because of it's spin-then-wait strategy. If you'll almost always have to wait, doing WaitForSingleObject right away could be better. And if you're doing some kernel-level driver stuff and must have as low latency as possible (at the expense of burning cycles), spinlock without blocking waits can be appropriate.
But Sleep(0) is the devil :)
On the other hand, blocking on an object really does take 0% CPU time while the thread is blocking - it's removed from the scheduler's ready-list, and thus isn't even considered for execution. Only when the object is triggered will the scheduler spend time on the thread - by iterating through the object's "waiting for trigger" list and re-activating the threads (unless they're also waiting on other condition(s)).
The method to choose, of course, depends on how long you're likely to be waiting for the object to trigger. If you almost never have to wait for the object, CRITICAL_SECTION is good because of it's spin-then-wait strategy. If you'll almost always have to wait, doing WaitForSingleObject right away could be better. And if you're doing some kernel-level driver stuff and must have as low latency as possible (at the expense of burning cycles), spinlock without blocking waits can be appropriate.
But Sleep(0) is the devil :)
Btw, some useful info about thread sync:
http://www.iseran.com/Win32/CodeForSpeed/multithreading.html
http://www.iseran.com/Win32/CodeForSpeed/multithreading.html
thanks for your comments...
so critical sections might be the way to go, unless
my source with a lock prefix would be ok, i assume.
Dom
so critical sections might be the way to go, unless
my source with a lock prefix would be ok, i assume.
Dom
arafel, it works even if there is a thread switch (all registers are saved including the flags)
Heh, I know about flag register being saved on taks switch...
At first I thought that the access_var could be erroneously modified by other thread if switched before the jz. Anyway, I was wrong. :oops: