hi all

just a stupid question .. i can't understand why is this set of functions useful. Incrementing a 32 bit value should be just a single atomic operation (inc var) ? So if i incremnt it no other thread should be able to change it...
Posted on 2004-02-23 01:32:50 by Bit7
When using a C compiler, these functions will be called as intrinsics - ie, atomic instructions. They are probably implemented in case intrinsics are turned off, or for debug builds...
Posted on 2004-02-23 02:05:19 by f0dder
May be necessary in languages where inline assembly is not supported, and adding a variable by 1 may involve more than one instruction (there have to be such compilers).
Posted on 2004-02-23 04:19:59 by C.Z.
Perhaps Visual Basic? :P - I think that even the pcode stuff would support this, though. I still think the routines are mainly there for completeness - if it's declarede in the API, whether meant to be intrinsic or not, they better have a symbol for it in some DLL to keep idiots from bitching & moaning
Posted on 2004-02-23 08:44:24 by f0dder
Keep in mind that Windows was designed as a portable operating system, to be used on RISC processors that don't have an INC instruction.

Also, many UNIX-like OSes have an atomic INC function you can call (actually, a whole host of atomic operations) and having such functions available makes porting code to Windows easier.
Cheers,
Randy Hyde
P.S., of course, if you want a *true* atomic INC instruction, don't forget to put the LOCK prefix on it. Multiprocessor systems have taken a *big* jump in popularity with the new hyperthreading technology.
Posted on 2004-02-23 09:35:59 by rhyde
In case anybody is interested... the following C/C++ code:


void test(void)
{
volatile LONG aa, bb;

aa = 10;
bb = InterlockedIncrement(&aa);
}


Genereates the following unoptimized code, even with the /Ox ("max optimizations") switch:


lea eax, DWORD PTR _aa$[ebp]
push eax
call DWORD PTR __imp__InterlockedIncrement@4
mov DWORD PTR _bb$[ebp], eax


To make the VS.NET compiler generate intrinsics, I had to do the following - and that's even though the /Ox compiler switch was used, which should generally use intrinsics.


extern "C" LONG __cdecl _InterlockedIncrement(LONG volatile *Addend);
#pragma intrinsic (_InterlockedIncrement)
#define InterlockedIncrement _InterlockedIncrement


With this, the following code was generated:


lea eax, DWORD PTR _aa$[ebp]
mov ecx, 1
lock xadd DWORD PTR [eax], ecx
inc ecx
mov DWORD PTR _bb$[ebp], ecx
Posted on 2004-02-23 10:10:35 by f0dder
very interesting argument...
thanks all, thanks fodder :)
uhm.. but i cant' really undertand more things ....
Problably stupid questions but...
1) why compiler use lock add.. and not lock inc ?
2) why the lock is only on that instruction ?
3) coul another thread modify ecx value before the lock xadd come ??
4) if a processor don't have the inc, can use add var,1 ... right ?
I'd really like to understand this misterious things... for me an "inc value" shoul be always atomic for a c compiler...
Posted on 2004-02-24 14:03:52 by Bit7
1) xadd, not add... xadd swaps dst and src before storing the addition result in the destination. It does this even if the return value of InterlockedIncrement isn't used, so perhaps there's some SMP (multi-CPU) issues... or it's just one of those places where you could write more efficient code by hand.

2) because that's the only instruction that touches data. It's rather silly to use Interlocked* for LOCAL variables as they're always local to a single thread, btw... you'd only use Interlocked* to access global data that is accessed from multiple threads.

3) nope, the CPU registers are a part of the OS Thread Context... so they are saved/restored per-thread. In a SMP system, each CPU of course also has it's own registers.

4) "inc variable" or "add variable, 1" is atomic in the sense that threads can't be switched "in the middle of an instruction". However, there are lots of issues when you want to do safe SMP code - and I must admit I'm not really familiar enough with this. Luckily, I've only had to protect larger data structures where you have to use stuff like critical sections anyway.
Posted on 2004-02-24 14:26:07 by f0dder
thanks foddder, this is a great little lesson for me:)

so, if a copiler could know that only a singe processor will be used with that application, it could maybe produce more efficient code :)

Tha API help say:
The function prevents more than one thread from using the same variable simultaneously.
So if i've understood well, this can be true only in a SMP machine.
Posted on 2004-02-25 01:47:08 by Bit7
If you're doing multithreaded programming, do it properly - this means using Interlocked* (or the lock prefix when programming directly) when accessing global variables. No reason not to do "proper code", unless you're on some embedded system with very limited system. And, well, an embedded x86 system capable of threading probably doesn't qualify as "very limited" in this sense :).

Remember, this only applies to global data, not local stuff on the stack. And it only applies to data that multiple threads are accessing... so it's not like you're going to have to litter your code with lock and other weird stuff all over. I also think the amount of dword-sized global that need to have sync. access will generally be pretty limited, so you might not have to deal with this ever. *DO* remember to use Critical Sections or other means to protect global structs, though - uniprocessor systems can have context switches while in the middle of manipulating a struct, only single-data operations are atomic (and on SMP systems, multiple CPUs could be accessing the same data).

Oh, and remember that SMP isn't exclusively multi-CPU machines - P4's with hyperthreading (which are starting to become common even in supermarket computers) classify as SMP...
Posted on 2004-02-25 09:58:31 by f0dder
infinite thanks fodder, now all is clear. So HTT in P4 is now a very good reason for Interlock*.

thx B7
Posted on 2004-02-26 01:14:09 by Bit7
Well, in assembly code you might as well use LOCK prefix and instructions like XADD instead. In high-level code I'd do Interlocked* in case speed isn't of importance (still with the intrinsics, though) - and resort to assembly blocks (either inline or external asm) for speed-critical stuff. Oh, and I'd go over the intel manuals again before dealing with it :P
Posted on 2004-02-26 08:38:59 by f0dder