Author Topic: InterlockedIncrement  (Read 2199 times)

0 Members and 1 Guest are viewing this topic.

Offline Bit7

  • Regular Member
  • ***
  • Posts: 571
    • http://bit7.cjb.net
InterlockedIncrement
« on: 2004-02-23 06:32:50 »
hi all

just a stupid question .. i can't understand why is this set of  functions useful. Incrementing a 32 bit value should be just a single atomic operation (inc var) ? So if i incremnt it no other thread should be able to change it...

Offline f0dder

  • Community Staff
  • ASM Fanatic
  • *****
  • Posts: 7788
  • Front Line Assembly
    • http://f0dder.reteam.org
InterlockedIncrement
« Reply #1 on: 2004-02-23 07:05:19 »
When using a C compiler, these functions will be called as intrinsics - ie, atomic instructions. They are probably implemented in case intrinsics are turned off, or for debug builds...
- carpe noctem

Offline C.Z.

  • Code Warrior
  • **
  • Posts: 147
InterlockedIncrement
« Reply #2 on: 2004-02-23 09:19:59 »
May be necessary in languages where inline assembly is not supported, and adding a variable by 1 may involve more than one instruction (there have to be such compilers).

Offline f0dder

  • Community Staff
  • ASM Fanatic
  • *****
  • Posts: 7788
  • Front Line Assembly
    • http://f0dder.reteam.org
InterlockedIncrement
« Reply #3 on: 2004-02-23 13:44:24 »
Perhaps Visual Basic? :P - I think that even the pcode stuff would support this, though. I still think the routines are mainly there for completeness - if it's declarede in the API, whether meant to be intrinsic or not, they better have a symbol for it in some DLL to keep idiots from bitching & moaning
- carpe noctem

Offline rhyde

  • Regular Member
  • ***
  • Posts: 564
InterlockedIncrement
« Reply #4 on: 2004-02-23 14:35:59 »
Keep in mind that Windows was designed as a portable operating system, to be used on RISC processors that don't have an INC instruction.

Also, many UNIX-like OSes have an atomic INC function you can call (actually, a whole host of atomic operations) and having such functions available makes porting code to Windows easier.
Cheers,
Randy Hyde
P.S., of course, if you want a *true* atomic INC instruction, don't forget to put the LOCK prefix on it. Multiprocessor systems have taken a *big* jump in popularity with the new hyperthreading technology.

Offline f0dder

  • Community Staff
  • ASM Fanatic
  • *****
  • Posts: 7788
  • Front Line Assembly
    • http://f0dder.reteam.org
Intrinsics/Visual C++
« Reply #5 on: 2004-02-23 15:10:35 »
In case anybody is interested... the following C/C++ code:
Code: [Select]

void test(void)
{
volatile LONG aa, bb;

aa = 10;
bb = InterlockedIncrement(&aa);
}


Genereates the following unoptimized code, even with the /Ox ("max optimizations") switch:
Code: [Select]

lea eax, DWORD PTR _aa$[ebp]
push eax
call DWORD PTR __imp__InterlockedIncrement@4
mov DWORD PTR _bb$[ebp], eax


To make the VS.NET compiler generate intrinsics, I had to do the following - and that's even though the /Ox compiler switch was used, which should generally use intrinsics.
Code: [Select]

extern "C" LONG  __cdecl _InterlockedIncrement(LONG volatile *Addend);
#pragma intrinsic (_InterlockedIncrement)
#define InterlockedIncrement _InterlockedIncrement


With this, the following code was generated:
Code: [Select]

lea eax, DWORD PTR _aa$[ebp]
mov ecx, 1
lock xadd DWORD PTR [eax], ecx
inc ecx
mov DWORD PTR _bb$[ebp], ecx
- carpe noctem

Offline Bit7

  • Regular Member
  • ***
  • Posts: 571
    • http://bit7.cjb.net
InterlockedIncrement
« Reply #6 on: 2004-02-24 19:03:52 »
very interesting argument...
thanks all, thanks fodder :)
uhm.. but i cant' really undertand more things ....
Problably stupid questions but...
1) why compiler use lock add.. and not lock inc ?
2) why the lock is only on that instruction ?
3) coul another thread modify ecx value before   the lock xadd come ??
4) if a processor don't have the inc, can use add var,1 ... right ?
I'd really like to understand this misterious things... for me an "inc value" shoul be always atomic for a c compiler...

Offline f0dder

  • Community Staff
  • ASM Fanatic
  • *****
  • Posts: 7788
  • Front Line Assembly
    • http://f0dder.reteam.org
InterlockedIncrement
« Reply #7 on: 2004-02-24 19:26:07 »
1) xadd, not add... xadd swaps dst and src before storing the addition result in the destination. It does this even if the return value of InterlockedIncrement isn't used, so perhaps there's some SMP (multi-CPU) issues... or it's just one of those places where you could write more efficient code by hand.

2) because that's the only instruction that touches data. It's rather silly to use Interlocked* for LOCAL variables as they're always local to a single thread, btw... you'd only use Interlocked* to access global data that is accessed from multiple threads.

3) nope, the CPU registers are a part of the OS Thread Context... so they are saved/restored per-thread. In a SMP system, each CPU of course also has it's own registers.

4) "inc variable" or "add variable, 1" is atomic in the sense that threads can't be switched "in the middle of an instruction". However, there are lots of issues when you want to do safe SMP code - and I must admit I'm not really familiar enough with this. Luckily, I've only had to protect larger data structures where you have to use stuff like critical sections anyway.
- carpe noctem

Offline Bit7

  • Regular Member
  • ***
  • Posts: 571
    • http://bit7.cjb.net
InterlockedIncrement
« Reply #8 on: 2004-02-25 06:47:08 »
thanks foddder, this is a great little lesson for me:)

so, if a copiler could know that only a singe processor will be used with that application, it could maybe produce more efficient code :)

Tha API help say:
The function prevents more than one thread from using the same variable simultaneously.
So if i've understood well, this can be true only in a SMP machine.

Offline f0dder

  • Community Staff
  • ASM Fanatic
  • *****
  • Posts: 7788
  • Front Line Assembly
    • http://f0dder.reteam.org
InterlockedIncrement
« Reply #9 on: 2004-02-25 14:58:31 »
If you're doing multithreaded programming, do it properly - this means using Interlocked* (or the lock prefix when programming directly) when accessing global variables. No reason not to do "proper code", unless you're on some embedded system with very limited system. And, well, an embedded x86 system capable of threading probably doesn't qualify as "very limited" in this sense :).

Remember, this only applies to global data, not local stuff on the stack. And it only applies to data that multiple threads are accessing... so it's not like you're going to have to litter your code with lock and other weird stuff all over. I also think the amount of dword-sized global that need to have sync. access will generally be pretty limited, so you might not have to deal with this ever. *DO* remember to use Critical Sections or other means to protect global structs, though - uniprocessor systems can have context switches while in the middle of manipulating a struct, only single-data operations are atomic (and on SMP systems, multiple CPUs could be accessing the same data).

Oh, and remember that SMP isn't exclusively multi-CPU machines - P4's with hyperthreading (which are starting to become common even in supermarket computers) classify as SMP...
- carpe noctem

Offline Bit7

  • Regular Member
  • ***
  • Posts: 571
    • http://bit7.cjb.net
InterlockedIncrement
« Reply #10 on: 2004-02-26 06:14:09 »
infinite thanks fodder, now all is clear. So HTT in P4 is now a very good reason for Interlock*.

thx B7

Offline f0dder

  • Community Staff
  • ASM Fanatic
  • *****
  • Posts: 7788
  • Front Line Assembly
    • http://f0dder.reteam.org
InterlockedIncrement
« Reply #11 on: 2004-02-26 13:38:59 »
Well, in assembly code you might as well use LOCK prefix and instructions like XADD instead. In high-level code I'd do Interlocked* in case speed isn't of importance (still with the intrinsics, though) - and resort to assembly blocks (either inline or external asm) for speed-critical stuff. Oh, and I'd go over the intel manuals again before dealing with it :P
- carpe noctem