In case anybody is interested... the following C/C++ code:
void test(void)
{
volatile LONG aa, bb;
aa = 10;
bb = InterlockedIncrement(&aa);
}
Genereates the following unoptimized code, even with the /Ox ("max optimizations") switch:
lea eax, DWORD PTR _aa$[ebp]
push eax
call DWORD PTR __imp__InterlockedIncrement@4
mov DWORD PTR _bb$[ebp], eax
To make the VS.NET compiler generate intrinsics, I had to do the following - and that's even though the /Ox compiler switch was used, which should generally use intrinsics.
extern "C" LONG __cdecl _InterlockedIncrement(LONG volatile *Addend);
#pragma intrinsic (_InterlockedIncrement)
#define InterlockedIncrement _InterlockedIncrement
With this, the following code was generated:
lea eax, DWORD PTR _aa$[ebp]
mov ecx, 1
lock xadd DWORD PTR [eax], ecx
inc ecx
mov DWORD PTR _bb$[ebp], ecx