Hi, I'm going to start writing a Common Lisp compiler soon (yes, I'm one of those heathens, please no language war ;), and I need to tag objects (i.e., find a way to differentiate pointers to e.g. ints from those to e.g. functions). The canonical way of doing this is to use the least significant bits of the pointer. I'm wondering if using another dword (right next to it) altogether, like some Haskell compilers, wouldn't be more efficient. I know that shifting is very slow on the P4, but is it slower than a load from a cached address on P3 and up?

In pseudo-c, the two ways would be something like:

Tagged pointer (simplifying a bit here)



int tag;

tag = ((int) ptr & 3); /* 2 bits tag */
switch (tag) {
case 0:
/*do stuff with a function pointer*/
break;
case 1:
ptr = (void*) ((int) ptr - 1); /* zero out LSB */
/* do stuff with cons pointer */
break;
case 2:
ptr = (void*) ((int) ptr - 2);
/* do stuff with general pointer */
break;
case 3:
ptr = (int) ptr >> 2; /* with this scheme, ints must be < 2^30-1, but that could be extended to 2^31-1 */
/* do stuff with the int */
break;
}


Using another dword altogether as a tag would be more like:



int tag;

tag = * ((int*) ptr); /* EDIT. Was * ((int*) ptr + 1); Minor detail :) */

switch (tag) {
case 0:
/*do stuff with a function pointer*/
break;
case 1:
/* do stuff with cons pointer */
break;
case 2:
/* do stuff with general pointer */
break;
case 3:
/* do stuff with the int */
break;
}


Using another dword also brings subtler differences, since for some functions, I could use a function pointer as the tag, allowing me to do a single dispatch ? la C++ instead of using a case or, worse, a series of conditional jumps. However, I'm not sure this is really important.

A researcher found that a mix of both approaches (small tag in the LSBs for often-used types, and full word for others) was better, but that was approximately 10 years ago, and x86 has changed a lot since then.

TIA!

PS, I hope this isn't OT, since I like the board's content, and I intend to try and make my compiler, or, at least, the structure of its output, efficient, so I'll probably have a lot of questions :)
Posted on 2004-12-01 21:46:35 by pkhuong
Short answer: memory is slow and code is fast.

What about 64-bit pointers tomorrow?

Are you doing this in x86?


mov ecx, [_ptr]
mov eax, ecx
and ecx, 4-1
and eax, 0-4
jmp

Dispatch DWORD \
OFFSET PTR_Function,
OFFSET PTR_Cons,
OFFSET PTR_General,
OFFSET PTR_Int30

This is two instructions more than storing another dword. :)

mov ecx, [_ptr]
mov eax, [_ptr][4]
jmp

The two extra instructions cost almost nothing in reality, but I think it is important to look at the bigger picture before making a choice.
Posted on 2004-12-01 23:27:49 by bitRAKE
Thanks.

Well, my primary target is x86 (ia32 for now), and I won't have access to an x86-64 box for a while. The only reason I am considering the 2 dwords scheme is that the 2nd dword will be at an address that would have been cached anyway, so the extra latency caused by the load from main memory isn't as bad. The 2-bits tag, on the other hand, requires me to have another tag for general values, and to shift the Int30 (it also restricts the range for ints, but that's not too important).

I guess the question is whether the additional branch for the general case is outweighed by having one less load.
Posted on 2004-12-01 23:57:06 by pkhuong
I guess the question is whether the additional branch for the general case is outweighed by having one less load.
Only as the number of items approaches infinity. :) Another approach would be to pool the items based on type, but using pointers is so much easier. IMHO, the extra dword provides a great deal of flexiblity and an array type should be used for large numbers of items.
Posted on 2004-12-02 01:29:40 by bitRAKE