Hi, after several days of struggling, I've got a simple doubly linked list going. I know there's been several threads about linked list, but I still would feel very happy if somebody would be kind enough to give some comments or advice, if having time or feeling very kind :)

Cause I'm very uncertain of several things in the code, and how I solved stuff :eek:
The list only accepts entries with values sized DWORD, and it's GetNext, GetPrevious functions, returns pointers to entry's values.
This is short description of how the list works ~~~

The list's 'Create' function allocates a struct with some variables and structs necessary for operation:

pCurrent DWORD ? ; points to current entry
pHeadDummy DWORD ? ; points to a dummy entry always first, always empty
pTailDummy DWORD ? ; points to a dummy entry always last, always empty
pRemove DWORD ? ; if non-NULL points to a entry to be deleted due next call to list
nrEntries DWORD ? ; entries in list
hHeap DWORD ? ; handle to list's heap

At lists startup, two empty entries are created, headDummy and tailDummy, first and last in list, pointing to eachother. All entries pushed into
the list will be between these two. I thought this solved some problems which came up regarding functions GetNext and GetPrevious entries from list.

hHeap is handle to heap created for the list at Creation. I'm not at all sure about the heapfunctions, this is first time I used them. I initialize the heap size to 4096 and HEAP_NO_SERIALIZE. It seems to work ok, it seem to work growing automatically beyond that. Pls feedback on this with heapmemory in the program if anybody can help. (I noticed a big speed boost when I implemented heapallocs instead of globalallocs btw)

pRemove is a pointer handling removal of entires. It is always NULL unless user wanted to remove an entry, if user calls the Remove-function, the function sets this pointer to current Entry position, and makes the entries surrounding current entry point to each other. However the current is not Freed from the heap until a next call to a list-function which shifts position in the list. This is because the current position should stay at same point until moved - if erasing immediately, I have to shift current position forward or backward in the list, and this would be a problem cause I dont know in which direction the user is scanning the list.

The 'Create' function returns a pointer to this struct to the creator of the list, and the program has to pass this pointer to the list everytime it calls any list-function, thus there can be several different lists at one time.

The entries struct are like so:

val DWORD ? ; value
pPrevious DWORD ? ; pointer to previous entry
pNext DWORD ? ; pointer to next entry

the functions for the list are not many, they are:

List_Create PROTO
List_GetPrevious PROTO :DWORD
List_Remove PROTO :DWORD
List_GetNrEntries PROTO :DWORD
List_Destroy PROTO :DWORD

+ one internal to the list, List_CheckIfRemove, which is called from List_Start, List_End, List_GetNext, List_GetPrevious, to check if there
is a request from last call to list, to remove an entry after moving in position in list.


attach a zipped source files, there is included a tiny program to test the list too, just adds some 1000 entries, then removes some, and displays the resulting entries.
Posted on 2003-04-07 12:24:52 by david
iirc HEAP_NO_SERIALIZE makes it unsafe to use the heap multithreaded - if you aren't employing other synchronization, and you intend for this module to be reusable, you probably shouldn't add that flag. As for the speed bost compared to Global/LocalAlloc, try adding HEAP_ZERO_MEMORY flag to all allocations and see if it doesn't end up around the same speed (I still advise you to use Heap* functions though). Also, you might consider whether you want to create a new heap, or just GetProcessHeap() - or perhaps specify the heap at setup time. (main advantage of using a separate heap is you can nullify all allocations at once with HeapDestroy - this is a bad workaround for leaking code though ;-)). Just a couple friendly advices :)

Oh, you might also want to consider some more "advanced" techniques, like allocating blocks of link nodes, instead of chaining each and every one. This will improve locality and reduce heap fragmentation, at the expense of requiring some more logic in insert+delete routines (and some overhead when doing this) - depends on the needs I suppose.
Posted on 2003-04-07 12:33:22 by f0dder
Thanks Fodder,
I replaced the HEAP_NO_SERIALIZE with what you suggested. It's similar speed after altering.
I'll have to think about the advanced techniques you mention about allocating blocks of memory and stuff,
cant figure that out at the moment, :eek:
You're right, :grin: at the moment I HeapFree() every entry from the heap, and in the 'Destroy' function I just free the whole Heap like you mention.

Uploading the updated version
Posted on 2003-04-07 13:24:57 by david
HEAP_NO_SERIALIZE shouldn't have much of a speed hit (the thread synchronization primitives of win32 seem "okay"), but the presence of it might make the code multithreading unsafe.

I'm a bit surprised you say speed didn't change - did you add HEAP_ZERO_MEMORY to all HeapAllocs?

indeed you did :). Btw, you may want to make a "MYHEAPFLAGS" equ so you can easily change stuff arond.
Posted on 2003-04-07 13:29:31 by f0dder
Here is the results, almost similar, shouldnt they be? :)

cycle counts for 5 go's of a testprogram - HEAP_NO_SERIALIZE


cycle counts for 5 go's of a testprogram - HEAP_ZERO_MEMORY


(testprogram is similar to the one in attached file, only in C)
Posted on 2003-04-07 13:41:48 by david
hrm, the HEAP_ZERO_MEMORY probably wont matter too much for such small allocs anyway. but try timing it with some "substantial" block allocations (couple megs).
Posted on 2003-04-07 13:48:11 by f0dder
heh...yea! :confused: .. ... when I try for more massive amounts of data the one version returns negative result of cycles, must be cause of I'm using the c-type __i64, and it's signed :(, I'll get back to you with this when I rewritten the timing thing, I'll try tomorrow! :grin:
Posted on 2003-04-07 15:08:31 by david
just use a "massive" loop, and GetTickCount() :). And don't remember to Sleep(0) (give up remaning timeslice) just before timing. And if you want to overkill, boost hread+process priority to realtime (be VERY careful with that though ;-)).
Posted on 2003-04-07 15:12:16 by f0dder
I dont know exactly what your test routine looks like. But from experience, I've found i had to program two tests in series, both with a large amount of retries to get an average time.

The first test is a "debouncing" test simply to waste time, while the windows OS finishes its new process alocation (since you just started the exe). Remeber this is a multithreaded OS so it still does parallelism even when you think its not ;).

The second test was my actualy trust worthy test... and all others in sucession. I would often get something like:

300 cycles <<-- "Debounce" this
200 cycles <<-- as it has OS overhead involved as well.
30 cycles
35 cycles
32 cycles
33 cycles
28 cycles
etc. etc. etc
Posted on 2003-04-07 16:23:06 by NaN
I'm way too tired to code now, but here's what it looks like

#include <stdio.h>
#include <iostream.h>
#include "asmList.h"
#include "windows.h"

void* hList;

__int64 GetTimeStamp()
DWORD hi, lo;
mov hi, edx
mov lo, eax
return (__int64)((hi<<32) | lo);

int _cdecl main()

__int64 start=GetTimeStamp();

for(long i=0;i<3000000;i++) // add approximately 1 mb of entries (edit:==about 3mb)
List_AddEntry(hList, i);

// Traverse the linked list and
// remove all entries whose value cant by 2 without remainder

long* val;
if((*val % 2)!=0)List_Remove(hList);


__int64 end=GetTimeStamp()-start;

printf("total cycles %i64\n\n", end);


return 0;

as always, thanks for your replies, I really appreciate it!!!!!!!!
Posted on 2003-04-07 17:00:39 by david
Here is an example of what im getting at:

.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\masm32.inc
include \masm32\include\dmacros.inc
include \masm32\include\_macros_.inc
include \masm32\include\kernel32.inc
include \masm32\include\user32.inc
include \masm32\include\debug.inc

includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\user32.lib
includelib \masm32\lib\masm32.lib
includelib \masm32\lib\debug.lib

TimeTest_ON macro
db 0fh,31h
push edx ; push high
push eax ; push low

TimeTest_OFF macro
db 0fh,31h
pop ebx ; pop low
pop ecx ; pop hight
sub eax,ebx ; diff low
sbb edx,ecx ; diff high

push esi
mov esi, 0
.while esi < 10

mov ecx,100h

;Test code here!
mov eax, 100000
dec eax
jnz @B
;Test code here!

dec ecx
jne test2
;PrintHex eax, "ANS"

shr eax,8
PrintDec eax, "AVERAGE CLOCKS"

inc esi
pop esi


call ExitProcess
end start

ANd the results:
eax = 301466, AVERAGE CLOCKS (D:\masm32\JProject\ClockTst\fpu.asm, 69)

eax = 205035, AVERAGE CLOCKS (D:\masm32\JProject\ClockTst\fpu.asm, 69)
eax = 219046, AVERAGE CLOCKS (D:\masm32\JProject\ClockTst\fpu.asm, 69)
eax = 210533, AVERAGE CLOCKS (D:\masm32\JProject\ClockTst\fpu.asm, 69)
eax = 203551, AVERAGE CLOCKS (D:\masm32\JProject\ClockTst\fpu.asm, 69)
eax = 285120, AVERAGE CLOCKS (D:\masm32\JProject\ClockTst\fpu.asm, 69)
eax = 209213, AVERAGE CLOCKS (D:\masm32\JProject\ClockTst\fpu.asm, 69)
eax = 208310, AVERAGE CLOCKS (D:\masm32\JProject\ClockTst\fpu.asm, 69)
eax = 208919, AVERAGE CLOCKS (D:\masm32\JProject\ClockTst\fpu.asm, 69)
eax = 204437, AVERAGE CLOCKS (D:\masm32\JProject\ClockTst\fpu.asm, 69)

See how the first = 150% longer than the rest. The process start up can not be trusted.

Posted on 2003-04-07 17:24:53 by NaN
NaN, why the "db" form instead of rdtsc?

Do Sleep(0) before your actual test, to give up your remainder timeslice (and get a new, fresh timeslice just before the test).

Pretouch any code pages used in the test to make sure they're paged in (this is one of the few occasions where EXE packing has a benefit, as it will make sure all pages are paged in).

If you time algorithms that use dynamically allocated memory (that is, time the algorithms, not the allocation overheap), be sure to specify the HEAP_ZERO_MEMORY flag, as this also ensures pages are both reserved *AND* committed.

Setting thread+process priority to realtime will give more correct results, but if your process loops forever you risk taking the rest of the system down with you (even on NT - but of course it requires privileges to raise process priority on NT).
Posted on 2003-04-08 01:46:56 by f0dder
Thanks for the tips, I wrote a new speed-test, I did something completely wrong before, totaly incorrect results.
This is new my version:

.model flat, stdcall
option casemap:none

include \masm32\include\windows.inc
include \masm32\include\user32.inc
include \masm32\include\kernel32.inc
include asmList.inc
includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib


NRTESTS equ 5 ; five test-runs


bigNoHi dd 0
bigNoLo dd 6

szAppName db "speedTest",0
szResult db "test nr:%lu: Cycles:%I64d",13,10,0
szError db "Error creating list",0
szFileName db "SpeedTest-Results.txt",0
log db 1024 dup (0)


buffer db 256 dup (?)
timeStartHi dd ?
timeStartLo dd ?
iter dd ?
hFile dd ?



xor eax, eax
mov iter, eax

mov timeStartHi, edx
mov timeStartLo, eax

; --- test code begins ---

invoke Sleep, 0


; --- test code ends ---

sub eax, timeStartLo ; total time for the test=timeEnd-timeStart
sbb edx, timeStartHi

; ---- print the result and do more tests till done -----

invoke wsprintf, ADDR buffer, ADDR szResult, iter, eax, edx ; qword has to be pushed lodword, hidword order I think
invoke lstrcat, ADDR log, ADDR buffer ; add the result to textlog
invoke MessageBox, NULL, ADDR buffer, ADDR szAppName, MB_OK

inc iter
cmp iter, NRTESTS ; do NRTESTS number of tests
jnz _doTest

; all done, save textlog to disk and quit.

mov hFile, eax
invoke lstrlen, ADDR log
invoke WriteFile, hFile, ADDR log, eax, ADDR iter, NULL
invoke CloseHandle, hFile

invoke ExitProcess, 0

invoke MessageBox, NULL, szAppName, szError, MB_OK or MB_ICONERROR
invoke ExitProcess, 0

end start

The result for the test became a little weird, that's why I wonder if something is still weird in this test code, look:

test nr:0: Cycles:196293503
test nr:1: Cycles:1152612220
test nr:2: Cycles:1962245699
test nr:3: Cycles:2708017288
test nr:4: Cycles:3438762689

test nr:0: Cycles:292842613
test nr:1: Cycles:1889110993
test nr:2: Cycles:3431464033
test nr:3: Cycles:4331828929
test nr:4: Cycles:5226363817

(the testcode is push 500000*12bytes of data, and then scans through the whole list.)

First thing, HEAP_NO_SERIALIZE seems faster, which it shouldn't be.
Then there's the test 0 in the results which is always one digit smaller everytime, everytime I run. Weird.

attached these tests too
Posted on 2003-04-08 08:29:08 by david
Sleep(0) should be done _before_ rdtsc :)

oh, and the format specifier used in the EXEs must be wrong ^_^
Posted on 2003-04-08 08:36:33 by f0dder
:grin: stupid me!

Oh, the format, you mean the comments of the amount of data pushed in the list? It's DWORDs pushed, but I just meant that the memory allocated totally, if one adds the structs-the liftentries with next/previous-pointers to each DWORD pushed in the list, my mistake.

edit: aahh, found what's wrong, I never read a new start time with rdtsc when doing the more tests in the loop, I must be idiot. now it works! :)
Posted on 2003-04-08 16:21:14 by david
nah, I mean the msgbox popping up with "l64d" or similar :) (ie, wrong format string used for wsprintf or whatever)
Posted on 2003-04-08 16:22:23 by f0dder
yezzzzz....:tongue: It should be %I64d .. I fixed it now.. thanks.

Hmm...I'm adding Append and Prepend functions, to push value into front of list, and push value into back of list,
but I'm not really sure what's normally considered back and front of a linked list? :confused:

I have my head dummy entry, and tail dummy entry, and think of them as to the left and right.
If I APPEND stuff into the list, it goes into the leftmost spot, after head dummy.
If I PREPEND stuff into the list, it goes into the rightmost spot, preceeding the tail-dummy.
I see the left side as front, and right side as back.

List after appending values 1, 2 and 3, in that order, into the front

[ headdummy ] [ 3 ] [ 2 ] [ 1 ] [ tailDummy ]


If I would prepend values 1, 2 and 3, in that order, into the back, list would be

[ headdummy ] [ 1 ] [ 2 ] [ 3 ] [ tailDummy ]


Is front and back normally considered the other way around, or is this right?
Posted on 2003-04-09 22:46:43 by david
Prepend means to put in front.

For example, many C compilers prepend an underscore ("_") to a function name before passing it on to the linker. So what you define as Sleep will be given to the linker as _Sleep.

An appendix of a book is placed in the back.

By convention, the "leftmost" item is the "first" item. This corresponds with LISP notation.
Posted on 2003-04-09 23:16:33 by tenkey
Ah, thanks tenkey, my idea of start and end of list is right,
but I my understanding of words append and prepend is wrong.
Not too bad, I only have to change the names of my append & prepend functions! :)

I try a List_GetPos function now, which returns a pointer to the value in the list at a specified position.
It is super-slow method. But I can't think of better one, how could one else do???

Like this:
If list has 10 entries, and variable 'nrEntriesInList' is 10,
the variable 'wantedPos' is the position of the entry whose value function should return, and the positions in list are specified as 0-9:

(pseudo-pseudo code: )

if ( wantedPos >= nrEntriesInList ) return NULL
if (wantedPos < nrEntriesInList/2 ) step forward from headDummy (wantedPos+1) times and return the value;
step backward from tailDummy (nrEntriesInList-wantedPos) times and return the value.

This way I dont have to step forward from start 10 times if I want position 9 of 10 total for instance...but it still so slow,
what if list has super-many entries? would take forever to find entry 10000 of 20000 total entries. :(
Posted on 2003-04-10 01:24:23 by david
that's one of the bad properties about regular linked lists - it's not particularly fast to get an item from an index number. Again, linking "blocks" of nodes would make this faster, and again at the expense that arbitrary insertion/deletion would be slowed a lot.

Btw, that _is_ a C=64 in your avatar, isn't it?
Posted on 2003-04-10 02:18:13 by f0dder