hutch--, how much seriousness do you expect from a house jumping up and down with crossed eyes? :D

...I leave the seriousness to others...

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=tfEqc.115093%24Ik.9534698%40attbi_s53
The only reason we continue to invest in process improvements to the exclusion of all else is that this is a cheap way to increase performance that does not involve any new thinking. If process/technology improvements were to stop tomorrow, we would have our hands full for the forseeable future with parallelisim speedups. In fact, the speed increases we get from process improvements are probally retarding parallel processing research quite a bit.
Posted on 2004-05-26 11:17:30 by bitRAKE
Its only when the eyes start rolling that I start to get worried.
Posted on 2004-05-27 00:30:45 by hutch--
Well, smokes not coming out of the chimney, yet. :)
Posted on 2004-05-27 08:17:37 by bitRAKE
...
Posted on 2004-05-27 18:09:15 by iblis
Hehe...cool! :D
Posted on 2004-05-27 18:17:18 by bitRAKE
You act like this is the first time you've seen it.
Posted on 2004-05-27 18:55:24 by iblis
No, it is just nice to put the picture to the words.

Chumbawamba, "Do you suffer from long term memory loss? I can't remember..."
Posted on 2004-05-27 19:26:36 by bitRAKE
To go off topic here, or perhaps back on topic I don?t understand the negativity towards parallel algorithms. Ok there are some algorithms, binary search for example (just me thinking here, I could be wrong), which I can?t see benefiting from a parallel architecture. But there are many, many which would. In fact (again me thinking here so there are perhaps cases I?m missing) given multiple processors with access to the same memory, I?d almost say that there isn?t any O(n*m), m being any polynomial of form f(n), which couldn?t be processed in parallel. One of the most complicated data structures I know of in a back propagation NN and, to blow my own trumpet for a moment, I?ve written such a net which can be trained across a network, never mind multiple processors sharing the same ram.

Personally I say bring on multi-processor systems, yeah sure they?re a bit harder to program for, but its so much more fun. And if it I?m getting more bang for you buck then I couldn?t be happier.
Posted on 2004-05-28 19:01:51 by Eóin
Personally I say: bring on the ps3.0+ GPUs.
Many parallel algoritms can be implemented on these new videocards, and they run way faster than your Opterons can ever dream of. In fact, I am actually implementing a GPU-assisted calculation at the moment. The CAD-program needs to make an estimate of the area of the connection between two or more holes. One way to do this would be to put a grid inside the hole, and do cell-classification. For each cell, you check whether it is inside all holes or not. Then you accumulate the area of all cells that are inside, and you have an estimation of the area. But if you look at it slightly differently, you can think of the grid as a pixelbuffer, and you count all pixels that are inside both holes. So this is what I do. I render the holes with CSG so only the parts that are inside all holes remain, then I count all pixels. A 3d card can render pixels much faster than a CPU would be able to classify them, I suppose.

Also, I am not against parallel systems, but as I said before, adding more CPUs to a system is only the second-best thing. Increasing clockspeed/efficiency gives more gain. To put it simple: a 3 GHz P4 is about twice as fast as a 1.5 GHz P4 in everything, but 2 1.5 GHz P4s are only twice as fast if you have a 100% parallel algo which has no additional overhead (synchronization or whatnot). In the average case you will not gain as much.

And my point is: as long as CPU manufacturers continue to build faster CPUs, you can always stick a few of those in a single system if you like. But once they start pushing towards multi-CPU systems, or even multi-core CPUs, I suppose that means they can no longer build faster CPUs, and this is their backup plan.
Posted on 2004-05-29 03:04:39 by Scali
I basically agree with the view that where you need vertical performance, single processor grunt is more useful to you but there are a couple of ways around the problem, handle larger data sizes for a given clock frequency and the throughput gets larger and thus the argument for 64 bit hardware or use parallel processors where the os and other related overhead is handled by one while the other does the fun stuff.

One of the things that pisses me off doing timings on my PIV under win2k is with processor intensive algo testing on very large data I have never been able to use more than about 55% of the processor capacity from the processor usage chart in win2k. What I would like to be able to do is dump all of the OS overhead on one processor and do my timings on another so I did not get anything like the fluctuation levels from OS task switching and similar. Multitasking is a pain in performance terms, especially when you have turned as much stuff off as you can and it still gets in the way.

Delegating as much processor tasks as possible to a GPU makes sense as it reduces the load on the main CPU but OS task switching does interfere with some graphics based operations. I shoved in the current low cost 128 meg video card on the current box but from time to time I see a pause when running DVD movies where the video card easily has the fill rate for a 21 inch screen running 1280 x 1024 and the problem is task switching effecting disk performance loading the data stream from a VOB file.
Posted on 2004-05-29 05:22:40 by hutch--
handle larger data sizes for a given clock frequency and the throughput gets larger and thus the argument for 64 bit hardware


Except that this is mostly irrelevant, since x86 has had 64 bit databuses since the Pentium (allowed 2 32 bit loads in parallel, or one 64 bit load via FPU or MMX). And I rarely need 64 bit integers anyway. Other types of 64 bit or even 128 bit data can be handled by FPU/MMX/SSE/SSE2, so no need for x86-64 there.

where the os and other related overhead is handled by one while the other does the fun stuff.


In the case of a single program (with one processor-intensive thread) running on a single CPU, the OS overhead is negligible anyway. It's a total waste to put a second CPU in there.

One of the things that pisses me off doing timings on my PIV under win2k is with processor intensive algo testing on very large data I have never been able to use more than about 55% of the processor capacity from the processor usage chart in win2k.


That is because Win2k doesn't support Hyperthreading, and wrongly treats it like a dual-CPU system. Take a look here and see the difference between how 2k and XP handle such situations: http://www.asmcommunity.net/board/index.php?topic=18353

from time to time I see a pause when running DVD movies where the video card easily has the fill rate for a 21 inch screen running 1280 x 1024 and the problem is task switching effecting disk performance loading the data stream from a VOB file.


Well, I am of the opinion that you should run as many processor-intensive tasks as you have processors. I don't expect my PC to be able to play a movie without problems if I am compiling a program. And if I do want this to happen, I will simply raise the priority on the movie player and/or decrease the priority on the compiler. Basically I then promote the movie player to the only processor-intensive task, and let the compiler eat the remaining cycles. Problem solved.
Oh, and defragmenting your disk may also help :)
Posted on 2004-05-29 05:41:42 by Scali

from time to time I see a pause when running DVD movies where the video card easily has the fill rate for a 21 inch screen running 1280 x 1024 and the problem is task switching effecting disk performance loading the data stream from a VOB file.

Have you remembered to turn on DMA access for your CD/DVD drive?
Posted on 2004-05-29 11:22:42 by f0dder
Interestingly enough the problem is more obvious when running a vob file from disk. The serial ATA disks don't use DMA and the interface for it is controlled by the Intel board. The pair of IBM HDDs I added to the box do use DMA and I think the DVD and CD writer also do but the most obvious comparison is my older PIV runs the same VOB files with no delay at all because it runs win98se which has less OS interference and it only uses a 3 year old 32 meg video card.

It seems to be a factor of later OS design to use more processor time or alternatively finer graduations of task switching and this appears to interfere with the DirectX 9.? I have installed on the box. I solved a video/audio synchronisation problem by setting the DVD software to direct sound.

The only thing that stops me from using the older PIV with win98se is the monitor on it is not that good for watching video but win98se certainly performs better with directx and direct sound.
Posted on 2004-05-29 21:03:42 by hutch--
Sounds odd, hutch. I have no problem with WinXP on my XP1800+ with a 32 mb GF2, or with my laptop, also WinXP, Celeron 1.6 and onboard Radeon IGP340M.

They both play video and games and other multimedia stuff perfectly. I haven't used Win98SE in ages, because it just doesn't work. You have to reboot it multiple times a day when you're using it intensively.

I have used NT4 and Win2k before that, and they also worked very smoothly. In fact, NT4 was a lot faster with DirectDraw than Win98SE was, at the time. At least on my Matrox card. Perhaps you have some dodgy drivers?
Posted on 2004-05-30 04:40:00 by Scali
Oh, by the way... I found this last night: http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/TW04079_WINHEC2004.ppt

They describe the future of graphics there, for Windows. And one of the points they make is that they want to target "non-graphical problem domains".
Posted on 2004-05-30 04:42:11 by Scali
Funny reading this back after all these years.
Obviously I have adopted 64-bit... not because I think it's fantastic, because it hasn't really made a dent in the market yet, neither in terms of performance nor in terms of adoption by end-users and developers alike. Most applications are still 32-bit only.
But I wouldn't be using assembly if I wasn't looking for maximum performance from the hardware... and if you have a 64-bit mode, you may aswell use it, even if it only gets you about 10% extra performance on average.

I still feel the same about multicore aswell. I bought a dual-core machine, but I don't really feel the need to go to 4 cores and beyond. Most software doesn't really gain a lot of performance from more than 2 cores anyway. I've also adopted multithreaded optimizations in some of my code, but just like 64-bit, the results generally aren't that spectacular. It seems that with Core2 and Core i7, the largest performance leap still came from inprovements in the core itself, not from getting 4 cores or reintroducing HT.

So my views haven't really changed. I'm still not impressed with 64-bit and multicore/multi-cpu solutions.
Posted on 2009-12-08 10:31:20 by Scali

What I wonder though... Where is AMD going next? Prior to Athlon, they never had a decent CPU. The Athlon was basically a clone of the Pentium Pro core, built with technology that they bought from the Alpha division. Now they extended this to 64 bit... But what is next? I haven't heard a thing about a next-gen architecture from them. Perhaps they have no idea, because they'd have to design a CPU from scratch again, and this time there is no example from Intel, and no technology to buy?


Clairvoyance!
Posted on 2012-04-02 14:14:53 by Scali