scientica, it's patronizing and non-info from hutches side... and HE talks about ego massaging? :grin:
Posted on 2003-11-30 08:03:48 by f0dder


I doubt that the future is based on spheres with ugly shading, running at low framerates without any kind of filtering whatsoever.
Posted on 2003-11-30 08:10:29 by Bruce-li
Interesting list but the only factor that seems to be relevant is the processor count,

1. 5120 NEC processors
2. 8192 HP processors
3. 2200 Apple G5 processors
4. 2500 P4 Xeons.
etc ....

I guess if you strapped a million x86 processors together you would hold the world record until someone strapped together a few more. Apart from being a slow site, it did not tell you much about what they systems were used for so I don't know if graphics performance is what you are trying to demonstrate with reference to this list.

Now if you bother to do the divisions, divied the winner at 5120 processors by the 416 used in the SGI box and see what the comparison is like. Unfortunately there is not a lot of data about how these systems are used and the gigaflops count does not help much.

Her is an independent test from another university.
http://www.cs.virginia.edu/stream/top20/Bandwidth.html


and besides, as I said before, it doesn't use x86 or DirectX, so it is not relevant to the discussion at hand...

What an interesting criterion, if its not x86 or directX, its not relevant ? When you are trying to make a point about directX apart from its native platform and OS, I wonder how you manage such an exclusion. It would seem that from the criterion you exercise, only x86 hardware and directX figure in the far wider world of graphics manipulation.

Now the history of directX is well know, waffle and speculation apart. Games of the vintage of the old DOOM series ran in 32 bit in DOS without Windows at all. To address this platform support problem Microsoft developed the directX system to allow video memory access for games programming, there is no argument that it was developed on x86 hardware for the 32 bit versions of Windows.

Now what you are assuming is that because Microsoft have a desire to port various parts of their OS and system to other platforms, that you have a one to one correlation between its performance on its native hardware and the next host hardware of a different system. Behind this assumption is what appears to be some ignorance to the differences to write code for different platforms. It may suit a particular ideology to plug the idea of multiport but it assumes portable libraries on different hardware which is really hard to get off the ground.

UNIX C is one of the few properly portable languages because the OS varies little from box to box with different hardware. This is fine in UNIX but try it with an OS base like Windows which is highly dependent on hardware specific code and you will find the problems.

As far as the existing limits of x86 architecture, try this discussion which is related to the 64 bit approach that AMD have in mind. It may dawn on you that 4 gig theoretical address range is not enough for the type of application that bigger hardware can already manage and I suggest this is why much of the clustering is using 64 bit Itanium2 processors.

http://www.devx.com/amd/Article/16101

Now if you bother to read it, it will give you some idea of why 64 bit registers will be very useful in terms of data transfer, number crunching and native large number QWORD operations. As I am sure you are aware of what memory addressing range is about, the shift to 64 bit hardware is simply a performance issue.

Now from your previous responses to the lack of technical data and argument, at the bottom 64 is BIGGER than 32 so it handles data in BIGGER slices so for any given frequency, it does things faster. To model this on existing X86 hardware, DWORD reads and writes are BIGGER than WORD reads and writes so for the same processor frequency, it moves more data faster.

Now translate this to you own hand coded assembler blitters in 64 bit and instead of piddling around with MMX, you can write directly to the screen in 64 bit chunks without having to use the MMX/FP registers at all.

Now your potted history of DirectX versus OpenGL is an interesting fantasy because OpenGL has been an "Open" system for many years that does not depend on proprietry hardware and operating systems. The scale of market resistence to Microsoft extending their leverage anywhere else is very extensive and this means that directX for linux as a widely supported extension is a long way from possible at the moment.. Then again with the documented limitations in an x86 processor at the moment, do other hardware manufacturers need to emulate x86 with direct X.

Regards,
http://www.asmcommunity.net/board/cryptmail.php?tauntspiders=in.your.face@nomail.for.you&id=2f46ed9f24413347f14439b64bdc03fd
Posted on 2003-11-30 08:23:25 by hutch--
f0dder,


scientica, it's patronizing and non-info from hutches side... and HE talks about ego massaging?


I am pleased to see that your analytical skills are trying to improve from their current state of atrophy.

Regards,
http://www.asmcommunity.net/board/cryptmail.php?tauntspiders=in.your.face@nomail.for.you&id=2f46ed9f24413347f14439b64bdc03fd

scientica,

Its comic relief from working on libraries and macros all day. Notjhing like watching someone make a fool of themselves. :alright:
Posted on 2003-11-30 08:30:42 by hutch--
it did not tell you much about what they systems were used for so I don't know if graphics performance is what you are trying to demonstrate with reference to this list.


All systems just ran a standard benchmark to determine the number of gigaflops they could deliver.
Now, if gigaflops aren't important to computer graphics (think stochastic raytracing), then I don't know what is. (your memory bandwidth benchmarks aren't, I can tell you that much).

Now if you bother to do the divisions, divied the winner at 5120 processors by the 416 used in the SGI box and see what the comparison is like.


That's not the point. First of all, it doesn't work that way. Supercomputer performance doesn't scale linearly to the amount of CPUs you add. Secondly, even if you did, you'd still not find the SGI box impressive, observe:

Rmax ratio: 35860/1793 = 20
Rpeak ratio: 40960/2163 = 18.9
CPU ratio: 5120/416 = 12.3

So we see that the NEC Earth Simulator actually has proportionally BETTER performance than the SGI, and not just a little either!
Thirdly, since neither the SGI nor the NEC Earth simulator use x86 processors, I don't see the relevance of these figures.
Your point was that x86 were useless for supercomputers, and as you can see, there are plenty of supercomputers outclassing the best Altix, the very supercomputer you brought up.
While they may need more CPUs than the Altix, I don't think that's the point really...
Let's pick that best x86 system (nr 4 in the list) and do the maths again:

Rmax ratio: 9819/1973 = 5.0
Rpeak ratio: 15300/2163 = 7.0
CPU ratio: 2500/416 = 6.0

Now we see that the Rmax ratio is only slightly lower than the CPU ratio, which is no surprise, since we already know that the scale is not linear in practice. The Rpeak is still higher than the CPU ratio however.
All in all, you could say that x86-based supercomputers scale quite well, compared to the Altix.
So what exactly was your point about x86, you said they weren't suitable for large scale projects? These numbers indicate something else...
What were the reasons for you saying that anyway? The TECHNICAL reasons I mean.

What an interesting criterion, if its not x86 or directX, its not relevant ? When you are trying to make a point about directX apart from its native platform and OS, I wonder how you manage such an exclusion. It would seem that from the criterion you exercise, only x86 hardware and directX figure in the far wider world of graphics manipulation.


Don't twist my words. We were discussing how x86/DirectX were technically(?) not fit for anything but PCs, were we not?
Then, you presented an SGI supercomputer, which used neither x86 nor DirectX... So how does this tell anything about x86 or DirectX then? Since the SGI's performance depends on neither. Hence it is not relevant to the discussion at hand.

Now what you are assuming is that because Microsoft have a desire to port various parts of their OS and system to other platforms, that you have a one to one correlation between its performance on its native hardware and the next host hardware of a different system.


I never assumed such a thing. You stated the opposite however. You stated that since x86/DirectX aren't used for anything but PCs, they can't be used for anything but PCs, am I right?
So, I asked you to give TECHNICAL reasons why they can't be used. Where do the technical limits lie?
And all you can come up with, is some unimpressive SGI system that is not relevant to this discussion at all.

Behind this assumption is what appears to be some ignorance to the differences to write code for different platforms. It may suit a particular ideology to plug the idea of multiport but it assumes portable libraries on different hardware which is really hard to get off the ground.


1) Can you mention these differences to write code for different platforms then? You claim they exist, but constantly neglect to actually list any. This sounds almost like a religion.
2) There is a difference between portable code, and implementing an API on multiple platforms. There is no reason why the DirectX sourcecode should be portable, if it is to be implemented on more than one platform. It can also be (partly if not completely) rewritten in an optimal way for the other platform, and still be functionally equivalent.
On top of that, most of the performance comes not from the DirectX HAL itself, but from the underlying driver, which is ofcourse hardware-specific anyway.

UNIX C is one of the few properly portable languages because the OS varies little from box to box with different hardware. This is fine in UNIX but try it with an OS base like Windows which is highly dependent on hardware specific code and you will find the problems.


Nice try, but not relevant, see above. Besides, Windows code is very portable, there's Windows for Alpha, MIPS, PowerPC, Itanium2, AMD64 and regular 32 bit x86, and in most cases, a simple recompile of the sourcecode will do the trick, assuming that the sourcecode itself is written properly, ofcourse.
This is actually easier than porting between different flavours of *nix.

As far as the existing limits of x86 architecture, try this discussion which is related to the 64 bit approach that AMD have in mind. It may dawn on you that 4 gig theoretical address range is not enough for the type of application that bigger hardware can already manage and I suggest this is why much of the clustering is using 64 bit Itanium2 processors.

http://www.devx.com/amd/Article/16101

Now if you bother to read it, it will give you some idea of why 64 bit registers will be very useful in terms of data transfer, number crunching and native large number QWORD operations. As I am sure you are aware of what memory addressing range is about, the shift to 64 bit hardware is simply a performance issue.

Now from your previous responses to the lack of technical data and argument, at the bottom 64 is BIGGER than 32 so it handles data in BIGGER slices so for any given frequency, it does things faster. To model this on existing X86 hardware, DWORD reads and writes are BIGGER than WORD reads and writes so for the same processor frequency, it moves more data faster.

Now translate this to you own hand coded assembler blitters in 64 bit and instead of piddling around with MMX, you can write directly to the screen in 64 bit chunks without having to use the MMX/FP registers at all.


Again, completely irrelevant... First of all, we already HAVE 64 bit x86 CPUs, so there's no reason to preach the advantages of 64 bit over 32 bit. Secondly, what does it matter if you write to the screen with MMX, FP registers or regular integer registers, as long as you shovel 64 bit at a time?
Thirdly, why would you use software blitters and renderers when you have hardware accelerators and/or are doing distributed offline rendering?

Now your potted history of DirectX versus OpenGL is an interesting fantasy because OpenGL has been an "Open" system for many years that does not depend on proprietry hardware and operating systems. The scale of market resistence to Microsoft extending their leverage anywhere else is very extensive and this means that directX for linux as a widely supported extension is a long way from possible at the moment.. Then again with the documented limitations in an x86 processor at the moment, do other hardware manufacturers need to emulate x86 with direct X.


Again, irrelevant. Wrong aswell, since there are implementations of DirectX on Itanium2 workstations (Windows XP for IA64) and G5 (MacSoft), as I have mentioned before. Neither use an x86 emulator. Which makes sense, since I cannot think of any reason why one would have to emulate an x86 in order to implement DirectX on a non-x86 system. And you have been unable to name one so far aswell...

So I'll repeat myself AGAIN:

Okay, let's ask again: What technical problems does x86 have, that make it not suitable for anything larger than PCs (except that it does quite well in the supercomputer arena?), and why would DirectX not suit larger systems, and why would OpenGL? Technical arguments please, this time.

Notjhing like watching someone make a fool of themselves.


Yea, but what's your excuse? :)
Posted on 2003-11-30 09:03:31 by Bruce-li
The big advantate that DirectX has over OpenGL is that OpenGL is committee run.
The ARB takes so long to decide anything that it is almost useless. OGL 2.0, which at the time of proposal (and still but less so) was much more powerful than DX8 (and now marginally so than DX9) is still in the development stage. The ARB takes so long to decide whether or not to ratify any standard that IHVs create their own extensions. The idea of hardware specific extensions flies in the face of the concept of an API, but they must do it because the ARB is so slow.

DirectX has been behind openGL in terms of finesse, the rules in openGL are solid, and have been around for years. There are neat tricks like re-using old data without re-sending it, and the rules state how this should be handled (purely in terms of output, so the driver may cache old data and resend, or the hardware may have state retention). These tricks are then used in applications to get speed up. DirectX doesn't have such things, and typically the behaviour is undefined.
Now these things prove that GL is a mature API, with a huge graphics legacy behind it.
DirectX on the other hand gets all the fancy new stuff put in to it as it arrives fresh from the IHVs.

There has also been a marked shift in Microsofts view of DX for "professional" applications, versions from 7 or 8 onwards have been including more aimed at the professional space. The results of which can be seen in the likes of 3DSMax, Pro/E, and Maya, all now supporting DX renderers alongside their OGL options. In fact all the hardware rendering on Maya must be done via a DX8 (or possibly 9, not sure on that) interface. This has mainly been allowed to happen by the ineffectiveness of the GL ARB to keep up. I'm pretty sure the ISVs would have liked to be GL only, but the fact was GL lagged behind because it couldn't ratify new GL versions, while DX was being dragged up by Microsoft (a monopoly does have advantages, it doesn't have to argue with anyone).

Mirno
Posted on 2003-11-30 09:14:16 by Mirno
I'd be interested in the cluster cost, too... the purchase cost of the systems, and the running cost in electricity... at first glance it might seem neater to have less but more grunty CPUs - but aesthetics aside, is it cost beneficial?

Now, all the rather irrelevant clustering and supercomputer stuff aside, could Grand Master All-Knowledgable Hutch please enlighten us mortals as to why GL is a better choice than DX? It's true that GL is available on more platforms, but there's no reason whatsoever (apart, perhaps, from legal reasons) that you couldn't do a DX implementation on other hardware than x86 (Bruce already mentioned G5 and Itanium2).

Oh, and remember that GL and DX are used for realtime-ish stuff - not move-style offline rendering like what pixar are doing. While this kind of rendering and the supercomputers involved are interesting, this is not really relevant to the discussion of GL vs. DX.

If you look at the APIs themselves, it should be clear that DX offers more control. Tell me how to change screen resolution with GL, how to query the hardware capabilities, hell, even something as simple as the texture memory and supported texture formats. You'll find that you have to use OS-specific code (screen resolution), graphics card vendor specific GL extensions (to do anything remotely interesting, realtime harware 3d rendering related anyway). OS and vendor specific... gee, sounds wonderfully portable, eh?
Posted on 2003-11-30 09:19:35 by f0dder
DirectX has been behind openGL in terms of finesse, the rules in openGL are solid, and have been around for years. There are neat tricks like re-using old data without re-sending it, and the rules state how this should be handled (purely in terms of output, so the driver may cache old data and resend, or the hardware may have state retention). These tricks are then used in applications to get speed up. DirectX doesn't have such things, and typically the behaviour is undefined.


Not true at all. In OpenGL the behaviour is often undefined, since in most cases you feed data from mainmemory to the driver (unless you are using some specific extensions).
It is undefined when and how the driver uploads this data to the hardware, and there can be large differences in behaviour and therefore performance between different vendors.

Direct3D on the other hand, works with 3 pools: SYSTEM, MANAGED and DEFAULT.
If you create a resource in the system-pool, it will be in system memory, by definition.
If you create it in the default-pool, it will be created in videomemory, if this is supported by the device, or the driver can choose to manage it itself in sytem memory, or at worst, let Direct3D handle it in the system-pool.
If you create a resource in the managed-pool, this means that the resources will be managed by Direct3D, and then it will have an internal copy in system memory, and it will manage creation and destruction of resources in videomemory by itself... The difference with OpenGL being ofcourse that the API manages it, not the driver, so it behaves the same for any vendor.

Now these things prove that GL is a mature API, with a huge graphics legacy behind it.


Legacy, yes, but 'current' hardware (the programmable kind) is supported very badly by OpenGL so far. In fact, the entire ps1.x generation is simply not supported at all. Maturity means nothing in this field.
Posted on 2003-11-30 09:25:04 by Bruce-li
On a technicallity, neither of you are right in terms of writing to the screen on the PC.
The AGP bus is only a 32 bit interface, so whether or not you have a 32 bit, or 64 bit CPU, pushing it across the AGP bus will still take a single AGP write burst length 2 to write 64 bits. Of course this will happen in one "clock" because of the effects of strobing (AGP 2x, 4x, and 8x).

In fact it is the AGP bus which makes graphics cards for the PC/Mac so slow in comparison to other dedicated hardware. The pin cost stops us from moving on to huge busses which give the real bandwidth the cards could use. PCI Express is still only a 32 bit interface, it just clocks higher (much higher, and the packety-ness should help reduce re-sending of corrupted data).

Mirno
Posted on 2003-11-30 09:35:29 by Mirno
mirno, out of curiosity, how important is the agp bus speed when doing some of the more fancy hardware 3d rendering? Isn't it things like shader speed and card speed (core, memory) that are the most important, as you ought to keep as much data possible static on the card (T&L and all)... or am I completely mistaken here? :)
Posted on 2003-11-30 09:47:09 by f0dder
On a technicallity, neither of you are right in terms of writing to the screen on the PC.
The AGP bus is only a 32 bit interface, so whether or not you have a 32 bit, or 64 bit CPU, pushing it across the AGP bus will still take a single AGP write burst length 2 to write 64 bits. Of course this will happen in one "clock" because of the effects of strobing (AGP 2x, 4x, and 8x).


This is true, but that doesn't really matter... With write-combining and burst-transfers, you'll never transfer just 64 bit at a time to the card anyway... it will go in chunks of eg 64 bytes.
However, if you do some testing, you will notice a difference between 32 bit or 64 bit writes, when you are doing burst-transfers. Both the AGP-queuing and the loop on the CPU-side are more efficient (you only need to use 1 instruction per 64 bits instead of 2, so you have a spare slot for another operation).
So I'm willing to give hutch-- that part of the argument... The relevance of this during hardware-accelerated rendering is another matter ofcourse :)
Oh, and for raytracing and other software-rendering techniques in use today, I don't think that the blt-speed is really the bottleneck either :)
Posted on 2003-11-30 09:49:31 by Bruce-li
Warning, this posting will not fit your theory of relevance.

The criterion of relevance seems to be endlessly elastic, shame others don't see it that way. It is being used here in what the future may bring in the absence of current hardware. If you subscribe to the AMD mailing list, you may be able to keep up with their developments.

Linus Torvald.
He goes on to write, "As far as I know, the _only_ things Itanium 2 does better on is (a) FP kernels, partly due to a huge cache and (b) big databases, entirely because the P4 is crippled with lots of memory". That crippling with lots of memory is due to what many people describe as a major kludge in the Pentium architecture called Page Address Extensions (PAE). According to Torvalds, "the only real major failure of the x86 is the PAE crud".
http://www.theinquirer.net/?article=7966

Now I am sure you can count so what do you see as the virtue of the 4 gig memory addressing range of current x86 processors ? Will it be when they develop into native 256 bit processors ? Intel have yet to annonce a 64 bit version of x86 architecture even though the rumour mill says they have one in the works.

So what are the boundries of current hardware ?

1. 32 bit is HALF the size of 64 bit but that does not matter, it does not fit your argument.

2. Memory addressing range of 4 gig is small along side 64 bit address range but that does not matter as it does not fit your argument either.

3. I will take Mirno's word on the data transfer rate on AGP because I am out of date in that area but this does not matter as it does not fit your argument either.

4. Silicon technology does show signs of age and there appears to be a limit on the technology of track spacing which in turn limits the absolute frequency of the hardware. Effectively winding up the wick is not without its boundaries and this technology is being pushed at the moment but this does not matter as it does not fit your argument either.


1) Can you mention these differences to write code for different platforms then? You claim they exist, but constantly neglect to actually list any. This sounds almost like a religion.

Well, I can help you with reference to Intel's PIV manuals for the instruction set but you will have to look up the others yourself. The differences ARE different instruction sets on different hardware unless you are naive enough to assume that they are all the same. You are making the mistake here of assuming that the "magic" portable libraries are available on all boxes with no difference in how they work. Great stuff if you don't code in asm, you don't have to look at the difference. Be careful though, this may not fit your theory either. :tongue:

Now as far as you humerous comments on 64 bit code in current x86 hardware, why do you think there IS 64 bit MMX registers as well as the 128 bit XMM ones. Even you should be able to guess this one, its because the native 32 bit registers won't hold them and historical fudges like using EAX:EDX pairs are slow. The whole purpose of changing x86 hardware to 64 bit is to address a number of problems I have mentioned, native 64 bit code and memory address range. Now this is probably not technical enough for you and it will not fit your current theory either.

What you have kept assuming here is that if you wait long enough, x86 will turn into something that its not.

f0dder,

could Grand Master All-Knowledgable Hutch please enlighten us mortals as to why GL is a better choice than DX?

On what, an IBM system 360. Perhaps you could elucidate your vision of what hardware it should run on before you ask the question. :tongue:

Regards,
http://www.asmcommunity.net/board/cryptmail.php?tauntspiders=in.your.face@nomail.for.you&id=2f46ed9f24413347f14439b64bdc03fd
Posted on 2003-11-30 10:09:23 by hutch--

On what, an IBM system 360. Perhaps you could elucidate your vision of what hardware it should run on before you ask the question.

Any system with a video card that does 3D hardware acceleration. As far as I see, there's nothing that ties the DX API to x86 hardware - just like GL isn't tied to proprietary SGI hardware. The main system hardware is actually rather irrelevant, it's the graphics accelerator it's all about - I don't assume you think GL is being used for movie-style rendering? ^_^
Posted on 2003-11-30 10:19:14 by f0dder
Now I am sure you can count so what do you see as the virtue of the 4 gig memory addressing range of current x86 processors ? Will it be when they develop into native 256 bit processors ? Intel have yet to annonce a 64 bit version of x86 architecture even though the rumour mill says they have one in the works.


That's nice and all, but AMD is already selling a variety of 64 bit x86 CPUs (do Athlon64 or Opteron ring a bell?), so all your 32-bit related stuff does not apply to x86, since x86 is 64 bit (and this also defeats the PAE crud obviously).
You'll have to come up with something better.

4. Silicon technology does show signs of age and there appears to be a limit on the technology of track spacing which in turn limits the absolute frequency of the hardware. Effectively winding up the wick is not without its boundaries and this technology is being pushed at the moment but this does not matter as it does not fit your argument either.


This is a general problem, not related to x86 alone. All current CPUs are made of silicon, so I don't see how this can be a disadvantage of x86 alone.

The differences ARE different instruction sets on different hardware unless you are naive enough to assume that they are all the same.


Erm yes, I believe the distinctive characteristic of a CPU family is indeed its instructionset. I thought this was common knowledge anyway, don't see why you need to point that out.
So yes, there are differences, as you said before...
But here comes the important question: How do these differences translate in practical advantages or disadvantages, or even limitations, as you spoke of? THAT is the whole point, you don't give any TECHNICAL arguments. All you're saying is something like this:
"Car A is red. Car B is blue. So car A is faster than car B".
This is also known as a non-sequitur argument.

You are making the mistake here of assuming that the "magic" portable libraries are available on all boxes with no difference in how they work. Great stuff if you don't code in asm, you don't have to look at the difference. Be careful though, this may not fit your theory either.


Re-read my last post about how libraries implemented on different platforms don't have to be written in a portable way themselves. This may actually come as a shock to you, but one of the most common libraries on all platforms is the C-library, and it works pretty much the same on all these platforms... YET it is in many cases written, at least partly, in hand-optimized assembly.
And another thing, even if you DO use assembly, you don't have to look at the differences inside these libs.

Now as far as you humerous comments on 64 bit code in current x86 hardware, why do you think there IS 64 bit MMX registers as well as the 128 bit XMM ones. Even you should be able to guess this one, its because the native 32 bit registers won't hold them and historical fudges like using EAX:EDX pairs are slow. The whole purpose of changing x86 hardware to 64 bit is to address a number of problems I have mentioned, native 64 bit code and memory address range. Now this is probably not technical enough for you and it will not fit your current theory either.


I think you are confused... You cannot actually manipulate 64 bit integers with MMX or XMM registers. MMX supports packed bytes, words and dwords only. SSE supports packed floats and doubles only. No 64 bit integers like with EDX:EAX.
I also fail to see how this is relevant... You want to use a part of the x86 instructionset as a proof of how the x86 instructionset lacks such an instructionset, or what?
Besides, you seem to overstate the importance of 64 bit. While it certainly has its uses, it's not the determining factor in most computer graphics-related scenarios.
For one, you don't use 64 bit integers anywhere in the rendering process itself. You use floating point mostly.
And secondly, the 64 bit addressing only matters if you actually need to use a dataset that large, locally.
In distributed rendering systems, such as the Pixar-one, this is not relevant either, you just distribute the data cleverly.
And lastly, as said before, since there already ARE 64 bit x86 CPUs, the whole 64 vs 32 bit point is irrelevant to the x86-discussion. If you want 64 bit x86, you can simply go out and buy it, today.

What you have kept assuming here is that if you wait long enough, x86 will turn into something that its not.


That's not an assumption, that is a historical fact. Just look at how x86 started out, and where it is now. In fact, this is a general rule, that applies to more than x86 alone. Computers in general turned into something that they originally weren't, in many ways.

Okay, let's ask again: What technical problems does x86 have, that make it not suitable for anything larger than PCs (except that it does quite well in the supercomputer arena?), and why would DirectX not suit larger systems, and why would OpenGL? Technical arguments please, this time.
Posted on 2003-11-30 10:33:38 by Bruce-li
f0dder, the AGP bus is a real bottle neck when dealing with large data sets.
When dealing with games, who tend to have less geometry, and textures of varying quality (so you can choose low quality for low memory cards) the bus-rate doesn't come in to it (other than how many milli-seconds you wait before the game starts).

Although I've seen a demo with 1Gb of textures. Ouch, that makes cards crawl :)

Mirno
Posted on 2003-11-30 11:53:42 by Mirno
Ah, so the AGP bus speed is mainly when dealing with insane amounts of data... when would this happen? As you mentioned, games tend to try to balance around what the current cards can do, and offer lower-quality textures etc. Perhaps in engineering? But there, wouldn't the dataset be limited to geometry? Most screens I've seen of engineering soft seemed to use non-textured polygons. I've actually been wondering about this for a while, since for games (which stress the rest of the hardware rendering) there seems to be very little difference between agp4x and 8x.

No doubt that PCI express will be a great thing, though... even if it might not make much difference for game performance, there's plenty of devices on the current PCI bus struggling for bandwidth. The hotplug capability etc sounds pretty nice, too.
Posted on 2003-11-30 12:06:49 by f0dder
I agree with f0dder... If you have large amounts of geometry, you need to render in a streaming fashion, and then the performance is not just dictated by the AGP bus, but ofcourse also the graphics hardware itself.
As long as you can upload more dynamic geometry than the card can handle, the AGP bus is not a bottleneck.
Also, since you generally want to visualize only a part of a large set of geometry, this should not be a problem.
You'll want to cull stuff on the CPU first, and/or generate low-res meshes for realtime interaction, and refine the meshes when there's no interaction.
Again, this is more a problem of the graphics hardware's rendering speed than the AGP bus speed itself.

The same goes for textures... You can never visualize such a large set of textures, so you could do some pre-processing and only upload the relevant mipmaps for each texture for the present view... Or again, use lowres textures during interaction, then refine when there's no interaction.
These are very common practices actually.
Posted on 2003-11-30 12:41:08 by Bruce-li

I doubt that the future is based on spheres with ugly shading, running at low framerates without any kind of filtering whatsoever.

It's now. And under developing.
Let me not to compare the software of one man and 300$+ hardware nVidia cards.
Are geforce cards able to do such REALTIME "ugly shading"?
NOT. They can't do raytracing. Just accelerate Wolf 3D technology + some advantages + some... + ...
Let's try to compare power of CPU and GPU.
And remember Moore low.
What about GPU evolution? :grin:

I remember first 3dfx game Turok. (of cource revolution, on P200mmx)
Nowadays, imho, games are not so much better than it was.

Well, when we get 4*A64 3G+ CPU, will be flexible software rendering worse than flat triangles rendered with SLOW ~500MHz GeForce GPU?

What i wanted to say?
I remember SoundBlaster for 200$.
How mach does sound card cost now?
And where on the M/B it is located :grin:
~0,5$ for connectors and AC97 codec.
Coz, if we are going to use 400$ speakers we'll buy 200$ SoundCard. Just for quality of sound. No more.

People who will code for 64bit CPU could kill mopdern gfx cards.
But the other side, GPU vendors have marketing department...
Posted on 2003-11-30 15:11:59 by S.T.A.S.
modern graphics cards can do prettier graphics than that simple raytrace stuff you posted. Doesn't even take a top of the line model to do that. I think it'll take quite some time before realtime raytracing can compete with GPUs...

And hell, there's prettier realtime raytracing stuff around.
Posted on 2003-11-30 15:16:37 by f0dder
It's now. And under developing.
Let me not to compare the software of one man and 300$+ hardware nVidia cards.
Are geforce cards able to do such REALTIME "ugly shading"?
NOT. They can't do raytracing. Just accelerate Wolf 3D technology + some advantages + some... + ...
Let's try to compare power of CPU and GPU.
And remember Moore low.
What about GPU evolution?


I don't think we will go to raytracing anytime soon.
Triangle rasterizing simply has too many advantages to just give it up.
It's highly parallelizable, and it can be performed by simple, cheap, and very fast dedicated hardware.
It also allows for very sophisticated texture filtering methods and other forms of antialiasing.
Raytracing is just inherently slow, especially when it comes to triangle meshes, which are the most common models, since triangles are the easiest primitive to model with. Spheres are pretty much useless for modeling anything other than... spheres... (I find that rtrt game rather funny, actually, building everythign from spheres... not much realism there :)
"Oh no, help! I'm being attacked by a bunch of spheres!!!! What do I do? Oh I know, I'll shoot some spheres at it, with my sphere-gun!"

"Quick, hide behind the sphere!"
-"AAAAGH WHICH ONE?!")

Currently we use raytracing for preprocessing of the data, generating lightmaps, occlusion maps, normalmaps and such.
While I have no doubt that eventually, the shaders will become sophisticated enough to allow raytracing, I don't think this will mean that we will simply abandon the triangle rasterization approach.
Think of it like the current situation... We have pixelshaders, several versions even. But this does not mean that we use the latest version of shaders everywhere. We use the simplest way we can, to get what we want, because it is fastest.
Likewise, raytracing will probably be applied only where it matters.
One day, hardware might be fast enough to go 100% raytracing, but we're a long way off yet, and triangle-rasterizers can already create stunning results, with excellent speed and quality, approaching raytracing, with reflections, refractions, per-pixel lighting and shadows.

PS: current hardware actually CAN do raytracing... There's a paper on how to implement a simple raytracer on an ATi Radeon 8500, and several people have done some slightly more advanced raytracers on Radeon 9500+ hardware.
(for the ongoing OGL vs D3D discussion: The R8500 raytracer cannot be implemented in OpenGL without using specific ATi-extensions for the 8500+, and will not work on any SGI system, or other brand. And until recently, the 9500+ tracer could not be implemented either, but now OGL has the ARB2 fragment program, which is a ps2.0 clone, so they 'fixed' that... Again not supported on SGI systems though).

PPS: Wolfenstein 3D didn't rasterize triangles, it was a raycaster.
Posted on 2003-11-30 15:22:36 by Bruce-li