well, the other night i was browsing the risc, cisc, vliw and epic pages of wikipedia and i read again the concept of the harvard architecture, which i quite like, and i learnt about the von neuman paradox, and hardware architecture is such a wonderful subject, almost philosophical sometimes... :D

i understood that current mainstream processors are quite similar and general purpose, and in some way, already designed for ease of programming and abstraction, namely the model that code and data are equal data, with programs as abstract independent preemtive threads, with an execution stack for return adresses, arguments and local storage, plus a pool of allocatable memory called the heap, all this on top of a protected, virtual memory system and controlled by an operating system. the unix model, somehow.

and i discovered somewhere the concept of configware http://en.wikipedia.org/wiki/Reconfigurable_computing and i spent some hours reading about this strange designs and all the promesses they hold, and it slapped me in the face quite hard. i remember when i had a huge crush on hardware raytracing like, five years ago when i saw some research project on it, and i was bugging everyone around me :) about the future and all, when really no one would care, and now i see the hype is becoming mainstream and intel has hired the very guys of this very team in germany, and is preparing to conquer the world with parallel "software" tracers making a hardware accelerator, and i can say "i told you so"... :D

well i felt quite the same about this but i feel this has the potential to revolutionize both hardware and software... computer science...the revenge of the ones who are close to the hardware, in some way :D... dont code uml, models and classes... THINK of your current task at the hardware level, microcode it, without objects or loops... with gates, then feed it with data and maybe some instructions, and let it roll!
then change it again...it seems to me this puts into question the way systems work today... but then again i might be wrong since i'm far from having understood much of these systems. but that definitely sounds interesting.
it is a generalisation of the concept of processors...so many new things to discover... that could be an answer to madprgmr's rant on being too perfect to feel any challenge or fun anymore :)

and this: http://xputers.informatik.uni-kl.de/staff/hartenstein/lot/HartensteinSantaFe04.ppt#1813,55,Diapositive%2055 , is f***ing hilarious... :)
Posted on 2008-02-15 21:11:20 by HeLLoWorld
Completely Raytracing all graphics in games is not viable, and never will be the optimal approach. The basic reasoning is ... that Von Neumann syndrome. Quite well seen imho in ATi's HD2900 card - 320 parallel processors, that barely beat the performance of 32 processors.
And the other reason is the constantly increasing expectations of gamers for better graphics. I was also all gung-ho about raytracing, deciding on a thing like "I'll skip studying these damn rasterizers and wait for that raytracing card to come-out". But with shaders, multiple passes and smart drawing algorithms, you use the best of rasterizing and only at the necessary places - the most beautiful graphics raytracing can give (by approximating the reflected/refracted scene via a texture or a cube-map). Here's a complete article on the subject:
http://beyond3d.com/content/articles/94/

The von-neumann syndrome is kinda tackled with the CellBE cpu with its 8 synnergetic processing elements.

Configware can be a nice addition, but it'll be too inconvenient thus financially risky (for Intel and AMD) to add to x86 cpus even as only a small unit like the FPU and SSE. But the FPU and SSE were obviously necessary, while we can live without configware. (there won't be many coders that can tackle it, imho).
Posted on 2008-02-16 06:38:48 by Ultrano
:sad:
Posted on 2008-02-16 10:47:35 by HeLLoWorld
:D
Posted on 2008-02-16 10:48:06 by HeLLoWorld
http://beyond3d.com/content/articles/94/

a great part of this i already know. research concerning dynamic scenes has maybe things to offer. for linear transformations of scene objects i think it's ok (you can merge the BSPs) , but i reckon a fully dynamics scene (which is what is desirable) is a problem.

cone or pyramid tracing can solve the problem of aliasing, i read papers on this, of course it's more expensive, but there sould be many shortcuts... its true though, that tracing exponentially increasing number of secondary rays is bad for performance (zero cache coherence after one or two bounces).

These proven results suggest that traditional Whitted ray tracing has relatively low lighting and image quality, and requires largely static scenes compared to what we are used to already

this is blatantly dishonest.

one, the "low lighting quality" is just a way of saying "it doesnt solve the equations perfectly", while its still orders of magnitude better than rasterization alone, which _absolutely_ doesnt take into account the environment of a given poly (radiosity etc). And, environment textures, while they work well, are just not the right way of doing things (zoom, you see texels, dezoom, you waste computing and must filter ; dynamically compute, you still have the problem of mipmaps etc. pixel based approch is the right way).

two, the "low image quality" due to aliasing can be solved by volume rays. Saying this is like saying rasterizing cant properly render textures cause you have to filter them if you zoom out. there are algos that accumulate the "amount" of color that a ray must take wshen it passes very close to some edge of an object.

three:"requires largely static scenes": this is incredible. open your eyes! rasterizer engines are growing in complexity just because of this. rasterizers today just can't do without bounding boxes, BSP, PVS, portals, and LOD, (not to mention precomputed scene lighting). they try to avoid to transform and trace everything not in front of the camera, and everything occluded. And, they try to trace front to back.

the key factor, is, are we in the case when polys<<pixels (90s) , polys~=pixels(2000s) or polys>>pixels(future, maybe, without LOD).

as an unrelated side note, i think the next step will be to get rid of texttures as we kno them. one could already store DCT coefficients instead and decode on the fly without need for filtering although that would be expensive. but sampling reality and storing it in a big array is quite inelegant and space hungry, i think the right way is to procedural everyting.
Posted on 2008-02-16 11:37:57 by HeLLoWorld
but i wanted a damn talk on reconfigurable machines! :mad:  :D
Posted on 2008-02-16 11:42:31 by HeLLoWorld
OK, I'll hush about the 3D stuff.

As far as I understood, the hardware should be FPGA-like, only with SRAM-controlled linking between cells. And using interconnections between the gates instead of one data-bus. And also chaining gates just like we chain asm instructions. And where there's a comparison+jump, simply do routing like in microelectronics. All this looks like simply using some FPGA, that doesn't age when re-programming, and we rely heavily on SMC. 
It keeps only the necessary data in local D-flops, starts ALU execution immediately, and the next operation is started immediately after the ALU completes. Also, you could be doing some other stuff in parallel.
So basically they only get rid of instruction-decoding. And jumps could be done in two ways: 1) pipelining streamed data (very strict timing, no loops/recursion)  2) using the whole pipeline for one chunk of data, only one row of ALUs from the pipeline work, but there are loops/recursion.

It doesn't sound like universally beating CPUs in everything. To me it sounds like a good way to make customizable supercomputers, for solving problems where random access to memory is rare/nonexistent. Stream in, stream out, reconfigure, stream in, stream out - that's it. It's awesome for some things, where you can massively-parallelize and then directly send computed results to other ALUs, skipping the usage of cache-memory, then do more calculations, and finally stream-out. Then for solving another part of the problem, reconfigure the gates and send another stream in... Sounds like only DSP and N-dimensional Math problems can need configware.
Posted on 2008-02-16 15:40:06 by Ultrano
OK, I'll hush about the 3D stuff.

mmh, that makes me sound dull :D , in fact i don't mind, i wouldn't have replied otherwise, i know i don't hold the truth(tm), i just like arguing to see what others have to say. i also know that often, things have a good reason to be the way they are :)


So basically they only get rid of instruction-decoding

It's not unlikely this speeds things several orders of magnitude, aint it? as they say, "programming in space instead of in time".
and, maybe loops are not that much needed when you have this flexibility... i don't know. and, recursion is overrated anyway :D

Sounds like only DSP and N-dimensional Math problems can need configware

mmh, i don't know... maybe the gates will be reconfigurable in realtime, maybe there are more possibilities...
Posted on 2008-02-16 17:41:59 by HeLLoWorld
Instruction-decoding is quite non-problem, especially on Athlon64 and later. And quite non-problem anywhere when the code fits in L1 i-cache.

But anyway, I think there are enough unknowns to be unable to judge whether cpus or configware can be superior:
1) how many picoseconds does a modern ALU on modern silicon take for each common bitwise/integer/FP operation.
2) can the ALU signal early on operation completion (using mod-N comparison of output, iirc) or it uses delay-lines.
3) How many picoseconds are usually wasted in cpus in wiring around the ALU

Maybe he shouldn't underestimate the cpus' out of order execution.
There's also a conflict in his design, about the units being able to dynamically change the address of the datastream at no cost and do nested loops.
Btw, if I read correctly, he did describe the gates are not SMC during runtime, but didn't say that there's also a circuit to do that specifically.
Maybe because I'm too tired now, I have the feeling this design is so full of holes, and is suitable for nothing much (that could be anyway made in FPGA without any useful opportunities missed). It's been thrown-out by the initial designers, and from what I saw online the design is mostly in vaporware state even now. Or did I miss the URLs of commercially-designed, or at least prototyped (even if only homebrew) IC? The compiler also is nowhere to be found, neither are any examples. Just syntaxis info, "how to use the compiler", and a bunch of funny-looking pages from the same author. It smells like the concerns I've been expressing are genuine. Or if there's a prototype, working schematic, working compiler/assembler, or a finalized non-self-conflicting design spec (that also doesn't surpass even me by orders of magnitude in smirking at things) , please post a link :).
Posted on 2008-02-16 20:30:05 by Ultrano
mmh, no, i don't know of more links... and, the pages are quite old and things don't seem to have progressed...i guess you must be skilled enough to judge... i just found the concept very cool.
Posted on 2008-02-16 21:08:02 by HeLLoWorld