I am a bit confused because I have written .drv files for video and key and BIOS code and disk interfaces and when they are bundled with an OS, I can just as easily load my .drv, call init and then call the internal functions to display a window or actually a rectangle with text in it or load picture files directly through IDE or SCSI and it seems that the OS actually does nothing, because if I use the standard interface to Kernel32 or whatever, they just get rerouted to my code in the .drv or .sys file with a little HAL mixed in and I have written HAL too. As a result, from the time I make a call to draw and where it ends up, the code in the middle does nothing except force me to put the operands in one order which is rearranged to a new order for the call to .drv. I understand that the advantage of MS is in establishing a standard for software, but that has gone to the wayside like IBM, who created standards and then lost control. It seems standards have become ingrained and a new standard is needed.
Posted on 2007-06-22 19:05:27 by genomist
This resequencing you resent imho makes for a stable system, designed to run on all PC hardware. If you look closer on a higher-level, there's much more to Windows as an OS than a simple HAL. It is there to give end-users a kick-start at using computers for a wide spectrum of tasks, and to let developers easily widen that spectrum. It is also there to enable hardware manufacturers add new functionality to the base system via PCI cards or external peripherals. Functionality, that the software developers can then use easily via the already-present API and third-party extentions.

Why do you think the Win32 standard has gone wayside? True, there are parasitic /obsolete API, which are supported even nowadays - those API and ideas having spawned thanks to the PC hardware and its limitations, that have been present during the planning of the Win standard. Thus there are many different ways to access some hardware. (i.e soundcards: via winmm, dsound, asio, dshow; and networking). But, thanks to those many ways to access such resources, a developer can easily decide which interface he needs - by knowing what are the plusses and minuses - to make robust software quickly.

Despite the inconsistent naming-conventions, and seemingly-bad inconsistent interface-ideas, the Win32 is the most convenient, and complete standard - which supports legacy and doesn't limit next-generation software/hardware. Win2k finally managed to add stability to the solid standard. That's why w2k is definite for professionals, and winXP is definite for regular end-users (thanks to its graphics, skins, and easy interface). New standard? No, thanks - no-one (except for device-driver developers) will spend 2x time on developing the same software to run on the definite Win32 standard, and anything new.
Posted on 2007-06-23 01:57:11 by Ultrano
You have a valid point in the fact that a standard is required and ,however it is established, it profits everyone to cooperate.
I was not implying that the interface to existing software be changed, but that new standards for new hardware be established before people waste effort trying to establish conflicting standards where none presently exist. The standard is adequate for what it does but is not flexible or extensible enough to encompass new technologies.
If I call a function which calls a function with the same variables, the intervening code is not needed, no matter how many times it moves the data if it doesn't change the data itself in some significant way.
The data at the .drv of the mouse is not changed in any significant way before it is used and interpreted by the program and the visual or other response is not changed in any significant way before it reaches video.drv or ide.sys or any other hardware interface.
I have in the past short-circuited window calls to improve benchmarks for products, and it does not change the user or software interface. When the PC was first created, the BIOS video call for dot write was very slow and ineffective, as a result, people coded around it. Would you force people to use the IBM BIOS? To improve our video card benchmarks, I rewrote the video call and patched out Int 10h. I simply cut out the waste.
The same situation exists now. Neither IBM then, or MS now, has any profit motive to deliver good code. As a result, they become obsolete if they do not compete.
The gain is that I can compute faster and as a result, I can do things that are not feasible to do otherwise.
Do you imagine that if I sell a game that delivers 10000 vertexes in Windows(r) with the OS bypassed and delivers 5000 otherwise that anyone would  choose to do the later? It loads under the OS but doesn't use it. Noone ever knows the difference except a few hardcore gnomes like me.


Posted on 2007-06-24 00:53:58 by genomist
genomist,

I believe you've been looking around inside of windows and have missed how it works. Yes,
the operating system does give you access to routines which many of them do call functions
that are almost (or at times exactly) the same, which then either does an Int2e or preforms
some generic task. But, the reason this is done is because of the level at which the routines
are being executed. The Windows API gives you access to a large collection of user mode
routines (for example CreateFileW), these routines can be accessed by any user on the system.
When these routines are called, they in turn call routines at a higher level (for
example NtCreateFile|ZwCreateFile). These routines, found in the system service (in our
example they are in the NTDLL service), are capable of calling the kernel trap handler Int2e
found in NTOSKRNL (on NT based systems).

Because of this, the system can effectively seperate the user from being able to directly
access the hardware without the system administrator installing a driver and/or service to
allow them to communicate with it. It adds a layer of stability in an otherwise hostile
environment. Of course you can get faster graphics if you write your own driver to bypass
the OS and directly interface with the hardware yourself. But then you sacrifice portability
and restrict your customers to only the devices in which your generic driver supports. Because
at the same time of blocking the user from the hardware, it keeps the developers from having
to deal with too many hardware specifics (unless they get into driver development themselves).

Truth is, if you want all that extra speed of direct hardware access, that's what the WDM/DDK
is for. If not, use what Microsoft has given you access to.

Now, as for Int2e (just a note of warning); If you decide to inspect NTOSKRNL.EXE for the
Windows IDT in an attempt to speed up your code, be forewarned. Microsoft has good reason
for using names rather than function numbers. Because of the massive number of developers
at Microsoft working on the Windows OS routines can get moved around as they are updated,
removed, added, etc. This causes the handler's numbers to change, not just between versions
but sometimes between builds. So just because a routine in Int2e's EAX does one thing on
one Windows XP Home SP1 install, doesn't mean it'll do it on all Windows XP Home SP1 installs.
When dealing with Microsoft, I don't ever suggest using the Int2e service directly, it's
really bad practice.

For your future reference:
Kernel System Calls are between EAX: 0x00000000-0x00000FFF
Win32 System Calls are between EAX: 0x00001000-0xFFFFFFFF

This design is Microsoft's "standard". Their approach is to make things easier to use and
as portable as possible (for hardware) with as little hassle on the user as possible. It's
illogical to think that they would redesign their system for optimal hardware access while
giving up the ease of design and stability the design they are currently using. When it
really comes down to it, what does the user really want. To be able to have 2 times the
number of polygons/triangles drawn on the screen, but only on a select collection of
supported video cards. Or have half the number, it still look okay (since most games have
to be dubbed down to 25-30 fps anyways) just not as good of graphics, but you are guaranteed
that it will work no matter what video card you decide to go with.

:lol:. o O (hmm, sounds like a familiar argument)

So what happens to your game/program when you're writing your graphics using Int10 to get
twice the number of polygons/triangles and your competitor is using DirectX9. Then Joe
Customer comes along with an ATI Radeon Xtreme 1100Posted on 2007-06-24 14:40:04 by Synfire
The various layers in the NT system don't really slow down things that much anyway - only if you're a retard and, say, use functions like GDI SetPixel(). No matter how fast & optimized & bare-to-the-metal you can do a SetPixel(), I could write graphics routines in C++, Pascal or probably even Python that outperforms it, as long as I have framebuffer access.

Even using The *W functions instead of *A functions (which skip ansi->unicode->ansi conversions) don't bring you much, even on old hardware.

I'd like to see you double the amount of delivered vertexes/second, considering games bump into hardware limits rather than software limits. And what's with the int10h reference, make up your mind whether you're talking about windows or DOS+BIOS, please.

I'm not something people shouldn't focus on writing good code, and should totally disregard any amount of overhead. But I'm tired of people who get nitty-gritty about things that are hardly measurable even on several years old equipment, and are willito sacrifice stability/portability/readability just for just about zero real gain.
Posted on 2007-06-24 18:14:45 by f0dder
Also considering that on any recent (NT based) Windows system the DOS style apps that use Int10 and such to access BIOS routines are doing so through a virtual environment, the idea that you would get an enormous increase in speed would be very unlikely since Windows would be loading the virtual dos machine (basically an emulator) then after translating/executing it would do the drawing like normal through the WinAPI. I didn't even think about that when I made my post. Kinda negates the whole argument of using Int10 on Windows (as well as the speed increase claims). Maybe on DOS, but not on Windows.
Posted on 2007-06-24 19:36:15 by Synfire
Synfire & all:
The short of my argument:
Multiple CPU + single threaded software = bad  + slow + fast to code
Multiple CPU + multithread software      = good + fast + slow to code
Multiple CPU + bypassed single threaded software + multithread replacement = good + fast + already done

The only reason I mentioned INT 10h is because I am old and I worked on the design of the first PC's. That refers to a time when the only available interface was VGA and games like the original Sierra adventures were popular. I will try to be more current in my arguments for examples. Nobody that I know of uses Int 10h in windows anymore. That has got to be a rag, switching context to V86 and back to write a single memory byte EEK!
=(0b8000h/a0000/below for current standards) +7  present etc...
Mov Eax,Cr0
Mov Cr0,eax
Mov [0],character ;as many time as you like, about 100 times faster than Int 10h
write is rubber banded usually and read stinks for performance
actually the buffer is D800.0000 or so depending on card, can be looked up in PCI.
I do this all the time, so don't tell me it can't be done.
I suppose you would have to be at ring 0 for that :)

Thus, when I play Half-Life Source(r), Half-Life II(r) and Lost Coast:
Source seems like a cartoon and it does not give me a sense of being in the game anymore because I play the newer game. It is the number of textured vertexes that makes it somewhat more real, shadows, and complexity.
A more current example might be the fact that multiple cpu's are becoming more common and with this, there is an advantage to being able to have code that is threaded differently.
I think that a game has started, where AMD and INTEL will compete on how many CPU's can be utilized in a single package and it could break Morgan's rule.
How exactly could a wine or windows or linux or mac OS use this when the operating system has evolved in a state where paralell action is not considered.
As far as acceleration and hardware limits, that is confusing to me too.
Example:
Video Hardware A is operating at speed X
Video Hardware B is operating at speed 2X
Which one is bumping?
The ratio and speed of code executed on main +  the ratio and speed of code executed in VPU.
The formula IS :
X / IPS(CPU)+Y /IPS(VPU)=Total time per screen
The percentage of time executing a system task is increasing with respect to the time of execution of VPU because they are advancing more rapidly than the box they are in.
Suppose that some goofy gnome can implement a 3D interface that is 2X the current speed, does it not make sense that X/1 + Y/2 = 1.5 and X/2 + Y/2 =1 and X/1+Y/1=2 if X and Y are equally balanced computing loads?
So long as video speed*load and system speed*load are matched, it makes no great difference whether the code is able to use cache, multiple cpu, avoid stall, avoid flush, etc as it is not holding things back, Are we talking about the new systems that I test with or currently available retail?
Code reusability is only good so long as THAT code is for something that is used in the future.
You may praise betamax, but I will plan to keep my video on DVD.
Microsoft will not suffer, they will simply anounce that they have been working on paralell software for 10 years and they are way ahead and Vista+8 will have that in it. As far as whether it does, is not so important as the fact that people will buy it because it uses a buzz word. If people bought computers because they wre technically correct, it would be bizarre, because not everybody can be familiar with the technology that they consume. People can't buy what isn't available and I don't have 200 billion $ to convince people to buy only stuff I recommend :)
I go to the doctor and he tells me that I need Lasix, as far as I am concerned, I don't know sh** about it, but if it is the only game and what is available and it works, what the heck?
I would rather have a bionic eye that can zoom and see in ultraviolet and infrared but it isn't available, apparently.
I think MS and others are waiting for Intel or AMD or somebody to come up with a CPU that accelerates existing code with multiple CPU's and perhaps that is what will happen instead, however, I can't see how and I design hardware. If I could do that, that is what I would do, because that is what would sell. There is just no way to pre-perform an operation when it is dependent on another  operation that is not yet completed. The coding practice has to be different.
I once told someone that PS2 was going to shoot craps, and a CEO told me he was betting everything he had that it wouldn't. It was the common principle that you follow IBM because they have been right for 27 years. I think that he can be reached in the van he lives in down by the river that flows by the country club he used to be a member of.
I think that Google has more power than any other company in the world.
Posted on 2007-06-26 00:26:01 by genomist
"bypassed single threaded software" - making up your own terms? never heard of that before, anyway. Yes, multithreaded software can be hard to write, especially if you want to parallelize algorithms, but it's the only way to take full advantage of SMP.

You don't switch from/to pmode for V86 code.

Framebuffer location is dependant on the card, and afaik you'll need card-specific drivers. Yes, you can look up ranges that the card would like to be mapped via PCI config calls, but does that tell you whatthe memory ranges are to be mapped for? My current card has 4 mapped memory ranges, three I/O ranges, and an IRQ.

No reason to go that low, anyway, since DirectX exposes the framebuffer. For modern stuff, you do textures and 3D anyway, having framebuffer access isn't too useful for that. Both DX and GL expose what you need.

"Morgan's rule" -> "Moore's Law", perhaps?


I think MS and others are waiting for Intel or AMD or somebody to come up with a CPU that accelerates existing code with multiple CPU's and perhaps that is what will happen instead, however, I can't see how and I design hardware.

I doubt that's going to happen, really. It's possible to design programming languages that better facilitate threading, but it's not easy to suddenly parallelize existing binary code, at least not for x86. Some things are inherently easier to parallelize up than other things, like GPU shaders...


There is just no way to pre-perform an operation when it is dependent on another  operation that is not yet completed.

Speculative/out-of-order execution, which was implemented many years ago on x86. Extending that to auto-thread code is a completely different issue, though.

Anyway, I fail to see a point or even a bit of coherency in your posts :)
Posted on 2007-06-26 05:51:03 by f0dder
As far as gamedev is concerned, Microsoft can't be at fault at all imho. I think gamedevs are at fault, with their bloated C++ code. DirectX is good enough, and openGL also seems to be well-developed. I don't know how good the ATi and nVidia drivers are, but with nowadays gamedev'ers relying way too much on the C++ compiler (VC++2k5), its STL, and constant cpu/gpu improvements, I can understand the code-bloat we see in every game.

The bottleneck on # of drawn-vertices with games has been solved by vertex-buffers, only the gpu's fillrate is valid in there, imho. Provided that driver-developers and ATi/nV use the obvious optimization of using 2 extra FIFO queues (that are placed in memory, that is accessible from ring3), an interrupt for feeding the 2 smaller FIFO, and specialized a state-object for feedback from the gpu. (all this is in order to remove ring3->0 transitions, and get max throughput).


To improve the performance of games on the PC, game-engine devs should simply learn almost everything about the hardware: cpu+mobo+gpu+ram , and most importantly be fluent in x86 asm. In order to write better engines, and use unthought-of optimization tricks, instead of their current fantasizing how computers work lol. Like doing float-comparison with simple integer instructions and removing branches with cmovXX, for instance. Their "SSE optimized" routines are usually C++ func, compiled with the SSE switch of VS2k5 >_<.

"bypassed single threaded software" o_O? Only a few things can be made to work in parallel. Forcing just about any single-threaded app to run on 2 cpus at once will crash it immediately. Win2k and later do juggle every app (thread) around cpus, but this is absolutely different.
Posted on 2007-06-26 06:28:54 by Ultrano

(...) but with nowadays gamedev'ers relying way too much on the C++ compiler (VC++2k5), its STL, and constant cpu/gpu improvements, I can understand the code-bloat we see in every game. (...)

Well my knowledge will never match to that of you boys but... Ultrano, I do not really think the STL or C++ are so bad tools at all. And since we are talking about simplicity, I think it would not be wrong to say that they do their job.

Just for the sake of this conversation I would say veteran programmers are to blame. They made the STL, they made the compilers and they are the ones who recommend people using the tools they provided. The veteran community (usually) give the tools also but leave the production programmers alone to unfold its properties (I said usually) without proper documentation. When this happens, usually ends up on bad implementation (which I must recognize, I am into this group) which then is copied over and over again by other new comers to the programming scene. Then these bad implemented projects results on frameworks which are even worse from the point of view "best-practice", making a vicious cycle. All in name of the K.I.S.S.

All in all, I would say everything is part of evolution. But on the other hand asm would not be able to solve all the problems. So we must live with bad implementation and easier coding if I may say so.
Posted on 2007-06-26 09:07:42 by codename
My remark on C++ and STL is that most coders have become too spoiled, even where performance matters. I've discussed exactly this matter with handheld/pda gamedev'ers , and all of them said "hell, I don't care about the extra performance. C++ does the job, so there's nothing more I can get with optimization". But I can list things that they would otherwise get: more (happy) customers, more gfx, less limits on gameplay. Heck, just a week of optimizing parts of the engine in asm, and one will get a fast reusable base for their next games, plus a broader range of features/gameplay-elements to add/consider on subsequent games. (in the company I work for, we proved this to ourselves  with several games and apps)

I've already had enough of such developers, with their inexcusable behavior/mindset. I see everywhere software, that literally tortures every customer. Be it ridiculous limitations or slow response-time, this is becoming increasingly infuriating every year. And the worst part is that the customer never expresses any disappointment in this, so the whole S&M act goes unpunished. At least, all customers provide "thank you" feedback with words like "wow, this game/app loaded/did_the_job 60 times faster than any other, plus added much more graphics/features" :D

When I come back to Win32 from the handheld-world, I always get happy that Microsoft did a good job there. The only missing thing for me is ... a SwitchToThread() with a parameter :).

Btw, imho another part of the reason for gamedev'ers to become so sloppy is the mindset "well, tomorrow there'll be R700 and GF9, and even faster cpus". Meanwhile, developers for console games know the limits right from the start. That's why now "Black" beats just about any current PC FPS, on hardware from 6 years ago  ;) . PS2 dev'ers go down to the metal, and monitoring the FIFO queues' load and stalls on the 3 paths to the PS2 gpu, cache contents and so on - is the basics of optimization there. Light-years away from the PC dev'ers act of constantly thinking "hmm, will the next hardware fix this problem automatically for me".

P.S. codename, the creators of HLL aren't to blame at all. It's the coders, that got so enchanted with HLL - that they'd avoid learning the hardware like the plague - that are to blame. You can't go blaming the creator of a simple tool, if someone misuses it. HLLs are extremely convenient, and relatively few things need optimization, while the rest has to be coded without fantasizing how the cpu works. Yet, coders nowadays blindly avoid asm, and instead have a blurry fantasy about it all, derived from looking what operations/syntax different HLLs have. Like "the cpu allocates memory automatically for me, and then calls the constructor, and I've trained it that this operator+= will call that function, but I should first make it convert between those two different things HWND and DWORD".  :lol:
Posted on 2007-06-26 15:43:38 by Ultrano
Oh forgive me if I was not very clear. I agree with you when you say performance is important. But well, that is why I said 'veteran' programmers. Not just HLL coders. What I mean is that HLL coders will develop a new high level compiler for it should be high level. In their documentation they hardly leave any topics like "performance". When there is dedicated documentation on performance issues for high level languages it is usually not very well written. Probably because binaries from high level languages were not made to be specially fast but specially easier or faster to code.

While compiler coders write their compilers to increase production, other groups of veteran programmers will not care about performance at all in their projects. While the worlds of systems and drivers programming are pretty interesting, game developers usually just want the job done. I bet a game developer who works on asm would probably have the optmization ideology but there are people who code games even in python or visual basic, thus those would probably want just the job done I believe. Anyway, for this last group of programmers, they usually get their skills from veteran programmers who dont care about it at all, and write articles and frameworks which produce worse code but most times easier to bake.

I wish everybody would write better code too, myself for one. However I still need lots of reading and training for that. I believe the end user never complained about that because the end user probably do not have required experience with computers. I think the guys at Microsoft (since the topic is about Windows) are aware about it, but on the other hand I wish everybody could make better systems etc too.
Posted on 2007-06-26 18:22:42 by codename

Like doing float-comparison with simple integer instructions

Hopefully still with some rounding-error compensation? And not just something along the lines of if *((int*) float1) == *((int*) flot2) (ie, something that ends up with an integer CMP, and possibly (if not already register-allocated) a load-to-register).

I personally wouldn't use STL for games, since you lose an amount of control, and only get generic guarantees about execution speed. Sure, you can profile your bottlenecks, but if you port to another compiler, or just build with a different version of your existing compiler, you can get pretty different tmings, memory usage, etc.

I'd still say the STL is pretty decent (though not perfect!), and most implementations in this time and day are pretty good - for generic stuff that isn't too sensitive, I have no problems using it. After all, handling 50-100k database entries in STL containers wasn't a problem on an old pmmx-200. PDAs might be a different breed though, too bad I don't have experience with those systems. I have a (romantic? :)) idea that they'd be interesting to code for, though, since you might be able to really feel a difference between C and assembly code there (slower CPUs, less mature compilers).

Black looks pretty cool, especially for a PS/2 game, but a bit bland imho by today's standards - lack of high-quality textures and shaders (obviously because of PS/2 limits). A couple of games that, imho, show off the PS/2 capability better, would be ICO and Shadow Of The Colossus.
Posted on 2007-06-26 18:45:38 by f0dder
Just a note on the "if *((int*) &float1) == *((int*) &float2)" : with the regular FPU comparison, equality-test has the same problem. Anyway, in games, equality is almost never used, just > , < , <= and >=  ; and results never need to be max-precise.
Posted on 2007-06-27 04:41:22 by Ultrano
Yeah, equality test need to be done by subtracting and checking if difference is "small enough". And yes, for games you can lose a lot of precision, not good if you're writing engineering software though ;)
Posted on 2007-06-28 03:50:30 by f0dder