I think the flame war started long before I got here, you just decided to issue blanket statements about people who don't agree with you being idiots. Well, the fact is that the context that the question was asked in has nothing to do with cache issues at all, any instruction cache would be lost when the program branched to another procedure or API call so you are wrong, there is not a cache concern in dealing with the issue presented. I am very sorry that you take this as badly as you do but the fact is that he asked a question dealing with inline strings and calling functions. There is nothing in there about cache hits and the topic is completely unrelated.
Posted on 2004-02-17 05:27:31 by donkey
you just decided to issue blanket statements about people who don't agree with you being idiots.


That's not at all what I said. Again, re-read until you understand what I said.
I merely said that people would agree with me because technically, what I say, makes sense. Read the Intel optimization manual's notes about cache usage, and you too will see that code and data cache are separate.

Well, the fact is that the context that the question was asked in has nothing to do with cache issues at all, any instruction cache would be lost when the program branched to another procedure or API call so you are wrong, there is not a cache concern in dealing with the issue presented.


That is not at all a given, modern-day code caches are very large... 64 kb on Athlon, and 12k uOps on P4. It is entirely possible that both an API function and your own calling routine can co-exist in cache. Besides, even though code-cache may be lost, that is still not a reason to ignore data-cache issues.

I am very sorry that you take this as badly as you do but the fact is that he asked a question dealing with inline strings and calling functions. There is nothing in there about cache hits and the topic is completely unrelated.


He asked about C-like strings in MASM. Depending on how you interpret it, he may have actually wanted to have them behave like in C, which is not using the code-section obviously. Since an alternative that was NOT like C was presented, I think it's very much on-topic to at least discuss the differences between this non-C variation and C-like macros. How else will he know what to choose?
Unless ofcourse you insist that the non-C macro was also off-topic, but I am not the one who brought that up.
Posted on 2004-02-17 06:08:13 by Henk-Jan

Unless ofcourse you insist that the non-C macro was also off-topic, but I am not the one who brought that up.


Are you just dense ? He was the one who brought up the "non-C macro" in the original post. Maybe it is you who should reread this thread

From the real topic here
Is there a macro for MASM/MASM32 that lets you pass plain strings like this to a function


So I hardly think saying "Yes there is a macro in the MASM32 library that does this" is off topic. If you want to write your own library go ahead, who cares. But if you want to answer a specific question about a specific library and go off on a tangent that has nothing at all to do with the topic at hand then you will be ridiculed. And no, an API call will not be within 64K of the calling function, but if you ever find one let me know.
Posted on 2004-02-17 07:01:19 by donkey
Note also that he said:
Just like in C

Needless to say that hutch's alternative does not qualify entirely.
It doesn't work JUST like in C. Which was my point, and you didn't seem to get it. So who are you calling dense here?

And no, an API call will not be within 64K of the calling function, but if you ever find one let me know.


Don't tell me that you have never heard of the term "set-associative", and still think that all 64k of the cached data has to be adjacent. I guess you don't know what a cacheline is either? Ah, it's all clear to me now... You argue so endlessly because you have not understood anything of what I said in the first place. Please, read those Intel manuals.
Posted on 2004-02-17 07:06:43 by Henk-Jan
Originally posted by Henk-Jan
Note also that he said:
Just like in C

Needless to say that hutch's alternative does not qualify entirely.
It doesn't work JUST like in C. Which was my point, and you didn't seem to get it. So who are you calling dense here?


Well, I think (and I guess everyone else but you does), that the OP meant "just like in C" in terms of source code (for convenience), not in terms of what is going on beneath (which pipeline it goes through, etc).
Posted on 2004-02-17 07:14:35 by Morris
So who are you calling dense here?


Well, it's not me that's for sure. I'm not the one worried about how many clocks I can save calling MessageBoxA. I have argued points like this with Hutch before, amiably and with give and take on both sides, for example my objection to the Switch/Case macro that is more C crap than I can handle and horribly inefficient, but this argument is just stupid.
Posted on 2004-02-17 07:17:42 by donkey
I'm not the one worried about how many clocks I can save calling MessageBoxA.


Yes you are. I never brought up messageboxes at all, you did however.

for example my objection to the Switch/Case macro that is more C crap than I can handle and horribly inefficient


Well yes, we discussed switch/case before, and came to the conclusion that it is almost impossible to write a MASM macro that is as efficient as VC++, let alone as convenient to use.

but this argument is just stupid.


You think it is stupid to argue about technical issues considering asm code and its performance on an asm forum? What do you want to discuss then? The birds and the bees?
Posted on 2004-02-17 07:23:01 by Henk-Jan
Ah, MessageBoxA was an example, obviously you are not capable of seeing that. Would you rather that I provide a full list of all the Windows API functions that use pointers to strings, there are quite a few but obviously you are not capable of understanding that a representative example is just that. Maybe I should be more specific so you can understand in the future, or I'll just type slower. :rolleyes:

You keep bringing up cache issues, f0dder was right that there were other perhaps better ways but decided to be insulting about it because he dislikes Hutch. It is his great failing that he chooses to answer a question by insulting the work of others. He is a very good coder from what I have seen of his work but he acts like a little kid sometimes and comes across like someone who is so inadequate that he must deride others to advance his opinion. In the same way you seem to have chosen to explain your point, by calling everyone who disagrees with you an idiot. Well, if you want to do that why not pick an issue that would actually make a difference then everyone can get pissed off at you for a good reason.

I no longer use MASM and inline strings in GoAsm are properly handled, when I did use MASM I did not use macros at all, preferring hand coded applications and procedures. That did not give me the right to insult others or attack anybody elses work as you, f0dder and Scali seem to think is your right. If you don't like the way something is done in Hutch's MASM32 then write your own MASM32 library and push that one instead. You will find many many threads here in which I have said things like "don't use that function it is terrible in this context" and was not insulting about it and did not call anyone who disagreed an idiot.

Your argument is stupid and pointless, there is no advantage to worrying about the instruction cache before a call to the API period.
Posted on 2004-02-17 07:58:45 by donkey
Ah, MessageBoxA was an example, obviously you are not capable of seeing that.


But I also gave some examples of API functions that might not be slow, or did you miss that part aswell?

It is his great failing that he chooses to answer a question by insulting the work of others.


He 'insults' work that he considers bad, and not recommendable to anymore. Still a lot better than insulting someone directly, without any good reason at all.

In the same way you seem to have chosen to explain your point, by calling everyone who disagrees with you an idiot.


I haven't done that, and I suggest you stop bringing that up, especially since you seem to be the only one that misunderstood what I said, and repeatedly refuses to re-read the statement to understand what it really meant.

That did not give me the right to insult others or attack anybody elses work as you, f0dder and Scali seem to think is your right.


As far as I can tell f0dder only made the 'junk' remark, which I would barely classify as an insult... and the other insults came from hutch. As far as I know I have not insulted anyone (at least not directly and/or on purpose), and I did not 'insult' anyone else's work, but merely gave some technical reasons (which are perfectly verifiable with the Intel manuals) why one would prefer certain macros in certain situations.

and was not insulting about it and did not call anyone who disagreed an idiot.


Neither was I.

Your argument is stupid and pointless, there is no advantage to worrying about the instruction cache before a call to the API period.


Correction, YOU find the argument stupid and pointless, but I'm sure that some people are interested in technical issues here.
Also, my statement was not about instruction-cache, but about data-cache, so at least get your facts straight (then again, you have not exactly given me the impression that you understand caching anyway... are you sure the argument is pointless? If you would open up to it, perhaps you can pick up a thing here and there).
Lastly, if you think that all API calls or string handling are not performance-critical by default, you are making a huge mistake.

So either get your facts straight, or just stop talking about things you don't understand.
Posted on 2004-02-17 08:27:04 by Henk-Jan
Originally posted by Henk-Jan
Originally posted by Donkey
In the same way you seem to have chosen to explain your point, by calling everyone who disagrees with you an idiot.


I haven't done that, and I suggest you stop bringing that up, especially since you seem to be the only one that misunderstood what I said, and repeatedly refuses to re-read the statement to understand what it really meant.


Yes you did say that

Originally posted by Henk-Jan
For the rest, any asm programmer that deliberately screws up cache-performance is an idiot, that's why all decent asm programmers must agree with me.


I don't think I am the only one who understood full well what you meant and are now trying to obscure:

Originally posted by Morris

...Who are you to call me an idiot?...


He 'insults' work that he considers bad, and not recommendable to anymore. Still a lot better than insulting someone directly, without any good reason at all.


He was being childish, insulting is never right when he was not provoked. If you will remember his insult came as a first response, not as a reply to anything anyone else had said. That is immature and uncalled for, and I think that he did mention Hutch directly in his post. He wanted to provoke a flame war and he got one.

Correction, YOU find the argument stupid and pointless, but I'm sure that some people are interested in technical issues here.


No, everybody who has commented on it has only referred to the argument as stupid and pointless or completely off-topic. I have not seen a single post that expressed any interest in the folly of worrying about cache misses for api calls. If I missed a post that demonstrated somebodies interest in a pointless topic I appologize but from what I can tell it is only you who find this useless subject interesting and have chosen to hijack an otherwise mundane thread to demonstrate the fact that you can argue endlessly about something that makes no difference at all. And as an added bonus call anyone who disagrees an idiot.

For example some other random comments from the thread:
Originally posted by Bogdan
I also do not care for the cache unless is inside one of my inner loops. Showing some message in a Messagebox on screen hardly qualifys for my "inner loop" definition. For the rest of my asm code i am quite happy to usee things that are simpler and much easyer to understand -- like some HLL constructs


Originally posted by bluffer
isnt there a crusade or some section in this forum apart from main isnt it hiro

why dont you shift this hoopla over there so that all of them who want to fight it can fight it out without annoying the onlookers
Posted on 2004-02-17 08:53:37 by donkey
I don't think I am the only one who understood full well what you meant and are now trying to obscure


Okay, so maybe there are two. Who cares?
I think I know better what I meant by what I said, since I was the one actually saying it.

No, everybody who has commented on it has only referred to the argument as stupid and pointless or completely off-topic.


Were they referring to me, or to the others, such as yourself, who continually try to argue beside the point?
I cannot help that, I only mentioned technical facts. What about all the people that may have found it interesting, but HAVEN'T posted?

from what I can tell it is only you who find this useless subject interesting and have chosen to hijack an otherwise mundane thread to demonstrate the fact that you can argue endlessly about something that makes no difference at all. And as an added bonus call anyone who disagrees an idiot.


First of all, you have proven time and time again that you do not understand the caching system. If anyone should find it interesting it should be you. Asm programmers not interested in cache... that's asm programmers not interested in how their CPU works... which is sort of a contradiction in terms, is it not? You really don't want to defend that statement, do you?
Secondly, drop the idiot-nonsense already. What do you think? If you say I called people who disagree with me an idiot often enough, that people will believe that?
Well in that case:
Donkey, stop calling everyone on this forum an idiot.
Donkey, stop calling everyone on this forum an idiot.
Donkey, stop calling everyone on this forum an idiot.
Donkey, stop calling everyone on this forum an idiot.
Donkey, stop calling everyone on this forum an idiot.
...
Donkey, stop calling everyone on this forum an idiot.

You get the idea. And you dare to call other people childish? Sheesh. Grow up, please. And spend your time in a more useful way, like studying how cache works.
Posted on 2004-02-17 09:11:23 by Henk-Jan
Were they referring to me, or to the others, such as yourself, who continually try to argue beside the point?

I believe Bogdan mentioned his opinion of cache concerns with API functions directly, so draw your own conclusion on that one.

First of all, you have proven time and time again that you do not understand the caching system. If anyone should find it interesting it should be you. Asm programmers not interested in cache... that's asm programmers not interested in how their CPU works...

Thank you but I understand enough to know that it is pointless to worry about the instruction cache before an API call.

You get the idea. And you dare to call other people childish? Sheesh. Grow up, please. And spend your time in a more useful way, like studying how cache works.

What did I do to provoke you, I only got involved in this thread's flame war when you decided to define what a good coder was and called everyone who disagreed an idiot. Maybe you should go back to more useful pursuits instead of thinking of new and original ways to insult people then trying to lie about your intentions later.
Posted on 2004-02-17 09:24:55 by donkey
You call people an idot all the time, I should know because I'm frequently located at the receiving end.

Now, if you touch that 'report this post to a moderator' link 1 more time or post any following post that does not involve the topic at hand but instead focuses on one of the posters then you're out of here.

My patience has worn out. Reply at your own peril.
Posted on 2004-02-17 09:27:15 by Hiroshimator
You call people an idot all the time, I should know because I'm frequently located at the receiving end.


Off-the-record perhaps. But I haven't spoken to you in ages. What do you know about me anyway?

or post any following post that does not involve the topic at hand


Excuse me, but I think my analysis of cache behaviour other than in C with a certain macro was quite on-topic. It's not my fault that people start claiming all kinds of things that I never said.
Someone put them up to this. And you don't do anything about it, other than perhaps banning me again. The most talented, experienced, knowledgable and helpful person on this forum, if he were given a chance. Why don't you tell liars like hutch, or people that don't even know that cache is set-associative, like donkey, to just shut up, and let people who provide valuable info, provide their valuable info?
Go ahead and ban me, I don't care. It's your loss.
Especially donkey's, because now he'll never find out how cache works, and how he can use it to his advantage. Hutch is beyond help anyway, he is so demented that he forgot that he can be and often is wrong.
Posted on 2004-02-17 09:38:22 by Henk-Jan
or people that don't even know that cache is set-associative, like donkey, to just shut up, and let people who provide valuable info, provide their valuable info?


Please don't assume to explain what I know or don't know, that is how this all started. And please don't use me as an excuse for your personal crusade. If your information was in the least bit valuable I would have gladly shut up, but it was more a tirade of insults than anything else. If you will notice before you decided to start calling people idiots I was very civil about the whole thing.
Posted on 2004-02-17 09:54:14 by donkey
Christ...

Okay, first things first - the macro I objected to was szText, not SADD - my bad. :stupid: . That macro, however, is junk, and I would have said that no matter who wrote it. Even if you want to put data in the code section, it's bad.



szText MACRO Name, Text:VARARG
LOCAL lbl
jmp lbl
Name db Text,0
lbl:
ENDM


Fortunately, the SADD macro in more recent versions of the masm32 package does 'the right thing' - well, almost. Instead of getting to the data section by ".data" and then switching "back" again with ".code", it would be better to implement it (or rather, the 'literal' macro) more like the CTEXT macro I posted - that way, you can use SADD and friends outside the code section, where it can be useful for things like building string pointer tables easily (think indexed error messages etc).

As for the rest of the stuff in this thread - boy what a bunch of crap, from all sides. It's silly to say you have to write optimal code, and for most uses of a ctext/sadd type macro, speed won't matter. That's still no excuse to use a clearly inoptimal macro like szText, the extra bytes for the jmp would be much better used to put it in the .data section and do "align 4". Scali, you need to remember that a whole lot of people here program in assembly not to squeeze the last drop of performance from the CPU, but because they like assembly or don't know any other languages. While we probably agree that this is somewhat silly, I think it's just fine, and there's no reason to shove HLL's down people's throaths.

While I don't really fancy mixing code and data in the same section, I think it's an okay thing to do this (for read-only data anyway), where performance isn't critical. Donkey's example of self-contained procs are a good example of where this can be useful (note, though, that a SEGMENT+ENDS statement could be added inside or at least right after the proc, still keeping the data and code split up nicely in memory). What I object to, however, is when people continuously mix code and data, using JMPs to skip data. szText and PowerBasic EXEs are good examples of this junk. At least learn to place your data after a RET or before a procedure entrypoint.

Btw,

In a portable executable file there is no mechanical seperation between data and code and the sections are defined by the read / write / execute attributes only. You can copy binary data into GlobalAlloc() memory and run it from there.

- if you want to stay compatible with future processors, you should refrain from doing this. On AMD64 in PAE mode, per-page 'X' bit has *finally* been added to the IA-32 architecture, allowing for nonexecutable heap and stack. Time to use VirtualAlloc with the correct flags for your JITters. Besides, per-page X bit can be implemented on most existing x86 with some clever tricks, look at The Owl + friend's PAX project (and wkrakers NT4+ port and Teo/OpenBSD's "I did not streal from PAX" implementation). But sure, on around 99% of the existing win32 platforms, that kind of code will run just fine.

As for the whole cache thing... dunno whether scali is right about code cached data not going into the data cache, but I guess he's right - it would be nice if you could specify the intel manual, section and page where it's mentioned, though. And sure it won't matter with your casual API call (not that API calls are the only place you can use invoke and ctext/sadd), and most likely not in 99% of the cases where you'd use a thing like ctext/sadd... that doesn't mean you shouldn't keep it in mind under other circumstances, though. (Oh, and do keep in mind that caches are set-associative and there's speculative reads/writes going on.)

Oh and btw Donkey, I don't rely on HeapAlloc for everything; I promote it instead of Global/LocalAlloc (since those are deprecated), and I raise an eyebrow when people use VirtualAlloc in bad ways (like, allocating small buffers). Your fancy example in a previous thread seemed like pretty good example of when to use VirtualAlloc, since you needed a pretty custom memory management system.

If the whole SGI yadda yadda blah must be brought in, remember the HP sv7 - it uses standard Xeon's for the rendering nodes, along with standard NVIDIA Quatro FX - standard PC hardware. And well, surprise surprise, the Onyx4 which hutch himself promoted, uses ATi video chips (the same stuff used on normal desktop "taiwanese terrors") - http://www.xbitlabs.com/news/video/display/20030715001721.html . No, obviously a single/standard PC will not compete with a high-class expensive 'big iron', but PC hardware can.

Also, it would be interesting to find some good information about the new NUMA (I think it's called) style PC hardware, whose purpose is to reduce some of the bus/memory bandwidth bottlenecks of SMP systems, by "grouping" CPUs and memory. Hey, if has a nice performance/price scale, who cares if it's junky old IA-32.

As for the rest... pages of useless yadda-yadda. Yes, that means you too, scali. And obviously, this post will either be ignored or things will be taken from it out of context and twisted beyond belief.
Posted on 2004-02-17 09:56:01 by f0dder
Please don't assume to explain what I know or don't know, that is how this all started.


Your statement about API functions not being within 64k of your own code made it painfully obvious. I did not assume anything, you handed it right to me.

If your information was in the least bit valuable I would have gladly shut up, but it was more a tirade of insults than anything else.


It would be, if you understood what it is about, perhaps?
Posted on 2004-02-17 10:14:29 by Henk-Jan
you need to remember that a whole lot of people here program in assembly not to squeeze the last drop of performance from the CPU, but because they like assembly or don't know any other languages. While we probably agree that this is somewhat silly, I think it's just fine, and there's no reason to shove HLL's down people's throaths.


No, but we don't have to accept it when such people try to decide for everyone what is important or not, do we?

dunno whether scali is right about code cached data not going into the data cache, but I guess he's right - it would be nice if you could specify the intel manual, section and page where it's mentioned, though.


Ofcourse he's right. Especially on P4 it should be painfully obvious that L1 cache is not shared, since the code-cache is of an entirely different type. For older CPUs the same rules apply though.
See http://www.intel.com/design/pentium4/manuals/248966.htm, page 1-18 : "The first level cache (nearest to the execution core) contains separate
caches for instructions and data."
page 2-5: "Avoid mixing code and data"
to name the two most important ones.

Hey, if has a nice performance/price scale, who cares if it's junky old IA-32.


I do, I do!!! :)
Posted on 2004-02-17 10:28:00 by Henk-Jan
Hi f0dder,

Thank you for bringing some sanity back to this topic, I agree with everythng you said. Sorry about the HeapAlloc thing, it just seems that way from the posts I see and that we are both involved in answering.

My only objection was to the personalized insults that were being thrown around without provocation and the lumping of everyone who disagrees with a coding strategy into a single category. Scali seems to assume that because I say that worrying about the instruction cache for an API call is pointless I didn't read the Intel manuals and must be an idiot. His method of explaining something is to throw out insults and arrogance then when somebody has the temerity to say it doesn't make a difference in the context of the question, he makes up his own question and calls them an idiot. Then when you take exception to being called an idiot he lies about his intention in the remark. A thoroughly distasteful person who has nothing to offer that I am interested in and it seems that no-one else is interested in his opinion either.

(note, though, that a SEGMENT+ENDS statement could be added inside or at least right after the proc, still keeping the data and code split up nicely in memory)


Inline strings in GoAsm do not need the same treatment as MASM, GoAsm has this functionality built in and does not introduce the problem in the first place.
Posted on 2004-02-17 10:48:46 by donkey
because I say that worrying about the instruction cache for an API call is pointless I didn't read the Intel manuals and must be an idiot.


I thought I said the issue was your self-inflicted 64kb cache statement (I never even mentioned instruction cache in the first place, it has nothing to do with the issue I explained, you are the only one bringing that up. Are you trying to argue with yourself? I surely won't argue this point, I never made it). That pretty much proves that you didn't read the manuals, or at least, did not understand them fully. And it's not that which would make you an idiot, but the fact that you would deny what is in the Intel manuals. As if you know it better than the guys that build the CPUs themselves?
Besides, you are the annoying distasteful blablah-whatever person here, by dragging personal issues along endlessly.
At least I have actually added valuable technical facts to this discussion. What have you added again, other than noise and slander?
Posted on 2004-02-17 11:05:43 by Henk-Jan