ddddddoh!!!!!!
and if I f_ckin do WANT to produce it? nasm rules (although nasm sux for not having ORG)

Why would you ever want a non-optimal encoding of your instructions? :)
Anyway, you can still handcode the opcode in MASM that way, clumsy perhaps, but it's possible :)


Isn't the OPTION NOSIGNEXTEND intended for this purpose (at least for AND, OR, XOR)?

Syntax: OPTION NOSIGNEXTEND

Description:

The NOSIGNEXTEND option disables the generation of the sign-
extended form (83h) of the AND, OR, and XOR instructions. The NEC
V25 and V35 microprocessors do not support this opcode.
-o-

BTW, both V25 and V35 support this opcode.
http://www.sandpile.org/post/msgs/20004012.htm
Posted on 2003-12-12 12:44:14 by MazeGen
Ah interesting, thanks... Didn't know that MASM had an option for it... Then again, I never needed it ;)
But this should make HeLLoWorld happy :)
Posted on 2003-12-12 12:46:57 by Bruce-li
Yeah, I thought there had to be such an option. :)
Posted on 2003-12-12 12:50:27 by QvasiModo
perhaps there are also per-instruction ways of doing this? Experiment with using "typecasts" on immediates?
Posted on 2003-12-12 12:51:38 by f0dder
BruceLi:

Well it's an old trick... For example, if you want to load -1 into eax, I believe that or eax, -1 is the shortest possible way (this is what compilers have been doing lately anyway).
This is because there are multiple forms for encoding immediate operands.
Some instructions can store eg -1 as a byte (0xFF), and it is expanded to -1 as a dword (0xFFFFFFFF) by the CPU before it is fed to the execution unit. This means the code size in memory is still small.


I read this... I re-read this...hum, its complicated... I re-re-read this... okay, I understood. Thank you!
how obfuscated, but after all, thats what optimisation tricks are about...

I must confess that I see absolutely no good reason for using 2-compliant notation for representing relative numbers, I thought of it, and I dont like this damn obfuscated way of seeing things.
I like simple things, and this just sux. in my perfect world there would be a bit sign (where? didnt manage to decide yet:) ) and thats all. of course you ve got to build two logicgates-based operators, add and sub, and to check 4 possible combinations of sign bits and to decide what to do accordingly. in my mind thats still immensely better, i conceptualize relative numbers as point on a infinite one-dimension line with a center, not as a circle that circles back:). but i never heard of a machine built on this, maybe ppl at the beginning took the first thing someone with a theorical mind thought of, and then everyone used it.

if i were born earlier and had imposed my views of what the computer world should look like (god prevented us from this, you surely think :grin: ) , neg would be simpler, tests would be simplier, and you f_ckin wouldnt need to negate a negative number before multiplying it. (hell, number are numbers (of something) , I conceptualize 3*10 as 3 times 10, and 3*-10 as 3 times -10, and -3*10 as... 10*-3 (okay I cheated:) ) and.. -3*-10 as... hum... okay I surrender:) )

tell me what you think of it PLEASE! (or did i miss something again? :grin: )


Why would you ever want a non-optimal encoding of your instructions?
Anyway, you can still handcode the opcode in MASM that way, clumsy perhaps, but it's possible


well, its just i dont like things happening behind my back, but youre right, i probably have to admit others know better than me what should happen behind my back... :) but i ve got the right to produce slow code!

i could imagine situations where i would have a processor executing two programs and switching between them every opcode, (or two processors executing them with the same clock, or...) , and these two programs were the same except in one source there would be eax:=-1 and in the other eax:=0 , and i would wonder why at some point the two EIP would differ, and I wouldnt want this... and I would spend the night searching where it comes from an f_ckin masm has swallowed 3 bytes... what? you dont care? :grin:



ps: and YES, there would be two zeros in my perfect world: +0 and -0. ;) new metaphysical food for the mind to think of! :)
Posted on 2003-12-12 12:54:19 by HeLLoWorld
Perhaps you can use PUSHCONTEXT/POPCONTEXT or such to temporarily change the mode of opcode-generation...
Posted on 2003-12-12 12:54:46 by Bruce-li

perhaps there are also per-instruction ways of doing this? Experiment with using "typecasts" on immediates?


Interesting:

and eax, BYTE PTR -1

and eax, -1

0040102C |. 25 FFFFFFFF AND EAX,FFFFFFFF
00401031 |. 83E0 FF AND EAX,FFFFFFFF
Posted on 2003-12-12 13:01:42 by donkey

I must confess that I see absolutely no good reason for using 2-compliant notation for representing relative numbers, I thought of it, and I dont like this damn obfuscated way of seeing things.
I like simple things, and this just sux. in my perfect world there would be a bit sign (where? didnt manage to decide yet:) ) and thats all. of course you ve got to build two logicgates-based operators, add and sub, and to check 4 possible combinations of sign bits and to decide what to do accordingly. in my mind thats still immensely better, i conceptualize relative numbers as point on a infinite one-dimension line with a center, not as a circle that circles back:). but i never heard of a machine built on this, maybe ppl at the beginning took the first thing someone with a theorical mind thought of, and then everyone used it.

No, I guess they just went for the option that would let them make a simpler chip (rather than simpler ASM programming :) ).

if i were born earlier and had imposed my views of what the computer world should look like (god prevented us from this, you surely think :grin: ) , neg would be simpler, tests would be simplier, and you f_ckin wouldnt need to negate a negative number before multiplying it. (hell, number are numbers (of something) , I conceptualize 3*10 as 3 times 10, and 3*-10 as 3 times -10, and -3*10 as... 10*-3 (okay I cheated:) ) and.. -3*-10 as... hum... okay I surrender:) )

tell me what you think of it PLEASE! (or did i miss something again? :grin: )

Mhm, what do you mean "to negate a negative number before multiplying it"? You have IMUL. (Or maybe I misunderstood you?)

ps: and YES, there would be two zeros in my perfect world: +0 and -0. ;) new metaphysical food for the mind to think of! :)

You'll love FPU programming then :grin:

EDIT:
@Donkey: Very clever... :alright:
Posted on 2003-12-12 13:07:01 by QvasiModo
and only dumb ppl will say "with 2-compliant, my son , you can add negative numbers the same way you did with other! isnt that a wonderful thing?" No its not, get lost. whats hard about doing a physical add and a physical sub? btw sign extend would be simpler too, and surely a looot of things i didnt think of.

oh, and i know mathematicians define the sub(n) as the add(-n) , where -n is the opposite in the group/ring/body (these are terms of evariste gallois s theory, curious to know what they are in english? (groupe/anneau/corps) ), but i have the right to disagree :) btw i forgot how they define -3*something... must be somewhere in my school books.

(these were the thoughts of a paranoiac madman that had a hard time to grasp 2-compliant :grin: )
Posted on 2003-12-12 13:07:25 by HeLLoWorld

and only dumb ppl will say "with 2-compliant, my son , you can add negative numbers the same way you did with other! isnt that a wonderful thing?" No its not, get lost. whats hard about doing a physical add and a physical sub? btw sign extend would be simpler too, and surely a looot of things i didnt think of.

oh, and i know mathematicians define the sub(n) as the add(-n) , where -n is the opposite in the group/ring/body (these are terms of evariste gallois s theory, curious to know what they are in english? (groupe/anneau/corps) ), but i have the right to disagree :) btw i forgot how they define -3*something... must be somewhere in my school books.

(these were the thoughts of a paranoiac madman that had a hard time to grasp 2-compliant :grin: )

LOL :grin:

Yeah, but seriously you'd need a more complicated math unit to handle 2 addition and subtraction algorithms instead of just one. But you have a right to complain alright! :grin:

BTW, I think -3*X = Remainder of (n-3) * X / n (or something like that)
Posted on 2003-12-12 13:11:24 by QvasiModo
HeLLoWorld, if you were born early enough, you could have gotten the machine you are looking for.

IBM had a series of computers (704, 709, ...) that used signed-magnitude for fixed-point numbers, and the circuitry was definitely more complex. Interestingly, the simplest design was to convert negative numbers to two's complement form before addition. After addition, negative results had to be converted from two's complement form. It hasn't been totally abandoned, though. Look at floating point numbers, it uses sign + positive value representation.

To a hardware designer, two's complement is great because it is simpler circuitry for addition.
All it does is represent all n-bit numbers as 2^(n+1) + V, where V is the number value (positive or negative), truncated to n+1 bits.

It is definitely harder for a software person to read.

There were also a lot of machines that used one's complement. Negation was very easy - just flip all the bits. There was this end-around carry you had to do for addition, though.

You want readability? How about a decimal machine? There were machines that had a fixed number of digits, and there were machines that had a variable number of digits. We had several decimal representations: BCD, excess-3, hey! we even had bi-quinary.
Posted on 2003-12-12 17:09:17 by tenkey
HeLLoWorld,

What I originally posted for Alex was a simple example to demonstrate the differences between older processors and a PIV. Using back to back shifts is a rather clunky way to do the rounding down operation but for a demonstration of how shifts perform differently on a PIV to earlier hardware, it worked fine.

What was being demonstrated was replacing left shifts with 2 adds according to Intel documentation which runs a lot faster than a left shift. Intel publish a reasonable number of instruction comparisons to demonstrate the difference with PIV to earlier Intel hardware, another is the preferred usage of,


test al, al
over
cmp al, 0

A number of arithmetic integer instructions run at twice the speed which gives you a preferred instruction set while other instructions run slower, LEA is an example here and it is unfortunate as it was a very fast instruction on a PIII down to 486.

For people who need to write code that performs properly on a PIV yet runs OK on earlier hardware, the optimisation manual is very useful here and usually benchmarking backs up the published preferences for instruction scheduling.

Something I was playing with recently was a test piece to perform memory copy using only integer instructions (non mmx or xmm) while avoiding the REP MOVSD/B instructions as they rely on a special case circuitry and don't perform well on strings under 64 bytes.

I wrote some test code something like the following.


@@:
mov eax, [esi+ecx]
mov [edi+ecx], eax
add ecx, 4
add edx, 1
jnz @B

It uses a similar byte size version to handle the last few bytes if the length is not a division of 4.

In this form, it was about 5 times faster on short strings, < 32 bytes and was still over twice as fast on strings of 120 bytes.

I then did the smartarse mods to remove one of the ADD instructions by adding the length to the source and destination indexes and negating the count and tried the following loop.


@@:
mov eax, [esi+ecx]
mov [edi+ecx], eax
add ecx, 4
jnz @B

It ran slower than the REP MOVSD code and was about 5 times slower than with the two adds. Using PUSH / POP was a lot slower again for the data transfer of to .

On a PIII or earlier it would not matter but on a PIV it does and the benchmarking shows it. It seems the pairing of the two adds worked a lot better than having one instruction shorter in the loop.

Regards,
http://www.asmcommunity.net/board/cryptmail.php?tauntspiders=in.your.face@nomail.for.you&id=2f46ed9f24413347f14439b64bdc03fd
Posted on 2003-12-12 20:25:25 by hutch--
Hutch:

Something I was playing with recently was a test piece to perform memory copy using only integer instructions (non mmx or xmm) while avoiding the REP MOVSD/B instructions as they rely on a special case circuitry and don't perform well on strings under 64 bytes.

you mean its not the same hardware that is used with rep mosvd and just movsd or mov mem,reg?
I also read rep movsd could copy 64 bits at a ime under certain conditions... does it use the mmx 64 bits data bus? there are non-mmx pentiums...

also i ve read in the early days of demomaking that it was times faster to completely unroll loops.
how would now perform this:

mov ecx,
mov edx, end - ( ecx * NumberOfBytesOfTheCodeInsideBrackets{} ) ; youve got to compute that
jmp

_begin:
{
mov eax,
mov ,eax
add esi,4
add edi,4
}repeated many times

_end:

so you jump in the middle of the code to have it executed ecx times and copy ecx dwords...
but i ve seen now the main bottleneck is processor being fed with mem, so to fetch opcodes from an unrolled loop may be worse even if there are no jumps that would empty the pipeline, or that dont empty it because of branch prediction? dont know.

but as you must do 2 adds, why not do add edi and esi like i did, so you dont have to use ? is not more complicated to encode? anyway just is simplier imho...


A number of arithmetic integer instructions run at twice the speed which gives you a preferred instruction set while other instructions run slower

interesting... it is somewhat complicated to optimize while processor internals change...many things to know, for each generation there are new quirks and subtilities to deal with :)
Posted on 2003-12-13 20:49:14 by HeLLoWorld
QvasiModo:

No, I guess they just went for the option that would let them make a simpler chip (rather than simpler ASM programming ).


Thats not what I meant, I didnt mean that you would have to code a sub yourself (i m not that mad), you would code it the same way as now, but i meant the harware would do add and sub, but differently. There would be 2 different circuits for add and sub, and they would both deal with positive numbers, and sub(a,b) would require a>=b. my number contain an absolute value on n bits and a sign on 1 bit. when you encounter an add or a sub, you check the sign bits of operands and you decide which circuit to use, and you update the sign bit.
well maybe its not very clear but i think its far more natural.


Mhm, what do you mean "to negate a negative number before multiplying it"? You have IMUL. (Or maybe I misunderstood you?)

not in your code, but the hardware must do it, no? and then check all 4 cases of the 2 number (++ -- +- -+) to know the sign of the result? (maybe i m wrong here), and if it shows that result must be <0 then it must 2complement the number it just computed? it sux, doesnt it?


IBM had a series of computers (704, 709, ...) that used signed-magnitude for fixed-point numbers, and the circuitry was definitely more complex. Interestingly, the simplest design was to convert negative numbers to two's complement form before addition. After addition, negative results had to be converted from two's complement form. It hasn't been totally abandoned, though. Look at floating point numbers, it uses sign + positive value representation.

i can hardly believe that!
i once drew the logic schematic of my system but i m not sure i can find it again, maybe i will do it again and post it if someone is interested (and maybe too if nobody cares :) )
Posted on 2003-12-13 21:11:35 by HeLLoWorld
HeLLoWorld,

With all of the normal integer code I have played with on a PIV, I have yet to find a good example where unrolling made the code any faster while on a PIII or earlier you saved on the loop code and often it was significant when clocked.

The thing to particularly look out for are different size reads and writes with the same register as the stalls are very bad. You have the normal xor reg, reg or sub reg, reg to clear the register and it seems to work OK but if you have a spare register its better to use it rather than reuse the same one with different sizes.

The special case with MOVSD and STOSD are always related to using them with the REP prefix, without that they are really slow so if you are stuck with integer code rather than XMM for block moves or stores, incremented pointers seem to have the advantage.

If you can use the XMM registers you can get very fast block copy. LINGO posted a software pretouch version of a block copy about 6 months ago that was the fastest I have seen and it was an unrolled XMM loop.

I agree that memory speed has been a performance wall for some time now and it compresses results between poor code and fast code but there is a company in Japan that has recently started to produce very fast memory in quantity so this restriction may not be with the later hardware for much longer.

With jumps, there are a few basic rules to stick to, a fall through is not a problem and using conditional jumps (Jxx) performs better if you take the normal prediction of jumping backwards rather than forward.

If there is a preference, set up a jump for the most common options so that its prediction works most of the time. What tends to happen is if a jup is pointing the wrong way 1st time, after enough iterations the prediction will get it right but if you can, its better to get the direction right.

Regards,
http://www.asmcommunity.net/board/cryptmail.php?tauntspiders=in.your.face@nomail.for.you&id=2f46ed9f24413347f14439b64bdc03fd
Posted on 2003-12-14 08:39:34 by hutch--


I think the point of the shifts/adds came across anyway, no need to spam more examples, it's trivial stuff anyway.
And in case you didn't know yet, hutch-- and I go back a long way, I got banned in the past for disagreeing with hutch--... But he can't ban me now, because he got stripped of his administrator rights (for obvious reasons).
So he can just flame me now, and I can handle him, he'll just make himself look like an idiot :)


Let me burst your bubble, you got banned each and every time for displaying condescending, arrogant attitudes towards others. Currently, you seem well on your way towards a new ban, since if this is your way to 'handle' things then I can only thank hutch-- for not stepping into this discussion again while he certainly would have had a valid reason to.

Hutch-- (or anybody else here) is not an idiot, stupid or whatever other pseudonym you come up with and doesn't deserve to be addressed this way on the forum. This and other discussions could do well without the use of these outbursts, since name-calling is neither appreciated nor allowed according to the rules (you like to quote yourself and hence should be familiar with). Wether or not a person's examples are right or wrong does not justify said behaviour or response. Attack the argument, not the person.

I suggest (and hope) that, if your goal sincerely is to stay as a productive member of this forum, you switch tactics from the personal attacks (that includes your current use of the signature) towards constructive, co?perative, non-ad hominem posts so that we all can appreciate the intellectual benefits of your presence.


I don't feel that I should use more words on this subject, the path that can be chosen can be either constructive or obstructive towards your enjoyment of this forum, the choice is entirely up to you. We'd love to keep welcoming you as a member but not at the cost of others.


-H-
Posted on 2003-12-14 09:36:52 by Hiroshimator
displaying condescending, arrogant attitudes towards others.


You mean like hutch is constantly doing to me, f0dder, and who knows who else?

Wether or not a person's examples are right or wrong does not justify said behaviour or response. Attack the argument, not the person.


If you hadn't deleted most of this thread, my comment would be in a slightly different light, I'd say.

(that includes your current use of the signature)


My signature just contains exact quotes, not altered in any way, nothing else. I've seen many people using quotes in their signatures. What exactly is wrong with mine? (I haven't had any complaints from the people that I quoted, by the way).
Oh and there are two nice ad-hominems from hutch-- quoted in that very signature... Wasn't this against the rules? Hummm.

your enjoyment of this forum


As I said before, I don't try to provoke flamewars, and I can't say I enjoy being called names and words being put in my mouth etc, when all I do is offer a better piece of code for the problem at hand, without any personal attacks or whatever.
I am not the problem, hutch is, which should be obvious by now. But if you continue to protect hutch while he breaks the forum rules, and blame me for starting flamewars when all I do is try to offer some advice, and threaten me with a ban, then I don't think I even want to be here.
Posted on 2003-12-14 11:02:33 by Bruce-li
Yet another thread that has been poisoned and is no longer worth reading. What a bunch of petulant little babies. :mad: Why not go stomp your feet somewhere else and let the rest of us actually exchange some useful information.

petulant
ADJECTIVE: 1. Unreasonably irritable or ill-tempered; peevish. 2. Contemptuous in speech or behavior.
Posted on 2003-12-14 11:35:57 by donkey
Scali,

Here is a challenge to you from me personally. I have known you for a long time and we have argued about many things but I have always had regard for your programming skills in the areas where you have expertise.

I have stuck my neck out a number of times for you because of this respect I have for your coding skills but there is nothing I can do when you blow your stack at anyone who disagrees with you.

Rather than trying to control the whole programming area in your own image, digest that there are large differences between people, programming styles, techniques and personal preferences.

Rather than fight difference, I challenge you to encourage it to create new ideas, rather than hammering people who don't see things your way, I challenge you to help find different ways to do things.

Starting a fight with someone my age is like taking on a tank with a featherduster. I have a job to do and I will do it as I see fit, just like most other people who float around here and while there are enough people who are more than capable of starting WW3, as usual they exercise a gentlemans agreement where they don't bother as its more productive all round for people to work together.

Finally I challenge you to keep your cool, deal with people as people and take your place as a respected member of the programming community who can help others with your knowledge. :alright:

Regards,
http://www.asmcommunity.net/board/cryptmail.php?tauntspiders=in.your.face@nomail.for.you&id=2f46ed9f24413347f14439b64bdc03fd
Posted on 2003-12-14 19:45:50 by hutch--
but I have always had regard for your programming skills in the areas where you have expertise.


I never noticed anything of that... I do recall a "You don't have what it takes to code BM in asm" though.

but there is nothing I can do when you blow your stack at anyone who disagrees with you.


That's cute, but it seems like you were the one 'blowing your stack' in this particular thread, because I gave an alternative routine for the one you posted, nothing more, nothing less. You started to imply that I don't agree with Intel and things like that.
As for the previous OGL/DX/supercomputer discussion, I think I've remained calm throughout, while you made personal attacks at me and f0dder repeatedly. So please take your own advise to heart aswell, it helps.

Starting a fight with someone my age is like taking on a tank with a featherduster.


I didn't want to start a fight, in both cases I wanted to point out some alternatives to what you were saying. You somehow take this personally and make it into a fight. That is not my fault.

Finally I challenge you to keep your cool, deal with people as people and take your place as a respected member of the programming community who can help others with your knowledge.


Even if it means that I disagree with what you say?
Posted on 2003-12-15 04:51:40 by Bruce-li