Without using look-up table, is there an efficient way to calculate the quotient and remainder of a signed division by 4 when the dividend is between -210 and 9 ? Is there also a fast way to calculate x^(4/3) for x between 0 and 8207 ?

For your 1st question,

For your 2nd question, it all depends on your interpretation of "fast". If it's the least typing possible with the fewer lines of code, you could use some of the functions in the Fpu.lib provided with the MASM32 package and also available from another thread on this forum. Possible code could be as simple as:

Raymond

```
xor edx,edx
```

mov eax,dividend

mov ecx,4

idiv ecx ;->eax = quotient, edx = remainder

For your 2nd question, it all depends on your interpretation of "fast". If it's the least typing possible with the fewer lines of code, you could use some of the functions in the Fpu.lib provided with the MASM32 package and also available from another thread on this forum. Possible code could be as simple as:

```
invoke FpuDiv,4,3,0,SRC1_DIMM or SRC2_DIMM or DEST_FPU
```

invoke FpuXexpY,ADDR x,0,0,SRC1_DMEM or SRC2_FPU or DEST_FPU

invoke FpuFLtoA,0,10,ADDR xbuffer,SRC1_FPU or SRC2_DIMM

In the above code, the 1st line computes the value of the 4/3 fraction. The 2nd line takes the "x" value from a dword memory variable labeled "x" and computes the x^(4/3). The 3rd line returns the result as a null-terminated string with 10 decimal digits in a memory string variable labeled "xbuffer".
Raymond

Dr. Manhattan,

See http://www.df.lth.se/~john_e/gems/gem002c.html to learn how to find the quick quotient. The quick remainder can be found by subtracting the quotient left shifted by 2 from the dividend. There is always the signed division instruction IDIV, but that is slow on most machines. There will not be a quick cube root. You will need the assistance of the FPU or you can use a iterative series scheme like you would do by hand. General exponentiation routines have been written. You can do a search of this site for "exponentiation" to find them. Ratch

See http://www.df.lth.se/~john_e/gems/gem002c.html to learn how to find the quick quotient. The quick remainder can be found by subtracting the quotient left shifted by 2 from the dividend. There is always the signed division instruction IDIV, but that is slow on most machines. There will not be a quick cube root. You will need the assistance of the FPU or you can use a iterative series scheme like you would do by hand. General exponentiation routines have been written. You can do a search of this site for "exponentiation" to find them. Ratch

Is there also a fast way to calculate x^(4/3) for x between 0 and 8207 ?

Newton's method is a quadratically converging algorithm, but there are higher-order techniques like, say, Halley's method. The iteration

..or stated another way...```
```

2x^3 + 4r

x' = ----------- x

4x^3 + 2r

is cubically converging to the cube root of r , for example (it's generalization to nth roots is due to Lambert in 1770). And there's no end to higher-order iterative methods such as this (cf. I. Kiss. A generalization of Newton's approximation procedure . _Z. angew. Math. Mech._ 34, 68--69, 1954.) but of course the price you pay is more computation per iteration.Applying Newton's method to f(y) = y^3 - x produces a quadratically-convergent iteration:

y(n+1) = y(n) - (y(n) - x/y(n)^2) / 3.0

This method takes 2 adds, 2 multiplies, and 1 divide per iteration.

Applying it to f(y) = y^2 - x/y produces a cubically-convergent iteration:

y(n+1) = 0.5 * (y(n) + 1.5*x / (y(n)^2 + 0.5*x/y(n)))

If you form 0.5*x and 1.5*x = x + (0.5*x) outside the iteration loop, this method takes 2 adds, 2 multiplies, and 2 divides per iteration.

Then there is y(n+1) = y(n) - (y(n) - x/y(n)^2) / 3.0

This method takes 2 adds, 2 multiplies, and 1 divide per iteration.

Applying it to f(y) = y^2 - x/y produces a cubically-convergent iteration:

y(n+1) = 0.5 * (y(n) + 1.5*x / (y(n)^2 + 0.5*x/y(n)))

If you form 0.5*x and 1.5*x = x + (0.5*x) outside the iteration loop, this method takes 2 adds, 2 multiplies, and 2 divides per iteration.

**chorus**' thread on cube root.

I thought it could be done with a magic number or something like that but it's more complicated than that. Thanks for your answers.

for the first one:

:)

```
```

; assume dividend is in eax

mov edx,eax

and edx,3

.if eax & 80000000h

neg edx

.endif

sar eax,2 ;edx is the remainder

:)

Ultrano,

The above code won't work. Try it with a -7 and see. It returns the remainder as -1 instead of the correct -3. That is because your code masks out the bottom 2 bits of a negative number, and that is incorrect. Also, doing a SAR on a negative number rounds it toward negative infinity instead of toward zero. That's why the quotient computes as -2 instead of -1. Most people prefer to round toward zero. Ratch

```
```

; assume dividend is in eax

mov edx,eax

and edx,3

.if eax & 80000000h

neg edx

.endif

sar eax,2 ;edx is the remainder

The above code won't work. Try it with a -7 and see. It returns the remainder as -1 instead of the correct -3. That is because your code masks out the bottom 2 bits of a negative number, and that is incorrect. Also, doing a SAR on a negative number rounds it toward negative infinity instead of toward zero. That's why the quotient computes as -2 instead of -1. Most people prefer to round toward zero. Ratch

haha!

i said it earlier, two-complement sux badly :)

it wouldnt be such a headache and there would be no problems if everyone used sign bit + amplitude notation for relative numbers. i m sure there are thousands of examples where code would be simpler. And almost no case where it would in fact be harder.

one of these days i think i will draw the schematics of the simple add/sub ALU again and post it.

imo its not more complicated, after all, currently, to do a sub you must do a neg. doh!

bye

i said it earlier, two-complement sux badly :)

it wouldnt be such a headache and there would be no problems if everyone used sign bit + amplitude notation for relative numbers. i m sure there are thousands of examples where code would be simpler. And almost no case where it would in fact be harder.

one of these days i think i will draw the schematics of the simple add/sub ALU again and post it.

imo its not more complicated, after all, currently, to do a sub you must do a neg. doh!

bye

```
```

.if eax & 80000000h

mov edx,eax

dec edx

and edx,3

not edx

.else

mov edx,eax

and edx,3

.endif

sar eax,2

1111 -1 0

1110 -2 1

1101 -3 2

1100 -4 3

1011 -5 4

1010 -6 5

1001 -7 6

1000 -8 7

0111 -9 8

0110 -10 9

-7 % 4 = -3 1001

-8 % 4 = 0 1010

-9 % 4 = -1

-10% 4 = -2

neg is done by

ADD reg, 11111111

NOT reg

simplest

Ultrano,

The above code still rounds toward negative infinity and gives the wrong value for remainders of negative numbers. The following code does everything correctly with no conditional checks or jumps. How do I know that? I tested it. Ratch

```
```

.if eax & 80000000h

mov edx,eax

dec edx

and edx,3

not edx

.else

mov edx,eax

and edx,3

.endif

sar eax,2

The above code still rounds toward negative infinity and gives the wrong value for remainders of negative numbers. The following code does everything correctly with no conditional checks or jumps. How do I know that? I tested it. Ratch

```
```

MOV EAX,NUM

PUSH EAX

CDQ

AND EDX,3

ADD EAX,EDX

SAR EAX,2 ;EAX=NUM/4=quotient

POP EDX

LEA ECX,[4*EAX]

SUB EDX,ECX ;EDX=NUM%4=remainder

HeLLoWorld,

i said it earlier, two-complement sux badly

I find your statement...er intriguing. Where did you make such a statement earlier?

it wouldnt be such a headache and there would be no problems if everyone used sign bit + amplitude notation for relative numbers. i m sure there are thousands of examples where code would be simpler. And almost no case where it would in fact be harder.

Take a look at this. http://www.trotek.ec-lyon.fr/~muller/cours/numeration/abs.html.en . Is that what you want to deal with? It takes extra hardware and extra conditional testing to do Signed Magnitude Arithmetic (SMA). It might be easier for you to understand SMA, but it costs extra to build and is harder to code. If it were such a good thing, everybody would be doing it. Now, the same twos complement addition and subtraction instructions can be used for either signed or unsigned numbers. The results are the same in both cases, one just interprets them differently. The automatic rollover for twos complement is very useful too. Ratch

i said it earlier, two-complement sux badly

I find your statement...er intriguing. Where did you make such a statement earlier?

it wouldnt be such a headache and there would be no problems if everyone used sign bit + amplitude notation for relative numbers. i m sure there are thousands of examples where code would be simpler. And almost no case where it would in fact be harder.

Take a look at this. http://www.trotek.ec-lyon.fr/~muller/cours/numeration/abs.html.en . Is that what you want to deal with? It takes extra hardware and extra conditional testing to do Signed Magnitude Arithmetic (SMA). It might be easier for you to understand SMA, but it costs extra to build and is harder to code. If it were such a good thing, everybody would be doing it. Now, the same twos complement addition and subtraction instructions can be used for either signed or unsigned numbers. The results are the same in both cases, one just interprets them differently. The automatic rollover for twos complement is very useful too. Ratch

Ratch, thanks for correcting me. :alright:

yes ratch, i had never seen info on this, but its something i ve figured out myself someday.

conditional testing can be done without needing sequential machines if well done, there are not so many cases and a few logical gates do it. yes there is one adder and one subber, and the subber always subs a-b where a>b. so what?

and no i dont think its more complicated.

take a look at my previous posts here, you may find them interesting or foolish.

http://www.asmcommunity.net/board/showthread.php?threadid=16363&perpage=15&highlight=born%20earlier&pagenumber=2

if i draw the whole circuitry i ll post it someday (but pobably it already exists somewhere)

anyhow i just CANT believe "the simplest way to do it would be to convert to 2 complement before add and back to sign-ampl then"

and then code becomes so nice ... imho

conditional testing can be done without needing sequential machines if well done, there are not so many cases and a few logical gates do it. yes there is one adder and one subber, and the subber always subs a-b where a>b. so what?

and no i dont think its more complicated.

take a look at my previous posts here, you may find them interesting or foolish.

http://www.asmcommunity.net/board/showthread.php?threadid=16363&perpage=15&highlight=born%20earlier&pagenumber=2

if i draw the whole circuitry i ll post it someday (but pobably it already exists somewhere)

anyhow i just CANT believe "the simplest way to do it would be to convert to 2 complement before add and back to sign-ampl then"

and then code becomes so nice ... imho

HeLLoWorld,

You are going over ground that has been thoroughly plowed by many people for a long time now. As a matter of fact, since the dawn of the computer age. As I said before, if SMA were such a good thing, everybody would be using it. Ratch

...conditional testing can be done without needing sequential machines if well done

I don't understand what you mean by that. A computer is by nature a sequential machine. In order to follow the right sequence, it needs to do sequential conditional testing. Please explain what you mean.there are not so many cases and a few logical gates do it

Are we talking about hardware or software? Please elucidate.yes there is one adder and one subber, and the subber always subs a-b where a>b. so what? and no i dont think its more complicated.

So for SMA you need a comparision operation to direct the flow of the operation to either the adder or subtracter. Plus possibly deal with two possible representations of zero. With twos complement, addition is straight forward. Subtraction likewise. It is simply negating a number and adding it. The same adder can be used for both addition and subtraction. I can't help you if you can't see that it is easier with twos complement.take a look at my previous posts here, you may find them interesting or foolish.

I find them irrelevant. You are talking about code alignment there, which is different than what number representation is best, which is what we are discussing here."the simplest way to do it would be to convert to 2 complement before add and back to sign-ampl then"

Who are you quoting? Not me.
You are going over ground that has been thoroughly plowed by many people for a long time now. As a matter of fact, since the dawn of the computer age. As I said before, if SMA were such a good thing, everybody would be using it. Ratch

first, please dont take any offence, i m often joking when i say things shouldnt be the way they are, but i like to think about how they could be, nonetheless.

I don't understand what you mean by that. A computer is by nature a sequential machine. In order to follow the right sequence, it needs to do sequential conditional testing. Please explain what you mean.

i m talking of the fact that it would, or not, NECESSARLY, take momre than one clock to do an add/sub.

in other words, can your circuitry give the result just by putting the input on some wires and wait for the signal to propagate through the gates, or do you need a clock to do several ops sequentially.

difference between cominatory logic and sequential logic...

of course a compu is sequential, but an adder is not, and a multiplier is.

hope i made it clear, if not correct me?

Are we talking about hardware or software? Please elucidate.

i m talking about how numbers could be represented in ALUs, and thus, in programs.

i imagine an alu that is designed to deal with numbers, stored in registeres, with the sign-amplitude comvention, then performs operations on them, and the result is ialso sign-amplitude conventionned, so your program, too, when it checks things or uses algos , uses this convention, which is for my little brain at leat, FAR more intuitive.

its just a matter of point of view.

So for SMA you need a comparision operation to direct the flow of the operation to either the adder or subtracter. Plus possibly deal with two possible representations of zero. With twos complement, addition is straight forward. Subtraction likewise. It is simply negating a number and adding it. The same adder can be used for both addition and subtraction. I can't help you if you can't see that it is easier with twos complement.

i know all of this.

comparison can be done simultaneously, you just wait its over.

2 zeros... so what???no overhead at all.

yes with 2 complements you add everything the same so what?you still need a neg to do a sub , so...

the same adder can be used... so what? a few logical gates between some billions...

i just think of something btw: with "my" system (what an arrogant jerk! :) , you can do a cmp without doing a sub+test. I find this wonderful! thanx for making me realize it!

I find them irrelevant. You are talking about code alignment there, which is different than what number representation is best, which is what we are discussing here.

sorry to tell you that, but you stopped reading after the first post you saw.

this thread is long and full of junk, but later on that very page i was explaining what we re talking about.

you still can go and read it if you want. you may find it interesting or foolish. you will better understand what i say i hope.

to be clear, i m not blind, i think there wouldnt be much, if any, to gain by using this convention, 2 compl works, but i still think its a clumsy obfuscated way of seeing things.

(there would be no gain to make segs descriptors in the gdt have all bits of the "base" field stored together, but still, it would be nicer, however, thousands of processors load them successfully in the segment descriptor cache evryday many times in a second at each task switch, you see...)

dont forget errors like those rounding-towards-zero problems that showed up in ultrano s code could one day make a space shuttle explose (useless provocation, okay, but you get the idea :)

bye ratch

in fact the best one. This sometimes makes me regret I wasn't born earlier, but still there's hope :)

I don't understand what you mean by that. A computer is by nature a sequential machine. In order to follow the right sequence, it needs to do sequential conditional testing. Please explain what you mean.

i m talking of the fact that it would, or not, NECESSARLY, take momre than one clock to do an add/sub.

in other words, can your circuitry give the result just by putting the input on some wires and wait for the signal to propagate through the gates, or do you need a clock to do several ops sequentially.

difference between cominatory logic and sequential logic...

of course a compu is sequential, but an adder is not, and a multiplier is.

hope i made it clear, if not correct me?

Are we talking about hardware or software? Please elucidate.

i m talking about how numbers could be represented in ALUs, and thus, in programs.

i imagine an alu that is designed to deal with numbers, stored in registeres, with the sign-amplitude comvention, then performs operations on them, and the result is ialso sign-amplitude conventionned, so your program, too, when it checks things or uses algos , uses this convention, which is for my little brain at leat, FAR more intuitive.

its just a matter of point of view.

So for SMA you need a comparision operation to direct the flow of the operation to either the adder or subtracter. Plus possibly deal with two possible representations of zero. With twos complement, addition is straight forward. Subtraction likewise. It is simply negating a number and adding it. The same adder can be used for both addition and subtraction. I can't help you if you can't see that it is easier with twos complement.

i know all of this.

comparison can be done simultaneously, you just wait its over.

2 zeros... so what???no overhead at all.

yes with 2 complements you add everything the same so what?you still need a neg to do a sub , so...

the same adder can be used... so what? a few logical gates between some billions...

i just think of something btw: with "my" system (what an arrogant jerk! :) , you can do a cmp without doing a sub+test. I find this wonderful! thanx for making me realize it!

I find them irrelevant. You are talking about code alignment there, which is different than what number representation is best, which is what we are discussing here.

sorry to tell you that, but you stopped reading after the first post you saw.

this thread is long and full of junk, but later on that very page i was explaining what we re talking about.

you still can go and read it if you want. you may find it interesting or foolish. you will better understand what i say i hope.

to be clear, i m not blind, i think there wouldnt be much, if any, to gain by using this convention, 2 compl works, but i still think its a clumsy obfuscated way of seeing things.

(there would be no gain to make segs descriptors in the gdt have all bits of the "base" field stored together, but still, it would be nicer, however, thousands of processors load them successfully in the segment descriptor cache evryday many times in a second at each task switch, you see...)

dont forget errors like those rounding-towards-zero problems that showed up in ultrano s code could one day make a space shuttle explose (useless provocation, okay, but you get the idea :)

bye ratch

i failed to quote you properly, be sure to read what i wrote plz :)

i was quoting tenkey:

HeLLoWorld, if you were born early enough, you could have gotten the machine you are looking for.

IBM had a series of computers (704, 709, ...) that used signed-magnitude for fixed-point numbers, and the circuitry was definitely more complex. Interestingly, the simplest design was to convert negative numbers to two's complement form before addition. After addition, negative results had to be converted from two's complement form. It hasn't been totally abandoned, though. Look at floating point numbers, it uses sign + positive value representation.

btw:

You are going over ground that has been thoroughly plowed by many people for a long time now. As a matter of fact, since the dawn of the computer age

what kind of... hummmm... proof is this? :grin:

HeLLoWorld, if you were born early enough, you could have gotten the machine you are looking for.

IBM had a series of computers (704, 709, ...) that used signed-magnitude for fixed-point numbers, and the circuitry was definitely more complex. Interestingly, the simplest design was to convert negative numbers to two's complement form before addition. After addition, negative results had to be converted from two's complement form. It hasn't been totally abandoned, though. Look at floating point numbers, it uses sign + positive value representation.

btw:

You are going over ground that has been thoroughly plowed by many people for a long time now. As a matter of fact, since the dawn of the computer age

what kind of... hummmm... proof is this? :grin:

what kind of... hummmm... proof is this

in fact the best one. This sometimes makes me regret I wasn't born earlier, but still there's hope :)

HeLLoWorld,

first, please dont take any offence, ....

I'm not, I just didn't remember saying what you quoted.i m talking of the fact that it would, or not, NECESSARLY, take momre than one clock to do an add/sub.

in other words, can your circuitry give the result just by putting the input on some wires and wait for the signal to propagate through the gates, or do you need a clock to do several ops sequentially.

I am not a hardware designer. I have no circuits. However, a lot of people much smarter than either of us have collectively spent more time than both of us will ever live on makng a fast adder. Unless you can show me you have the talents of Seymour Cray, I consider that aspect of this discussion closed.in other words, can your circuitry give the result just by putting the input on some wires and wait for the signal to propagate through the gates, or do you need a clock to do several ops sequentially.

i m talking about how numbers could be represented in ALUs, and thus, in programs.

i imagine an alu that is designed to deal with numbers, stored in registeres, with the sign-amplitude comvention, then performs operations on them, and the result is ialso sign-amplitude conventionned, so your program, too, when it checks things or uses algos , uses this convention, which is for my little brain at leat, FAR more intuitive.

its just a matter of point of view.

I read the words, but I cannot compute what you are talking about, or its relevance. I don't worry about how my ALU represents numbers internally or whether it's wired up the best way possible. Nothing I can do about it. I program 'em. I don't fix 'em.i imagine an alu that is designed to deal with numbers, stored in registeres, with the sign-amplitude comvention, then performs operations on them, and the result is ialso sign-amplitude conventionned, so your program, too, when it checks things or uses algos , uses this convention, which is for my little brain at leat, FAR more intuitive.

its just a matter of point of view.

sorry to tell you that, but you stopped reading after the first post you saw.

this thread is long and full of junk, but later on that very page i was explaining what we re talking about.

You are right. I didn't, you did. You said then what you are saying now. Same arguments now apply to those pronouncements back then.this thread is long and full of junk, but later on that very page i was explaining what we re talking about.

to be clear, i m not blind, i think there wouldnt be much, if any, to gain by using this convention, 2 compl works, but i still think its a clumsy obfuscated way of seeing things.

Its more of a method. The hardware has no trouble understanding it even though humans might. In fact it was probably invented and used because it was easy to implement in hardware.(there would be no gain to make segs descriptors in the gdt have all bits of the "base" field stored together, but still, it would be nicer, however, thousands of processors load them successfully in the segment descriptor cache evryday many times in a second at each task switch, you see...)

What has that to do with this discussion?dont forget errors like those rounding-towards-zero problems that showed up in ultrano s code could one day make a space shuttle explose (useless provocation, okay, but you get the idea

The round off was not an error. It was probably not the right selection he intended. The FPU gives you 4 different rounding methods of which rounding toward minus infinity is included. You can round anyway you want in twos complement if you program it that way, so the shuttle is safe. Ratch