Hi, I've been here before but it's a looong time ago, and at that time I started a project called ML(Math Language) it was supposed to make it easier to type formulas etc... in assembly language, but it never really worked, well some weeks ago I had an idea of making my own private language(With compiler) and then expanding it as needed etc... well this might sound crazy but I actually succeded in making the math to assembly convert, well I havn't written the actual code, but it breaks down the math formulas, based upon signs(sign significance: Multiplication over Addition etc...) and it places Brackets first. It works, it is not completly optimized to short up memory use, and to minimize calculations, but it does a few dummy corrections, so everything should run, and an expert assembler can easy see the use of it, and probably can easly adjust it's errors to make it 100% optimized and save time using it etc... but right now it takes a formula like "A=10^2+10", and breaks it down to:

where the Temp_B is a temporarly variable called B, this is something the function creates itself based on the ascii codes etc.... As you see, if you try a harder formula like "A=10^2+10-12*62/1^(51*3)" it outputs:

I've included the program in the attachment.

Anyway the problem is now converting these Calculations(NOT MATHEMATICAL EXPRESSIONS) into assembly, and I want to minimize code space(not actual binary space though :wink:), and so I've come up with the following macros:

Now I am positive that this is either not optimized or not working at all, and that is why I've posted it here, because the macros I need are like "A = B + C", "A = B - C", "A = B * C" "A = B / C" etc... So I was wondering if anybody could help me out...

Thanks, and please commet the attached program.

- Julian

```
```

Temp_B=10^2

A=Temp_B+10

where the Temp_B is a temporarly variable called B, this is something the function creates itself based on the ascii codes etc.... As you see, if you try a harder formula like "A=10^2+10-12*62/1^(51*3)" it outputs:

```
```

Temp_B=51*3

Temp_C=10^2

Temp_D=1^Temp_B

Temp_E=12*62

Temp_F=Temp_E/Temp_D

Temp_G=Temp_C+10

A=Temp_G-Temp_F

I've included the program in the attachment.

Anyway the problem is now converting these Calculations(NOT MATHEMATICAL EXPRESSIONS) into assembly, and I want to minimize code space(not actual binary space though :wink:), and so I've come up with the following macros:

```
```

Addition macro SrcF:DWORD, SrcS:DWORD

XOR EAX, EAX

ADD EAX, SrcF

ADD EAX, SrcS

ret

Addition endm

Subtraction macro SrcF:DWORD, SrcS:DWORD

XOR EAX, EAX

ADD EAX, SrcF

SUB EAX, SrcS

ret

Subtraction endm

Multiplication macro SrcF:DWORD, SrcS:DWORD

XOR EAX, EAX

ADD EAX, SrcF

MUL EAX, SrcS

ret

Multiplication endm

Division macro SrcF:DWORD, SrcS:DWORD

XOR EAX, EAX

ADD EAX, SrcF

DIV EAX, SrcS

ret

Division endm

Now I am positive that this is either not optimized or not working at all, and that is why I've posted it here, because the macros I need are like "A = B + C", "A = B - C", "A = B * C" "A = B / C" etc... So I was wondering if anybody could help me out...

Thanks, and please commet the attached program.

- Julian

This sounds like a fun project. :) i'm sorry i cant help you but i alwasy wanted to do some sort of a math language or lib. But my knowledge about assembler is poor(as my english) . :lol:

You can replace "clear and add" sequence with "load"

Replace

with

Also, you need to set EDX before dividing.

If you want negative numbers use

If you want only positive numbers use

If you're not using CALL to run the macro code, remove the RET instructions.

Replace

```
xor eax,eax
```

add eax,srcF

with

```
mov eax,srcF
```

Also, you need to set EDX before dividing.

If you want negative numbers use

```
cdq
```

idiv srcS

If you want only positive numbers use

```
xor edx,edx
```

div srcS

If you're not using CALL to run the macro code, remove the RET instructions.

Also why not expand it to FPU and SSE, afterall most of the time when dealing with equations you'll want floating point numbers :) .

Matlab is really cool - student copy is $99. :) I'm not trying to pull you away from your coding, but I think it is important to see what is out there. You might want to borrow features for your own work. Matlab actually compiles the code internally to use optimized matrix libraries - impressive to say the least! The interface has some incredible ideas as well, but the language itself is strange = high learning curve. Thinking about problems from a matrix perspective is interesting as well.

Octave if a free matlab compatible program. Its not as good as matlab but I use it when working on my college matlab projects at home. Its worth checking out before deciding to part with money.

Oh and the new ms powertoy calculator is really cool too :)

Oh and the new ms powertoy calculator is really cool too :)

Hi, I'd love to add FPU, but I have no idea as of how to implement it, I tried last time and failed... anyway here is new code:

Notice that I've included signed and unsigned :), as for optimizing. Though while these functions are great, I want my program code to be as small as possible, so the more macros,and the less comands in each, the better.

In FPU do I just do:

?.

Thanks for your help.

- Julian

```
```

Addition macro Dst:DWORD, SrcF:DWORD, SrcS:DWORD

MOV EAX, SrcF

ADD EAX, SrcS

MOV Dst, EAX

Addition endm

Subtraction macro Dst:DWORD, SrcF:DWORD, SrcS:DWORD

MOV EAX, SrcF

SUB EAX, SrcS

MOV Dst, EAX

Subtraction endm

Multiplication macro Dst:DWORD, SrcF:DWORD, SrcS:DWORD

MOV EAX, SrcF

MUL EAX, SrcS

MOV Dst, EAX

Multiplication endm

IMultiplication macro Dst:DWORD, SrcF:DWORD, SrcS:DWORD

MOV EAX, SrcF

IMUL EAX, SrcS

MOV Dst, EAX

IMultiplication endm

Division macro Dst:DWORD, SrcF:DWORD, SrcS:DWORD

MOV EAX, SrcF

CDQ

DIV srcS

MOV Dst, EAX

Division endm

IDivision macro Dst:DWORD, SrcF:DWORD, SrcS:DWORD

MOV EAX, SrcF

CDQ

IDIV srcS

MOV Dst, EAX

IDivision endm

Notice that I've included signed and unsigned :), as for optimizing. Though while these functions are great, I want my program code to be as small as possible, so the more macros,and the less comands in each, the better.

In FPU do I just do:

```
```

FPUAddition macro Dst:REAL10, SrcF:REAL10, SrcS:REAL10

FpuAdd ScrF, ScrS, Dst

FPUAddition endm

?.

Thanks for your help.

- Julian

FPU is a bit messy casue of the stack, it can be dome easilly enough but to do it optimally would require keeping intermediate results on the stack and keeping track of them, not very nice :o .

An example of the easy way would be to have macros like

Where all operands are memory ones.

SSE on the other hand can be done with registers like in the integer MACROs.

An example of the easy way would be to have macros like

```
FPUAddition macro Dst:REAL10, SrcF:REAL10, SrcS:REAL10
```

fld SrcF

fadd SrcS

fstp Dst

FPUAddition endm

Where all operands are memory ones.

SSE on the other hand can be done with registers like in the integer MACROs.

So this will work?

- Julian

```
```

FPUAddition macro Dst:REAL10, SrcF:REAL10, SrcS:REAL10

FLD SrcF

FADD SrcS

FSTP Dst

FPUAddition endm

FPUSubtraction macro Dst:REAL10, SrcF:REAL10, SrcS:REAL10

FLD SrcF

FSUB SrcS

FSTP Dst

FPUSubtraction endm

FPUMultiplication macro Dst:REAL10, SrcF:REAL10, SrcS:REAL10

FLD SrcF

FMUL SrcS

FSTP Dst

FPUMultiplication endm

FPUDivision macro Dst:REAL10, SrcF:REAL10, SrcS:REAL10

FLD SrcF

FDIV SrcS

FSTP Dst

FPUDivision endm

- Julian

It certainly should.

Ok, here is new version, please commet any weakness, since I will probably be remaking it a couple of times to get 100% performance 8)

- Julian

- Julian

Its nice alright, though if you want to be producing optimal code then then I think you'll need to move away from macros and output asm code.

You should get some info about optimizing multiplications and such thihgs, for example, if you the operation is: a = eax*2

I guess the output will be:

shr eax, 2

mov , eax

also how lea is used, but if for example: a = eax+ 3^2+2

will be first "reduced" to a= eax+11

The point is try do the less operations posible, mean that you will have constant and not constant values in a expression.

A constant value that only depend in constants values can be calculated like the second example, and a expression that depend of not constant values like the first, pheraphs can be optimized.

Then I see three diferent point that you should care about the expresion:

1) Check what constanst can be obtained at first and calculate them, you see a easy way for do this, with the things that you have done for now in your application? ;)

2) When you obtain this "reduced" expression, check what expressions can be handled with lea, shifts and pheraphs others (what others?)?

3) handle the rest of the expression in the normal way, that mean doing the rest of the operations without optimization, because cant be optimized more, if is well done the anterior step.

Also for handle fpu and mmx, pheraphs, after you can do the simple thing, can you add some flags to be set in your programm, like a selection choise, and let the user for this options.

About your programm, check that the output you are moving eax to eax, when pheraphs that can be valid for optimization????, is a nop instruction

Then my sugestion, understand the macros for multiply, add, ... , in fact understand how this operations take place, example, how many is:

101*11?, simple, follow this rules:

0*1 = 0

1*1= 1

0+0=0

0+1=1

1+1= 10 (carry out)

When you multiply:

12

*32

-----

024

+36?

------

384

Is the same for binary digits

101

*11

------

0101

+1010 ; extra zero, like when multiply decimal numbers

-------

1111

If you do that with bytes, you will see for what the result is the double size than the operands, and for you can have signed and unsigned, following the two complement http://folk.uio.no/botnen/intel/vt/reference/ see cdq and mul.

Also the things about extend to mmx and others, pheraphs is best that depend on what are the variables implicated?, mean that you can enter in your programm declaration for the types, or that you can parse a asm file, read, and watch for the expresions that you whant your programm handle them:

option casemap:none

.....

result dd 0

......

mov eax, ebx

push ecx

= (eax*3)^ebx*sin(ecx-eax)

....

And output the same code, with this line commented, and the iutput of your programm writed down in assembly instructions... ;), is a hard work, I guess. But I see that you will do it.

That is what I sugest.

I guess the output will be:

shr eax, 2

mov , eax

also how lea is used, but if for example: a = eax+ 3^2+2

will be first "reduced" to a= eax+11

The point is try do the less operations posible, mean that you will have constant and not constant values in a expression.

A constant value that only depend in constants values can be calculated like the second example, and a expression that depend of not constant values like the first, pheraphs can be optimized.

Then I see three diferent point that you should care about the expresion:

1) Check what constanst can be obtained at first and calculate them, you see a easy way for do this, with the things that you have done for now in your application? ;)

2) When you obtain this "reduced" expression, check what expressions can be handled with lea, shifts and pheraphs others (what others?)?

3) handle the rest of the expression in the normal way, that mean doing the rest of the operations without optimization, because cant be optimized more, if is well done the anterior step.

Also for handle fpu and mmx, pheraphs, after you can do the simple thing, can you add some flags to be set in your programm, like a selection choise, and let the user for this options.

About your programm, check that the output you are moving eax to eax, when pheraphs that can be valid for optimization????, is a nop instruction

*iirc*, you can keep the track of the used registers and variables, also pheraphs for what registers should be keeped.Then my sugestion, understand the macros for multiply, add, ... , in fact understand how this operations take place, example, how many is:

101*11?, simple, follow this rules:

0*1 = 0

1*1= 1

0+0=0

0+1=1

1+1= 10 (carry out)

When you multiply:

12

*32

-----

024

+36?

------

384

Is the same for binary digits

101

*11

------

0101

+1010 ; extra zero, like when multiply decimal numbers

-------

1111

If you do that with bytes, you will see for what the result is the double size than the operands, and for you can have signed and unsigned, following the two complement http://folk.uio.no/botnen/intel/vt/reference/ see cdq and mul.

Also the things about extend to mmx and others, pheraphs is best that depend on what are the variables implicated?, mean that you can enter in your programm declaration for the types, or that you can parse a asm file, read, and watch for the expresions that you whant your programm handle them:

option casemap:none

.....

result dd 0

......

mov eax, ebx

push ecx

= (eax*3)^ebx*sin(ecx-eax)

....

And output the same code, with this line commented, and the iutput of your programm writed down in assembly instructions... ;), is a hard work, I guess. But I see that you will do it.

That is what I sugest.

Thanks for your suggestions. I'll try doing as many as possible, but about the last one(how to write it in assembly), well it won't be totally like that, it will be like a GUI interface, where you can drag & drop windows, menus, buttons, labels etc... and then when you click one you get to the procedure in assembly, and of course then there are non-windows procedures like "Start" and "Main", and variables will be created differently... I havn't figured out the total design, but it's purpose is to save coding time, and it needs a GUI designer, so much I know.

- Julian

- Julian

Just a question, if I had the choice bettwen DWORD variables and stack memory(Push/Pop), which would be the best to use? Also are there any limitations on the stack memory beside from your computers memory?

- Julian

- Julian

Long time no see ;)

Registers are the fastest, local (stack) and static (memory) variables are much slower. Local variables are slightly faster than static ones.

As for the FPU, check this out:

http://en.wikipedia.org/wiki/Reverse_Polish_notation

With the RPN you can convert a formula like this:

((x + (y * z) / w

into this:

x y z * w /

which makes no sense to a human but it's perfect for a computer, since it translates directly into instructions for stack-based machines (like the Intel floating point unit):

There's also another advantage in using floats instead of ints: rounding. Suppose you have this formula:

1 / 2 * 2

The result should be 1. But with integer opcodes you round down all the intermediate results, so you get:

1 / 2 * 2 =

since 0.5 is rounded to 0. Floats are slower but the accuracy is higher -- intermediate results are calculated with decimals, so the above calculation would work.

Hope that helps :)

Registers are the fastest, local (stack) and static (memory) variables are much slower. Local variables are slightly faster than static ones.

As for the FPU, check this out:

http://en.wikipedia.org/wiki/Reverse_Polish_notation

With the RPN you can convert a formula like this:

((x + (y * z) / w

into this:

x y z * w /

which makes no sense to a human but it's perfect for a computer, since it translates directly into instructions for stack-based machines (like the Intel floating point unit):

```
```

fld x ; x

fld y ; y

fld z ; z

fmulp ; *

fld w ; w

fdivp ; /

fstp r ; r could be the result

There's also another advantage in using floats instead of ints: rounding. Suppose you have this formula:

1 / 2 * 2

The result should be 1. But with integer opcodes you round down all the intermediate results, so you get:

1 / 2 * 2 =

**0*** 2 =**0**since 0.5 is rounded to 0. Floats are slower but the accuracy is higher -- intermediate results are calculated with decimals, so the above calculation would work.

Hope that helps :)

How would you do fast assembly calculations? What's the fastest and most stabile way to do it? I plan on integrating it all, where it tries to use EAX as much, then it will use whatever else is fastest etc... I know this is hard work, but I'd like to gain a performance which is faster then VB's, more stable, and of course only slightly less optimized then the popular C++ compilers. Any ideas are truely appreciated. Ohh, and thanks for the Reversed Polish notation, I'll look at it.

- Julian

- Julian

If you want to come close to what C++ compilers are doing, then you will need to study existing compiler optimizations. It is an ongoing research area because "there's no such thing as the fastest code" (M. Abrams).

A simple algorithm for improving expression code is the Sethi-Ullman register allocator. You construct a data structure known as the

To get better results, you will need to hold several instructions in memory before outputting them to the target file. You will need to perform redundancy elimination on a block of code containing several instructions.

One simple optimization is

More powerful register allocation and redundancy elimination techniques require you to be comfortable with graph theory, set theory, and complex data structures (linked lists, trees, networks).

A simple algorithm for improving expression code is the Sethi-Ullman register allocator. You construct a data structure known as the

**expression tree**and use that to select registers before or while creating executable code. It apparently creates the best register allocation for a single expression.To get better results, you will need to hold several instructions in memory before outputting them to the target file. You will need to perform redundancy elimination on a block of code containing several instructions.

One simple optimization is

**peephole**optimization. It basically is a set of*ad hoc*optimizations which reduce common instruction sequences with more optimal sequences. You can remove redundant "register loads" within the framework of peephole optimization.More powerful register allocation and redundancy elimination techniques require you to be comfortable with graph theory, set theory, and complex data structures (linked lists, trees, networks).

Newest version, based on a new principle and way of thought. It is based on that all expressions can be broken down to a tree of calculations, my previous version did this, but without spacing and without reusing memory. Ok, tell me what you think, and please test it's limit, by the way it onyl supports "+" and "-".

- Julian

- Julian

If any of you know any readings or alike(about Parsing or making your own compiler 8)), then please don't hesitate to tell me. Currently I am going to read:

http://tao.doc.wustl.edu/misc/

Maybe I'll implement more types of math parsers, then when you use the compiler you can chose the one which suits you(Speed, Memory, Readability etc...). Suggestions and comments are appreciated. Thanks again for the community help.

- Julian

http://tao.doc.wustl.edu/misc/

Maybe I'll implement more types of math parsers, then when you use the compiler you can chose the one which suits you(Speed, Memory, Readability etc...). Suggestions and comments are appreciated. Thanks again for the community help.

- Julian