hi!
i'm looking for info on the jvm, java bytecode, optimization etc...
so far i found some "introductory" stuff... but i havent searched alot.
Have any of you some good links to start from, or some reference papers still "readable"?
i think not many people are involved in this yet, hand programming java asm and stuf...
it is an intersing machine (stack, plus some regs, plus constants, but operates only on the stack...) , of course having done x86 asm can only help but its far less "tricks" to apply and more reordering/etc... less fun i think, but maybe it's because i'm too used to real GP regs and true lowlevel messing...
thank you!
i'm looking for info on the jvm, java bytecode, optimization etc...
so far i found some "introductory" stuff... but i havent searched alot.
Have any of you some good links to start from, or some reference papers still "readable"?
i think not many people are involved in this yet, hand programming java asm and stuf...
it is an intersing machine (stack, plus some regs, plus constants, but operates only on the stack...) , of course having done x86 asm can only help but its far less "tricks" to apply and more reordering/etc... less fun i think, but maybe it's because i'm too used to real GP regs and true lowlevel messing...
thank you!
its far less "tricks" like lea or sbb etc...
but i begin to suspect there are huge tricks possible with this stack system... there are instructions for inserting values two or three places from the top for example...
thing is, you dont know how to optimize. smallest numbers of bytecodes?
you dont know how it will perform on different JVMs...
for example i suspect replacing one mul with a few shifts/adds will slow things down...
and would you prefer the regs or the stacks, etc...
d'uh.
but i begin to suspect there are huge tricks possible with this stack system... there are instructions for inserting values two or three places from the top for example...
thing is, you dont know how to optimize. smallest numbers of bytecodes?
you dont know how it will perform on different JVMs...
for example i suspect replacing one mul with a few shifts/adds will slow things down...
and would you prefer the regs or the stacks, etc...
d'uh.
I dunno if hand-writing JVM will bring you much else than headaches :)
You might want to take a look at http://board.flatassembler.net/topic.php?t=5502
You might want to take a look at http://board.flatassembler.net/topic.php?t=5502
As for optimizations, there is no such thing as optimization guide, afaik, for three reasons: 1) very few people (and by few I mean that the list pretty much ends with the JMV developers and the guy who made Jasmin :D ) are using JAS. 2) There isn't much to optimize. 3) see the first sentence in f0dder` post :)
If you want size optimizations then simply try different instructions and see which produces the smallest bytecode.
Same fotr speed optimizations. A benchmarking wrapper for a function might help to determine which instruction sequence is faster.
Sorry, don't know any useful links, but you might want to google for Jasmin, JAS, java+assembler, etc.
If you want size optimizations then simply try different instructions and see which produces the smallest bytecode.
Same fotr speed optimizations. A benchmarking wrapper for a function might help to determine which instruction sequence is faster.
Sorry, don't know any useful links, but you might want to google for Jasmin, JAS, java+assembler, etc.
you will want to look at bcel
and you can find details about the bytecode spec from probably sun themselves; it'll be available in one of their specs. check the java.sun.com site.
and you can find details about the bytecode spec from probably sun themselves; it'll be available in one of their specs. check the java.sun.com site.
2) There isn't much to optimize.
I'm not so sure about it.
this stack system is awkward to me, and maybe for the compilers too.
plus, you've got the stack PLUS the regs... I dont know what is faster, duplicating the top of the stack (or burying it two values lower) or pushing a reg... but you can only make calculations onto the top of the stack.
the compiled code i've seen seems to use the regs like you do on general x86-like systems, ie making operations, storing storing results in regs, pushing regs, making ops, etc... ie not really using the stack. seems like alot of fuss to me. maybe you could save a lot of these stores if you cared. (again, maybe depends of JVM implementation, maybe no point if JITed, etc... but who can tell)
maybe things will change, maybe what i've seen wasnt a good example.. but it could be interesting.
oh i'm no particular java fan, just that i use it here where i do my internship...debugging j2me games on mobilephones...often configs dont have floating point nor any math lib etc. and its still somehow interesting.
It would be hard to make a correct optimizations since you'll nee to make sure that it actually gives a positive results on different JVMs and different JITs (if applicable).
Only optimizations that could be done imo are the usual HLL optimizations: algebraic simplification, constant folding/propagation, etc. It's not like you need to worry about instruction parallelization, fetch blocks and stuff like that.? :) Thats what I meant by 'There isn't much to optimize.'
I remember when I was fooling around with JAS the resultant bytecode had greatly reduced size than it's Java equivalent. Never did speed benchmark though.
It would be interesting to see how some algo written in low-level Java performs against it's older brother.
By the way,
JVM is a stack based machine. There is no such thing as registers? :). Except for the program pointer and three stack pointers registers (or two stack pointers.. don't remember for sure.) which you can't use other than for their initial intend.
Only optimizations that could be done imo are the usual HLL optimizations: algebraic simplification, constant folding/propagation, etc. It's not like you need to worry about instruction parallelization, fetch blocks and stuff like that.? :) Thats what I meant by 'There isn't much to optimize.'
I remember when I was fooling around with JAS the resultant bytecode had greatly reduced size than it's Java equivalent. Never did speed benchmark though.
It would be interesting to see how some algo written in low-level Java performs against it's older brother.
By the way,
JVM is a stack based machine. There is no such thing as registers? :). Except for the program pointer and three stack pointers registers (or two stack pointers.. don't remember for sure.) which you can't use other than for their initial intend.
okay, i meant local variables instead of registers.
like iload_1, iload_2 etc.
As I said.
but:hard and hazardous, but maybe worth it. after all, there are some "general rules of thumb" on x86 even though each processor is different.
thank you! thank you! :D thats what i was looking for.
was it on a GHz beast with a complex jvm?
bytecode size or bytecode number seems to be the only decent way of measuring speed without testing...
still...more weight for a mul than an add? and what bout jumps? d'uh...
however: maybe on a tiny mobilephone with no JIT and a dull interpreter, no floats and all, it WOULD be worth, and bytecode number is a good indication of speed.
again, maybe not... i dont pretend anything. if i do some tests i'll share tem if i have the time!
like iload_1, iload_2 etc.
It would be hard to make a correct optimizations since you'll nee to make sure that it actually gives a positive results on different JVMs and different JITs (if applicable).Only optimizations that could be done imo are the usual HLL optimizations: algebraic simplification, constant folding/propagation, etc. It's not like you need to worry about instruction parallelization, fetch blocks and stuff like that.? Smile Thats what I meant by 'There isn't much to optimize.'
As I said.
but:hard and hazardous, but maybe worth it. after all, there are some "general rules of thumb" on x86 even though each processor is different.
I remember when I was fooling around with JAS the resultant bytecode had greatly reduced size than it's Java equivalent. Never did speed benchmark though.
thank you! thank you! :D thats what i was looking for.
was it on a GHz beast with a complex jvm?
bytecode size or bytecode number seems to be the only decent way of measuring speed without testing...
still...more weight for a mul than an add? and what bout jumps? d'uh...
however: maybe on a tiny mobilephone with no JIT and a dull interpreter, no floats and all, it WOULD be worth, and bytecode number is a good indication of speed.
again, maybe not... i dont pretend anything. if i do some tests i'll share tem if i have the time!
for example, i've seen this twice in the compiled (and obfuscated i think) code:
(branch if local var is zero)
when it could be
or am i missing something?
is ieq a newer instruction or does it differ somehow?
and also things like
when it could be
(branch if local var is zero)
iconst_0
iload_2
if_icmp label
when it could be
iload_2
ifeq label
or am i missing something?
is ieq a newer instruction or does it differ somehow?
and also things like
iload_3
iload_3
when it could be
iload_3
dup
ifeq tests only if value is zero, while if_icmpeq can compare any two values. Other than that I see no difference.
In the example you gave, second version indeed looks like a better choice.
For the second example I am not sure. It might be that dup is actually slower since it needs to pop and push twice, while first example only does two pushes, but on other hand it's shorter...
In the example you gave, second version indeed looks like a better choice.
For the second example I am not sure. It might be that dup is actually slower since it needs to pop and push twice, while first example only does two pushes, but on other hand it's shorter...
don't forget about the hotspot compiler as well, that will compile on the the fly relevant parts.
activated via the -server option when running a java app.
it would be interesting to compare your optimisations of various code segements against the unopimized running of a -server version.
if you optimise it in a way that the jvm cannot easily optimise further, you may make things worse :)
activated via the -server option when running a java app.
it would be interesting to compare your optimisations of various code segements against the unopimized running of a -server version.
if you optimise it in a way that the jvm cannot easily optimise further, you may make things worse :)