Greets,
I'm wondering, the current Intel/AMD docs don't seem to specify the clock cycles of the opcodes anymore (unless I'm missing something). I'm considering purchasing v-Tune just so I can (hopefully) get that information.
What I want to do is write a source scanner that will scan my MASM/TASM/FASM code and depending on the processor I choose, show me the cycles for each line. Further more, I have a goal, if it's possible, to make an Add-In for RadASM that adds real-time profiling into the code editor so I can see my clock cycles and hopefully u/v pairs as well, and then clock the loops and some others.
I know it's a daunting task. But I can only currently clock as high as the 486 but beyond that I can't seem to find any information on the clock cycles of an opcode.
Thanks,
_Shawn
I'm wondering, the current Intel/AMD docs don't seem to specify the clock cycles of the opcodes anymore (unless I'm missing something). I'm considering purchasing v-Tune just so I can (hopefully) get that information.
What I want to do is write a source scanner that will scan my MASM/TASM/FASM code and depending on the processor I choose, show me the cycles for each line. Further more, I have a goal, if it's possible, to make an Add-In for RadASM that adds real-time profiling into the code editor so I can see my clock cycles and hopefully u/v pairs as well, and then clock the loops and some others.
I know it's a daunting task. But I can only currently clock as high as the 486 but beyond that I can't seem to find any information on the clock cycles of an opcode.
Thanks,
_Shawn
I have one pdf document on AMD's Athlon, that is marvellous - it has absolutely everything you need to know to start optimizing. And you'll be pleasantly surprised by the opt. techniques. I renamed the file, so I can't tell you how to google it. I found it at the AMD site, it's 1670053 (1,670,053) bytes, and the title of the pdf is "AMD Athlon Processing \n x86 Code Optimization Guide". The last few chapters are complete tables for the instructions' timings, pipeline they take, pairing with which instructions, and so on... it's really great.
The idea of clock cycles are kind of redundant now.
Modern processors don't execute the x86 instructions, but convert them to RISC like micro-ops (although AMD calls them macro-ops). They can be executed out of order if there aren't dependancies, so this helps absorb pipeline delays, and memory latency. One instruction therefore could hold up execution for hundreds of clocks, or just 1, if the processor can find other instructions to work on, while waiting for it to finish.
That is why MASM doesn't provide a timings listing for the .686 directive, as it is almost a hinderance to you re-ordering your code.
Things like pairing of instructions are in the past, that was due simply to the architecture having two pipelines, now the division of the pipeline is done much higher up (sharing the same begining section of the pipeline, and dividing where the instruction decoder, and execution units are).
If you want to write a performance tool, you need intimate knowledge of the pipeline, and execution engines. This is why only Intel and AMD do this (with VTune & CodeAnalyist respectively), because they aren't going to tell ANYONE how their processors run.
Mirno
Modern processors don't execute the x86 instructions, but convert them to RISC like micro-ops (although AMD calls them macro-ops). They can be executed out of order if there aren't dependancies, so this helps absorb pipeline delays, and memory latency. One instruction therefore could hold up execution for hundreds of clocks, or just 1, if the processor can find other instructions to work on, while waiting for it to finish.
That is why MASM doesn't provide a timings listing for the .686 directive, as it is almost a hinderance to you re-ordering your code.
Things like pairing of instructions are in the past, that was due simply to the architecture having two pipelines, now the division of the pipeline is done much higher up (sharing the same begining section of the pipeline, and dividing where the instruction decoder, and execution units are).
If you want to write a performance tool, you need intimate knowledge of the pipeline, and execution engines. This is why only Intel and AMD do this (with VTune & CodeAnalyist respectively), because they aren't going to tell ANYONE how their processors run.
Mirno
Ya I go with Mirno, on todays CPUs that are heavily pipelined instructions take a random amount of time, so your numbers would not be perfect but estimates. But I think you should look into Ultrano's suggestion. hey Ultrano if u can can u send me the file? isaacb AT rogers DOT com (Sorry this is encoded for spam scanners)
isaacb AT rogers DOT com (Sorry this is encoded for spam scanners)
Just type you email normal and it'll be converted to an image, great feature, check my sig for an working example.
Here's the thread about this feature: Hiro, this would be nice...