Greets,

I've brought up the idea of an assembly language interpreter before and had no welcoming for it. Due largely to a misunderstanding. I intend to create a debugging tool. Not an interpreter, though it is. I have a few questions.

I am planning right now a special virtual machine that interprets x86 assembly. The byte code will actually be the Intel/AMD opcodes -- as a base. Then, add-ins can be created to add a MASM/TASM/SPASM/FASM/NASM/Etc. layer on top of that.

The goal is simple: to be able to modify the source code during run-time, and interactively view the happenings of your source code. True, a debugger can let you do that, kind of. What debugger will allow you to add/remove/modify source code during run-time? None.

What's the best approach? The way I'm doing it now, is by creating a list where each node contains information about each line of code. The VM will iterate through each node and execute the instruction or branch accordingly. This way, it can detect a brake point, break, and while in break mode, allow you to modify the source. Because it's in a list, you can add or remove lines easily and the VM can adjust itself for branches and memory locations and stuff quite easily without a "virtual recompile". There will be some overhead for each node, but that's a given.

Or, as I've wondered, should I create a virtual memory space and place each opcode into the space contiguously, as would a real executable binary, and then keep track of the break points via some internal list that contains all the break points. If a line is added to code, then create a jump or something pointing to the added (or modified) or skipping the removed code and somehow accounting for memory locations and offsets somehow? I think this would be tough, but most desirable.

The second question, what about callback? Any ideas? I can't seem to understand how. If it's being interpreted and it encounters a call back, how does it handle the callback effectively? I've thought about creating a "stub" procedure that is a real executable entry point telling the VM to then pass control onto the virtual callback function to be dealt with. Perhaps that's how it has to be done. Any ideas?

Next, Macros. I would have to expand the macros during one of the passes. But to allow it to be modified during runtime may not be feasible. Do you step trace through the macro code, or the expanded code? Does it jump into the actual macro during runtime and you modify it there and all other locations where the macro is used is updated withit? Do I modify it where it occurs and it simultaneously updates all other occurances of that macro, including the definition?

And final question I have: since I'd like to be able to allow people to use libs, even if there is no source code... is a lib file already compiled and ready to go? So if I memory map a lib and find the function entry points, can I make a call into that memory address and have executable code? Or does the linker do something to it during the linking process?

I'd eventually also like to allow you to debug a DLL. For example, say you create a RadASM Add-In DLL that you'd like to debug. I'm thinking about creating a proxy DLL that will notify the VM when a function is used that tells the VM to start "virtually executing" the source code for the DLL. The end result, your using RadASM (or some other) and hit a breakpoint in the DLL source, it'll go into break mode and you can debug or modify source and correct the issue or not, but everything still works. Borland, Metrowerks, and Microsoft have similar technology.

I've ran the idea before and people have made it clear that they would rather use a debugger and just compile. So be it, but I like being able to modify source code during runtime and interact with it as I please without recompiles. That's just me.

What are your thougts and ideas about how this might be achieved? I'm not trying to recreate the x86 CPU. I only want to be able to interactively debug and modify source during run time (something we Visual Basic programmers are spoiled by). I know there's much to account for and consider, but I think I can make it work. Perhaps with some restrictions, but for the most part, if I can run Hutch's Boyer Moore and The Svin's algo's through it and make changes interactively, that would be cool.

I'll try to make some other info avaiable, such as clock cycles, possible prediction stalls, and so on... but one step at a time. I'll be programming in C/C++ using VC++ 7 (or Metrowerks Codewarrior For Windows 8.0).


Thanks,
Shawn
Posted on 2002-05-27 03:38:00 by _Shawn
_Shawn,
I believe that the kind of virtual machine you describe would require so many and so extensive non-standard system interfaces, that test results achieved with it would not be representative of what you would get with the same code assembled to normal executable files. That is the only 'gripe' I have with the idea, but that one is significant.

On the other hand, what is the real motivation for having such a VM ? As you hinted, and I agree, the main motivation is to be able to work interactively with assembly code, much as one can do with an interpretive HLL.

That goal is not quite reached by normal debuggers, but I think it would be better to improve them in their abilities to handle runtime symbolical modifications, than to build a VM of the kind you describe. And with a debugger perfectly normal system interfaces can be used, so test results would be more valid.

As to one of your questions, concerning linkers, I'm afraid your idea of simply loading libs in without linking would not work. Though I haven't yet studied the ones for Windows executables, all linkers I know of do modify the code from the linked files in producing the executables. And further modifications are then made when an executable is loaded for running. This is necessary to resolve their cross references, as well as to relate them to runtime locations. Such linking would have to be emulated by the VM you want.

If you still want to go ahead with it, I can only wish you luck, but I really think you would do better investing your time in other projects.
Posted on 2002-05-27 04:31:22 by RAdlanor
Is this run time assembler something like you are talking about?

http://europa.spaceports.com/~schueler/asm/
Posted on 2002-05-27 10:24:11 by alpha
Is this run time assembler something like you are talking about?


Not quite... I'm not trying to make a scripting language out of this. Just want to be able to change source code while it's being executed. Only way I know of is is to make it interpreted. The other thing I can do, is, if a line of code is modified, perhaps I can rewrite that entire procedure into a different area of memory and then begin execution at there?

Thanks,
Shawn
Posted on 2002-05-27 15:49:44 by _Shawn
What you are trying to achieve sounds to me a helluva lot like an emulator. Now before you go forming a lynch-squad, just hear me out.
Your program is going to store the asm source in some sort of logical database, whether it be truly tree-based, or whether it is a type of multilinked-list of some other flavour, the point is that YOUR program will be EMULATING the source program by providing a safe runtime environment in which it is possible, just like a tracing debugger, to STEP through the source program, and interrupt it's program-flow at any time... right?
I think you should consider coding it using the same methodology as used in a runtime tracing debugger, ie use INT 3 and the cpu in step mode, but use HLL ideology in your approach to the actual application.
My concept would involve a realtime assembler which could extract full statements by parsing the source intelligently. It would assemble a single MASM statement into one or more opcodes of the instruction-set named at the top of the source, and then call the interrupt to throw the cpu into STEP mode.
Like any debugger, the application would keep the state of the cpu, but this one would have THREE basic modes of operation:
Mode 1 would be Find Problem mode.
It would STEP right through the program until it identified an illegal condition, and then tell us about it.
Mode 2 would be Single Instruction mode.
It would STEP through the OpCodes generated by a single MASM statement as defined above.
Mode 3 would be our regular lowlevel OpCode stepper, as in any machinecode debugger.

I will get banged on the knuckles by a moderator if I suggest any way to ensure your application interprets a MASM instruction to produce the same OpCodes as ml.exe
So I am not going to.
I'm not sure what the copyright standing of ml.exe is.
I think it has a few bugs anyway, as every now and then I write some source which I am SO sure is bug-free but won't compile.
I end up recoding that section using different source but the same logic and it compiles. Go Figure.
I think your application could be invaluable as an aid in optimizing code but I think you might need to be careful with copyright breach over syntax etc - (is that why compilers all have their own syntax for anything thats higher than a cpu instruction?)
Anyway, I see where you're going, and the best of luck with it.
Posted on 2002-05-28 01:47:43 by Homer
Is it possible for an application to "watch" a particular register, such as EIP, so I can emit opcodes (remember, my bytecodes will be same as intel x86) in a contiguous space of allocated memory and if something begins executing a proc I can intercept it and act accordingly?


Would it be desirable to read each opcode from the source code and emulate the functionality thereby, or, to "precompile" into the bytecode and read it there?


Thanks,
Shawn
Posted on 2002-05-28 04:15:15 by _Shawn
I'm not sure how feasible any type of direct execution would be in
a project like this. You have to keep a pretty complete CPU context,
and things get pretty messy when you start doing calls to external
code (ie, the windows API). I especially find the idea of threads and
callbacks frustrating (though callbacks can probably be solved by
proxies - you still have to find some way to identify callback functions,
perhaps by keeping a full list of APIs that take callback parameters).

Direct Execution would require you to put opcodes in memory just like
an assembler+linker would have done it. As I see it, this would make
it harder keeping per-instruction / per-source-line information. Also,
when modifying source lines, you'd need to re-build the entire thing,
and be able to track the old source "eip" position into the new machine
code position - this could also prove pretty messy.

Your project seems to me like a mix of an assembler+linker, debugger,
and machine emulator (like vmware or bochs). Each part is pretty
complicated by itself, but a mix? Eeeeeeek :).

You cannot 'watch' registers. P4 has some new interesting debugging
stuff, but I haven't really looked into it - I do know, though, that
it can generate breakpoint on branch, which could be useful to (try to)
process branches to foreign code, which should be nice if you go for
a direct execution model. I guess you can see single-step (int1, not int3)
tracing as watching EIP, as you will have a breakpoint per cpu instruction.
"pushfd / or dword ptr , 100h / popfd".
Posted on 2002-05-28 05:28:07 by f0dder
I have a ton of notes on this exact thing - a kind of out-of-order or non-linear assembler with all the features of a debugger. I want to be able to execute code fragments, and display huge amounts of info on code as it is typed. All this requires interface definitions for external code (APIs, libraries, etc.) It's not a small project by any means. I've typed lots of my ideas into messages on this board, so I wont repeat all of that here.
Posted on 2002-05-28 07:21:28 by bitRAKE
I didn't think I'd get much support on this. I don't need any. I have a protype somewhat working right now in C#, basically, emulating each instruction from a linked list. I will optimize it in C++ later to behave differently. My only problem is that are 527 individual opcodes on the P4 processor. Not counting AMD (which I haven't yet looked at). Withing that, are countless variations. Even typing the opcodes into an enum was a pain. It'll be a while before a fully functional emulator is available but I'm doing good so far for what I've got.

This will be able to double as an assembly scripting language, as well. That's by design.

Someone earlier pointed me to a website that had source to a dynamic runtime-assembler... I forgot which thread.

What I will do, in the optimized form, is create a stream (rather than a list) of opcodes (like an actual binary) and emulate each function similar to how the CPU would execute. I've thought whether I should pad each instruction with a NOP or an INT3 if there's a breakpoint.

If source code is changed during runtime, using the run-time assembler methodology, just have it "re-assemble" that procedure (or other area modified) into another memory location and then execution then continues at the next instruction in the re-assembled source. That way, it has the effect of run-time editing. I'm working on a proof of that and if I get it working, I'll post the first beta for people to use. People may not like having to install the .NET framework... that's up to them. But the final product won't use the .NET runtime. Only the prototype.


Thanks,
_Shawn
Posted on 2002-05-31 02:42:57 by _Shawn