A few months ago I decided to try something which I later found out almost became something like my worst nightmare.

There exists lots of java obfuscators and I so I thought why not make one for native code using asm.
My aproach would be to obfuscate C and C++ code by using MS VC++ 6.0's /FA switch. The compiler
will output a asm listing for the source files and using masm I reassemble the asm listing into new object files.
The obfuscator I wrote goes in between the compiler and the assembler and processes the asm listing.
So it goes like this:
Cl.exe -> (myfile.asm) -> obfuscator -> (new file with same filename) -> ml.exe
With masm I compile with the following switches: /c /coff /Zm /Cx

All this works in most cases. However in some cases CL doesn't produce code that ML can eat and this is where the problems starts.

Because masm doesn't support COMDAT. After alot of research I found a guide on comdats (http://www.launcherasm.com/technical/comdats.html) which describes this problem but
even though I wrote my own comdat enabling code I didn't manage to get the linker satisfied.

In most cases the problem gets shown as errors like these:

libc.lib(crt0msg.obj) : error LNK2005: "`string'" (??_C@_02JJJH@?6?6?$AA@) already defined in myfile.obj
myfile.exe : fatal error LNK1169: one or more multiply defined symbols found

The problematic code in question is CL's own generated code which looks like this:

; COMDAT ??_C@_02JJJH@?6?6?$AA@
_DATA SEGMENT
??_C@_02JJJH@?6?6?$AA@ DB 0aH, 0aH, 00H ; `string'
_DATA ENDS

This is nothing more then 2 new lines but enough to make lots of problems.

There are others problems which I havn't spoken about yet such as keywords which masm doesn't like or special conditions like VC++ exception handling which uses FS segment and makes masm error out with lots of errors for no reason at all. All this can be solved though with a bit effort.

Another problems is redefinition of segments inside each other. This happens especially with MFC code which is another clear incompatibility between CL and ML.

I've probably spent much more time on this then most people would. The reason why I havn't given up yet is because I know it can be done. I have been given a object file (processed with their obfuscator) from another company who did it. By researching their file I discovered a bit of what they did.
They used a Microsoft compiler thats for sure. I am pretty sure they wrote the original source in C
atleast a HLL and not asm. Their object file consist of .text, .data and .bss sections. No comdat was enabled for either of the sections. Comparing with the comdat enabling guide there should be a section for each comdat enabled symbol so it seems they have no comdat symbols at all.
So this means if this should work it must be possible to redo the assembler file so it won't depend on the same symbol names on multiply files.
If I would rename all symbols I would get problems with known symbols like procedures across more files and libc functions.

I don't expect any real answer because I've studied more on this then most would have bothered.
If you have some clues or reallife experience with either reassembling using masm or comdat I'd like to get a comment though :)

What can I say? I'm trying to fix microsofts own bugs, no? :eek:

// CyberHeg
Posted on 2003-05-28 07:37:15 by CyberHeg
Hi,

I am facing the same problem.

Have you come up with any solutions after so many years?


Regards,
boycoder
Posted on 2011-01-11 20:58:37 by boycoder
I don't see the problem really.
It seems this all stems from the assumption that the assembly listing that cl.exe outputs would be compatible with MASM. Apparently it is not, and I don't see any reason why it should be (unlike for example gcc, Microsoft's compiler is not just a frontend, which pipes its output through a separate assembler executable, but generates the code directly). The assembly output is merely for diagnostic purposes.
If you can generate the assembly listing with cl.exe, then apparently you have the sourcecode, and you can use cl.exe to generate the binary directly, no need for MASM.
If you don't have the sourcecode, then how did you get an assembly listing? Whoever provided you with the assembly listing can also provide you with the sourcecode or the binary.
I think that's more of an ethical issue than any kind of practical problem, let alone a 'bug', caused by Microsoft, which you need to 'fix'.
Posted on 2011-01-12 04:52:25 by Scali
For the purposes of obfuscating C/C++ sourcecode, it would seem prudent to take advantage of the compiler's macro support.
Basically your obfuscator program is then a 'pretty printer' which takes the subject sourcecode and mangles it, emitting an output sourcecode consisting of macro declarations that appear to be garbage, and a handful of macro definitions which are the key that the compiler needs to demangle it at buildtime.
The trick then is to determine suitable substrings for macro-replacement based on the input source!
Posted on 2011-01-12 06:31:19 by Homer