Assembling and disassembling

Disassembler (Disasm()) is the main part of the OllyDbg, and one of the most complex. Disassembler not only converts binary code to the human-readable mnemonics, it also highlights and comments command and arguments, creates structured hexadecimal dump, determines types of operands and calculates their values, and even predicts results of some frequently used integer commands.

Of course, there is a tradeoff between the amount of information and the execution time.
OllyDbg disasssembler is relatively fast, it takes 0.2 .. 0.3 microseconds on a 3-GHz Athlon to determine command length and extract the basic information. If you request mnemonics, dump and comments, it may take more than 1 microsecond to produce results. There is a special stripped-down version, Cmdinfo(), that is even faster - typically 0.15 microseconds or less.

Two routines, Disassembleforward() and Disassembleback() allow you to walk executable code forward abd back by a specified number of commands. Whereas the first is rather straightforward, walking code back requires either preliminary code analysis or a lot of heuristics. Therefore 
Disassembleback() may sporadically produce false results.

Assembler shares code table with Disasm(). This means that any disassemled command can be assembled back (maybe to the different binary code, see below). Assembleallforms() supports two code generation styles: precise (command can be executed) and imprecise (used by search routines). Many 80x86 operations have more than one possible binary encoding. For example, MOV EAX,[DWORD EBX] has sixteen different forms:

   8B03            - the simplest form
   8B43 00         - form without SIB with 1-byte zero displacement
   8B83 00000000   - form without SIB with 4-byte displacement
   8B0423          - form with SIB byte without scaled index
   8B0463          - same
   8B04A3          - same
   8B04E3          - same
   8B4423 00       - SIB byte, 1-byte displacement, no index
   8B4463 00       - same
   8B44A3 00       - same
   8B44E3 00       - same
   8B8423 00000000 - SIB byte, 4-byte displacement, no index
   8B8463 00000000 - same
   8B84A3 00000000 - same
   8B84E3 00000000 - same
   8B041D 00000000 - SIB byte, 4-byte displacement, scale 1, no base

Assembler allows you to create all these encodings. In the imprecise mode, you even can specify 
MOV ANY,ANY, and Assembler will return the list of models that match any possible MOV command. If all you need is the working command, Assemble() will automatically select the shortest form.

Ndisasm() is a disassembler for the Common Intermediate Language (.NET) commands. It is as yet very rudimentary.


Imprecise commands

When searching for command or sequence of commands, you can specify imprecise Assembler patterns that match many different instructions. For example, MOV EAX,ANY will match MOV EAX,ECX; MOV EAX,12345; MOV EAX,[FS:0] and many other commands.

Imprecise patterns use following keywords:

Keyword Matches
R8 Any 8-bit register (AL,BL, CL, DL, AH, BH, CH, DH)
R16 Any 16-bit register (AX, BX, CX, DX, SP, BP, SI, DI)
R32 Any 32-bit register (EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI)
SEG Any segment register (ES, CS, SS, DS, FS, GS)
FPUREG Any FPU register (ST0..ST7)
MMXREG Any MMX register (MM0..MM7)
SSEREG Any SSE register (XMM0..XMM7)
CRREG Any control register (CR0..CR7)
DRREG Any debug register (DR0..DR7)
CONST Any constant
ANY Any register, constant or memory operand

You can freely combine these keywords in memory addresses, like in the following examples:

Memory address Matches
[CONST] Any fixed memory location, like [400000]
[R32] Memory locations with address residing in register, like [ESI]
[R32+1000] Sum of any 32-bit register and constant 1000, like [EBP+1000]
[R32+CONST] Sum of any 32-bit register and any offset, like [EAX-4] or [EBP+8]
[ANY] Any memory operand, like [ESI] or [FS:EAX*8+ESI+1234]

If you are searching for the sequence of commands, it's important to emphasize the interaction of the commands in a sequence. Suppose that you are looking for all comparisons of two memory operands. 80x86 has no such instruction (except CMPS, but it's slow and requires lengthy preparations). Therefore compiler will generate the following code:

  MOV EAX,[location 1]
  CMP EAX,[location 2]

However, it is possible that compiler will choose ECX instead of EAX, or any other register. To take into account all such cases, OllyDbg has special depending registers:

Register Meaning
RA, RB All instances of 32-bit register RA in the command or sequence must reference the same register; the same for RB; but RA and RB must be different
R8A, R8B Same as above, but R8A and R8B are 8-bit registers
R16A, R16B Same as above, but R16A and R16B are 16-bit registers
R32A, R32B Same as RA, RB

For example, search for XOR RA,RA will find all commands that use XOR to zero 32-bit register, whereas XOR RA,RB will exclude such cases. By the way, correct sequence for the mentioned example is

  MOV RA,[CONST]
  CMP RA,[CONST]

There are also several imprecise commands:

Command Matches
JCC Any conditional jump (JB, JNE, JAE...)
SETCC Any conditional set byte command (SETB, SETNE...)
CMOVCC Any conditional move command (CMOVB, CMOVNE...)
FCMOVCC Any conditional floating-point move (FCMOVB, FCMOVE...)

Examples:

Pattern Found commands
MOV R32,ANY MOV EBX,EAX
MOV EAX,ECX
MOV EAX,[DWORD 4591DB]
MOV EDX,[DWORD EBP+8]
MOV EDX,[DWORD EAX*4+EDX]
MOV EAX,004011BC
ADD R8,CONST ADD AL,30
ADD CL,0E0
ADD DL,7
XOR ANY,ANY XOR EAX,EAX
XOR AX,SI
XOR AL,01
XOR ESI,00000088
XOR [DWORD EBX+4],00000002
XOR ECX,[DWORD EBP-12C]
MOV EAX,[ESI+CONST] MOV EAX,[DWORD ESI+0A0]
MOV EAX,[DWORD ESI+18]
MOV EAX,[DWORD ESI-30]

Note that in the last line [DWORD ESI-30] is equivalent to [DWORD ESI+0FFFFFFD0].


API functions:

ulong Disasm(uchar *cmd,ulong cmdsize,ulong cmdip,uchar *cmddec,t_disasm *cmdda,int cmdmode,t_reg *cmdreg,t_predict *cmdpredict);
ulong Cmdinfo(uchar *cmd,ulong cmdsize,ulong cmdip,t_cmdinfo *ci,int cmdmode,t_reg *cmdreg);
ulong Disassembleforward(uchar *copy,ulong base,ulong size,ulong ip,ulong n,uchar *decode);
ulong Disassembleback(uchar *copy,ulong base,ulong size,ulong ip,ulong n,uchar *decode);
int Checkcondition(int code,ulong flags);
ulong Setcondition(int code,ulong flags);
int Byteregtodwordreg(int bytereg);

int Assembleallforms(wchar *src,ulong ip,t_asmmod *model,int maxmodel,int mode,wchar *errtxt);
ulong Assemble(wchar *src,ulong ip,uchar *buf,ulong nbuf,int mode,wchar *errtxt);

ulong Ndisasm(uchar *cmd,ulong size,ulong ip,t_netasm *da,int mode,t_module *pmod);


See also:

Dumpforward(), Dumpback()