Reversing Obfuscated Python Applications
如果无法正常显示,请先停止浏览器的去广告插件。
1. Reversing Obfuscated Python Applications
Breaking the dropbox client on windows
Author: ExtremeCoders © 2014
E-mail: extremecoders@mail.com
According to Wikipedia, “Dropbox is a file hosting service operated by Dropbox, Inc., headquartered in San
Francisco, California, that offers cloud storage, file synchronization, personal cloud, and client software.
Dropbox allows users to create a special folder on each of their computers, which Dropbox then
synchronizes so that it appears to be the same folder (with the same contents) regardless of which
computer is used to view it. Files placed in this folder also are accessible through a website and mobile
phone applications”
Dropbox provides client software for Microsoft Windows, Mac OS X, Linux, Android, iOS, BlackBerry OS
and web browsers, as well as unofficial ports to Symbian, Windows Phone, and MeeGo.
The Dropbox client software is written in python so that a single codebase can be deployed to a wide
variety of platform and architectures. Another benefit of using python is ease in coding and reduced
time for testing and deployment. However python poses other problem too such as the relative ease in
reversing & decompiling as compared to native applications. In case of a closed source application like
dropbox this is a serious issue, and something must be done to prevent users from getting access to the
source code.
In this regard, the dropbox client on Windows is shipped as an .exe file. The executable is generated
using py2exe which serves two purposes - firstly, it becomes a lot easier for the end user to install than
fiddling with a bunch of .pyc files and the second and the most important is it prevents over enthusiast
users from peeking into the source.
So good luck and bon voyage on this reversing journey.
2. Introduction
After installing Dropbox (which installs silently without any user intervention) we can navigate to the
above folder, to find the main dropbox binary. Since we know it is already written in python and has
been packaged by py2exe we will not waste any time by running through a PE detection tool like PEiD.
Python code in the file system can reside in two forms. It may be either in plaintext .py or compiled .pyc
form. In the latter case, the file is not directly readable by a text editor but can be run through a python
bytecode decompiler in order to obtain the plaintext source code. In the case of dropbox, these .pyc files
are packaged inside the executable. So our first step is to unpack the executable to get hold of the .pyc
files.
Unpacking the executable
Now let’s discuss a bit about py2exe and its innards. It is a tool which packages python scripts into a
windows executable along with an embedded python interpreter (actually I am oversimplifying here).
This executable can then be run on a windows system as a standalone file without the necessity of
installing python. All the necessary scripts required for the software to run are packaged within it.
Now the .pyc files necessary for the application to run are packaged as a zip archive. This archive is just
concatenated at the end of the py2exe loader stub namely run_w.exe or run.exe for windows and console
applications respectively. The python interpreter on the other hand is packaged as a resource.
During runtime the executable fires up the embedded python interpreter. This embedded python
interpreter is a .dll on windows. The dynamically linked library is loaded entirely from memory and
since Windows does not allow a PE to be loaded from memory, the tool is provided with its own PE
loader. We can look into the file MemoryModule.c within py2exe source code for the details.
Page | 2
3. So for unpacking we have to do two things – first, grab the zip archive containing the .pyc and then
extract the python dll embedded as a resource within the executable. For the second objective, we can
use any decent resource editor. Here I am using PE Explorer.
Note that besides the resource PYTHON27.DLL, there is another one named PYTHONSCRIPT. This
contains a set of start-up scripts (actually they are not scripts, since they are compiled and not directly
readable by a text editor) which are run before the application is initialized. The purpose of them is to
set up some import hook which facilitates to load the pyc from within the executable. Normally without
them, python can only load .pyc files from the file system or from a normal zip archive. Since the pyc
files are packaged in an archive concatenated to the PE, it needs special treatment i.e. import hooks to
load them. After import hooks have been set up, whenever python wants to import a module, the import
hooks are called which bypasses the regular import mechanism loading the pyc files from the
executable. So in short it acts like a proxy. The advantage of this, we do not need to have files residing
on the system in order to load them. They can be anywhere!
Okay, now the first objective. For extracting the zip archive, we load the file into exeinfo pe and dump
the overlay. If we want to automate some of the steps, we can use the tool py2exe dumper.
Page | 3
4. Inspecting the pyc files
Once we have extracted the embedded zip archive from the executable, we can see that it contains .pyc
files. Opening any such file in a hex editor reveals that the file is encrypted. There are no readable
strings at all. A normal pyc file generally has some readable strings which are missing in this case.
Further pyc files begin with a 4 byte magic number followed by another 4 byte timestamp. In this case, it
begins with the value 07 F3 0D 0A, which is different from the normal python 2.7 magic value of 03 F3
0D 0A. Obviously, we cannot decompile the file as is. We need to decrypt it; else no decompiler will
show any interest in reading it.
Figuring the decryption process
Right now we are in the dark about the encryption algorithm used and how to decrypt it. The dropbox
application must load these modules, which means internally it must decrypt them before it can work.
So if we can grab them from memory after it has decrypted itself, we can get the decryption for free.
Let’s see if this method is feasible.
We will be coding a program in C which will embed the dropbox python interpreter (python27.dll) we
obtained previously. The purpose of the program will be to run any python script in the context of
dropbox. In order to compile and link we will use the header files and import libraries provided with
the standard python 2.7 distribution. Let’s name the output file as embedder.exe
#include "Python.h"
#include "marshal.h"
#include <windows.h>
void main(int argc, char *argv[])
{
if(argc < 2)
//The script to run will be provided as an argument
Page | 4
5. {
printf("No script specified to run\n");
return;
}
HANDLE hand = CreateFileA(argv[1], GENERIC_READ, FILE_SHARE_READ, NULL,
OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if(!hand)
{
printf("Failed to open file %s\n", argv[1]);
return;
}
char pathname[MAX_PATH];
GetModuleFileNameA(NULL, pathname, sizeof(pathname));
char *c = strrchr(pathname, '\\');
*c=0;
Py_NoSiteFlag = 1; //We do not need to load site modules
Py_SetPythonHome(pathname); //Setting the path to the python libraries
Py_Initialize();
//Allocating a buffer to hold the file's contents
void *buf = calloc(GetFileSize(hand, NULL) + 1, 1);
//Reading the file within the buffer
DWORD dummy;
ReadFile(hand, buf, GetFileSize(hand, NULL), &dummy, NULL);
CloseHandle(hand);
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
PyRun_SimpleString((char*) buf); //Running the python script located in the buffer
PyGILState_Release(gstate);
Py_Finalize();
free(buf);
}
Using the above program, we can run any python script in the context of the dropbox python
interpreter. Now we will write a python script which will load those encrypted pyc files using the
marshal module. Hopefully, that will decrypt it too. After that we will dump it back to disk.
import marshal
infile = open('authenticate.pyc', 'rb')
# The encrypted pyc file
infile.seek(8) # Skip the header, it consists of magic value & timestamp of 4 bytes each
code_obj = marshal.load(infile)
# Unmarshal the file
outfile = open('decrypted.pyc', 'wb')
# The output file
outfile.write('\x03\xf3\x0d\x0a\x00\x00\x00\x00')
# Write the header
marshal.dump(code_obj, outfile)
# Dump back to file
outfile.close()
infile.close()
Page | 5
6. So let’s run the script using the C program embedding python. We need to pass the name of the script
file as an argument. However the results are not encouraging. Indeed it generates a file decrypted.pyc
but on examining its contents we see this.
The file is basically empty. Dumping to disk has failed. Only it has written the hex byte 4E in addition to
the header. We need to analyse the dropbox python interpreter to see why it has happened.
PyPy to the rescue
CPython provides a function PyMarshal_WriteObjectToFile, which dumps a code object to a file on disk.
Internally, this calls another function w_object which does the actual work. In case of dropbox, w_object
has been patched to disable marshalling of code objects to disk. So to get around the limitation we need
an alternative. We need some code, preferably in python, which will do the marshalling for us.
Luckily there is an implementation of python called PyPy which is written in python itself. We can
leverage PyPy’s marshalling code to dump the code objects in our case. The marshalling code is in the
file _marshal.py which can be obtained from PyPy source. Let’s save the file as dropdump.py.
<<< Code from _marshal.py should be copied here as is >>>
import marshal
# Import the built-in marshal module
infile = open('authenticate.pyc', 'rb')
infile.seek(8)
code_obj = marshal.load(infile)
outfile = open('decrypted.pyc', 'wb')
outfile.write('\x03\xf3\x0d\x0a\x00\x00\x00\x00')
dump(code_obj, outfile)
# Use PyPy’s marshalling code
outfile.close()
infile.close()
However on running we are greeted with the following ungrateful message.
Page | 6
7. The mystery of the missing co_code
According to the python documentation “co_code is a string representing the sequence of bytecode
instructions”. Every python code-object has an array which contains bytecode which will be executed.
This array is called co_code. This should always be present or otherwise code-objects cannot exist. That
means the dropbox python interpreter is hiding that from us. As we cannot access co_code from the
python layer we need to delve deeper and try to access that from within the native or the assembly
layer. We need the services of a debugger. We will use old & faithful Ollydbg, but before that let’s see the
structure of a PyCodeObject.
typedef struct {
PyObject_HEAD
int co_argcount;
int co_nlocals;
int co_stacksize;
int co_flags;
PyObject *co_code; //This is the missing member
PyObject *co_consts;
PyObject *co_names;
PyObject *co_varnames;
PyObject *co_freevars;
PyObject *co_cellvars;
PyObject *co_filename;
PyObject *co_name;
int co_firstlineno;
PyObject *co_lnotab;
void *co_zombieframe;
PyObject *co_weakreflist;
} PyCodeObject;
The first member in the structure is PyObject_HEAD. It is defined as
/* PyObject_HEAD defines the initial segment of every PyObject. */
#define PyObject_HEAD
\
_PyObject_HEAD_EXTRA
\
Py_ssize_t ob_refcnt;
\
struct _typeobject *ob_type;
For release builds of python _PyObject_HEAD_EXTRA is empty. So essentially it contains two members of
4 bytes each. The first member ob_refcnt holds the number of reference counts to this object & the
second ob_type points to a PyTypeObject variable representing the type of this object. For example, if it
points to PyCode_Type, then this object is a code_object. Just by looking at the second member of a
PyObject in a debugger, we can know the type of it. This tiny bit of information will turn very useful in
our endeavour.
Thus in a standard python distribution co_code is located at an offset of 24 byte from the start of
PyCodeObject. This co_code is a pointer to a PyObject. It’s in fact a pointer to a PyStringObject (which is
also a PyObject) as mentioned in the documentation. The structure of a PyStringObject is as follows.
typedef struct {
PyObject_VAR_HEAD
long ob_shash;
int ob_sstate;
char ob_sval[1];//This array contains the string , its length is in PyObject_VAR_HEAD
} PyStringObject;
Page | 7
8. PyObject_VAR_HEAD is defined as
#define PyObject_VAR_HEAD
\
PyObject_HEAD
\
Py_ssize_t ob_size; /* Number of items in variable part */
Within a PyStringObject, ob_sval is located at an offset of 20 bytes from the start. We will use these
offsets while working in the debugger. These offset values are obtained from a standard python
distribution. It is possible that dropbox has changed the structure layout so as to hinder reversing. We
will see that shortly.
Exploring the structure of code object
To find whether the dropbox has kept the structure layout intact or has modified it, we will code a small
C program. It will generate a code object using the PyCode_New CPython function. We will run the
program in Ollydbg, and inspect the returned value. We will check whether the offsets are in tandem
with what we obtained earlier. If they are different, we will need to find the actual offsets.
#include "Python.h"
#include "marshal.h"
#include <windows.h>
void main(int argc, char *argv[])
{
Py_NoSiteFlag = 1;
char filename[MAX_PATH];
GetModuleFileNameA(NULL, filename, sizeof(filename));
char *c = strrchr(filename, '\\');
*c = 0;
Py_SetPythonHome(filename);
Py_Initialize();
PyObject *codestring = PyString_FromString("Marker String");
//This marker string
can be used to find out the position of co_code within the code object
PyObject *tuple = PyTuple_New(0);
PyObject *string = PyString_FromString("");
PyCodeObject *codeObject = PyCode_New(0, 0, 0, 0, codestring,
tuple, tuple, tuple, tuple, tuple,
string, string, 0, string);
Py_Finalize();
}
Page | 8
9. So debug the program in Ollydbg and set a breakpoint after the call to PyCode_New. The returned value
in eax will be the pointer to a PyCodeObject which we will be inspecting for anomalies.
The returned value in this case is 0x0094E268. We will follow the value in dump.
The returned object is indeed a PyCodeObject as evident from the second member. Now we need to
verify if the value at an offset of 24 is indeed a pointer to co_code. So we follow the value at 0x0094E280
in dump to reach here. Remember that co_code is actually a PyStringObject.
Now this is a PyTupleObject. We were expecting to find a PyStringObject here. This means dropbox has
fiddled with the layout of PyCodeObject as we suspected earlier. So we need to find the actual offset of
co_code. From this juncture we can take two paths. We can either follow each member of PyCodeObject
in dump to see which is a PyStringObject containing our marker string or we can write a small C
program to do the job. During my reversing session, I took the first path, but now in this tutorial I will
demonstrate the second one and then will verify the result in the debugger. We will be modifying the
previous program a little to find out the offset of co_code.
Page | 9
10. #include "Python.h"
#include "marshal.h"
#include <windows.h>
void main(int argc, char *argv[])
{
Py_NoSiteFlag = 1;
char filename[MAX_PATH];
GetModuleFileNameA(NULL, filename, sizeof(filename));
char *c = strrchr(filename, '\\');
*c = 0;
Py_SetPythonHome(filename);
Py_Initialize();
PyObject *codestring = PyString_FromString("Marker String");
PyObject *tuple = PyTuple_New (0);
PyObject *string = PyString_FromString("");
PyCodeObject *codeObject = PyCode_New(0, 0, 0, 0, codestring,
tuple, tuple, tuple, tuple, tuple,
string, string, 0, string);
char *ptr;
for(ptr = (char*)codeObject; ptr < (char*)codeObject + sizeof(PyCodeObject); ptr+=4)
if( *((PyObject**)ptr) == codeString)
printf("co_code found at offset %d\n", (ptr - (char*)codeObject));
Py_Finalize();
}
Running the program gives the following output.
To verify the results we will use the debugger. We will be following the value at an offset of 56 (i.e.
0x00953820) from the start of PyCodeObject.
It’s indeed a PyStringObject as it should be, and further if we change the display to hex we will see our
marker string.
So dropbox has modified the structure of PyCodeObject. Now co_code is located at an offset of 56 instead
of 24.
Page | 10
11. Getting access to co_code
Okay, co_code is located at an offset of 56 but we cannot access it in the python layer. We need access to
it so that we can marshal the code object to disk, and for this purpose we will code a C extension. The
program will contain a function which when fed a PyCodeObject will return the co_code.
#include <Python.h>
static PyObject* getCode(PyObject* self, PyObject* args)
{
PyObject* code = NULL;
PyObject* co_code = NULL;
PyArg_ParseTuple(args, "O", &code);
_asm
{
mov eax, code
mov eax, dword ptr [eax + 56]
mov co_code, eax
}
Py_XINCREF(co_code);
return co_code;
//The code object is located at an offset of 56
//Increase the reference count
}
static PyMethodDef extension_methods[]={
{"getCode", getCode, METH_VARARGS, "Get Code Object"},
{NULL, NULL, 0, NULL}};
PyMODINIT_FUNC initdropextension()
{
//The name of the extension module as seen from python
//This should be of same name as of the extension file.
//In this case the file will be named as dropextension.pyd
Py_InitModule("dropextension", extension_methods);
}
We need to modify dropdump.py so that it uses our extension for accessing co_code. We will only modify
the function dump_code. Rest will remain same.
import dropextension
# Import the extension
def dump_code(self, x):
self._write(TYPE_CODE)
self.w_long(x.co_argcount)
self.w_long(x.co_nlocals)
self.w_long(x.co_stacksize)
self.w_long(x.co_flags)
self.dump(dropextension.getCode(x)) # Use our extension to access co_code
self.dump(x.co_consts)
self.dump(x.co_names)
self.dump(x.co_varnames)
self.dump(x.co_freevars)
self.dump(x.co_cellvars)
self.dump(x.co_filename)
self.dump(x.co_name)
self.w_long(x.co_firstlineno)
self.dump(x.co_lnotab)
Page | 11
12. Is that enough?
Lets’ now try to marshal the code using the newly coded tools. This time there are no error messages.
We can find the newly created file decrypted.pyc. Let’s open it in a hex editor.
This time the file is not empty. If we scroll down a bit we can find some readable strings. This means the
file has been decrypted and marshalling has succeeded. All that is left is to decompile the file.
Page | 12
13. We will be using Easy Python Decompiler. Let’s try to decompile decrypted.pyc.
Tough luck! Failure once again. This time it says Invalid pyc/pyo file. Although decompilers do fail, but in
this case the file is really invalid. We will be using pychrysanthemum, a tool for inspecting pyc files to
verify the results.
From the summary tab everything looks to be normal. Let’s have at a look at the Disassembly tab.
Page | 13
14. It definitely shows some disassembled code but if we observe carefully, we will find that all this is junk
code. We find many STOP_CODE. However the documentation of python says STOP_CODE “Indicates end-
of-code to the compiler, not used by the interpreter”. So it means we should not find this opcode in a
compiled python file. If we scroll down further in the disassembly, we will see other instances strongly
suggesting that the code is junk.
In addition to the STOP_CODE there are several other opcodes which it could not disassemble. That
means our task is not over yet. We need to convert this junk code to something comprehendible to a
disassembler & a decompiler.
Page | 14
15. The use of opcode remapping
Opcode remapping is a technique in which the opcode definition of a Virtual Machine is changed. In case
of python it means that the opcodes meaning are different than that of a standard python distribution.
Thus if the opcode 23 initially meant BINARY_ADD, it may now mean POP_TOP.To be able to decompile
the file successfully we need to obtain the new opcode mapping. Using that we can definitely decompile
the pyc file. In the case of dropbox we are confident that it uses this trick, as all other facets of the pyc
file are perfectly normal. Only we cannot decipher the bytecode instructions.
Now ponder for a moment. Suppose that we compile a python script in this modified python interpreter
and compare it with the output generated by compiling the same code but in a standard python 2.7
interpreter there will be some differences. These differences will be due to the fact that the modified
interpreter uses different set of opcodes. Rest should be same. So using this we should be able to find
out which opcodes were mapped and to what new values.
Generating the file set
So for now we need to generate two sets of pyc files. One compiled from dropbox python and the other
from standard python 2.7. We need several python script files such that by compiling it we can generate
almost all opcodes used by python (python 2.7 has 118 opcodes). To ease our search we can use the py
files provided in a standard distribution (there are more than a thousand files). That should hopefully
generate majority of opcodes if not all. Infact we will see later that even after using more than a
thousand files there are about 5 opcodes left. We could ignore them for the sole reason that if we cannot
find the usage of those opcodes even after comparing more than a thousand files those opcodes are
probably not used normally. An example of such opcode is EXTENDED_ARG.
We will use the same code developed earlier in embedder.exe & dropextension.pyd. For generating the
first set of files, we will be linking against the standard python dll. For the second, we need to link with
the dropbox python dll along with the C extension. The extension will only be needed in the second case
as there is no access to co_code. However in both cases the code of embedder need not be changed.
For generating the first set of reference files, we will prepare a python script. The name of the script file
will be passed as an argument to embedder as we have been doing earlier. The purpose of this script will
be to load a py file from disk, compile it to generate a code object, and then marshal it back to disk using
PyPy’s marshalling code. We will use PyPy’s marshalling code instead of the built-in for consistency.
<<< code from _marshal.py should be copied here as is >>>
basedir = os.getcwd()
py_files = os.path.join(basedir, 'py_files') # The source py files will be located here
out_files = os.path.join(basedir, 'org_opcodes')
# The output files will go here
for f in os.listdir(py_files):
txt = open(os.path.join(py_files, f)).read()
of = open(os.path.join(out_files, f + '.org'), 'wb')
cobj = compile(txt, '', 'exec')
# Compile the code
dump(cobj, of) # Marshal using PyPy’s marshalling code
of.close()
Running the above script using embedder generates the first set of reference files.
Page | 15
16. For generating the second set of reference files we will use the modified code of _marshal.py as used in
dropdump.py. The rest of the code will be as follows.
basedir = os.getcwd()
py_files = os.path.join(basedir, 'py_files') # The source py files will be located here
out_files = os.path.join(basedir, 'drop_opcodes')
# The output files will go here
for f in os.listdir(py_files):
txt = open(os.path.join(py_files, f)).read()
of = open(os.path.join(out_files, f + '.drop'), 'wb')
cobj = compile(txt, '', 'exec')
# Compile the code
dump(cobj, of) # Marshal using PyPy’s marshalling code
of.close()
So the above two code snippets are similar with the difference being the output directory & the
extension of the output file. Running the above using dropbox python dll and embedder yields the
second set of reference files.
Finding the opcode mapping
We have the two set of reference files generated from the same source. The difference between these
two sets of files should reveal the opcode mapping. We need to code a tool which will find the
differences. For simplicity we will be coding the tool in python although we could use any other
language here.
import os, marshal
opcodes = dict()
# Dictionary to store the opcodes
def compare(ocode, ncode):
orgCodeStr, dropCodeStr = bytearray(ocode.co_code), bytearray(ncode.co_code)
# Make sure we are comparing strings of same length
if len(orgCodeStr) == len(dropCodeStr):
# Compare the code strings bytes
for o, n in zip(orgCodeStr, dropCodeStr):
if o != n:
if o not in opcodes.keys():
opcodes[o] = n
else:
if opcodes[o] != n:
print 'Two remapped opcodes for a single opcodes, The files are
out of sync, skipping'
break
else:
print 'Code Strings not of same length, skipping...'
# Recursive scanning for more code objects
for oconst, nconst in zip(ocode.co_consts, ncode.co_consts):
if hasattr(oconst, 'co_code') and hasattr(nconst, 'co_code'):
# both should have co_code
compare(oconst, nconst)
def main():
org_files = os.path.join(os.getcwd(), 'org_opcodes')
new_files = os.path.join(os.getcwd(), 'drop_opcodes')
Page | 16
17. for f in os.listdir(org_files):
# Open the files
of = open(os.path.join(org_files, f), 'rb')
nf = open(os.path.join(new_files, f[0:-4] + '.drop'), 'rb')
# unmarshal & compare opcodes
compare(marshal.load(of), marshal.load(nf))
of.close()
nf.close()
print opcodes
if __name__ == '__main__':
main()
Running the script gives us the following opcode map.
{1: 15, 2: 59, 3: 60, 4: 13, 5: 49, 10: 48, 11: 54, 12: 38, 13: 25, 15: 34,
19: 28, 20: 36, 21: 12, 22: 41, 23: 52, 24: 55, 25: 4, 26: 43, 27: 5, 28: 32,
29:30, 30: 16, 31: 17, 32: 18, 33: 19, 40: 61, 41: 62, 42: 63, 43: 64, 50: 44,
51: 45, 52: 46, 53: 47, 54: 70, 55: 6, 56: 29, 57: 8, 58: 27, 59: 3, 60: 31,
61: 69, 62: 7, 63: 22, 64: 50, 65: 21, 66: 2, 67: 57, 68: 39, 71: 9, 72: 14,
73: 33, 74: 35, 75: 11, 76: 58, 77: 24, 78: 23, 79: 10, 80: 40, 81: 37, 82: 51,
83: 66, 84: 56, 85: 65, 86: 26, 87: 1, 88: 67, 89: 42, 90: 105, 91: 104, 92: 103,
93: 91, 94: 83, 95: 94, 96: 97, 97: 115, 98: 108, 99: 114, 100: 82, 101: 89, 102: 90,
103: 117, 104: 118, 105: 88, 106: 96, 107: 111, 108: 98, 109: 99, 110: 119, 111: 120,
112: 122, 114: 123, 115: 121, 116: 80, 119: 106, 120: 84, 121: 116, 122: 85, 124: 102,
125: 92, 126: 81, 130: 101, 131: 112, 132: 86, 133: 87, 134: 95, 135: 107, 136: 109,
137: 110, 140: 133, 141: 134, 142: 135, 143: 136, 146: 141, 147: 142}
If we observe carefully, we will find the following opcodes are missing from the map 0, 9, 70, 113, 145.
The meaning of the opcodes are 0 -> STOP_CODE, 9 -> NOP, 70 -> PRINT_EXPR, 113 -> JUMP_ABSOLUTE,
145 -> EXTENDED_ARG.
The opcode 113 can be generated by compiling the following snippet.
def foo():
if bar1:
if bar2:
print ''
By following the comparison method devised earlier we can see that opcode 113 is left unchanged. Now
among the remaining four opcodes, STOP_CODE & NOP are never generated in compiled bytecode, so we
can ignore them. PRINT_EXPR is generated when the interpreter is running in interactive mode so we
can ignore it too. The EXTENDED_ARG opcode is generated whenever the argument passed to a function
is too big to fit in a space of two bytes. It can be generated in cases like passing more than 65,536
parameters to a function. This is also a rare situation, so we can ignore it too.
Now we have recovered the generated the opcode map. We have found which opcodes were changed
and to what new values. We need to incorporate this opcode map while marshalling the code object to
disk i.e. before dumping we will scan co_code and change remapped opcodes to original values, so that it
can be disassembled & decompiled.
Page | 17
18. Opcode unmapping
To incorporate the newly found opcode map, we will reuse the code of dropdump.py. The code will be
modified as follows. We can name the file as unmapper.py.
import dropextension, marshal
remap = {0: 0, 113: 113, 145: 145, 20: 9, 30: 70, 15: 1, 59: 2, 60: 3, 13: 4, 49: 5, 48:
10, 54: 11, 38: 12, 25: 13, 34: 15, 28: 19, 36: 20, 12: 21, 41: 22, 52: 23, 55: 24, 4:
25, 43: 26, 5: 27, 32: 28, 16: 30, 17: 31, 18: 32, 19: 33, 61: 40, 62: 41, 63: 42, 64:
43, 44: 50, 45: 51, 46: 52, 47: 53, 70: 54, 6: 55, 29: 56, 8: 57, 27: 58, 3: 59, 31: 60,
69: 61, 7: 62, 22: 63, 50: 64, 21: 65, 2: 66, 57: 67, 39: 68, 9: 71, 14: 72, 33: 73,
35: 74, 11: 75, 58: 76, 24: 77, 23: 78, 10: 79, 40: 80, 37: 81, 51: 82, 66: 83, 56: 84,
65: 85, 26: 86, 1: 87, 67: 88, 42: 89, 105: 90, 104: 91, 103: 92, 91: 93, 83: 94, 94:
95, 97: 96, 115: 97, 108: 98, 114: 99, 82: 100, 89: 101, 90: 102, 117: 103, 118: 104,
88: 105, 96: 106, 111: 107, 98: 108, 99: 109, 119: 110, 120: 111, 122: 112, 123: 114,
121: 115, 80: 116, 106: 119, 84: 120, 116: 121, 85: 122, 102: 124, 92: 125, 81: 126,
101: 130, 112: 131, 86: 132, 87: 133, 95: 134, 107: 135, 109: 136, 110: 137, 133: 140,
134: 141, 135: 142, 136: 143, 141: 146, 142: 147}
def dump_code(self, x):
self._write(TYPE_CODE)
self.w_long(x.co_argcount)
self.w_long(x.co_nlocals)
self.w_long(x.co_stacksize)
self.w_long(x.co_flags)
code = bytearray(dropextension.getCode(x))
c = 0
while c < len(code):
n = remap[code[c]] # Using the opcode map
code[c] = n
c+=1 if n < 90 else 3 # Opcodes greater than 89 takes 2 byte parameter
self.dump(str(code))
self.dump(x.co_consts)
self.dump(x.co_names)
self.dump(x.co_varnames)
self.dump(x.co_freevars)
self.dump(x.co_cellvars)
self.dump(x.co_filename)
self.dump(x.co_name)
self.w_long(x.co_firstlineno)
self.dump(x.co_lnotab)
inf = open('authenticate.pyc', 'rb')
# Load the file we wish to decompile
inf.seek(8) # Skip 8 byte header
code = marshal.load(inf)
# Unmarshal using built in module
inf.close()
outf = open('decrypted.pyc', 'wb')
outf.write('\x03\xf3\x0d\x0a\x00\x00\x00\x00')
dump(code, outf)
outf.close()
Note that each key value pair in the remap dictionary is reversed. This is due to the fact that now we
want to change the new opcode back to the original one. Also note that we have chosen arbitrary values
for opcodes 0, 9, 70, 145. We also need to make sure that these chosen arbitrary values do not clash
with an existing value. We need to run the script in a similar way using embedder. After that it should
hopefully be curtains down.
Page | 18
19. The final results
Running we get the following results. No errors. No messages
A file decrypted.pyc is also created. Now time to open in pychrysanthemum, before decompiling.
This time there are no STOP_CODE or partially disassembled opcodes. The coast looks clear. We can
safely proceed to decompiling. Let’s feed the file to Easy Python Decompiler.
Decompiling completed without errors. Now time to check out the code and bask in the light of glory
and success.
Page | 19
20. def finish_dropbox_boot(self, ret, freshly_linked, wiz_ret, dropbox_folder):
self.dropbox_app.is_freshly_linked = freshly_linked
if self.dropbox_app.mbox.is_secondary:
try:
self.dropbox_app.mbox.complete_link(self.dropbox_app.config.get('email'),
ret.get('userdisplayname'), ret.get('uid'), self.dropbox_app.mbox.dual_link)
except AttributeError:
self.dropbox_app.mbox.callbacks.other_client_exiting()
if freshly_linked:
clobber_symlink_at(self.dropbox_app.sync_engine.fs, dropbox_folder)
self.dropbox_app.safe_makedirs(dropbox_folder, 448, False)
TRACE('Freshly linked!')
try:
if arch.constants.platform == 'win':
TRACE('Trying to create a shortcut on the Desktop.' )
_folder_name = self.dropbox_app.get_dropbox_folder_name()
arch.util.add_shortcut_to_desktop(_folder_name,
self.dropbox_app.mbox.is_secondary)
if self.dropbox_app.mbox.is_primary and _folder_name !=
arch.constants.default_dropbox_folder_name:
TRACE('Attempting to remove shortcut named Dropbox since our folder
name is %r', _folder_name)
arch.util.remove_shortcut_from_desktop(arch.constants.default_dropbox_folder_name)
except Exception:
unhandled_exc_handler()
The code above is a snippet of the decompiled code. Now we have access to the full source code of
dropbox. We can reverse engineer it, look for vulnerabilities or may even code an open source dropbox
client. The possibilities are many. With this we come to the end of this tutorial. Hope you liked it and
thanks for reading. Some additional information will be presented in the addendum.
This is extremecoders signing off, Ciao!
Page | 20
21. Addendum
This supplement is provided to discuss some other features of the protection. We will see what steps
can be taken to further increase the protection.
Exploring the differences
The defacto tool for binary comparison is BinDiff from Zynamics (now owned by Google), but due to its
price tag we will be using freely available tools. We will see what other changes have dropbox
incorporated to the standard python interpreter. Patchdiff2 is a great free alternative. It requires IDA to
run.
The flow graph on the left is from the standard python and the one on the right is from the dropbox
python dll. Here we are comparing PyRun_FileExFlags. The function is used in CPython to execute a
script associated with a file. In case of dropbox it has been patched to do no nothing. We got around this
limitation by using PyRun_SimpleString which was unpatched. However we had to read the contents of
the file ourselves and pass the source code as a string to the function. Similarly other functions such as
PyRun_SimpleFileExFlags, PyRun_AnyFileExFlags have also been patched.
Page | 21
22. In case of PyRun_AnyFileExFlags we see that a block on the right is missing. Dropbox has removed the
call to PyRun_InteractiveLoopFlags. This function is used to read and execute statements from a file
associated with an interactive device until EOF is reached. This is the function executed when we run
python in a console window or terminal. The next image shows that PyRun_InteractiveOneFlags has also
been patched to do nothing. By patching it, we cannot start an interactive python console window
(unless we take other drastic measures like implementing our own function).
Page | 22
23. Refining the protection
Here we will discuss what further could have been done to increase the protection. Firstly, dropbox
uses py2exe on windows. The source code of py2exe is available. The license of py2exe allows
modification. Dropbox Inc. could have modified the source to develop a custom version, albeit with
some more protections like debugger checks, encryption etc. However the disadvantage is the
protection needs to be rewritten for other platforms like Linux, MacOSX etc. as py2exe is only for
windows.
Coming into the CPython part, we used PyRun_SimpleString to inject our code. This could have been
patched too as it is not needed. This is like a hole in the armour. We used the built-in marshal module to
load the encrypted code objects. The code objects were decrypted, immediately after loading. This could
be avoided. By modifying PyEval_EvalFrameEx, we can make it such that PyCodeObject would be
decrypted only when it was needed to execute. We can add a new co_flag to indicate which code objects
were already decrypted in a previous run. By checking the flag we will only decrypt code_objects which
do not have the flag set. Another good refinement to the encryption logic would be to decrypt on
execution and re-encrypt it back after execution. By using the second logic we will never reach a state in
which all code objects are decrypted, preventing a memory dump.
For generating the file set needed for opcode remapping, we used the built-in function compile, to
compile source code to bytecode. This was also not needed as dropbox already uses compiled pyc files
in the zip file. Further, by patching compile, would result in an exception if we tried to inject some code
as python internally always compiles plain text source code to bytecode before execution.
Lastly there could have been improvements to opcode mapping. We could have used multiple opcode
maps. That is we create a new co_flag. Suppose the flag is named OPCODE_MAP_FLAG. If the flag was set
to 1 opcode 66 may mean INPLACE_DIVIDE, if the flag was set to 2, the same opcode may mean
POP_BLOCK. The flag would be checked in PyEval_EvalFrameEx before execution to know which set of
opcodes this currently executing code object uses. This combined with the encryption protection will
definitely make dropbox much tougher to crack. The only drawback of increasing the protection is that
it may result in degradation in run time performance but optimization is always possible.
Links & References
Dropbox :: https ://www.dropbox.com/
Python :: https ://www.python.org/
Py2exe :: http://www.py2exe.org/
Pe Explorer :: http://www.heaventools.com/overview.htm
Exeinfo PE :: http://exeinfo.atwebpages.com/
010 Editor :: http://www.sweetscape.com/010editor/
PyPy :: http://pypy.org/
Ollydbg :: http://www.ollydbg.de/version2.html
Easy Python Decompiler :: http://sourceforge.net/projects/easypythondecompiler/
Py2Exe Dumper :: http://s ourceforge.net/projects/py2exedumper/
Pychrysanthemum :: https ://github.com/monkeycz/pychrysanthemum
Patchdiff2 :: https ://code.google.com/p/patchdiff2/
Security analysis of dropbox :: https ://www.usenix.org/conference/woot13/workshop-program/presentation/kholia
Dropbox reversing tutorial :: http://progdupeu.pl/tutoriels/280/dropbox-a-des-fuites/
Page | 23