Testing with other decompiler tools

This commit is contained in:
rocky
2018-04-12 19:57:53 -04:00
parent da57e2d416
commit fa6408d53b
4 changed files with 63 additions and 22 deletions

View File

@@ -52,8 +52,16 @@ You get the idea. This code pulls all of these forks together and
*moves forward*. There is some serious refactoring and cleanup in this *moves forward*. There is some serious refactoring and cleanup in this
code base over those old forks. code base over those old forks.
This project has the most complete support for Python 3.3 and above This demonstrably does the best in decompiling Python across all
and the best all-around Python support. Python versions. And even where there another project only provides
decompilation for subset of Python versions, we generally do
demonstrably better for those as well.
How can we tell? By taking the set of Python bytecode that comes
distributed with that version of Python, decompiling those and see how
many decompile properly; among that decompile, then make sure the
programs are syntactically correct, and in cases where the program can
be check with a provided test case, do that.
We are serious about testing, and use automated processes to find We are serious about testing, and use automated processes to find
bugs. In the issue trackers for other decompilers, you will find a bugs. In the issue trackers for other decompilers, you will find a
@@ -136,26 +144,26 @@ All of the Python decompilers that I have looked at have problems
decompiling Python's control flow. In some cases we can detect an decompiling Python's control flow. In some cases we can detect an
erroneous decompilation and report that. erroneous decompilation and report that.
*Verification* is the process of decompiling bytecode, compiling with In older versions of Python it was possible to verify bytecode by
a Python for that bytecode version, and then comparing the bytecode decompiling bytecode, and then compiling using the Python interpreter
produced by the decompiled/compiled program. Some allowance is made for that bytecode version. Having done this the bytecode produced
for inessential differences. But other semantically equivalent could be compared with the original bytecode. However as Python's code
differences are not caught. For example ``1 and 0`` is decompiled to generation got better, this is no longer feasible.
the equivalent ``0``; remnants of the first true evaluation (1) is
lost when Python compiles this. When Python next compiles ``0`` the
resulting code is simpler.
*Weak Verification* There is a kind of *weak verification* that we use that doesn't check
on the other hand doesn't check bytecode for equivalence but does bytecode for equivalence but does check to see if the resulting
check to see if the resulting decompiled source is a valid Python decompiled source is a valid Python program by running the Python
program by running the Python interpreter. Because the Python language interpreter. Because the Python language has changed so much, for best
has changed so much, for best results you should use the same Python results you should use the same Python version in checking as was used
Version in checking as used in the bytecode. in creating the bytecode.
Finally, we have automated running the standard Python tests after There are however an interesting class of these programs that is
first compiling and decompiling the test program. Results here are a readily available give stronger verification: those programs that
bit weak (if not better than most other Python decompilers). But over when run check some computation, or even better themselves.
time this will probably get better.
And already Python has a set of programs like this: the test suite
for the standard library that comes with Python. We have some
code in `test/stdlib` to facilitate this kind of checking.
Python support is strongest in Python 2 for 2.7 and drops off as you Python support is strongest in Python 2 for 2.7 and drops off as you
get further away from that. Support is also probably pretty good for get further away from that. Support is also probably pretty good for
@@ -203,7 +211,7 @@ There is lots to do, so please dig in and help.
See Also See Also
-------- --------
* https://github.com/zrax/pycdc : supports all versions of Python and is written in C++. Support for later Python 3 versions is a bit lacking though. * https://github.com/zrax/pycdc : supports all versions of Python and is written in C++. Support for Python 3 is a bit lacking though.
* https://code.google.com/archive/p/unpyc3/ : supports Python 3.2 only. The above projects use a different decompiling technique than what is used here. * https://code.google.com/archive/p/unpyc3/ : supports Python 3.2 only. The above projects use a different decompiling technique than what is used here.
* https://github.com/figment/unpyc3/ : fork of above, but supports Python 3.3 only. Includes some fixes like supporting function annotations * https://github.com/figment/unpyc3/ : fork of above, but supports Python 3.3 only. Includes some fixes like supporting function annotations
* The HISTORY_ file. * The HISTORY_ file.

View File

@@ -1,4 +1,5 @@
#!/bin/bash #!/bin/bash
# Use pycdc to run our test/bytecode* test suite
bs=${BASH_SOURCE[0]} bs=${BASH_SOURCE[0]}
testdir=$(dirname $bs)/../test testdir=$(dirname $bs)/../test
fulldir=$(readlink -f $testdir) fulldir=$(readlink -f $testdir)

View File

@@ -0,0 +1,32 @@
#!/bin/bash
# Use pycdc to run our test/bytecode_2.7* test suite
bs=${BASH_SOURCE[0]}
topdir=$(dirname $bs)/..
(cd $topdir && pyenv local 2.7.14)
testdir=$topdir/test
fulldir=$(readlink -f $testdir)
cd $fulldir
for bytecode in bytecode_2.7/*.pyc ; do
echo $bytecode
uncompyle2 $bytecode > /dev/null
echo ================ $bytecode rc: $? ==============
done
tmpdir=/tmp/test-2.7
( cd bytecode_2.7_run &&
mkdir $tmpdir || true
for bytecode in *.pyc ; do
shortname=$(basename $bytecode .pyc)
echo $bytecode
py_file=${tmpdir}/${shortname}.py
typeset -i rc=0
uncompyle2 $bytecode > $py_file
rc=$?
if (( rc == 0 )); then
python $py_file
rc=$?
fi
echo ================ $bytecode rc: $rc ==============
done
)

View File

@@ -42,7 +42,7 @@ def customize_for_version3(self, version):
# * class_name - the name of the class # * class_name - the name of the class
# * subclass_info - the parameters to the class e.g. # * subclass_info - the parameters to the class e.g.
# class Foo(bar, baz) # class Foo(bar, baz)
# ----------- # ----------
# * subclass_code - the code for the subclass body # * subclass_code - the code for the subclass body
subclass_info = None subclass_info = None
if node == 'classdefdeco2': if node == 'classdefdeco2':