From fa6408d53b1b3fc0dbdbb659dc40f24590658b38 Mon Sep 17 00:00:00 2001 From: rocky Date: Thu, 12 Apr 2018 19:57:53 -0400 Subject: [PATCH] Testing with other decompiler tools --- README.rst | 50 +++++++++++++++++------------- admin-tools/pycdc-runtests.sh | 1 + admin-tools/uncompyle2-runtests.sh | 32 +++++++++++++++++++ uncompyle6/semantics/customize3.py | 2 +- 4 files changed, 63 insertions(+), 22 deletions(-) create mode 100755 admin-tools/uncompyle2-runtests.sh diff --git a/README.rst b/README.rst index e658f6b2..50a198ed 100644 --- a/README.rst +++ b/README.rst @@ -52,8 +52,16 @@ You get the idea. This code pulls all of these forks together and *moves forward*. There is some serious refactoring and cleanup in this code base over those old forks. -This project has the most complete support for Python 3.3 and above -and the best all-around Python support. +This demonstrably does the best in decompiling Python across all +Python versions. And even where there another project only provides +decompilation for subset of Python versions, we generally do +demonstrably better for those as well. + +How can we tell? By taking the set of Python bytecode that comes +distributed with that version of Python, decompiling those and see how +many decompile properly; among that decompile, then make sure the +programs are syntactically correct, and in cases where the program can +be check with a provided test case, do that. We are serious about testing, and use automated processes to find bugs. In the issue trackers for other decompilers, you will find a @@ -136,26 +144,26 @@ All of the Python decompilers that I have looked at have problems decompiling Python's control flow. In some cases we can detect an erroneous decompilation and report that. -*Verification* is the process of decompiling bytecode, compiling with -a Python for that bytecode version, and then comparing the bytecode -produced by the decompiled/compiled program. Some allowance is made -for inessential differences. But other semantically equivalent -differences are not caught. For example ``1 and 0`` is decompiled to -the equivalent ``0``; remnants of the first true evaluation (1) is -lost when Python compiles this. When Python next compiles ``0`` the -resulting code is simpler. +In older versions of Python it was possible to verify bytecode by +decompiling bytecode, and then compiling using the Python interpreter +for that bytecode version. Having done this the bytecode produced +could be compared with the original bytecode. However as Python's code +generation got better, this is no longer feasible. -*Weak Verification* -on the other hand doesn't check bytecode for equivalence but does -check to see if the resulting decompiled source is a valid Python -program by running the Python interpreter. Because the Python language -has changed so much, for best results you should use the same Python -Version in checking as used in the bytecode. +There is a kind of *weak verification* that we use that doesn't check +bytecode for equivalence but does check to see if the resulting +decompiled source is a valid Python program by running the Python +interpreter. Because the Python language has changed so much, for best +results you should use the same Python version in checking as was used +in creating the bytecode. -Finally, we have automated running the standard Python tests after -first compiling and decompiling the test program. Results here are a -bit weak (if not better than most other Python decompilers). But over -time this will probably get better. +There are however an interesting class of these programs that is +readily available give stronger verification: those programs that +when run check some computation, or even better themselves. + +And already Python has a set of programs like this: the test suite +for the standard library that comes with Python. We have some +code in `test/stdlib` to facilitate this kind of checking. Python support is strongest in Python 2 for 2.7 and drops off as you get further away from that. Support is also probably pretty good for @@ -203,7 +211,7 @@ There is lots to do, so please dig in and help. See Also -------- -* https://github.com/zrax/pycdc : supports all versions of Python and is written in C++. Support for later Python 3 versions is a bit lacking though. +* https://github.com/zrax/pycdc : supports all versions of Python and is written in C++. Support for Python 3 is a bit lacking though. * https://code.google.com/archive/p/unpyc3/ : supports Python 3.2 only. The above projects use a different decompiling technique than what is used here. * https://github.com/figment/unpyc3/ : fork of above, but supports Python 3.3 only. Includes some fixes like supporting function annotations * The HISTORY_ file. diff --git a/admin-tools/pycdc-runtests.sh b/admin-tools/pycdc-runtests.sh index 1ee8a4e0..15e3402c 100755 --- a/admin-tools/pycdc-runtests.sh +++ b/admin-tools/pycdc-runtests.sh @@ -1,4 +1,5 @@ #!/bin/bash +# Use pycdc to run our test/bytecode* test suite bs=${BASH_SOURCE[0]} testdir=$(dirname $bs)/../test fulldir=$(readlink -f $testdir) diff --git a/admin-tools/uncompyle2-runtests.sh b/admin-tools/uncompyle2-runtests.sh new file mode 100755 index 00000000..3111c67e --- /dev/null +++ b/admin-tools/uncompyle2-runtests.sh @@ -0,0 +1,32 @@ +#!/bin/bash +# Use pycdc to run our test/bytecode_2.7* test suite +bs=${BASH_SOURCE[0]} +topdir=$(dirname $bs)/.. +(cd $topdir && pyenv local 2.7.14) +testdir=$topdir/test +fulldir=$(readlink -f $testdir) +cd $fulldir + +for bytecode in bytecode_2.7/*.pyc ; do + echo $bytecode + uncompyle2 $bytecode > /dev/null + echo ================ $bytecode rc: $? ============== +done + +tmpdir=/tmp/test-2.7 +( cd bytecode_2.7_run && + mkdir $tmpdir || true + for bytecode in *.pyc ; do + shortname=$(basename $bytecode .pyc) + echo $bytecode + py_file=${tmpdir}/${shortname}.py + typeset -i rc=0 + uncompyle2 $bytecode > $py_file + rc=$? + if (( rc == 0 )); then + python $py_file + rc=$? + fi + echo ================ $bytecode rc: $rc ============== + done +) diff --git a/uncompyle6/semantics/customize3.py b/uncompyle6/semantics/customize3.py index 14995452..1aa106ba 100644 --- a/uncompyle6/semantics/customize3.py +++ b/uncompyle6/semantics/customize3.py @@ -42,7 +42,7 @@ def customize_for_version3(self, version): # * class_name - the name of the class # * subclass_info - the parameters to the class e.g. # class Foo(bar, baz) - # ----------- + # ---------- # * subclass_code - the code for the subclass body subclass_info = None if node == 'classdefdeco2':