Update documentation

This commit is contained in:
rocky
2018-06-12 13:13:48 -04:00
parent c01eb554ed
commit 7e172b63d1

View File

@@ -29,11 +29,11 @@ CPython bytecode decompilers is the ability to deparse just
*fragments* of source code and give source-code information around a *fragments* of source code and give source-code information around a
given bytecode offset. given bytecode offset.
I use the tree fragments to deparse fragments of code inside my I use the tree fragments to deparse fragments of code *at run time*
trepan_ debuggers_. For that, bytecode offsets are recorded and inside my trepan_ debuggers_. For that, bytecode offsets are recorded
associated with fragments of the source code. This purpose, although and associated with fragments of the source code. This purpose,
compatible with the original intention, is yet a little bit different. although compatible with the original intention, is yet a little bit
See this_ for more information. different. See this_ for more information.
Python fragment deparsing given an instruction offset is useful in Python fragment deparsing given an instruction offset is useful in
showing stack traces and can be encorporated into any program that showing stack traces and can be encorporated into any program that
@@ -58,7 +58,7 @@ provides decompilation for subset of Python versions, we generally do
demonstrably better for those as well. demonstrably better for those as well.
How can we tell? By taking Python bytecode that comes distributed with How can we tell? By taking Python bytecode that comes distributed with
that version of Python and decompiling these. Among htose that that version of Python and decompiling these. Among those that
successfully decompile, we can then make sure the resulting programs successfully decompile, we can then make sure the resulting programs
are syntactically correct by running the Python interpreter for that are syntactically correct by running the Python interpreter for that
bytecode version. Finally, in cases where the program has a test for bytecode version. Finally, in cases where the program has a test for
@@ -151,12 +151,12 @@ for that bytecode version. Having done this the bytecode produced
could be compared with the original bytecode. However as Python's code could be compared with the original bytecode. However as Python's code
generation got better, this is no longer feasible. generation got better, this is no longer feasible.
There is a kind of *weak verification* that we use that doesn't check There verification that we use that doesn't check bytecode for
bytecode for equivalence but does check to see if the resulting equivalence but does check to see if the resulting decompiled source
decompiled source is a valid Python program by running the Python is a valid Python program by running the Python interpreter. Because
interpreter. Because the Python language has changed so much, for best the Python language has changed so much, for best results you should
results you should use the same Python version in checking as was used use the same Python version in checking as was used in creating the
in creating the bytecode. bytecode.
There are however an interesting class of these programs that is There are however an interesting class of these programs that is
readily available give stronger verification: those programs that readily available give stronger verification: those programs that
@@ -174,19 +174,21 @@ that era was minimal)
There is some work to do on the lower end Python versions which is There is some work to do on the lower end Python versions which is
more difficult for us to handle since we don't have a Python more difficult for us to handle since we don't have a Python
interpreter for versions 1.5, 1.6, and 2.0. interpreter for versions 1.6, and 2.0.
In the Python 3 series, Python support is is strongest around 3.4 or In the Python 3 series, Python support is is strongest around 3.4 or
3.3 and drops off as you move further away from those versions. Python 3.3 and drops off as you move further away from those versions. Python
3.6 changes things drastically by using word codes rather than byte 3.0 is weird in that it in some ways resembles 2.6 more than it does
codes. As a result, the jump offset field in a jump instruction 3.1 or 2.7. Python 3.6 changes things drastically by using word codes
argument has been reduced. This makes the `EXTENDED_ARG` instructions rather than byte codes. As a result, the jump offset field in a jump
are now more prevalent in jump instruction; previously they had been instruction argument has been reduced. This makes the `EXTENDED_ARG`
rare. Perhaps to compensate for the additional `EXTENDED_ARG` instructions are now more prevalent in jump instruction; previously
instructions, additional jump optimization has been added. So in sum they had been rare. Perhaps to compensate for the additional
handling control flow by ad hoc means as is currently done is worse. `EXTENDED_ARG` instructions, additional jump optimization has been
added. So in sum handling control flow by ad hoc means as is currently
done is worse.
Also, between Python 3.5, 3.6 and 3.7 there have been major changes to the Between Python 3.5, 3.6 and 3.7 there have been major changes to the
`MAKE_FUNCTION` and `CALL_FUNCTION` instructions. `MAKE_FUNCTION` and `CALL_FUNCTION` instructions.
Currently not all Python magic numbers are supported. Specifically in Currently not all Python magic numbers are supported. Specifically in
@@ -212,11 +214,12 @@ There is lots to do, so please dig in and help.
See Also See Also
-------- --------
* https://github.com/zrax/pycdc : supports all versions of Python and is written in C++. Support for Python 3 is a bit lacking though. * https://github.com/zrax/pycdc : purports to support all versions of Python. It is written in C++ and is most accurate for Python versions around 2.7 and 3.3 when the code was more actively developed. Accuracy for more recent versions of Python 3 and early versions of Python are especially lacking. See its `issue tracker <https://github.com/zrax/pycdc/issues>`_ for details. Currently lightly maintained.
* https://code.google.com/archive/p/unpyc3/ : supports Python 3.2 only. The above projects use a different decompiling technique than what is used here. * https://code.google.com/archive/p/unpyc3/ : supports Python 3.2 only. The above projects use a different decompiling technique than what is used here. Currently unmaintained.
* https://github.com/figment/unpyc3/ : fork of above, but supports Python 3.3 only. Includes some fixes like supporting function annotations * https://github.com/figment/unpyc3/ : fork of above, but supports Python 3.3 only. Includes some fixes like supporting function annotations. Currently unmaintained.
* The HISTORY_ file. * https://github.com/wibiti/uncompyle2 : supports Python 2.7 only, but does that fairly well. Because of it specificity it can sometimes to better than uncompyle6 which we can't do withouth breaking other 2.7 cases. Currently lightly maintained. See its issue `tracker <https://github.com/wibiti/uncompyle2/issues>`_ for more details
* `How to report a bug <https://github.com/rocky/python-uncompyle6/blob/master/HOW-TO-REPORT-A-BUG.md>`_ * `How to report a bug <https://github.com/rocky/python-uncompyle6/blob/master/HOW-TO-REPORT-A-BUG.md>`_
* The HISTORY_ file.
* https://github.com/rocky/python-xdis : Cross Python version disassembler * https://github.com/rocky/python-xdis : Cross Python version disassembler
* https://github.com/rocky/python-xasm : Cross Python version assembler * https://github.com/rocky/python-xasm : Cross Python version assembler
* https://github.com/rocky/python-uncompyle6/wiki : Wiki Documents which describe the code and aspects of it in more detail * https://github.com/rocky/python-uncompyle6/wiki : Wiki Documents which describe the code and aspects of it in more detail