Merge branch 'master' into python-2.4

This commit is contained in:
rocky
2019-11-18 18:15:51 -05:00
23 changed files with 224 additions and 103 deletions

View File

@@ -15,7 +15,7 @@ Introduction
*uncompyle6* translates Python bytecode back into equivalent Python
source code. It accepts bytecodes from Python version 1.0 to version
3.8, spanning over 24 years of Python releases. We include Dropbox's
Python 2.5 bytecode and some PyPy bytecode.
Python 2.5 bytecode and some PyPy bytecodes.
Why this?
---------
@@ -46,14 +46,15 @@ not exist and there is just bytecode. Again, my debuggers make use of
this.
There were (and still are) a number of decompyle, uncompyle,
uncompyle2, uncompyle3 forks around. Almost all of them come basically
from the same code base, and (almost?) all of them are no longer
actively maintained. One was really good at decompiling Python 1.5-2.3
or so, another really good at Python 2.7, but that only. Another
handles Python 3.2 only; another patched that and handled only 3.3.
You get the idea. This code pulls all of these forks together and
*moves forward*. There is some serious refactoring and cleanup in this
code base over those old forks.
uncompyle2, uncompyle3 forks around. Many of them come basically from
the same code base, and (almost?) all of them are no longer actively
maintained. One was really good at decompiling Python 1.5-2.3, another
really good at Python 2.7, but that only. Another handles Python 3.2
only; another patched that and handled only 3.3. You get the
idea. This code pulls all of these forks together and *moves
forward*. There is some serious refactoring and cleanup in this code
base over those old forks. Even more experimental refactoring is going
on in decompile3_.
This demonstrably does the best in decompiling Python across all
Python versions. And even when there is another project that only
@@ -75,11 +76,11 @@ fixed in the other decompilers.
Requirements
------------
The code here can be run on Python versions 2.6 or later, PyPy 3-2.4,
or PyPy-5.0.1. Python versions 2.4-2.7 are supported in the
python-2.4 branch. The bytecode files it can read have been tested on
Python bytecodes from versions 1.4, 2.1-2.7, and 3.0-3.8 and the
above-mentioned PyPy versions.
The code here can be run on Python versions 2.6 or later, PyPy 3-2.4
and later. Python versions 2.4-2.7 are supported in the python-2.4
branch. The bytecode files it can read have been tested on Python
bytecodes from versions 1.4, 2.1-2.7, and 3.0-3.8 and later PyPy
versions.
Installation
------------
@@ -185,15 +186,21 @@ they had been rare. Perhaps to compensate for the additional
added. So in sum handling control flow by ad hoc means as is currently
done is worse.
Between Python 3.5, 3.6 and 3.7 there have been major changes to the
:code:`MAKE_FUNCTION` and :code:`CALL_FUNCTION` instructions.
Between Python 3.5, 3.6, 3.7 there have been major changes to the
:code:`MAKE_FUNCTION` and :code:`CALL_FUNCTION` instructions. Python
Python 3.8 removes :code:`SETUP_LOOP`, :code:`SETUP_EXCEPT`,
:code:`BREAK_LOOP`, and :code:`CONTINUE_LOOP`, instructions which may
make control-flow detection harder, lacking the more sophisticated
control-flow analysis that is planned. We'll see.
Currently not all Python magic numbers are supported. Specifically in
some versions of Python, notably Python 3.6, the magic number has
changes several times within a version.
**We support only released versions, not candidate versions.** Note however
that the magic of a released version is usually the same as the *last* candidate version prior to release.
**We support only released versions, not candidate versions.** Note
however that the magic of a released version is usually the same as
the *last* candidate version prior to release.
There are also customized Python interpreters, notably Dropbox,
which use their own magic and encrypt bytcode. With the exception of
@@ -215,7 +222,8 @@ There is lots to do, so please dig in and help.
See Also
--------
* https://github.com/zrax/pycdc : purports to support all versions of Python. It is written in C++ and is most accurate for Python versions around 2.7 and 3.3 when the code was more actively developed. Accuracy for more recent versions of Python 3 and early versions of Python are especially lacking. See its `issue tracker <https://github.com/zrax/pycdc/issues>`_ for details. Currently lightly maintained.
* https://github.com/zrax/pycdc : aims to support all versions of Python, but doesn't currently. It is written in C++ and is most accurate for Python versions around 2.7 and 3.3 when the code was more actively developed. Accuracy for more recent versions of Python 3 and early versions of Python are especially lacking. See its `issue tracker <https://github.com/zrax/pycdc/issues>`_ for details. Currently lightly maintained.
* https://github.com/rocky/python-decompile3 : Much smaller and more modern code, focusing on 3.7+. Changes in that will get migrated back ehre.
* https://code.google.com/archive/p/unpyc3/ : supports Python 3.2 only. The above projects use a different decompiling technique than what is used here. Currently unmaintained.
* https://github.com/figment/unpyc3/ : fork of above, but supports Python 3.3 only. Includes some fixes like supporting function annotations. Currently unmaintained.
* https://github.com/wibiti/uncompyle2 : supports Python 2.7 only, but does that fairly well. There are situations where :code:`uncompyle6` results are incorrect while :code:`uncompyle2` results are not, but more often uncompyle6 is correct when uncompyle2 is not. Because :code:`uncompyle6` adheres to accuracy over idiomatic Python, :code:`uncompyle2` can produce more natural-looking code when it is correct. Currently :code:`uncompyle2` is lightly maintained. See its issue `tracker <https://github.com/wibiti/uncompyle2/issues>`_ for more details
@@ -232,6 +240,7 @@ See Also
.. _debuggers: https://pypi.python.org/pypi/trepan3k
.. _remake: https://bashdb.sf.net/remake
.. _pycdc: https://github.com/zrax/pycdc
.. _decompile3: https://github.com/rocky/python-decompile3
.. _this: https://github.com/rocky/python-uncompyle6/wiki/Deparsing-technology-and-its-use-in-exact-location-reporting
.. |buildstatus| image:: https://travis-ci.org/rocky/python-uncompyle6.svg :target: https://travis-ci.org/rocky/python-uncompyle6
.. |packagestatus| image:: https://repology.org/badge/vertical-allrepos/python:uncompyle6.svg :target: https://repology.org/project/python:uncompyle6/versions

View File

@@ -61,8 +61,6 @@
$ . ./admin-tools/make-dist-newer.sh
$ twine check dist/uncompyle6-$VERSION*
Goto https://github.com/rocky/python-uncompyle6/releases
# Upload single package and look at Rst Formating
$ twine check dist/uncompyle6-${VERSION}*
@@ -72,6 +70,8 @@ Goto https://github.com/rocky/python-uncompyle6/releases
$ twine upload dist/uncompyle6-${VERSION}*
Goto https://github.com/rocky/python-uncompyle6/releases
# Push tags:
$ git push --tags

View File

@@ -32,6 +32,7 @@ setup(
install_requires = install_requires,
license = license,
long_description = long_description,
long_description_content_type = "text/x-rst",
name = modname,
packages = find_packages(),
py_modules = py_modules,

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -14,4 +14,4 @@ for y in (1, 2, 10):
expected = 3
result.append(expected)
assert result == [10, 2, 3]
assert result == [3, 2, 3]

View File

@@ -1,4 +1,4 @@
# Tests BINARY_TRUE_DIVIDE and INPLACE_TRUE_DIVIDE
from __future__ import division # needed on 2.6 and 2.7
from __future__ import division # needed on 2.2 .. 2.7
x = len(__file__) / 1
x /= 1

View File

@@ -0,0 +1,53 @@
# Covers a large number of operators
#
# This code is RUNNABLE!
from __future__ import division # needed on 2.2 .. 2.7
import sys
PYTHON_VERSION = sys.version_info[0] + (sys.version_info[1] / 10.0)
# some floats (from 01_float.py)
x = 1e300
assert 0.0 == x * 0
assert x * 1e300 == float("inf")
assert str(float("inf") * 0.0) == "nan"
assert str(float("-inf") * 0.0) == "nan"
assert -1e300 * 1e300 == float("-inf")
# Complex (adapted from 02_complex.py)
y = 5j
assert y ** 2 == -25
y **= 3
assert y == (-0-125j)
# Tests BINARY_TRUE_DIVIDE and INPLACE_TRUE_DIVIDE (from 02_try_divide.py)
x = 2
assert 4 / x == 2
if PYTHON_VERSION >= 2.2:
x = 5
assert x / 2 == 2.5
x = 3
x /= 2
assert x == 1.5
x = 2
assert 4 // x == 2
x = 7
x //= 2
assert x == 3
x = 3
assert x % 2 == 1
x %= 2
assert x == 1
assert x << 2 == 4
x <<= 3
assert x == 8
assert x >> 1 == 4
x >>= 1
assert x == 4

View File

@@ -133,21 +133,24 @@ case $PYVERSION in
[test_capi.py]=1
[test_curses.py]=1 # Possibly fails on its own but not detected
[test test_cmd_line.py]=1 # Takes too long, maybe hangs, or looking for interactive input?
[test_dis.py]=1 # We change line numbers - duh!
[test_doctest.py]=1 # Fails on its own
[test_exceptions.py]=1
[test_format.py]=1 # control flow. uncompyle2 does not have problems here
[test_frozen.py]=1 # try vs try/else control flow. uncompyle2 does not have problems here
[test_generators.py]=1 # control flow. uncompyle2 has problem here too
[test_grammar.py]=1 # Too many stmts. Handle large stmts
[test_grp.py]=1 # test takes to long, works interactively though
[test_hashlib.py]=1 # Investiage
[test_io.py]=1 # Test takes too long to run
[test_ioctl.py]=1 # Test takes too long to run
[test_itertools.py]=1 # Fix erroneous reduction to "conditional_true".
# See test/simple_source/bug27+/05_not_unconditional.py
[test_long.py]=1
[test_long_future.py]=1
[test_math.py]=1
[test_memoryio.py]=1 # FIX
[test_multiprocessing.py]=1 # On uncompyle2, taks 24 secs
[test_modulefinder.py]=1 # FIX
[test_multiprocessing.py]=1 # On uncompyle2, takes 24 secs
[test_pep352.py]=1 # ?
[test_posix.py]=1 # Bug in try-else detection inside test_initgroups()
# Deal with when we have better flow-control detection

View File

@@ -214,13 +214,17 @@ class Python27Parser(Python2Parser):
super(Python27Parser, self).customize_grammar_rules(tokens, customize)
self.check_reduce['and'] = 'AST'
# self.check_reduce['or'] = 'AST'
self.check_reduce['raise_stmt1'] = 'AST'
self.check_reduce['list_if_not'] = 'AST'
self.check_reduce['list_if'] = 'AST'
self.check_reduce['if_expr_true'] = 'tokens'
self.check_reduce['whilestmt'] = 'tokens'
self.check_reduce["and"] = "AST"
self.check_reduce["conditional"] = "AST"
# self.check_reduce["or"] = "AST"
self.check_reduce["raise_stmt1"] = "AST"
self.check_reduce["iflaststmtl"] = "AST"
self.check_reduce["list_if_not"] = "AST"
self.check_reduce["list_if"] = "AST"
self.check_reduce["comp_if"] = "AST"
self.check_reduce["if_expr_true"] = "tokens"
self.check_reduce["whilestmt"] = "tokens"
return
def reduce_is_invalid(self, rule, ast, tokens, first, last):
@@ -231,54 +235,99 @@ class Python27Parser(Python2Parser):
if invalid:
return invalid
if rule == ('and', ('expr', 'jmp_false', 'expr', '\\e_come_from_opt')):
# If the instruction after the instructions formin "and" is an "YIELD_VALUE"
if rule == ("and", ("expr", "jmp_false", "expr", "\\e_come_from_opt")):
# If the instruction after the instructions forming the "and" is an "YIELD_VALUE"
# then this is probably an "if" inside a comprehension.
if tokens[last] == 'YIELD_VALUE':
if tokens[last] == "YIELD_VALUE":
# Note: We might also consider testing last+1 being "POP_TOP"
return True
# Test that jump_false jump somewhere beyond the end of the "and"
# it might not be exactly the end of the "and" because this and can
# be a part of a larger condition. Oddly in 2.7 there doesn't seem to be
# an optimization where the "and" jump_false is back to a loop.
jmp_false = ast[1]
if jmp_false[0] == "POP_JUMP_IF_FALSE":
while (first < last and isinstance(tokens[last].offset, str)):
last -= 1
if jmp_false[0].attr < tokens[last].offset:
return True
# Test that jmp_false jumps to the end of "and"
# or that it jumps to the same place as the end of "and"
jmp_false = ast[1][0]
jmp_target = jmp_false.offset + jmp_false.attr + 3
return not (jmp_target == tokens[last].offset or
tokens[last].pattr == jmp_false.pattr)
elif rule[0] == ('raise_stmt1'):
return ast[0] == 'expr' and ast[0][0] == 'or'
elif rule[0] in ('assert', 'assert2'):
elif rule == ("comp_if", ("expr", "jmp_false", "comp_iter")):
jmp_false = ast[1]
if jmp_false[0] == "POP_JUMP_IF_FALSE":
return tokens[first].offset < jmp_false[0].attr < tokens[last].offset
pass
elif (rule[0], rule[1][0:5]) == (
"conditional",
("expr", "jmp_false", "expr", "JUMP_ABSOLUTE", "expr")):
jmp_false = ast[1]
if jmp_false[0] == "POP_JUMP_IF_FALSE":
else_instr = ast[4].first_child()
if jmp_false[0].attr != else_instr.offset:
return True
end_offset = ast[3].attr
return end_offset < tokens[last].offset
pass
elif rule[0] == ("raise_stmt1"):
return ast[0] == "expr" and ast[0][0] == "or"
elif rule[0] in ("assert", "assert2"):
jump_inst = ast[1][0]
jump_target = jump_inst.attr
return not (last >= len(tokens)
or jump_target == tokens[last].offset
or jump_target == next_offset(ast[-1].op, ast[-1].opc, ast[-1].offset))
elif rule == ('list_if_not', ('expr', 'jmp_true', 'list_iter')):
elif rule == ("iflaststmtl", ("testexpr", "c_stmts")):
testexpr = ast[0]
if testexpr[0] == "testfalse":
testfalse = testexpr[0]
if testfalse[1] == "jmp_false":
jmp_false = testfalse[1]
if last == len(tokens):
last -= 1
while (isinstance(tokens[first].offset, str) and first < last):
first += 1
if first == last:
return True
while (first < last and isinstance(tokens[last].offset, str)):
last -= 1
return tokens[first].offset < jmp_false[0].attr < tokens[last].offset
pass
pass
pass
elif rule == ("list_if_not", ("expr", "jmp_true", "list_iter")):
jump_inst = ast[1][0]
jump_offset = jump_inst.attr
return jump_offset > jump_inst.offset and jump_offset < tokens[last].offset
elif rule == ('list_if', ('expr', 'jmp_false', 'list_iter')):
elif rule == ("list_if", ("expr", "jmp_false", "list_iter")):
jump_inst = ast[1][0]
jump_offset = jump_inst.attr
return jump_offset > jump_inst.offset and jump_offset < tokens[last].offset
elif rule == ('or', ('expr', 'jmp_true', 'expr', '\\e_come_from_opt')):
# Test that jmp_true doesn't jump inside the middle the "or"
elif rule == ("or", ("expr", "jmp_true", "expr", "\\e_come_from_opt")):
# Test that jmp_true doesn"t jump inside the middle the "or"
# or that it jumps to the same place as the end of "and"
jmp_true = ast[1][0]
jmp_target = jmp_true.offset + jmp_true.attr + 3
return not (jmp_target == tokens[last].offset or
tokens[last].pattr == jmp_true.pattr)
elif (rule[0] == 'whilestmt' and
elif (rule[0] == "whilestmt" and
rule[1][0:-2] ==
('SETUP_LOOP', 'testexpr', 'l_stmts_opt',
'JUMP_BACK', 'JUMP_BACK')):
("SETUP_LOOP", "testexpr", "l_stmts_opt",
"JUMP_BACK", "JUMP_BACK")):
# Make sure that the jump backs all go to the same place
i = last-1
while (tokens[i] != 'JUMP_BACK'):
while (tokens[i] != "JUMP_BACK"):
i -= 1
return tokens[i].attr != tokens[i-1].attr
elif rule[0] == 'if_expr_true':
return (first) > 0 and tokens[first-1] == 'POP_JUMP_IF_FALSE'
elif rule[0] == "if_expr_true":
return (first) > 0 and tokens[first-1] == "POP_JUMP_IF_FALSE"
return False
@@ -286,7 +335,7 @@ class Python27Parser(Python2Parser):
class Python27ParserSingle(Python27Parser, PythonParserSingle):
pass
if __name__ == '__main__':
if __name__ == "__main__":
# Check grammar
p = Python27Parser()
p.check_grammar()
@@ -302,9 +351,9 @@ if __name__ == '__main__':
""".split()))
remain_tokens = set(tokens) - opcode_set
import re
remain_tokens = set([re.sub(r'_\d+$', '', t)
remain_tokens = set([re.sub(r"_\d+$", "", t)
for t in remain_tokens])
remain_tokens = set([re.sub('_CONT$', '', t)
remain_tokens = set([re.sub("_CONT$", "", t)
for t in remain_tokens])
remain_tokens = set(remain_tokens) - opcode_set
print(remain_tokens)

View File

@@ -67,7 +67,8 @@ class Scanner3(Scanner):
# Create opcode classification sets
# Note: super initilization above initializes self.opc
# Ops that start SETUP_ ... We will COME_FROM with these names
# For ops that start SETUP_ ... we will add COME_FROM with these names
# at the their targets.
# Some blocks and END_ statements. And they can start
# a new statement
if self.version < 3.8:

View File

@@ -1633,6 +1633,57 @@ class SourceWalker(GenericASTTraversal, object):
self.write(")")
def kv_map(self, kv_node, sep, line_number, indent):
first_time = True
for kv in kv_node:
assert kv in ('kv', 'kv2', 'kv3')
# kv ::= DUP_TOP expr ROT_TWO expr STORE_SUBSCR
# kv2 ::= DUP_TOP expr expr ROT_THREE STORE_SUBSCR
# kv3 ::= expr expr STORE_MAP
# FIXME: DRY this and the above
if kv == 'kv':
self.write(sep)
name = self.traverse(kv[-2], indent='')
if first_time:
line_number = self.indent_if_source_nl(line_number, indent)
first_time = False
pass
line_number = self.line_number
self.write(name, ': ')
value = self.traverse(kv[1], indent=self.indent+(len(name)+2)*' ')
elif kv == 'kv2':
self.write(sep)
name = self.traverse(kv[1], indent='')
if first_time:
line_number = self.indent_if_source_nl(line_number, indent)
first_time = False
pass
line_number = self.line_number
self.write(name, ': ')
value = self.traverse(kv[-3], indent=self.indent+(len(name)+2)*' ')
elif kv == 'kv3':
self.write(sep)
name = self.traverse(kv[-2], indent='')
if first_time:
line_number = self.indent_if_source_nl(line_number, indent)
first_time = False
pass
line_number = self.line_number
self.write(name, ': ')
line_number = self.line_number
value = self.traverse(kv[0], indent=self.indent+(len(name)+2)*' ')
pass
self.write(value)
sep = ", "
if line_number != self.line_number:
sep += "\n" + self.indent + " "
line_number = self.line_number
pass
pass
def n_dict(self, node):
"""
prettyprint a dict
@@ -1753,14 +1804,15 @@ class SourceWalker(GenericASTTraversal, object):
pass
else:
# Python 2 style kvlist. Find beginning of kvlist.
indent = self.indent + " "
line_number = self.line_number
if node[0].kind.startswith("BUILD_MAP"):
if len(node) > 1 and node[1].kind in ("kvlist", "kvlist_n"):
kv_node = node[1]
else:
kv_node = node[1:]
self.kv_map(kv_node, sep, line_number, indent)
else:
indent = self.indent + " "
line_number = self.line_number
sep = ''
opname = node[-1].kind
if self.is_pypy and self.version >= 3.5:
@@ -1798,54 +1850,7 @@ class SourceWalker(GenericASTTraversal, object):
pass
elif opname.startswith('kvlist'):
kv_node = node[-1]
first_time = True
for kv in kv_node:
assert kv in ('kv', 'kv2', 'kv3')
# kv ::= DUP_TOP expr ROT_TWO expr STORE_SUBSCR
# kv2 ::= DUP_TOP expr expr ROT_THREE STORE_SUBSCR
# kv3 ::= expr expr STORE_MAP
# FIXME: DRY this and the above
if kv == 'kv':
self.write(sep)
name = self.traverse(kv[-2], indent='')
if first_time:
line_number = self.indent_if_source_nl(line_number, indent)
first_time = False
pass
line_number = self.line_number
self.write(name, ': ')
value = self.traverse(kv[1], indent=self.indent+(len(name)+2)*' ')
elif kv == 'kv2':
self.write(sep)
name = self.traverse(kv[1], indent='')
if first_time:
line_number = self.indent_if_source_nl(line_number, indent)
first_time = False
pass
line_number = self.line_number
self.write(name, ': ')
value = self.traverse(kv[-3], indent=self.indent+(len(name)+2)*' ')
elif kv == 'kv3':
self.write(sep)
name = self.traverse(kv[-2], indent='')
if first_time:
line_number = self.indent_if_source_nl(line_number, indent)
first_time = False
pass
line_number = self.line_number
self.write(name, ': ')
line_number = self.line_number
value = self.traverse(kv[0], indent=self.indent+(len(name)+2)*' ')
pass
self.write(value)
sep = ", "
if line_number != self.line_number:
sep += "\n" + self.indent + " "
line_number = self.line_number
pass
pass
self.kv_map(node[-1], sep, line_number, indent)
pass
if sep.startswith(",\n"):