Bug in for loop with try. Add more of 2.7's COME_FROM statements.

spark.py: add tracing reduce rules. main: reduce cutsines.
Start history
This commit is contained in:
rocky
2015-12-21 21:08:08 -05:00
parent 6b0bb124ea
commit 6a49cd2c69
12 changed files with 225 additions and 27 deletions

109
HISTORY.md Normal file
View File

@@ -0,0 +1,109 @@
This project has history of over 17 years spanning back to Python 1.5
There have been a number of people who have worked on this. I am awed
by the amount of work, number of people who have contributed to this,
and the cleverness in the code.
The below is an annotated history from my reading of the sources cited.
In 1998, John Aycock first wrote a grammar parser in Python,
eventually called SPARK, that was usable inside a Python program. This
code was described in the
[7th International Python Conference](http://legacy.python.org/workshops/1998-11/proceedings/papers/aycock-little/aycock-little.html). That
paper doesn't talk about decompilation, nor did John have that in mind
at that time. It does mention that a full parser for Python (rather
than the simple languages in the paper) was being considered.
[This](http://pages.cpsc.ucalgary.ca/~aycock/spark/content.html#contributors)
contains a of people acknowledged in developing SPARK. What's amazing
about this code is that it is reasonably fast and has survived up to
Python 3 with relatively little change. This work was done in
conjunction with his Ph.D Thesis. This was finished around 2001. In
working on his thesis, John realized SPARK could be used to deparse
Python bytecode. In the fall of 1999, he started writing the Python
program, "decompyle", to do this.
This code introduced another clever idea: using table-driven
semantics routines, using format specifiers.
The last mention of a release of SPARK from John is around 2002.
In the fall of 2000, Hartmut Goebel
[took over maintaining the code](https://groups.google.com/forum/#!searchin/comp.lang.python/hartmut$20goebel/comp.lang.python/35s3mp4-nuY/UZALti6ujnQJ). The
first subsequennt public release announcement that I can find is
["decompyle - A byte-code-decompiler version 2.2 beta 1"](https://mail.python.org/pipermail/python-announce-list/2002-February/001272.html).
From the CHANGES file found in
[the tarball for that release](http://old-releases.ubuntu.com/ubuntu/pool/universe/d/decompyle2.2/decompyle2.2_2.2beta1.orig.tar.gz),
it appears that Hartmut did most of the work to get this code to
accept the full Python language. He added precidence to the table
specifiers, support for multiple versions of Python, the
pretty-printing of docstrings, lists and hashes. He also wrote
extensive tests and routines to the testing and verification of
decompiled bytecode.
decompyle2.2 was packaged for Debian (sarge) by
[Ben Burton around 2002](https://packages.qa.debian.org/d/decompyle.html). As
it worked on Python 2.2 only long after Python 2.3 and 2.4 were in
widespread use, it was removed.
[Crazy Compilers](http://www.crazy-compilers.com/decompyle/) offers a
byte-code decompiler service for versions of Python up to 2.6. As
someone who worked in compilers, it is tough to make a living by
working on compilers. (For example, based on
[John Aycock's recent papers](http://pages.cpsc.ucalgary.ca/~aycock/)
it doesn't look like he's done anything compiler-wise since SPARK). So
I hope people will use the crazy-compilers service. I wish them the
success that his good work deserves.
Next we get to
["uncompyle" and PyPI](https://pypi.python.org/pypi/uncompyle/1.1) and
the era of git repositories. In contrast to decompyle, this now runs
only on Python 2.7 although it accepts bytecode back to Python
2.5. Thomas Grainger is the package owner of this, although Hartmut is
listed as the author.
The project exists not only on
[github](https://github.com/gstarnberger/uncompyle) but also on
[bitbucket](https://bitbucket.org/gstarnberger/uncompyle) where the
git history goes back to 2009. Somewhere in there the name was changed
from "decompyle" to "uncompyle".
The name Thomas Grainger isn't found in (m)any of the commits in the
several years of active development. Guenther Starnberger, Keknehv,
hamled, and Eike Siewertsen are principle committers here.
This project, uncompyle6, however owes its existence to uncompyle2 by
Myst herie (Mysterie) whose first commit seems to goes back to 2012;
it is also based on Hartmut's code. I chose this as it seems had been
the most actively worked on most recently.
Over the many years, code styles and Python features have
changed. However brilliant the code was and still is, it hasn't really
had a single public active maintainer. And there have been many forks
of the code.
That it has been in need of an overhaul has been recognized by the
Hartmut a decade an a half ago:
[decompyle/uncompile__init__.py](https://github.com/gstarnberger/uncompyle/blob/master/uncompyle/__init__.py#L25-L26)
NB. This is not a masterpiece of software, but became more like a hack.
Probably a complete rewrite would be sensefull. hG/2000-12-27
One of the attempts to modernize it and make it available for Python3
is [the one by Anton Vorobyov (DarkFenX)](https://github.com/DarkFenX/uncompyle3). I've
followed some of the ideas there in this project.
Lastly, I should mention [unpyc](https://code.google.com/p/unpyc3/)
and most especially [pycdc](https://github.com/zrax/pycdc), largely by
Michael Hansen and Darryl Pogue. If they supported getting source-code
fragments and I could call it from Python, I'd probably ditch this and
use that. From what I've seen, the code runs blindingly fast and spans
all versions of Python.
Tests for the project have been, or are being, culled from all of the
projects mentioned.
NB. If you find mistakes, want corrections, or want your name added (or removed),
please contact me.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -1,8 +1,11 @@
Files in this directory contain very simnple constructs that work
across all versions of Python.
Their simnplicity is to try to make it easier to debug grammar
and AST walking routines.
Their simplicity is to try to make it easier to debug scanner, grammar
and semantic-action routines.
We also try to make the code here runnable by Python and when run should
not produce an error.
The numbers in the filenames are to assist running the programs from
the simplest to more complex. For example, many tests have assignment

View File

@@ -0,0 +1,5 @@
for i in (1,2):
try:
x = 1
except ValueError:
y = 2

View File

@@ -1,5 +1,8 @@
# Tests:
# forstmt ::= SETUP_LOOP expr _for designator
# for_block POP_BLOCK COME_FROM
for a in b:
c = d
for a in [1]:
c = 2
for a in range(2):
c = 2

View File

@@ -117,7 +117,7 @@ def main(in_base, out_base, files, codes, outfile=None,
outstream = _get_outstream(outfile)
# print(outfile, file=sys.stderr)
# try to decomyple the input file
# Try to uncmpile the input file
try:
uncompyle_file(infile, outstream, showasm, showast)
tot_files += 1
@@ -136,8 +136,8 @@ def main(in_base, out_base, files, codes, outfile=None,
outstream.close()
os.rename(outfile, outfile + '_failed')
else:
sys.stderr.write("\n# Can't uncompyle %s\n" % infile)
else: # uncompyle successfull
sys.stderr.write("\n# Can't uncompile %s\n" % infile)
else: # uncompile successfull
if outfile:
outstream.close()
if do_verify:
@@ -145,7 +145,7 @@ def main(in_base, out_base, files, codes, outfile=None,
msg = verify.compare_code_with_srcfile(infile, outfile)
if not outfile:
if not msg:
print('\n# okay decompyling %s' % infile)
print('\n# okay decompiling %s' % infile)
okay_files += 1
else:
print('\n# %s\n\t%s', infile, msg)
@@ -158,7 +158,7 @@ def main(in_base, out_base, files, codes, outfile=None,
else:
okay_files += 1
if not outfile:
mess = '\n# okay decompyling'
mess = '\n# okay decompiling'
# mem_usage = __memUsage()
print(mess, infile)
if outfile:

View File

@@ -43,7 +43,13 @@ def jabs_op(name, op):
hasjabs.append(op)
def updateGlobal():
# JUMP_OPs are used in verification
# JUMP_OPs are used in verification and in the scanner in resolving forward/backward
# jumps
globals().update({'PJIF': opmap['POP_JUMP_IF_FALSE']})
globals().update({'PJIT': opmap['POP_JUMP_IF_TRUE']})
globals().update({'JA': opmap['JUMP_ABSOLUTE']})
globals().update({'JF': opmap['JUMP_FORWARD']})
globals().update(dict([(k.replace('+','_'),v) for (k,v) in opmap.items()]))
globals().update({'JUMP_OPs': map(lambda op: opname[op], hasjrel + hasjabs)})
# Instruction opcodes for compiled code

View File

@@ -44,8 +44,8 @@ class _State:
self.T, self.complete, self.items = [], [], items
self.stateno = stateno
# DEFAULT_DEBUG = {'rules': True, 'transition': False}
DEFAULT_DEBUG = {'rules': False, 'transition': False}
# DEFAULT_DEBUG = {'rules': True, 'transition': True, 'reduce' : True}
DEFAULT_DEBUG = {'rules': False, 'transition': False, 'reduce': False}
class GenericParser:
'''
An Earley parser, as per J. Earley, "An Efficient Context-Free
@@ -450,6 +450,8 @@ class GenericParser:
for rule in self.states[state].complete:
lhs, rhs = rule
if self.debug['reduce']:
print("%s ::= %s" % (lhs, ' '.join(rhs)))
for pitem in sets[parent]:
pstate, pparent = pitem
k = self.goto(pstate, lhs)

View File

@@ -29,7 +29,6 @@ globals().update(dis.opmap)
from uncompyle6.opcodes.opcode_34 import *
import uncompyle6.scanner as scan
@@ -60,21 +59,22 @@ class Scanner34(scan.Scanner):
bytecode = dis.Bytecode(co)
# self.lines contains (block,addrLastInstr)
# if classname:
# classname = '_' + classname.lstrip('_') + '__'
if classname:
classname = '_' + classname.lstrip('_') + '__'
# def unmangle(name):
# if name.startswith(classname) and name[-2:] != '__':
# return name[len(classname) - 2:]
# return name
def unmangle(name):
if name.startswith(classname) and name[-2:] != '__':
return name[len(classname) - 2:]
return name
# free = [ unmangle(name) for name in (co.co_cellvars + co.co_freevars) ]
# names = [ unmangle(name) for name in co.co_names ]
# varnames = [ unmangle(name) for name in co.co_varnames ]
# else:
else:
# free = co.co_cellvars + co.co_freevars
# names = co.co_names
# varnames = co.co_varnames
pass
# Scan for assertions. Later we will
# turn 'LOAD_GLOBAL' to 'LOAD_ASSERT' for those
@@ -439,6 +439,33 @@ class Scanner34(scan.Scanner):
target += offset + 3
return target
def next_except_jump(self, start):
"""
Return the next jump that was generated by an except SomeException:
construct in a try...except...else clause or None if not found.
"""
if self.code[start] == DUP_TOP:
except_match = self.first_instr(start, len(self.code), POP_JUMP_IF_FALSE)
if except_match:
jmp = self.prev_op[self.get_target(except_match)]
self.ignore_if.add(except_match)
self.not_continue.add(jmp)
return jmp
count_END_FINALLY = 0
count_SETUP_ = 0
for i in self.op_range(start, len(self.code)):
op = self.code[i]
if op == END_FINALLY:
if count_END_FINALLY == count_SETUP_:
assert self.code[self.prev_op[i]] in (JUMP_ABSOLUTE, JUMP_FORWARD, RETURN_VALUE)
self.not_continue.add(self.prev_op[i])
return self.prev_op[i]
count_END_FINALLY += 1
elif op in (SETUP_EXCEPT, SETUP_WITH, SETUP_FINALLY):
count_SETUP_ += 1
def detect_structure(self, offset):
"""
Detect structures and their boundaries to fix optimizied jumps
@@ -459,8 +486,51 @@ class Scanner34(scan.Scanner):
start = curent_start
end = curent_end
parent = struct
pass
if op in (POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE):
if op == SETUP_EXCEPT:
start = offset + 3
target = self.get_target(offset)
end = self.restrict_to_parent(target, parent)
if target != end:
self.fixed_jumps[pos] = end
# print target, end, parent
# Add the try block
self.structs.append({'type': 'try',
'start': start,
'end': end-4})
# Now isolate the except and else blocks
end_else = start_else = self.get_target(self.prev_op[end])
# Add the except blocks
i = end
while self.code[i] != END_FINALLY:
jmp = self.next_except_jump(i)
if self.code[jmp] == RETURN_VALUE:
self.structs.append({'type': 'except',
'start': i,
'end': jmp+1})
i = jmp + 1
else:
if self.get_target(jmp) != start_else:
end_else = self.get_target(jmp)
if self.code[jmp] == JUMP_FORWARD:
self.fixed_jumps[jmp] = -1
self.structs.append({'type': 'except',
'start': i,
'end': jmp})
i = jmp + 3
# Add the try-else block
if end_else != start_else:
r_end_else = self.restrict_to_parent(end_else, parent)
self.structs.append({'type': 'try-else',
'start': i+1,
'end': r_end_else})
self.fixed_jumps[i] = r_end_else
else:
self.fixed_jumps[i] = i+1
elif op in (POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE):
start = offset + self.op_size(op)
target = self.get_target(offset)
rtarget = self.restrict_to_parent(target, parent)