diff --git a/HISTORY.md b/HISTORY.md new file mode 100644 index 00000000..46465fd4 --- /dev/null +++ b/HISTORY.md @@ -0,0 +1,109 @@ +This project has history of over 17 years spanning back to Python 1.5 + +There have been a number of people who have worked on this. I am awed +by the amount of work, number of people who have contributed to this, +and the cleverness in the code. + +The below is an annotated history from my reading of the sources cited. + +In 1998, John Aycock first wrote a grammar parser in Python, +eventually called SPARK, that was usable inside a Python program. This +code was described in the +[7th International Python Conference](http://legacy.python.org/workshops/1998-11/proceedings/papers/aycock-little/aycock-little.html). That +paper doesn't talk about decompilation, nor did John have that in mind +at that time. It does mention that a full parser for Python (rather +than the simple languages in the paper) was being considered. + +[This](http://pages.cpsc.ucalgary.ca/~aycock/spark/content.html#contributors) +contains a of people acknowledged in developing SPARK. What's amazing +about this code is that it is reasonably fast and has survived up to +Python 3 with relatively little change. This work was done in +conjunction with his Ph.D Thesis. This was finished around 2001. In +working on his thesis, John realized SPARK could be used to deparse +Python bytecode. In the fall of 1999, he started writing the Python +program, "decompyle", to do this. + +This code introduced another clever idea: using table-driven +semantics routines, using format specifiers. + +The last mention of a release of SPARK from John is around 2002. + +In the fall of 2000, Hartmut Goebel +[took over maintaining the code](https://groups.google.com/forum/#!searchin/comp.lang.python/hartmut$20goebel/comp.lang.python/35s3mp4-nuY/UZALti6ujnQJ). The +first subsequennt public release announcement that I can find is +["decompyle - A byte-code-decompiler version 2.2 beta 1"](https://mail.python.org/pipermail/python-announce-list/2002-February/001272.html). + +From the CHANGES file found in +[the tarball for that release](http://old-releases.ubuntu.com/ubuntu/pool/universe/d/decompyle2.2/decompyle2.2_2.2beta1.orig.tar.gz), +it appears that Hartmut did most of the work to get this code to +accept the full Python language. He added precidence to the table +specifiers, support for multiple versions of Python, the +pretty-printing of docstrings, lists and hashes. He also wrote +extensive tests and routines to the testing and verification of +decompiled bytecode. + +decompyle2.2 was packaged for Debian (sarge) by +[Ben Burton around 2002](https://packages.qa.debian.org/d/decompyle.html). As +it worked on Python 2.2 only long after Python 2.3 and 2.4 were in +widespread use, it was removed. + +[Crazy Compilers](http://www.crazy-compilers.com/decompyle/) offers a +byte-code decompiler service for versions of Python up to 2.6. As +someone who worked in compilers, it is tough to make a living by +working on compilers. (For example, based on +[John Aycock's recent papers](http://pages.cpsc.ucalgary.ca/~aycock/) +it doesn't look like he's done anything compiler-wise since SPARK). So +I hope people will use the crazy-compilers service. I wish them the +success that his good work deserves. + +Next we get to +["uncompyle" and PyPI](https://pypi.python.org/pypi/uncompyle/1.1) and +the era of git repositories. In contrast to decompyle, this now runs +only on Python 2.7 although it accepts bytecode back to Python +2.5. Thomas Grainger is the package owner of this, although Hartmut is +listed as the author. + +The project exists not only on +[github](https://github.com/gstarnberger/uncompyle) but also on +[bitbucket](https://bitbucket.org/gstarnberger/uncompyle) where the +git history goes back to 2009. Somewhere in there the name was changed +from "decompyle" to "uncompyle". + +The name Thomas Grainger isn't found in (m)any of the commits in the +several years of active development. Guenther Starnberger, Keknehv, +hamled, and Eike Siewertsen are principle committers here. + +This project, uncompyle6, however owes its existence to uncompyle2 by +Myst herie (Mysterie) whose first commit seems to goes back to 2012; +it is also based on Hartmut's code. I chose this as it seems had been +the most actively worked on most recently. + +Over the many years, code styles and Python features have +changed. However brilliant the code was and still is, it hasn't really +had a single public active maintainer. And there have been many forks +of the code. + +That it has been in need of an overhaul has been recognized by the +Hartmut a decade an a half ago: + +[decompyle/uncompile__init__.py](https://github.com/gstarnberger/uncompyle/blob/master/uncompyle/__init__.py#L25-L26) + + NB. This is not a masterpiece of software, but became more like a hack. + Probably a complete rewrite would be sensefull. hG/2000-12-27 + +One of the attempts to modernize it and make it available for Python3 +is [the one by Anton Vorobyov (DarkFenX)](https://github.com/DarkFenX/uncompyle3). I've +followed some of the ideas there in this project. + +Lastly, I should mention [unpyc](https://code.google.com/p/unpyc3/) +and most especially [pycdc](https://github.com/zrax/pycdc), largely by +Michael Hansen and Darryl Pogue. If they supported getting source-code +fragments and I could call it from Python, I'd probably ditch this and +use that. From what I've seen, the code runs blindingly fast and spans +all versions of Python. + +Tests for the project have been, or are being, culled from all of the +projects mentioned. + +NB. If you find mistakes, want corrections, or want your name added (or removed), +please contact me. diff --git a/test/bytecode_2.7/forelse.pyc b/test/bytecode_2.7/forelse.pyc deleted file mode 100644 index 504fa9d4..00000000 Binary files a/test/bytecode_2.7/forelse.pyc and /dev/null differ diff --git a/test/bytecode_3.4/10_for.pyc b/test/bytecode_3.4/10_for.pyc new file mode 100644 index 00000000..26c63996 Binary files /dev/null and b/test/bytecode_3.4/10_for.pyc differ diff --git a/test/bytecode_3.4/20_try_except.pyc b/test/bytecode_3.4/20_try_except.pyc new file mode 100644 index 00000000..c7663671 Binary files /dev/null and b/test/bytecode_3.4/20_try_except.pyc differ diff --git a/test/bytecode_3.4/for.pyc b/test/bytecode_3.4/for.pyc deleted file mode 100644 index bf5baeb8..00000000 Binary files a/test/bytecode_3.4/for.pyc and /dev/null differ diff --git a/test/simple_source/README b/test/simple_source/README index c4d834fa..40bce7c4 100644 --- a/test/simple_source/README +++ b/test/simple_source/README @@ -1,8 +1,11 @@ Files in this directory contain very simnple constructs that work across all versions of Python. -Their simnplicity is to try to make it easier to debug grammar -and AST walking routines. +Their simplicity is to try to make it easier to debug scanner, grammar +and semantic-action routines. + +We also try to make the code here runnable by Python and when run should +not produce an error. The numbers in the filenames are to assist running the programs from the simplest to more complex. For example, many tests have assignment diff --git a/test/simple_source/exception/20_try_except.py b/test/simple_source/exception/20_try_except.py new file mode 100644 index 00000000..70b8e052 --- /dev/null +++ b/test/simple_source/exception/20_try_except.py @@ -0,0 +1,5 @@ +for i in (1,2): + try: + x = 1 + except ValueError: + y = 2 diff --git a/test/simple_source/looping/for.py b/test/simple_source/looping/10_for.py similarity index 65% rename from test/simple_source/looping/for.py rename to test/simple_source/looping/10_for.py index a1ec00e3..10a56089 100644 --- a/test/simple_source/looping/for.py +++ b/test/simple_source/looping/10_for.py @@ -1,5 +1,8 @@ # Tests: # forstmt ::= SETUP_LOOP expr _for designator # for_block POP_BLOCK COME_FROM -for a in b: - c = d +for a in [1]: + c = 2 + +for a in range(2): + c = 2 diff --git a/uncompyle6/main.py b/uncompyle6/main.py index 151db24b..bceb17ee 100644 --- a/uncompyle6/main.py +++ b/uncompyle6/main.py @@ -117,7 +117,7 @@ def main(in_base, out_base, files, codes, outfile=None, outstream = _get_outstream(outfile) # print(outfile, file=sys.stderr) - # try to decomyple the input file + # Try to uncmpile the input file try: uncompyle_file(infile, outstream, showasm, showast) tot_files += 1 @@ -136,8 +136,8 @@ def main(in_base, out_base, files, codes, outfile=None, outstream.close() os.rename(outfile, outfile + '_failed') else: - sys.stderr.write("\n# Can't uncompyle %s\n" % infile) - else: # uncompyle successfull + sys.stderr.write("\n# Can't uncompile %s\n" % infile) + else: # uncompile successfull if outfile: outstream.close() if do_verify: @@ -145,7 +145,7 @@ def main(in_base, out_base, files, codes, outfile=None, msg = verify.compare_code_with_srcfile(infile, outfile) if not outfile: if not msg: - print('\n# okay decompyling %s' % infile) + print('\n# okay decompiling %s' % infile) okay_files += 1 else: print('\n# %s\n\t%s', infile, msg) @@ -158,7 +158,7 @@ def main(in_base, out_base, files, codes, outfile=None, else: okay_files += 1 if not outfile: - mess = '\n# okay decompyling' + mess = '\n# okay decompiling' # mem_usage = __memUsage() print(mess, infile) if outfile: diff --git a/uncompyle6/opcodes/opcode_34.py b/uncompyle6/opcodes/opcode_34.py index 928b8fb0..be57b5fa 100644 --- a/uncompyle6/opcodes/opcode_34.py +++ b/uncompyle6/opcodes/opcode_34.py @@ -43,7 +43,13 @@ def jabs_op(name, op): hasjabs.append(op) def updateGlobal(): - # JUMP_OPs are used in verification + # JUMP_OPs are used in verification and in the scanner in resolving forward/backward + # jumps + globals().update({'PJIF': opmap['POP_JUMP_IF_FALSE']}) + globals().update({'PJIT': opmap['POP_JUMP_IF_TRUE']}) + globals().update({'JA': opmap['JUMP_ABSOLUTE']}) + globals().update({'JF': opmap['JUMP_FORWARD']}) + globals().update(dict([(k.replace('+','_'),v) for (k,v) in opmap.items()])) globals().update({'JUMP_OPs': map(lambda op: opname[op], hasjrel + hasjabs)}) # Instruction opcodes for compiled code diff --git a/uncompyle6/parsers/spark.py b/uncompyle6/parsers/spark.py index ac772ef2..5f32dcea 100644 --- a/uncompyle6/parsers/spark.py +++ b/uncompyle6/parsers/spark.py @@ -44,8 +44,8 @@ class _State: self.T, self.complete, self.items = [], [], items self.stateno = stateno -# DEFAULT_DEBUG = {'rules': True, 'transition': False} -DEFAULT_DEBUG = {'rules': False, 'transition': False} +# DEFAULT_DEBUG = {'rules': True, 'transition': True, 'reduce' : True} +DEFAULT_DEBUG = {'rules': False, 'transition': False, 'reduce': False} class GenericParser: ''' An Earley parser, as per J. Earley, "An Efficient Context-Free @@ -450,6 +450,8 @@ class GenericParser: for rule in self.states[state].complete: lhs, rhs = rule + if self.debug['reduce']: + print("%s ::= %s" % (lhs, ' '.join(rhs))) for pitem in sets[parent]: pstate, pparent = pitem k = self.goto(pstate, lhs) diff --git a/uncompyle6/scanners/scanner34.py b/uncompyle6/scanners/scanner34.py index a51c0c12..bc909649 100644 --- a/uncompyle6/scanners/scanner34.py +++ b/uncompyle6/scanners/scanner34.py @@ -29,7 +29,6 @@ globals().update(dis.opmap) from uncompyle6.opcodes.opcode_34 import * - import uncompyle6.scanner as scan @@ -60,21 +59,22 @@ class Scanner34(scan.Scanner): bytecode = dis.Bytecode(co) # self.lines contains (block,addrLastInstr) - # if classname: - # classname = '_' + classname.lstrip('_') + '__' + if classname: + classname = '_' + classname.lstrip('_') + '__' - # def unmangle(name): - # if name.startswith(classname) and name[-2:] != '__': - # return name[len(classname) - 2:] - # return name + def unmangle(name): + if name.startswith(classname) and name[-2:] != '__': + return name[len(classname) - 2:] + return name - # free = [ unmangle(name) for name in (co.co_cellvars + co.co_freevars) ] - # names = [ unmangle(name) for name in co.co_names ] - # varnames = [ unmangle(name) for name in co.co_varnames ] - # else: - # free = co.co_cellvars + co.co_freevars - # names = co.co_names - # varnames = co.co_varnames + # free = [ unmangle(name) for name in (co.co_cellvars + co.co_freevars) ] + # names = [ unmangle(name) for name in co.co_names ] + # varnames = [ unmangle(name) for name in co.co_varnames ] + else: + # free = co.co_cellvars + co.co_freevars + # names = co.co_names + # varnames = co.co_varnames + pass # Scan for assertions. Later we will # turn 'LOAD_GLOBAL' to 'LOAD_ASSERT' for those @@ -439,6 +439,33 @@ class Scanner34(scan.Scanner): target += offset + 3 return target + def next_except_jump(self, start): + """ + Return the next jump that was generated by an except SomeException: + construct in a try...except...else clause or None if not found. + """ + + if self.code[start] == DUP_TOP: + except_match = self.first_instr(start, len(self.code), POP_JUMP_IF_FALSE) + if except_match: + jmp = self.prev_op[self.get_target(except_match)] + self.ignore_if.add(except_match) + self.not_continue.add(jmp) + return jmp + + count_END_FINALLY = 0 + count_SETUP_ = 0 + for i in self.op_range(start, len(self.code)): + op = self.code[i] + if op == END_FINALLY: + if count_END_FINALLY == count_SETUP_: + assert self.code[self.prev_op[i]] in (JUMP_ABSOLUTE, JUMP_FORWARD, RETURN_VALUE) + self.not_continue.add(self.prev_op[i]) + return self.prev_op[i] + count_END_FINALLY += 1 + elif op in (SETUP_EXCEPT, SETUP_WITH, SETUP_FINALLY): + count_SETUP_ += 1 + def detect_structure(self, offset): """ Detect structures and their boundaries to fix optimizied jumps @@ -459,8 +486,51 @@ class Scanner34(scan.Scanner): start = curent_start end = curent_end parent = struct + pass - if op in (POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE): + if op == SETUP_EXCEPT: + start = offset + 3 + target = self.get_target(offset) + end = self.restrict_to_parent(target, parent) + if target != end: + self.fixed_jumps[pos] = end + # print target, end, parent + # Add the try block + self.structs.append({'type': 'try', + 'start': start, + 'end': end-4}) + # Now isolate the except and else blocks + end_else = start_else = self.get_target(self.prev_op[end]) + + # Add the except blocks + i = end + while self.code[i] != END_FINALLY: + jmp = self.next_except_jump(i) + if self.code[jmp] == RETURN_VALUE: + self.structs.append({'type': 'except', + 'start': i, + 'end': jmp+1}) + i = jmp + 1 + else: + if self.get_target(jmp) != start_else: + end_else = self.get_target(jmp) + if self.code[jmp] == JUMP_FORWARD: + self.fixed_jumps[jmp] = -1 + self.structs.append({'type': 'except', + 'start': i, + 'end': jmp}) + i = jmp + 3 + + # Add the try-else block + if end_else != start_else: + r_end_else = self.restrict_to_parent(end_else, parent) + self.structs.append({'type': 'try-else', + 'start': i+1, + 'end': r_end_else}) + self.fixed_jumps[i] = r_end_else + else: + self.fixed_jumps[i] = i+1 + elif op in (POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE): start = offset + self.op_size(op) target = self.get_target(offset) rtarget = self.restrict_to_parent(target, parent)