Bug in for loop with try. Add more of 2.7's COME_FROM statements.

spark.py: add tracing reduce rules. main: reduce cutsines. Start history
2025-08-03 00:45:53 +08:00 · 2015-12-21 21:08:08 -05:00
parent 6b0bb124ea
commit 6a49cd2c69
12 changed files with 225 additions and 27 deletions
--- a/HISTORY.md
+++ b/HISTORY.md
@@ -0,0 +1,109 @@
+This project has history of over 17 years spanning back to Python 1.5
+
+There have been a number of people who have worked on this. I am awed
+by the amount of work, number of people who have contributed to this,
+and the cleverness in the code.
+
+The below is an annotated history from my reading of the sources cited.
+
+In 1998, John Aycock first wrote a grammar parser in Python,
+eventually called SPARK, that was usable inside a Python program. This
+code was described in the
+[7th International Python Conference](http://legacy.python.org/workshops/1998-11/proceedings/papers/aycock-little/aycock-little.html). That
+paper doesn't talk about decompilation, nor did John have that in mind
+at that time. It does mention that a full parser for Python (rather
+than the simple languages in the paper) was being considered.
+
+[This](http://pages.cpsc.ucalgary.ca/~aycock/spark/content.html#contributors)
+contains a of people acknowledged in developing SPARK. What's amazing
+about this code is that it is reasonably fast and has survived up to
+Python 3 with relatively little change. This work was done in
+conjunction with his Ph.D Thesis. This was finished around 2001. In
+working on his thesis, John realized SPARK could be used to deparse
+Python bytecode. In the fall of 1999, he started writing the Python
+program, "decompyle", to do this.
+
+This code introduced another clever idea: using table-driven
+semantics routines, using format specifiers.
+
+The last mention of a release of SPARK from John is around 2002.
+
+In the fall of 2000, Hartmut Goebel
+[took over maintaining the code](https://groups.google.com/forum/#!searchin/comp.lang.python/hartmut$20goebel/comp.lang.python/35s3mp4-nuY/UZALti6ujnQJ). The
+first subsequennt public release announcement that I can find is
+["decompyle - A byte-code-decompiler version 2.2 beta 1"](https://mail.python.org/pipermail/python-announce-list/2002-February/001272.html).
+
+From the CHANGES file found in
+[the tarball for that release](http://old-releases.ubuntu.com/ubuntu/pool/universe/d/decompyle2.2/decompyle2.2_2.2beta1.orig.tar.gz),
+it appears that Hartmut did most of the work to get this code to
+accept the full Python language. He added precidence to the table
+specifiers, support for multiple versions of Python, the
+pretty-printing of docstrings, lists and hashes. He also wrote
+extensive tests and routines to the testing and verification of
+decompiled bytecode.
+
+decompyle2.2 was packaged for Debian (sarge) by
+[Ben Burton around 2002](https://packages.qa.debian.org/d/decompyle.html). As
+it worked on Python 2.2 only long after Python 2.3 and 2.4 were in
+widespread use, it was removed.
+
+[Crazy Compilers](http://www.crazy-compilers.com/decompyle/) offers a
+byte-code decompiler service for versions of Python up to 2.6. As
+someone who worked in compilers, it is tough to make a living by
+working on compilers. (For example, based on
+[John Aycock's recent papers](http://pages.cpsc.ucalgary.ca/~aycock/)
+it doesn't look like he's done anything compiler-wise since SPARK). So
+I hope people will use the crazy-compilers service. I wish them the
+success that his good work deserves.
+
+Next we get to
+["uncompyle" and PyPI](https://pypi.python.org/pypi/uncompyle/1.1) and
+the era of git repositories. In contrast to decompyle, this now runs
+only on Python 2.7 although it accepts bytecode back to Python
+2.5. Thomas Grainger is the package owner of this, although Hartmut is
+listed as the author.
+
+The project exists not only on
+[github](https://github.com/gstarnberger/uncompyle) but also on
+[bitbucket](https://bitbucket.org/gstarnberger/uncompyle) where the
+git history goes back to 2009. Somewhere in there the name was changed
+from "decompyle" to "uncompyle".
+
+The name Thomas Grainger isn't found in (m)any of the commits in the
+several years of active development. Guenther Starnberger, Keknehv,
+hamled, and Eike Siewertsen are principle committers here.
+
+This project, uncompyle6, however owes its existence to uncompyle2 by
+Myst herie (Mysterie) whose first commit seems to goes back to 2012;
+it is also based on Hartmut's code. I chose this as it seems had been
+the most actively worked on most recently.
+
+Over the many years, code styles and Python features have
+changed. However brilliant the code was and still is, it hasn't really
+had a single public active maintainer. And there have been many forks
+of the code.
+
+That it has been in need of an overhaul has been recognized by the
+Hartmut a decade an a half ago:
+
+[decompyle/uncompile__init__.py](https://github.com/gstarnberger/uncompyle/blob/master/uncompyle/__init__.py#L25-L26)
+
+    NB. This is not a masterpiece of software, but became more like a hack.
+    Probably a complete rewrite would be sensefull. hG/2000-12-27
+
+One of the attempts to modernize it and make it available for Python3
+is [the one by Anton Vorobyov (DarkFenX)](https://github.com/DarkFenX/uncompyle3). I've
+followed some of the ideas there in this project.
+
+Lastly, I should mention [unpyc](https://code.google.com/p/unpyc3/)
+and most especially [pycdc](https://github.com/zrax/pycdc), largely by
+Michael Hansen and Darryl Pogue. If they supported getting source-code
+fragments and I could call it from Python, I'd probably ditch this and
+use that. From what I've seen, the code runs blindingly fast and spans
+all versions of Python.
+
+Tests for the project have been, or are being, culled from all of the
+projects mentioned.
+
+NB. If you find mistakes, want corrections, or want your name added (or removed),
+please contact me.
--- a/test/bytecode_2.7/forelse.pyc
+++ b/test/bytecode_2.7/forelse.pyc
--- a/test/bytecode_3.4/10_for.pyc
+++ b/test/bytecode_3.4/10_for.pyc
--- a/test/bytecode_3.4/20_try_except.pyc
+++ b/test/bytecode_3.4/20_try_except.pyc
--- a/test/bytecode_3.4/for.pyc
+++ b/test/bytecode_3.4/for.pyc
--- a/test/simple_source/README
+++ b/test/simple_source/README
@@ -1,8 +1,11 @@
 Files in this directory contain very simnple constructs that work
 across all versions of Python.

-Their simnplicity is to try to make it easier to debug grammar
-and AST walking routines.
+Their simplicity is to try to make it easier to debug scanner, grammar
+and semantic-action routines.
+
+We also try to make the code here runnable by Python and when run should
+not produce an error.

 The numbers in the filenames are to assist running the programs from
 the simplest to more complex. For example, many tests have assignment
--- a/test/simple_source/exception/20_try_except.py
+++ b/test/simple_source/exception/20_try_except.py
@@ -0,0 +1,5 @@
+for i in (1,2):
+    try:
+        x = 1
+    except ValueError:
+        y = 2
--- a/test/simple_source/looping/10_for.py
+++ b/test/simple_source/looping/10_for.py
@@ -1,5 +1,8 @@
 # Tests:
 #  forstmt ::= SETUP_LOOP expr _for designator
 #              for_block POP_BLOCK COME_FROM
-for a in b:
-    c = d
+for a in [1]:
+    c = 2
+
+for a in range(2):
+    c = 2
--- a/uncompyle6/main.py
+++ b/uncompyle6/main.py
@@ -117,7 +117,7 @@ def main(in_base, out_base, files, codes, outfile=None,
            outstream = _get_outstream(outfile)
        # print(outfile, file=sys.stderr)

-        # try to decomyple the input file
+        # Try to uncmpile the input file
        try:
            uncompyle_file(infile, outstream, showasm, showast)
            tot_files += 1
@@ -136,8 +136,8 @@ def main(in_base, out_base, files, codes, outfile=None,
                outstream.close()
                os.rename(outfile, outfile + '_failed')
            else:
-                sys.stderr.write("\n# Can't uncompyle %s\n" % infile)
-        else: # uncompyle successfull
+                sys.stderr.write("\n# Can't uncompile %s\n" % infile)
+        else: # uncompile successfull
            if outfile:
                outstream.close()
            if do_verify:
@@ -145,7 +145,7 @@ def main(in_base, out_base, files, codes, outfile=None,
                    msg = verify.compare_code_with_srcfile(infile, outfile)
                    if not outfile:
                        if not msg:
-                            print('\n# okay decompyling %s' % infile)
+                            print('\n# okay decompiling %s' % infile)
                            okay_files += 1
                        else:
                            print('\n# %s\n\t%s', infile, msg)
@@ -158,7 +158,7 @@ def main(in_base, out_base, files, codes, outfile=None,
            else:
                okay_files += 1
                if not outfile:
-                    mess = '\n# okay decompyling'
+                    mess = '\n# okay decompiling'
                    # mem_usage = __memUsage()
                    print(mess, infile)
        if outfile:
--- a/uncompyle6/opcodes/opcode_34.py
+++ b/uncompyle6/opcodes/opcode_34.py
@@ -43,7 +43,13 @@ def jabs_op(name, op):
    hasjabs.append(op)

 def updateGlobal():
-    # JUMP_OPs are used in verification
+    # JUMP_OPs are used in verification and in the scanner in resolving forward/backward
+    # jumps
+    globals().update({'PJIF': opmap['POP_JUMP_IF_FALSE']})
+    globals().update({'PJIT': opmap['POP_JUMP_IF_TRUE']})
+    globals().update({'JA': opmap['JUMP_ABSOLUTE']})
+    globals().update({'JF': opmap['JUMP_FORWARD']})
+    globals().update(dict([(k.replace('+','_'),v) for (k,v) in opmap.items()]))
    globals().update({'JUMP_OPs': map(lambda op: opname[op], hasjrel + hasjabs)})

 # Instruction opcodes for compiled code
--- a/uncompyle6/parsers/spark.py
+++ b/uncompyle6/parsers/spark.py
@@ -44,8 +44,8 @@ class _State:
        self.T, self.complete, self.items = [], [], items
        self.stateno = stateno

-# DEFAULT_DEBUG = {'rules': True, 'transition': False}
-DEFAULT_DEBUG = {'rules': False, 'transition': False}
+# DEFAULT_DEBUG = {'rules': True, 'transition': True, 'reduce' : True}
+DEFAULT_DEBUG = {'rules': False, 'transition': False, 'reduce': False}
 class GenericParser:
    '''
    An Earley parser, as per J. Earley, "An Efficient Context-Free
@@ -450,6 +450,8 @@ class GenericParser:

            for rule in self.states[state].complete:
                lhs, rhs = rule
+                if self.debug['reduce']:
+                    print("%s ::= %s" % (lhs, ' '.join(rhs)))
                for pitem in sets[parent]:
                    pstate, pparent = pitem
                    k = self.goto(pstate, lhs)
--- a/uncompyle6/scanners/scanner34.py
+++ b/uncompyle6/scanners/scanner34.py
@@ -29,7 +29,6 @@ globals().update(dis.opmap)

 from uncompyle6.opcodes.opcode_34 import *

-
 import uncompyle6.scanner as scan


@@ -60,21 +59,22 @@ class Scanner34(scan.Scanner):
        bytecode = dis.Bytecode(co)

        # self.lines contains (block,addrLastInstr)
-        # if classname:
-        #     classname = '_' + classname.lstrip('_') + '__'
+        if classname:
+            classname = '_' + classname.lstrip('_') + '__'

-        #     def unmangle(name):
-        #         if name.startswith(classname) and name[-2:] != '__':
-        #             return name[len(classname) - 2:]
-        #         return name
+            def unmangle(name):
+                if name.startswith(classname) and name[-2:] != '__':
+                    return name[len(classname) - 2:]
+                return name

            # free = [ unmangle(name) for name in (co.co_cellvars + co.co_freevars) ]
            # names = [ unmangle(name) for name in co.co_names ]
            # varnames = [ unmangle(name) for name in co.co_varnames ]
-        # else:
+        else:
            # free = co.co_cellvars + co.co_freevars
            # names = co.co_names
            # varnames = co.co_varnames
+            pass

        # Scan for assertions. Later we will
        # turn 'LOAD_GLOBAL' to 'LOAD_ASSERT' for those
@@ -439,6 +439,33 @@ class Scanner34(scan.Scanner):
            target += offset + 3
        return target

+    def next_except_jump(self, start):
+        """
+        Return the next jump that was generated by an except SomeException:
+        construct in a try...except...else clause or None if not found.
+        """
+
+        if self.code[start] == DUP_TOP:
+            except_match = self.first_instr(start, len(self.code), POP_JUMP_IF_FALSE)
+            if except_match:
+                jmp = self.prev_op[self.get_target(except_match)]
+                self.ignore_if.add(except_match)
+                self.not_continue.add(jmp)
+                return jmp
+
+        count_END_FINALLY = 0
+        count_SETUP_ = 0
+        for i in self.op_range(start, len(self.code)):
+            op = self.code[i]
+            if op == END_FINALLY:
+                if count_END_FINALLY == count_SETUP_:
+                    assert self.code[self.prev_op[i]] in (JUMP_ABSOLUTE, JUMP_FORWARD, RETURN_VALUE)
+                    self.not_continue.add(self.prev_op[i])
+                    return self.prev_op[i]
+                count_END_FINALLY += 1
+            elif op in (SETUP_EXCEPT, SETUP_WITH, SETUP_FINALLY):
+                count_SETUP_ += 1
+
    def detect_structure(self, offset):
        """
        Detect structures and their boundaries to fix optimizied jumps
@@ -459,8 +486,51 @@ class Scanner34(scan.Scanner):
                start = curent_start
                end = curent_end
                parent = struct
+                pass

-        if op in (POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE):
+        if op == SETUP_EXCEPT:
+            start  = offset + 3
+            target = self.get_target(offset)
+            end    = self.restrict_to_parent(target, parent)
+            if target != end:
+                self.fixed_jumps[pos] = end
+                # print target, end, parent
+            # Add the try block
+            self.structs.append({'type':  'try',
+                                   'start': start,
+                                   'end':   end-4})
+            # Now isolate the except and else blocks
+            end_else = start_else = self.get_target(self.prev_op[end])
+
+            # Add the except blocks
+            i = end
+            while self.code[i] != END_FINALLY:
+                jmp = self.next_except_jump(i)
+                if self.code[jmp] == RETURN_VALUE:
+                    self.structs.append({'type':  'except',
+                                           'start': i,
+                                           'end':   jmp+1})
+                    i = jmp + 1
+                else:
+                    if self.get_target(jmp) != start_else:
+                        end_else = self.get_target(jmp)
+                    if self.code[jmp] == JUMP_FORWARD:
+                        self.fixed_jumps[jmp] = -1
+                    self.structs.append({'type':  'except',
+                                         'start': i,
+                                         'end':   jmp})
+                    i = jmp + 3
+
+            # Add the try-else block
+            if end_else != start_else:
+                r_end_else = self.restrict_to_parent(end_else, parent)
+                self.structs.append({'type':  'try-else',
+                                       'start': i+1,
+                                       'end':   r_end_else})
+                self.fixed_jumps[i] = r_end_else
+            else:
+                self.fixed_jumps[i] = i+1
+        elif op in (POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE):
            start = offset + self.op_size(op)
            target = self.get_target(offset)
            rtarget = self.restrict_to_parent(target, parent)