README.rst: note addition of pydisassemble

Remove duplicate disassembly printing from scanners and put common code in caller(s). Show source-code line numbers in disassembly output and fix alignment of byte offsets. disas.py: workaround Python 2/3 different layouts before we get to bytecodes in a code object.
2025-08-03 00:45:53 +08:00 · 2015-12-15 01:42:07 -05:00
parent 5e5da104c5
commit 34ecd54e2c
11 changed files with 131 additions and 98 deletions
--- a/2
+++ b/2
@@ -2,7 +2,7 @@ Metadata-Version: 2.0
 Name: uncompyle6
 Version: 2.0.1
 Summary: Python byte-code to source-code converter
-Home-page: http://github.com/rocky/uncompyle6
+Home-page: http://github.com/rocky/python-uncompyle6
 Author: Rocky
 Author-email: rb@dustyfeet.com
 License: GPLv3
--- a/README.rst
+++ b/README.rst
@@ -1,8 +1,7 @@
 uncompyle6
 ==========
-A CPython 2.x and possibly 3.x byte-code disassembler and
+A Python 2.x and possibly 3.x byte-code decompiler.
 adecompiler.
 This is written in Python 2.7 but is Python3 compatible.
@@ -10,46 +9,34 @@ This is written in Python 2.7 but is Python3 compatible.
 Introduction
 ------------
-'uncompyle6' converts Python byte-code back into equivalent Python
+_uncompyle6_ converts Python byte-code back into equivalent Python
 source code. It accepts byte-codes from Python version 2.5 to 2.7.
-It runs on Python 2.7 and, with a little more work, on Python 3 as well.
+It runs on Python 2.7 and with a little more work Python 3.
 The generated source is fairly readable: docstrings, lists, tuples and
 hashes are somewhat pretty-printed.
-'uncompyle6' is based on John Aycock's generic small languages
+_uncompyle6_ is based on John Aycock's generic small languages
-compiler 'spark' (http://pages.cpsc.ucalgary.ca/~aycock/spark/) and his
+compiler 'spark' (http://www.csr.uvic.ca/~aycock/python/) and his
 prior work on a tool called 'decompyle'. This was improved by Hartmut Goebel
-http://www.crazy-compilers.com
+`http://www.crazy-compilers.com/`_
-In order to the decompile a program, we need to be able to disassemble
+# Additional note (3 July 2004):
 it first. And this process may be useful in of itself. So we provide a
 utility for just that piece as well.
-'pydisassemble' gives a CPython disassembly of Python byte-code. How
+This software is no longer available from the original website.
-is this different than what Python already provides via the "dis"
+However http://www.crazy-compilers.com/decompyle/ provides a
-module?  Here, we can cross disassemble bytecodes from different
+decompilation service.
 versions of CPython than the version of CPython that is doing the
 disassembly.
-'pydisassemble works on the same versions as 'uncompyle6' and handles the
+# Additional note (5 June 2012):
 same sets of CPython bytecode versions.
 *Note from 3 July 2004:*
 This software was original available from http://www.crazy-compilers.com;
 http://www.crazy-compilers.com/decompyle/ provides a decompilation service.
 *Note (5 June 2012):*
 The decompilation of python bytecode 2.5 & 2.6 is based on the work of
 Eloi Vanderbeken. bytecode is translated to a pseudo 2.7 python bytecode
 and then decompiled.
-*Note (12 Dec 2016):*
+# Additional note (12 Dec 2016):
-This project will be used to deparse fragments of code inside my
+I will be using this to deparse fragments of code inside my trepan_
-trepan_ debuggers_. For that, I need to record text fragements for all
+debuggers_. For that, I need to record text fragements for all
 byte-code offsets (of interest). This purpose although largely
 compatible with the original intention is yet a little bit different.
@@ -80,8 +67,6 @@ Installation
 This uses setup.py, so it follows the standard Python routine:
 ::
    python setup.py install # may need sudo
    # or if you have pyenv:
    python setup.py develop
@@ -103,18 +88,15 @@ Usage
 Run
 ::
     ./scripts/uncompyle6 -h
 for usage help
 Known Bugs/Restrictions
 -----------------------
-Support for Python 3 bytecode and syntax is lacking.
+Support Python 3 bytecode and syntax is lacking.
 .. _trepan: https://pypi.python.org/pypi/trepan
 .. _debuggers: https://pypi.python.org/pypi/trepan3k
--- a/bin/pydissassemble
+++ b/bin/pydissassemble
@@ -38,11 +38,11 @@ def disassemble_code(version, co, out=None):
    assert isinstance(co, types.CodeType)
    # store final output stream for case of error
-    __real_out = out or sys.stdout
+    real_out = out or sys.stdout
-    print('# Python %s' % version, file=__real_out)
+    print('# Python %s' % version, file=real_out)
    if co.co_filename:
        print('# Embedded file name: %s' % co.co_filename,
-              file=__real_out)
+              file=real_out)
    # Pick up appropriate scanner
    if version == 2.7:
@@ -63,6 +63,11 @@ def disassemble_code(version, co, out=None):
    scanner.setShowAsm(True, out)
    tokens, customize = scanner.disassemble(co)
    for t in tokens:
        print(t, file=real_out)
    print(file=out)
 def disassemble_file(filename, outstream=None, showasm=False, showast=False):
    """
--- a/uncompyle6/init.py
+++ b/uncompyle6/init.py
@@ -81,11 +81,11 @@ def load_module(filename):
        # print version
        fp.read(4) # timestamp
        magic_int = magics.magic2int(magic)
        if version == PYTHON_VERSION:
            magic_int = magics.magic2int(magic)
            # Note: a higher magic number necessarily mean a later
-            # release.  At Pyton 3.0 the magic number decreased
+            # release.  At Python 3.0 the magic number decreased
            # significantly. Hence the range below. Also note
            # inclusion of the size info, occurred within a
            # Python magor/minor release. Hence the test on the
@@ -95,7 +95,7 @@ def load_module(filename):
            bytecode = fp.read()
            co = marshal.loads(bytecode)
        else:
-            co = disas.load(fp)
+            co = disas.load(fp, magic_int)
        pass
    return version, co
@@ -108,11 +108,11 @@ def uncompyle(version, co, out=None, showasm=False, showast=False):
    assert isinstance(co, types.CodeType)
    # store final output stream for case of error
-    __real_out = out or sys.stdout
+    real_out = out or sys.stdout
-    print('# Python %s' % version, file=__real_out)
+    print('# Python %s' % version, file=real_out)
    if co.co_filename:
        print('# Embedded file name: %s' % co.co_filename,
-              file=__real_out)
+              file=real_out)
    # Pick up appropriate scanner
    if version == 2.7:
@@ -133,12 +133,17 @@ def uncompyle(version, co, out=None, showasm=False, showast=False):
    scanner.setShowAsm(showasm, out)
    tokens, customize = scanner.disassemble(co)
    if showasm:
        for t in tokens:
            print(t, file=real_out)
        print(file=out)
    #  Build AST from disassembly.
    walk = walker.Walker(out, scanner, showast=showast)
    try:
        ast = walk.build_ast(tokens, customize)
    except walker.ParserError as e :  # parser failed, dump disassembly
-        print(e, file=__real_out)
+        print(e, file=real_out)
        raise
    del tokens # save memory
--- a/uncompyle6/disas.py
+++ b/uncompyle6/disas.py
@@ -36,7 +36,7 @@ def marshalLoad(fp):
    internStrings = []
    return load(fp)
-def load(fp):
+def load(fp, magic_int):
    """
    marshal.load() written in Python. When the Python bytecode magic loaded is the
    same magic for the running Python interpreter, we can simply use the
@@ -51,27 +51,34 @@ def load(fp):
    if marshalType == 'c':
        Code = types.CodeType
        # FIXME If 'i' is deprecated, what would we use?
        co_argcount = unpack('i', fp.read(4))[0]
        co_nlocals = unpack('i', fp.read(4))[0]
        co_stacksize = unpack('i', fp.read(4))[0]
        co_flags = unpack('i', fp.read(4))[0]
-        co_code = load(fp)
+        # FIXME: somewhere between Python 2.7 and python 3.2 there's
-        co_consts = load(fp)
+        # another 4 bytes before we get to the bytecode. What's going on?
-        co_names = load(fp)
+        # Again, because magic ints decreased between python 2.7 and 3.0 we need
-        co_varnames = load(fp)
+        # a range here.
-        co_freevars = load(fp)
+        if 3000 < magic_int < 20121:
-        co_cellvars = load(fp)
+            fp.read(4)
-        co_filename = load(fp)
+        co_code = load(fp, magic_int)
-        co_name = load(fp)
+        co_consts = load(fp, magic_int)
        co_names = load(fp, magic_int)
        co_varnames = load(fp, magic_int)
        co_freevars = load(fp, magic_int)
        co_cellvars = load(fp, magic_int)
        co_filename = load(fp, magic_int)
        co_name = load(fp, magic_int)
        co_firstlineno = unpack('i', fp.read(4))[0]
-        co_lnotab = load(fp)
+        co_lnotab = load(fp, magic_int)
        # The Python3 code object is different than Python2's which
        # we are reading if we get here.
        # Also various parameters which were strings are now
            # bytes (which is probably more logical).
        if PYTHON3:
            if PYTHON_MAGIC_INT > 3020:
-                # In later Python3 versions, there is a
+                # In later Python3 magic_ints, there is a
                # kwonlyargcount parameter which we set to 0.
                return Code(co_argcount, 0, co_nlocals, co_stacksize, co_flags,
                            bytes(co_code, encoding='utf-8'),
@@ -152,7 +159,7 @@ def load(fp):
        tuplesize = unpack('i', fp.read(4))[0]
        ret = tuple()
        while tuplesize > 0:
-            ret += load(fp),
+            ret += load(fp, magic_int),
            tuplesize -= 1
        return ret
    elif marshalType == '[':
--- a/uncompyle6/scanner.py
+++ b/uncompyle6/scanner.py
@@ -21,7 +21,7 @@ if (sys.version_info > (3, 0)):
 else:
    L65536 = long(65536)
-from uncompyle6.opcodes import opcode_25, opcode_26, opcode_27, opcode_34
+from uncompyle6.opcodes import opcode_25, opcode_26, opcode_27, opcode_32, opcode_34
 class Token:
@@ -31,7 +31,7 @@ class Token:
    A byte-code token is equivalent to the contents of one line
    as output by dis.dis().
    '''
-    def __init__(self, type_, attr=None, pattr=None, offset=-1, linestart=False):
+    def __init__(self, type_, attr=None, pattr=None, offset=-1, linestart=None):
        self.type = intern(type_)
        self.attr = attr
        self.pattr = pattr
@@ -51,9 +51,9 @@ class Token:
    def __str__(self):
        pattr = self.pattr
        if self.linestart:
-            return '\n%s\t%-17s %r' % (self.offset, self.type, pattr)
+            return '\n%4d  %6s\t%-17s %r' % (self.linestart, self.offset, self.type, pattr)
        else:
-            return '%s\t%-17s %r' % (self.offset, self.type, pattr)
+            return '      %6s\t%-17s %r' % (self.offset, self.type, pattr)
    def __hash__(self):
        return hash(self.type)
--- a/uncompyle6/scanners/scanner25.py
+++ b/uncompyle6/scanners/scanner25.py
@@ -35,8 +35,14 @@ class Scanner25(scan.Scanner):
            if self.code[i] in (RETURN_VALUE, END_FINALLY):
                n = i + 1
        self.code = array('B', co.co_code[:n])
-        # linestarts contains bloc code adresse (addr,block)
+
        # linestarts is a tuple of (offset, line number.
        # Turn that in a has that we can index
        self.linestarts = list(dis.findlinestarts(co))
        linestartoffsets = {}
        for offset, lineno in self.linestarts:
            linestartoffsets[offset] = lineno
        self.prev = [0]
        # class and names
@@ -72,7 +78,13 @@ class Scanner25(scan.Scanner):
        linestarts = self.linestarts
        self.lines = []
        linetuple = namedtuple('linetuple', ['l_no', 'next'])
-        linestartoffsets = {a for (a, _) in linestarts}
+
        # linestarts is a tuple of (offset, line number).
        # Turn that in a has that we can index
        linestartoffsets = {}
        for offset, lineno in linestarts:
            linestartoffsets[offset] = lineno
        (prev_start_byte, prev_line_no) = linestarts[0]
        for (start_byte, line_no) in linestarts[1:]:
            while j < start_byte:
@@ -202,16 +214,16 @@ class Scanner25(scan.Scanner):
                if offset in self.return_end_ifs:
                    op_name = 'RETURN_END_IF'
-            if offset not in replace:
+            if offset in linestartoffsets:
-                rv.append(Token(op_name, oparg, pattr, offset, linestart = offset in linestartoffsets))
+                linestart = linestartoffsets[offset]
            else:
-                rv.append(Token(replace[offset], oparg, pattr, offset, linestart = offset in linestartoffsets))
+                linestart = None
            if offset not in replace:
                rv.append(Token(op_name, oparg, pattr, offset, linestart))
            else:
                rv.append(Token(replace[offset], oparg, pattr, offset, linestart))
        if self.showasm:
            out = self.out # shortcut
            for t in rv:
                print >>out, t
            print >>out
        return rv, customize
    def getOpcodeToDel(self, i):
--- a/uncompyle6/scanners/scanner26.py
+++ b/uncompyle6/scanners/scanner26.py
@@ -13,7 +13,7 @@ from operator import itemgetter
 from uncompyle6.opcodes.opcode_26 import *
 import dis
-import scanner as scan
+import uncompyle6.scanner as scan
 class Scanner26(scan.Scanner):
    def __init__(self):
@@ -71,7 +71,13 @@ class Scanner26(scan.Scanner):
        linestarts = self.linestarts
        self.lines = []
        linetuple = namedtuple('linetuple', ['l_no', 'next'])
-        linestartoffsets = {a for (a, _) in linestarts}
+
        # linestarts is a tuple of (offset, line number).
        # Turn that in a has that we can index
        linestartoffsets = {}
        for offset, lineno in linestarts:
            linestartoffsets[offset] = lineno
        (prev_start_byte, prev_line_no) = linestarts[0]
        for (start_byte, line_no) in linestarts[1:]:
            while j < start_byte:
@@ -202,16 +208,16 @@ class Scanner26(scan.Scanner):
                if offset in self.return_end_ifs:
                    op_name = 'RETURN_END_IF'
-            if offset not in replace:
+            if offset in linestartoffsets:
-                rv.append(Token(op_name, oparg, pattr, offset, linestart = offset in linestartoffsets))
+                linestart = linestartoffsets[offset]
            else:
-                rv.append(Token(replace[offset], oparg, pattr, offset, linestart = offset in linestartoffsets))
+                linestart = None
            if offset not in replace:
                rv.append(Token(op_name, oparg, pattr, offset, linestart))
            else:
                rv.append(Token(replace[offset], oparg, pattr, offset, linestart))
        if self.showasm:
            out = self.out # shortcut
            for t in rv:
                print >>out, t
            print >>out
        return rv, customize
    def getOpcodeToDel(self, i):
--- a/uncompyle6/scanners/scanner27.py
+++ b/uncompyle6/scanners/scanner27.py
@@ -46,10 +46,16 @@ class Scanner27(scan.Scanner):
        self.lines = []
        linetuple = namedtuple('linetuple', ['l_no', 'next'])
        j = 0
-        # linestarts contains bloc code adresse (addr,block)
+
        # linestarts is a tuple of (offset, line number).
        # Turn that in a has that we can index
        linestarts = list(dis.findlinestarts(co))
-        linestartoffsets = {a for (a, _) in linestarts}
+        linestartoffsets = {}
        for offset, lineno in linestarts:
            linestartoffsets[offset] = lineno
        (prev_start_byte, prev_line_no) = linestarts[0]
        for (start_byte, line_no) in linestarts[1:]:
            while j < start_byte:
@@ -190,16 +196,16 @@ class Scanner27(scan.Scanner):
                if offset in self.return_end_ifs:
                    op_name = 'RETURN_END_IF'
-            if offset not in replace:
+            if offset in linestartoffsets:
-                rv.append(Token(op_name, oparg, pattr, offset, linestart = offset in linestartoffsets))
+                linestart = linestartoffsets[offset]
            else:
-                rv.append(Token(replace[offset], oparg, pattr, offset, linestart = offset in linestartoffsets))
+                linestart = None
            if offset not in replace:
                rv.append(Token(op_name, oparg, pattr, offset, linestart))
            else:
                rv.append(Token(replace[offset], oparg, pattr, offset, linestart))
        if self.showasm:
            out = self.out # shortcut
            for t in rv:
                print(t, file=out)
            print(file=out)
        return rv, customize
    def op_size(self, op):
--- a/uncompyle6/scanners/scanner32.py
+++ b/uncompyle6/scanners/scanner32.py
@@ -11,7 +11,7 @@ from __future__ import print_function
 import dis, marshal
 from collections import namedtuple
-from uncompyle6.scanner import Token
+from uncompyle6.scanner import Token, L65536
 # Get all the opcodes into globals
@@ -20,7 +20,7 @@ from uncompyle6.opcodes.opcode_27 import *
 import uncompyle6.scanner as scan
-class Scanner34(scan.Scanner):
+class Scanner32(scan.Scanner):
    def __init__(self):
        self.Token = scan.Scanner.__init__(self, 3.2) # check
@@ -62,14 +62,19 @@ class Scanner34(scan.Scanner):
            # w/o touching arguments
            current_token = Token(dis.opname[op])
            current_token.offset = offset
-            current_token.linestart = True if offset in self.linestarts else False
+
            if offset in self.linestarts:
                current_token.linestart = self.linestarts[offset]
            else:
                current_token.linestart = None
            if op >= dis.HAVE_ARGUMENT:
                # Calculate op's argument value based on its argument and
                # preceding extended argument, if any
                oparg = code[offset+1] + code[offset+2]*256 + extended_arg
                extended_arg = 0
                if op == dis.EXTENDED_ARG:
-                    extended_arg = oparg*65536
+                    extended_arg = oparg * L65536
                # Fill token's attr/pattr fields
                current_token.attr = oparg
@@ -88,6 +93,7 @@ class Scanner34(scan.Scanner):
                        free = co.co_cellvars + co.co_freevars
                    current_token.pattr = free[oparg]
            tokens.append(current_token)
        return tokens, customize
    def build_lines_data(self, code_obj):
--- a/uncompyle6/scanners/scanner34.py
+++ b/uncompyle6/scanners/scanner34.py
@@ -11,8 +11,7 @@ from __future__ import print_function
 import dis, marshal
 from collections import namedtuple
-from uncompyle6.scanner import Token
+from uncompyle6.scanner import Token, L65536
 # Get all the opcodes into globals
 globals().update(dis.opmap)
@@ -62,14 +61,19 @@ class Scanner34(scan.Scanner):
            # w/o touching arguments
            current_token = Token(dis.opname[op])
            current_token.offset = offset
-            current_token.linestart = True if offset in self.linestarts else False
+
            if offset in self.linestarts:
                current_token.linestart = self.linestarts[offset]
            else:
                current_token.linestart = None
            if op >= dis.HAVE_ARGUMENT:
                # Calculate op's argument value based on its argument and
                # preceding extended argument, if any
                oparg = code[offset+1] + code[offset+2]*256 + extended_arg
                extended_arg = 0
                if op == dis.EXTENDED_ARG:
-                    extended_arg = oparg*65536
+                    extended_arg = oparg * L65536
                # Fill token's attr/pattr fields
                current_token.attr = oparg