README.rst: note addition of pydisassemble

Remove duplicate disassembly printing from scanners and
put common code in caller(s). Show source-code line numbers in disassembly output
and fix alignment of byte offsets.
disas.py: workaround Python 2/3 different layouts before we get to
bytecodes in a code object.
This commit is contained in:
rocky
2015-12-15 01:42:07 -05:00
parent 5e5da104c5
commit 34ecd54e2c
11 changed files with 131 additions and 98 deletions

View File

@@ -2,7 +2,7 @@ Metadata-Version: 2.0
Name: uncompyle6 Name: uncompyle6
Version: 2.0.1 Version: 2.0.1
Summary: Python byte-code to source-code converter Summary: Python byte-code to source-code converter
Home-page: http://github.com/rocky/uncompyle6 Home-page: http://github.com/rocky/python-uncompyle6
Author: Rocky Author: Rocky
Author-email: rb@dustyfeet.com Author-email: rb@dustyfeet.com
License: GPLv3 License: GPLv3

View File

@@ -1,8 +1,7 @@
uncompyle6 uncompyle6
========== ==========
A CPython 2.x and possibly 3.x byte-code disassembler and A Python 2.x and possibly 3.x byte-code decompiler.
adecompiler.
This is written in Python 2.7 but is Python3 compatible. This is written in Python 2.7 but is Python3 compatible.
@@ -10,46 +9,34 @@ This is written in Python 2.7 but is Python3 compatible.
Introduction Introduction
------------ ------------
'uncompyle6' converts Python byte-code back into equivalent Python _uncompyle6_ converts Python byte-code back into equivalent Python
source code. It accepts byte-codes from Python version 2.5 to 2.7. source code. It accepts byte-codes from Python version 2.5 to 2.7.
It runs on Python 2.7 and, with a little more work, on Python 3 as well. It runs on Python 2.7 and with a little more work Python 3.
The generated source is fairly readable: docstrings, lists, tuples and The generated source is fairly readable: docstrings, lists, tuples and
hashes are somewhat pretty-printed. hashes are somewhat pretty-printed.
'uncompyle6' is based on John Aycock's generic small languages _uncompyle6_ is based on John Aycock's generic small languages
compiler 'spark' (http://pages.cpsc.ucalgary.ca/~aycock/spark/) and his compiler 'spark' (http://www.csr.uvic.ca/~aycock/python/) and his
prior work on a tool called 'decompyle'. This was improved by Hartmut Goebel prior work on a tool called 'decompyle'. This was improved by Hartmut Goebel
http://www.crazy-compilers.com `http://www.crazy-compilers.com/`_
In order to the decompile a program, we need to be able to disassemble # Additional note (3 July 2004):
it first. And this process may be useful in of itself. So we provide a
utility for just that piece as well.
'pydisassemble' gives a CPython disassembly of Python byte-code. How This software is no longer available from the original website.
is this different than what Python already provides via the "dis" However http://www.crazy-compilers.com/decompyle/ provides a
module? Here, we can cross disassemble bytecodes from different decompilation service.
versions of CPython than the version of CPython that is doing the
disassembly.
'pydisassemble works on the same versions as 'uncompyle6' and handles the # Additional note (5 June 2012):
same sets of CPython bytecode versions.
*Note from 3 July 2004:*
This software was original available from http://www.crazy-compilers.com;
http://www.crazy-compilers.com/decompyle/ provides a decompilation service.
*Note (5 June 2012):*
The decompilation of python bytecode 2.5 & 2.6 is based on the work of The decompilation of python bytecode 2.5 & 2.6 is based on the work of
Eloi Vanderbeken. bytecode is translated to a pseudo 2.7 python bytecode Eloi Vanderbeken. bytecode is translated to a pseudo 2.7 python bytecode
and then decompiled. and then decompiled.
*Note (12 Dec 2016):* # Additional note (12 Dec 2016):
This project will be used to deparse fragments of code inside my I will be using this to deparse fragments of code inside my trepan_
trepan_ debuggers_. For that, I need to record text fragements for all debuggers_. For that, I need to record text fragements for all
byte-code offsets (of interest). This purpose although largely byte-code offsets (of interest). This purpose although largely
compatible with the original intention is yet a little bit different. compatible with the original intention is yet a little bit different.
@@ -80,8 +67,6 @@ Installation
This uses setup.py, so it follows the standard Python routine: This uses setup.py, so it follows the standard Python routine:
::
python setup.py install # may need sudo python setup.py install # may need sudo
# or if you have pyenv: # or if you have pyenv:
python setup.py develop python setup.py develop
@@ -103,18 +88,15 @@ Usage
Run Run
::
./scripts/uncompyle6 -h ./scripts/uncompyle6 -h
for usage help for usage help
Known Bugs/Restrictions Known Bugs/Restrictions
----------------------- -----------------------
Support for Python 3 bytecode and syntax is lacking. Support Python 3 bytecode and syntax is lacking.
.. _trepan: https://pypi.python.org/pypi/trepan .. _trepan: https://pypi.python.org/pypi/trepan
.. _debuggers: https://pypi.python.org/pypi/trepan3k .. _debuggers: https://pypi.python.org/pypi/trepan3k

View File

@@ -38,11 +38,11 @@ def disassemble_code(version, co, out=None):
assert isinstance(co, types.CodeType) assert isinstance(co, types.CodeType)
# store final output stream for case of error # store final output stream for case of error
__real_out = out or sys.stdout real_out = out or sys.stdout
print('# Python %s' % version, file=__real_out) print('# Python %s' % version, file=real_out)
if co.co_filename: if co.co_filename:
print('# Embedded file name: %s' % co.co_filename, print('# Embedded file name: %s' % co.co_filename,
file=__real_out) file=real_out)
# Pick up appropriate scanner # Pick up appropriate scanner
if version == 2.7: if version == 2.7:
@@ -63,6 +63,11 @@ def disassemble_code(version, co, out=None):
scanner.setShowAsm(True, out) scanner.setShowAsm(True, out)
tokens, customize = scanner.disassemble(co) tokens, customize = scanner.disassemble(co)
for t in tokens:
print(t, file=real_out)
print(file=out)
def disassemble_file(filename, outstream=None, showasm=False, showast=False): def disassemble_file(filename, outstream=None, showasm=False, showast=False):
""" """

View File

@@ -81,11 +81,11 @@ def load_module(filename):
# print version # print version
fp.read(4) # timestamp fp.read(4) # timestamp
magic_int = magics.magic2int(magic)
if version == PYTHON_VERSION: if version == PYTHON_VERSION:
magic_int = magics.magic2int(magic)
# Note: a higher magic number necessarily mean a later # Note: a higher magic number necessarily mean a later
# release. At Pyton 3.0 the magic number decreased # release. At Python 3.0 the magic number decreased
# significantly. Hence the range below. Also note # significantly. Hence the range below. Also note
# inclusion of the size info, occurred within a # inclusion of the size info, occurred within a
# Python magor/minor release. Hence the test on the # Python magor/minor release. Hence the test on the
@@ -95,7 +95,7 @@ def load_module(filename):
bytecode = fp.read() bytecode = fp.read()
co = marshal.loads(bytecode) co = marshal.loads(bytecode)
else: else:
co = disas.load(fp) co = disas.load(fp, magic_int)
pass pass
return version, co return version, co
@@ -108,11 +108,11 @@ def uncompyle(version, co, out=None, showasm=False, showast=False):
assert isinstance(co, types.CodeType) assert isinstance(co, types.CodeType)
# store final output stream for case of error # store final output stream for case of error
__real_out = out or sys.stdout real_out = out or sys.stdout
print('# Python %s' % version, file=__real_out) print('# Python %s' % version, file=real_out)
if co.co_filename: if co.co_filename:
print('# Embedded file name: %s' % co.co_filename, print('# Embedded file name: %s' % co.co_filename,
file=__real_out) file=real_out)
# Pick up appropriate scanner # Pick up appropriate scanner
if version == 2.7: if version == 2.7:
@@ -133,12 +133,17 @@ def uncompyle(version, co, out=None, showasm=False, showast=False):
scanner.setShowAsm(showasm, out) scanner.setShowAsm(showasm, out)
tokens, customize = scanner.disassemble(co) tokens, customize = scanner.disassemble(co)
if showasm:
for t in tokens:
print(t, file=real_out)
print(file=out)
# Build AST from disassembly. # Build AST from disassembly.
walk = walker.Walker(out, scanner, showast=showast) walk = walker.Walker(out, scanner, showast=showast)
try: try:
ast = walk.build_ast(tokens, customize) ast = walk.build_ast(tokens, customize)
except walker.ParserError as e : # parser failed, dump disassembly except walker.ParserError as e : # parser failed, dump disassembly
print(e, file=__real_out) print(e, file=real_out)
raise raise
del tokens # save memory del tokens # save memory

View File

@@ -36,7 +36,7 @@ def marshalLoad(fp):
internStrings = [] internStrings = []
return load(fp) return load(fp)
def load(fp): def load(fp, magic_int):
""" """
marshal.load() written in Python. When the Python bytecode magic loaded is the marshal.load() written in Python. When the Python bytecode magic loaded is the
same magic for the running Python interpreter, we can simply use the same magic for the running Python interpreter, we can simply use the
@@ -51,27 +51,34 @@ def load(fp):
if marshalType == 'c': if marshalType == 'c':
Code = types.CodeType Code = types.CodeType
# FIXME If 'i' is deprecated, what would we use?
co_argcount = unpack('i', fp.read(4))[0] co_argcount = unpack('i', fp.read(4))[0]
co_nlocals = unpack('i', fp.read(4))[0] co_nlocals = unpack('i', fp.read(4))[0]
co_stacksize = unpack('i', fp.read(4))[0] co_stacksize = unpack('i', fp.read(4))[0]
co_flags = unpack('i', fp.read(4))[0] co_flags = unpack('i', fp.read(4))[0]
co_code = load(fp) # FIXME: somewhere between Python 2.7 and python 3.2 there's
co_consts = load(fp) # another 4 bytes before we get to the bytecode. What's going on?
co_names = load(fp) # Again, because magic ints decreased between python 2.7 and 3.0 we need
co_varnames = load(fp) # a range here.
co_freevars = load(fp) if 3000 < magic_int < 20121:
co_cellvars = load(fp) fp.read(4)
co_filename = load(fp) co_code = load(fp, magic_int)
co_name = load(fp) co_consts = load(fp, magic_int)
co_names = load(fp, magic_int)
co_varnames = load(fp, magic_int)
co_freevars = load(fp, magic_int)
co_cellvars = load(fp, magic_int)
co_filename = load(fp, magic_int)
co_name = load(fp, magic_int)
co_firstlineno = unpack('i', fp.read(4))[0] co_firstlineno = unpack('i', fp.read(4))[0]
co_lnotab = load(fp) co_lnotab = load(fp, magic_int)
# The Python3 code object is different than Python2's which # The Python3 code object is different than Python2's which
# we are reading if we get here. # we are reading if we get here.
# Also various parameters which were strings are now # Also various parameters which were strings are now
# bytes (which is probably more logical). # bytes (which is probably more logical).
if PYTHON3: if PYTHON3:
if PYTHON_MAGIC_INT > 3020: if PYTHON_MAGIC_INT > 3020:
# In later Python3 versions, there is a # In later Python3 magic_ints, there is a
# kwonlyargcount parameter which we set to 0. # kwonlyargcount parameter which we set to 0.
return Code(co_argcount, 0, co_nlocals, co_stacksize, co_flags, return Code(co_argcount, 0, co_nlocals, co_stacksize, co_flags,
bytes(co_code, encoding='utf-8'), bytes(co_code, encoding='utf-8'),
@@ -152,7 +159,7 @@ def load(fp):
tuplesize = unpack('i', fp.read(4))[0] tuplesize = unpack('i', fp.read(4))[0]
ret = tuple() ret = tuple()
while tuplesize > 0: while tuplesize > 0:
ret += load(fp), ret += load(fp, magic_int),
tuplesize -= 1 tuplesize -= 1
return ret return ret
elif marshalType == '[': elif marshalType == '[':

View File

@@ -21,7 +21,7 @@ if (sys.version_info > (3, 0)):
else: else:
L65536 = long(65536) L65536 = long(65536)
from uncompyle6.opcodes import opcode_25, opcode_26, opcode_27, opcode_34 from uncompyle6.opcodes import opcode_25, opcode_26, opcode_27, opcode_32, opcode_34
class Token: class Token:
@@ -31,7 +31,7 @@ class Token:
A byte-code token is equivalent to the contents of one line A byte-code token is equivalent to the contents of one line
as output by dis.dis(). as output by dis.dis().
''' '''
def __init__(self, type_, attr=None, pattr=None, offset=-1, linestart=False): def __init__(self, type_, attr=None, pattr=None, offset=-1, linestart=None):
self.type = intern(type_) self.type = intern(type_)
self.attr = attr self.attr = attr
self.pattr = pattr self.pattr = pattr
@@ -51,9 +51,9 @@ class Token:
def __str__(self): def __str__(self):
pattr = self.pattr pattr = self.pattr
if self.linestart: if self.linestart:
return '\n%s\t%-17s %r' % (self.offset, self.type, pattr) return '\n%4d %6s\t%-17s %r' % (self.linestart, self.offset, self.type, pattr)
else: else:
return '%s\t%-17s %r' % (self.offset, self.type, pattr) return ' %6s\t%-17s %r' % (self.offset, self.type, pattr)
def __hash__(self): def __hash__(self):
return hash(self.type) return hash(self.type)

View File

@@ -35,8 +35,14 @@ class Scanner25(scan.Scanner):
if self.code[i] in (RETURN_VALUE, END_FINALLY): if self.code[i] in (RETURN_VALUE, END_FINALLY):
n = i + 1 n = i + 1
self.code = array('B', co.co_code[:n]) self.code = array('B', co.co_code[:n])
# linestarts contains bloc code adresse (addr,block)
# linestarts is a tuple of (offset, line number.
# Turn that in a has that we can index
self.linestarts = list(dis.findlinestarts(co)) self.linestarts = list(dis.findlinestarts(co))
linestartoffsets = {}
for offset, lineno in self.linestarts:
linestartoffsets[offset] = lineno
self.prev = [0] self.prev = [0]
# class and names # class and names
@@ -72,7 +78,13 @@ class Scanner25(scan.Scanner):
linestarts = self.linestarts linestarts = self.linestarts
self.lines = [] self.lines = []
linetuple = namedtuple('linetuple', ['l_no', 'next']) linetuple = namedtuple('linetuple', ['l_no', 'next'])
linestartoffsets = {a for (a, _) in linestarts}
# linestarts is a tuple of (offset, line number).
# Turn that in a has that we can index
linestartoffsets = {}
for offset, lineno in linestarts:
linestartoffsets[offset] = lineno
(prev_start_byte, prev_line_no) = linestarts[0] (prev_start_byte, prev_line_no) = linestarts[0]
for (start_byte, line_no) in linestarts[1:]: for (start_byte, line_no) in linestarts[1:]:
while j < start_byte: while j < start_byte:
@@ -202,16 +214,16 @@ class Scanner25(scan.Scanner):
if offset in self.return_end_ifs: if offset in self.return_end_ifs:
op_name = 'RETURN_END_IF' op_name = 'RETURN_END_IF'
if offset not in replace: if offset in linestartoffsets:
rv.append(Token(op_name, oparg, pattr, offset, linestart = offset in linestartoffsets)) linestart = linestartoffsets[offset]
else: else:
rv.append(Token(replace[offset], oparg, pattr, offset, linestart = offset in linestartoffsets)) linestart = None
if offset not in replace:
rv.append(Token(op_name, oparg, pattr, offset, linestart))
else:
rv.append(Token(replace[offset], oparg, pattr, offset, linestart))
if self.showasm:
out = self.out # shortcut
for t in rv:
print >>out, t
print >>out
return rv, customize return rv, customize
def getOpcodeToDel(self, i): def getOpcodeToDel(self, i):

View File

@@ -13,7 +13,7 @@ from operator import itemgetter
from uncompyle6.opcodes.opcode_26 import * from uncompyle6.opcodes.opcode_26 import *
import dis import dis
import scanner as scan import uncompyle6.scanner as scan
class Scanner26(scan.Scanner): class Scanner26(scan.Scanner):
def __init__(self): def __init__(self):
@@ -71,7 +71,13 @@ class Scanner26(scan.Scanner):
linestarts = self.linestarts linestarts = self.linestarts
self.lines = [] self.lines = []
linetuple = namedtuple('linetuple', ['l_no', 'next']) linetuple = namedtuple('linetuple', ['l_no', 'next'])
linestartoffsets = {a for (a, _) in linestarts}
# linestarts is a tuple of (offset, line number).
# Turn that in a has that we can index
linestartoffsets = {}
for offset, lineno in linestarts:
linestartoffsets[offset] = lineno
(prev_start_byte, prev_line_no) = linestarts[0] (prev_start_byte, prev_line_no) = linestarts[0]
for (start_byte, line_no) in linestarts[1:]: for (start_byte, line_no) in linestarts[1:]:
while j < start_byte: while j < start_byte:
@@ -202,16 +208,16 @@ class Scanner26(scan.Scanner):
if offset in self.return_end_ifs: if offset in self.return_end_ifs:
op_name = 'RETURN_END_IF' op_name = 'RETURN_END_IF'
if offset not in replace: if offset in linestartoffsets:
rv.append(Token(op_name, oparg, pattr, offset, linestart = offset in linestartoffsets)) linestart = linestartoffsets[offset]
else: else:
rv.append(Token(replace[offset], oparg, pattr, offset, linestart = offset in linestartoffsets)) linestart = None
if offset not in replace:
rv.append(Token(op_name, oparg, pattr, offset, linestart))
else:
rv.append(Token(replace[offset], oparg, pattr, offset, linestart))
if self.showasm:
out = self.out # shortcut
for t in rv:
print >>out, t
print >>out
return rv, customize return rv, customize
def getOpcodeToDel(self, i): def getOpcodeToDel(self, i):

View File

@@ -46,10 +46,16 @@ class Scanner27(scan.Scanner):
self.lines = [] self.lines = []
linetuple = namedtuple('linetuple', ['l_no', 'next']) linetuple = namedtuple('linetuple', ['l_no', 'next'])
j = 0 j = 0
# linestarts contains bloc code adresse (addr,block)
# linestarts is a tuple of (offset, line number).
# Turn that in a has that we can index
linestarts = list(dis.findlinestarts(co)) linestarts = list(dis.findlinestarts(co))
linestartoffsets = {a for (a, _) in linestarts} linestartoffsets = {}
for offset, lineno in linestarts:
linestartoffsets[offset] = lineno
(prev_start_byte, prev_line_no) = linestarts[0] (prev_start_byte, prev_line_no) = linestarts[0]
for (start_byte, line_no) in linestarts[1:]: for (start_byte, line_no) in linestarts[1:]:
while j < start_byte: while j < start_byte:
@@ -190,16 +196,16 @@ class Scanner27(scan.Scanner):
if offset in self.return_end_ifs: if offset in self.return_end_ifs:
op_name = 'RETURN_END_IF' op_name = 'RETURN_END_IF'
if offset not in replace: if offset in linestartoffsets:
rv.append(Token(op_name, oparg, pattr, offset, linestart = offset in linestartoffsets)) linestart = linestartoffsets[offset]
else: else:
rv.append(Token(replace[offset], oparg, pattr, offset, linestart = offset in linestartoffsets)) linestart = None
if offset not in replace:
rv.append(Token(op_name, oparg, pattr, offset, linestart))
else:
rv.append(Token(replace[offset], oparg, pattr, offset, linestart))
if self.showasm:
out = self.out # shortcut
for t in rv:
print(t, file=out)
print(file=out)
return rv, customize return rv, customize
def op_size(self, op): def op_size(self, op):

View File

@@ -11,7 +11,7 @@ from __future__ import print_function
import dis, marshal import dis, marshal
from collections import namedtuple from collections import namedtuple
from uncompyle6.scanner import Token from uncompyle6.scanner import Token, L65536
# Get all the opcodes into globals # Get all the opcodes into globals
@@ -20,7 +20,7 @@ from uncompyle6.opcodes.opcode_27 import *
import uncompyle6.scanner as scan import uncompyle6.scanner as scan
class Scanner34(scan.Scanner): class Scanner32(scan.Scanner):
def __init__(self): def __init__(self):
self.Token = scan.Scanner.__init__(self, 3.2) # check self.Token = scan.Scanner.__init__(self, 3.2) # check
@@ -62,14 +62,19 @@ class Scanner34(scan.Scanner):
# w/o touching arguments # w/o touching arguments
current_token = Token(dis.opname[op]) current_token = Token(dis.opname[op])
current_token.offset = offset current_token.offset = offset
current_token.linestart = True if offset in self.linestarts else False
if offset in self.linestarts:
current_token.linestart = self.linestarts[offset]
else:
current_token.linestart = None
if op >= dis.HAVE_ARGUMENT: if op >= dis.HAVE_ARGUMENT:
# Calculate op's argument value based on its argument and # Calculate op's argument value based on its argument and
# preceding extended argument, if any # preceding extended argument, if any
oparg = code[offset+1] + code[offset+2]*256 + extended_arg oparg = code[offset+1] + code[offset+2]*256 + extended_arg
extended_arg = 0 extended_arg = 0
if op == dis.EXTENDED_ARG: if op == dis.EXTENDED_ARG:
extended_arg = oparg*65536 extended_arg = oparg * L65536
# Fill token's attr/pattr fields # Fill token's attr/pattr fields
current_token.attr = oparg current_token.attr = oparg
@@ -88,6 +93,7 @@ class Scanner34(scan.Scanner):
free = co.co_cellvars + co.co_freevars free = co.co_cellvars + co.co_freevars
current_token.pattr = free[oparg] current_token.pattr = free[oparg]
tokens.append(current_token) tokens.append(current_token)
return tokens, customize return tokens, customize
def build_lines_data(self, code_obj): def build_lines_data(self, code_obj):

View File

@@ -11,8 +11,7 @@ from __future__ import print_function
import dis, marshal import dis, marshal
from collections import namedtuple from collections import namedtuple
from uncompyle6.scanner import Token from uncompyle6.scanner import Token, L65536
# Get all the opcodes into globals # Get all the opcodes into globals
globals().update(dis.opmap) globals().update(dis.opmap)
@@ -62,14 +61,19 @@ class Scanner34(scan.Scanner):
# w/o touching arguments # w/o touching arguments
current_token = Token(dis.opname[op]) current_token = Token(dis.opname[op])
current_token.offset = offset current_token.offset = offset
current_token.linestart = True if offset in self.linestarts else False
if offset in self.linestarts:
current_token.linestart = self.linestarts[offset]
else:
current_token.linestart = None
if op >= dis.HAVE_ARGUMENT: if op >= dis.HAVE_ARGUMENT:
# Calculate op's argument value based on its argument and # Calculate op's argument value based on its argument and
# preceding extended argument, if any # preceding extended argument, if any
oparg = code[offset+1] + code[offset+2]*256 + extended_arg oparg = code[offset+1] + code[offset+2]*256 + extended_arg
extended_arg = 0 extended_arg = 0
if op == dis.EXTENDED_ARG: if op == dis.EXTENDED_ARG:
extended_arg = oparg*65536 extended_arg = oparg * L65536
# Fill token's attr/pattr fields # Fill token's attr/pattr fields
current_token.attr = oparg current_token.attr = oparg