Blob Blame History Raw
===============================
Porting python-dbus to Python 3
===============================

This is an experimental port to Python 3.x where x >= 2.  There are lots of
great sources for porting C extensions to Python 3, including:

 * http://python3porting.com/toc.html
 * http://docs.python.org/howto/cporting.html
 * http://docs.python.org/py3k/c-api/index.html

I also consulted an early take on this port by John Palmieri and David Malcolm
in the context of Fedora:

 * https://bugs.freedesktop.org/show_bug.cgi?id=26420

although I have made some different choices.  The patches in that tracker
issue also don't cover porting the Python bits (e.g. the test suite), nor the
pygtk -> pygi porting, both which I've also attempted to do in this branch.

This document outlines my notes and strategies for doing this port.  Please
feel free to contact me with any bugs, issues, disagreements, suggestions,
kudos, and curses.

Barry Warsaw
barry@python.org
2011-11-11


User visible changes
====================

You've got some dbus-python code that works great in Python 2.  This branch
should generally allow your existing Python 2 code to continue to work
unchanged.  There are a few changes you'll notice in Python 2 though:

 - The minimum supported Python 2 version is 2.6.
 - All object reprs are unicodes.  This change was made because it greatly
   simplifies the implementation and cross-compatibility with Python 3.
 - Some exception strings have changed.
 - `MethodCallMessage` and `SignalMessage` objects have better reprs now.

What do you need to do to port that to Python 3?  Here are the user visible
changes you should be aware of, relative to Python 2.  Python 3.2 is the
minimal required version:

 - `ByteArray` objects must be initialized with bytes objects, not unicodes.
   Use `b''` literals in the constructor.  This also works in Python 2, where
   bytes objects are aliases for 8-bit strings.
 - `Byte` objects must be initialized with either a length-1 bytes object
   (again, use `b''` literals to be compatible with either Python 2 or 3)
   or an integer.
 - byte signatures (i.e. `y` type codes) must be passed either a length-1
   bytes object or an integer. unicodes (str in Python 3) are not allowed.
 - `ByteArray` is now a subclass of `bytes`, where in Python 2 it is a
   subclass of `str`.
 - `dbus.UTF8String` is gone, use `dbus.String`.  Also `utf8_string` arguments
   are no longer allowed.
 - All longs are now ints, since Python 3 has only a single int type.  This
   also means that the class hierarchy for the dbus numeric types has changed
   (all derive from int in Python 3).


Bytes vs. Strings
=================

All strings in dbus are defined as UTF-8:

http://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-signatures

However, the dbus C API accepts `char*` which must be UTF-8 strings NUL
terminated and no other NUL bytes.

This page describes the mapping between Python types and dbus types:

    http://dbus.freedesktop.org/doc/dbus-python/doc/tutorial.html#basic-types

Notice that it maps dbus `string` (`'s'`) to `dbus.String` (unicode) or
`dbus.UTF8String` (str).  Also notice that there is no direct dbus equivalent
of Python's bytes type (although dbus does have byte arrays), so I am mapping
dbus strings to unicodes in all cases, and getting rid of `dbus.UTF8String` in
Python 3.  I've also added a `dbus._BytesBase` type which is unused in Python
2, but which forms the base class for `dbus.ByteArray` in Python 3.  This is
an implementation detail and not part of the public API.

In Python 3, object paths (`'o'` or `dbus.ObjectPath`), signatures (`'g'` or
`dbus.Signature`), bus names, interfaces, and methods are all strings.  A
previous aborted effort was made to use bytes for these, which at first blush
may makes some sense, but on deeper consideration does not.  This approach
also tended to impose too many changes on user code, and caused lots of
difficult to track down problems.

In Python 3, all such objects are subclasses of `str` (i.e. `unicode`).

(As an example, dbus-python's callback dispatching pretty much assumes all
these things are strings.  When they are bytes, the fact that `'foo' != b'foo'`
causes dispatch matching to fail in difficult to debug ways.  Even bus names
are not immune, since they do things like `bus_name[:1] == ':'` which fails in
multiple ways when `bus_name` is a bytes.  For sanity purposes, these are all
unicode strings now, and we just eat the complexity at the C level.)

I am using `#include <bytesobject.h>`, which exposes the PyBytes API to Python
2.6 and 2.7, and I have converted all internal PyString calls to PyBytes
calls.  Where this is inappropriate, we'll use PyUnicode calls explicitly.
E.g. all repr() implementations now return unicodes.  Most of these changes
shouldn't be noticed, even in existing Python 2 code.

Generally, I've left the descriptions and docstrings saying "str" instead of
"unicode" since there's no distinction in Python 3.

APIs which previously returned PyStrings will usually return PyUnicodes, not
PyBytes.


Ints vs. Longs
==============

Python 3 only has PyLong types; PyInts are gone.  For that reason, I've
switched all PyInt calls to use PyLong in both Python 2 and Python 3.  Python
3.0 had a nice `<intobject.h>` header that aliased PyInt to PyLong, but that's
gone as of Python 3.1, and the minimal required Python 3 version is 3.2.

In the above page mapping basic types, you'll notice that the Python int type
is mapped to 32-bit signed integers ('i') and the Python long type is mapped
to 64-bit signed integers ('x').  Python 3 doesn't have this distinction, so
ints map to 'i' even though ints can be larger in Python 3.  Use the
dbus-specific integer types if you must have more exact mappings.

APIs which accepted ints in Python 2 will still do so, but they'll also now
accept longs.  These APIs obviously only accept longs in Python 3.

Long literals in Python code are an interesting thing to have to port.  Don't
use them if you want your code to work in both Python versions.

`dbus._IntBase` is removed in Python 3, you only have `dbus._LongBase`, which
inherits from a Python 3 int (i.e. a PyLong).  Again, this is an
implementation detail that users should never care about.


Macros
======

In types-internal.h, I define `PY3K` when `PY_MAJOR_VERSION` >= 3, so you'll
see ifdefs on the former symbol within the C code.

Python 3 really could use a PY_REFCNT() wrapper for ob_refcnt access.


PyCapsule vs. PyCObject
=======================

`_dbus_bindings._C_API` is an attribute exposed to Python in the module.  In
Python 2, this is a PyCObject, but these do not exist in Python >= 3.2, so it
is replaced with a PyCapsules for Python 3.  However, since PyCapsules were
only introduced in Python 2.7, and I want to support Python 2.6, PyCObjects
are still used when this module is compiled for Python 2.


Python level compatibility
==========================

`from dbus import _is_py3` gives you a flag to check if you must do something
different in Python 3.  In general I use this flag to support both versions in
one set of sources, which seems better than trying to use 2to3.  It's not part
of the dbus-python public API, so you must not use it in third-party projects.


Miscellaneous
=============

The PyDoc_STRVAR() documentation is probably out of date.  Once the API
choices have been green-lighted upstream, I'll make a pass through the code to
update them.  It might be tricky based on any differences between Python 2 and
Python 3.

There were a few places where I noticed what might be considered bugs,
unchecked exception conditions, or possible reference count leaks.  In these
cases, I've just fixed what I can and hopefully haven't made the situation
worse.

`dbus_py_variant_level_get()` did not check possible error conditions, nor did
their callers.  When `dbus_py_variant_level_get()` encounters an error, it now
returns -1, and callers check this.

As much as possible, I've refrained from general code cleanups (e.g. 80
columns), unless it just bugged me too much or I touched the code for reasons
related to the port.  I've also tried to stick to existing C code style,
e.g. through the use of pervasive `Py_CLEAR()` calls, comparison against NULL
usually with `!foo`, and such.  As Bart Simpson might write on his classroom
blackboard::

    This is not a rewrite
    This is not a rewrite
    This is not a rewrite
    This is not a rewrite
    ...

and so on.  Well, mostly ;).

I think I fixed a reference leak in `DBusPyServer_set_auth_mechanisms()`.
`PySequence_Fast()` returns a new reference, which wasn't getting decref'd in
any return path.

 - Instantiation of metaclasses uses different, incompatible syntax in Python
   2 and 3.  You have to use direct calling of the metaclass to work across
   versions, i.e. `Interface = InterfaceType('Interface', (object,), {})`
 - `iteritems()` and friends are gone.  I dropped the "iter" prefixes.
 - `xrange() is gone.  I changed them to use `range()`.
 - `isSequenceType()` is gone in Python 3, so I use a different idiom there.
 - `__next__()` vs. `next()`
 - `PyUnicode_FromFormat()` `%V` flag is a clever hack!
 - `sys.version_info` is a tuple in Python 2.6, not a namedtuple.  i.e. there
   is no `sys.version_info.major`
 - `PyArg_Parse()`: No 'y' code in Python 2; in Python 3, no equivalent of 'z'
   for bytes objects.


Open issues
===========

Here are a few things that still need to be done, or for which there may be
open questions::

 - Update all C extension docstrings for accuracy.