Blame platform-demos/gl/strings.py.page

Packit 1470ea
Packit 1470ea
<page xmlns="http://projectmallard.org/1.0/" xmlns:its="http://www.w3.org/2005/11/its" xmlns:e="http://projectmallard.org/experimental/" type="guide" style="task" id="strings.py" xml:lang="gl">
Packit 1470ea
Packit 1470ea
<info>
Packit 1470ea
  <title type="text">Strings (Python)</title>
Packit 1470ea
  <link type="guide" xref="beginner.py#theory"/>
Packit 1470ea
  <link type="next" xref="label.py"/>
Packit 1470ea
  <revision version="0.1" date="2012-06-16" status="draft"/>
Packit 1470ea
Packit 1470ea
  <desc>An explanation of how to deal with strings in Python and GTK+.</desc>
Packit 1470ea
  <credit type="author copyright">
Packit 1470ea
    <name>Sebastian Pölsterl</name>
Packit 1470ea
    <email its:translate="no">sebp@k-d-w.org</email>
Packit 1470ea
    <years>2011</years>
Packit 1470ea
  </credit>
Packit 1470ea
  <credit type="editor">
Packit 1470ea
    <name>Marta Maria Casetti</name>
Packit 1470ea
    <email its:translate="no">mmcasetti@gmail.com</email>
Packit 1470ea
    <years>2012</years>
Packit 1470ea
  </credit>
Packit 1470ea
Packit 1470ea
    <mal:credit xmlns:mal="http://projectmallard.org/1.0/" type="translator copyright">
Packit 1470ea
      <mal:name>Fran Dieguez</mal:name>
Packit 1470ea
      <mal:email>frandieguez@gnome.org</mal:email>
Packit 1470ea
      <mal:years>2012-2013.</mal:years>
Packit 1470ea
    </mal:credit>
Packit 1470ea
  </info>
Packit 1470ea
Packit 1470ea
<title>Strings</title>
Packit 1470ea
Packit 1470ea
<links type="section"/>
Packit 1470ea
Packit 1470ea
<note style="warning">

GNOME strongly encourages the use of Python 3 for writing applications!

</note>
Packit 1470ea
Packit 1470ea
<section id="python-2">
Packit 1470ea
<title>Strings in Python 2</title>
Packit 1470ea
Packit 1470ea

Python 2 comes with two different kinds of objects that can be used to represent strings, str and unicode. Instances of unicode are used to express Unicode strings, whereas instances of the str type are byte representations (the encoded string). Under the hood, Python represents Unicode strings as either 16- or 32-bit integers, depending on how the Python interpreter was compiled.

Packit 1470ea
Packit 1470ea
Packit 1470ea
>>> unicode_string = u"Fu\u00dfb\u00e4lle"
Packit 1470ea
>>> print unicode_string]]>
Packit 1470ea
Fußbälle
Packit 1470ea
Packit 1470ea
Packit 1470ea

Unicode strings can be converted to 8-bit strings with unicode.encode(). Python’s 8-bit strings have a str.decode() method that interprets the string using the given encoding (that is, it is the inverse of the unicode.encode()):

Packit 1470ea
Packit 1470ea
Packit 1470ea
>>> type(unicode_string)
Packit 1470ea
<type 'unicode'>
Packit 1470ea
>>> unicode_string.encode("utf-8")
Packit 1470ea
'Fu\xc3\x9fb\xc3\xa4lle'
Packit 1470ea
>>> utf8_string = unicode_string.encode("utf-8")
Packit 1470ea
>>> type(utf8_string)
Packit 1470ea
<type 'str'>
Packit 1470ea
>>> unicode_string == utf8_string.decode("utf-8")
Packit 1470ea
True]]>
Packit 1470ea
Packit 1470ea

Unfortunately, Python 2.x allows you to mix unicode and str if the 8-bit string happened to contain only 7-bit (ASCII) bytes, but would get <sys>UnicodeDecodeError</sys> if it contained non-ASCII values.

Packit 1470ea
Packit 1470ea
</section>
Packit 1470ea
Packit 1470ea
<section id="python-3">
Packit 1470ea
<title>Cadeas en Python 3</title>
Packit 1470ea
Packit 1470ea

Since Python 3.0, all strings are stored as Unicode in an instance of the str type. Encoded strings on the other hand are represented as binary data in the form of instances of the bytes type. Conceptually, str refers to text, whereas bytes refers to data. Use encode() to go from str to bytes, and decode() to go from bytes to str.

Packit 1470ea
Packit 1470ea

In addition, it is no longer possible to mix Unicode strings with encoded strings, because it will result in a TypeError:

Packit 1470ea
Packit 1470ea
Packit 1470ea
>>> text = "Fu\u00dfb\u00e4lle"
Packit 1470ea
>>> data = b" sind rund"
Packit 1470ea
>>> text + data
Packit 1470ea
Traceback (most recent call last):
Packit 1470ea
  File "<stdin>", line 1, in <module>
Packit 1470ea
TypeError: Can't convert 'bytes' object to str implicitly
Packit 1470ea
>>> text + data.decode("utf-8")
Packit 1470ea
'Fußbälle sind rund'
Packit 1470ea
>>> text.encode("utf-8") + data
Packit 1470ea
b'Fu\xc3\x9fb\xc3\xa4lle sind rund']]>
Packit 1470ea
Packit 1470ea
</section>
Packit 1470ea
Packit 1470ea
<section id="gtk">
Packit 1470ea
<title>Unicode en GTK+</title>
Packit 1470ea
Packit 1470ea

GTK+ uses UTF-8 encoded strings for all text. This means that if you call a method that returns a string you will always obtain an instance of the str type. The same applies to methods that expect one or more strings as parameter, they must be UTF-8 encoded. However, for convenience PyGObject will automatically convert any unicode instance to str if supplied as argument:

Packit 1470ea
Packit 1470ea
Packit 1470ea
>>> from gi.repository import Gtk
Packit 1470ea
>>> label = Gtk.Label()
Packit 1470ea
>>> unicode_string = u"Fu\u00dfb\u00e4lle"
Packit 1470ea
>>> label.set_text(unicode_string)
Packit 1470ea
>>> txt = label.get_text()
Packit 1470ea
>>> type(txt)
Packit 1470ea
<type 'str'>]]>
Packit 1470ea
Packit 1470ea

Aínda máis:

Packit 1470ea
Packit 1470ea
Packit 1470ea
>>> txt == unicode_string]]>
Packit 1470ea
Packit 1470ea

would return False, with the warning __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal (Gtk.Label.get_text() will always return a str instance; therefore, txt and unicode_string are not equal).

Packit 1470ea
Packit 1470ea

This is especially important if you want to internationalize your program using <link href="http://docs.python.org/library/gettext.html">gettext</link>. You have to make sure that gettext will return UTF-8 encoded 8-bit strings for all languages.

Packit 1470ea
Packit 1470ea

In general it is recommended to not use unicode objects in GTK+ applications at all, and only use UTF-8 encoded str objects since GTK+ does not fully integrate with unicode objects.

Packit 1470ea
Packit 1470ea

String encoding is more consistent in Python 3.x because PyGObject will automatically encode/decode to/from UTF-8 if you pass a string to a method or a method returns a string. Strings, or text, will always be represented as instances of str only.

Packit 1470ea
Packit 1470ea
</section>
Packit 1470ea
Packit 1470ea
<section id="references">
Packit 1470ea
<title>References</title>
Packit 1470ea
Packit 1470ea

<link href="http://python-gtk-3-tutorial.readthedocs.org/en/latest/unicode.html">How To Deal With Strings - The Python GTK+ 3 Tutorial</link>

Packit 1470ea
Packit 1470ea
</section>
Packit 1470ea
Packit 1470ea
</page>