Blob Blame History Raw
    >>> from lxml.html import document_fromstring, fragment_fromstring, tostring

lxml.html has two parsers, one for HTML, one for XHTML:

    >>> from lxml.html import HTMLParser, XHTMLParser
    >>> html = "<html><body><p>Hi!</p></body></html>"

    >>> root = document_fromstring(html, parser=HTMLParser())
    >>> print(root.tag)
    html

    >>> root = document_fromstring(html, parser=XHTMLParser())
    >>> print(root.tag)
    html

There are two functions for converting between HTML and XHTML:

    >>> from lxml.html import xhtml_to_html, html_to_xhtml

    >>> doc = document_fromstring(html, parser=HTMLParser())
    >>> tostring(doc)
    b'<html><body><p>Hi!</p></body></html>'

    >>> html_to_xhtml(doc)
    >>> tostring(doc)
    b'<html:html xmlns:html="http://www.w3.org/1999/xhtml"><html:body><html:p>Hi!</html:p></html:body></html:html>'

    >>> xhtml_to_html(doc)
    >>> tostring(doc)
    b'<html xmlns:html="http://www.w3.org/1999/xhtml"><body><p>Hi!</p></body></html>'