>>> from lxml.html import document_fromstring, fragment_fromstring, tostring
lxml.html has two parsers, one for HTML, one for XHTML:
>>> from lxml.html import HTMLParser, XHTMLParser
>>> html = "<html><body><p>Hi!</p></body></html>"
>>> root = document_fromstring(html, parser=HTMLParser())
>>> print(root.tag)
html
>>> root = document_fromstring(html, parser=XHTMLParser())
>>> print(root.tag)
html
There are two functions for converting between HTML and XHTML:
>>> from lxml.html import xhtml_to_html, html_to_xhtml
>>> doc = document_fromstring(html, parser=HTMLParser())
>>> tostring(doc)
b'<html><body><p>Hi!</p></body></html>'
>>> html_to_xhtml(doc)
>>> tostring(doc)
b'<html:html xmlns:html="http://www.w3.org/1999/xhtml"><html:body><html:p>Hi!</html:p></html:body></html:html>'
>>> xhtml_to_html(doc)
>>> tostring(doc)
b'<html xmlns:html="http://www.w3.org/1999/xhtml"><body><p>Hi!</p></body></html>'