Blame TODO

Packit Service a31ea6
124907 HTML parse buffer problem when parsing larse in-memory docs
Packit Service a31ea6
124110 DTD validation && wrong namespace
Packit Service a31ea6
123564 xmllint --html --format
Packit Service a31ea6
Packit Service a31ea6
           TODO for the XML parser and stuff:
Packit Service a31ea6
	   ==================================
Packit Service a31ea6
Packit Service a31ea6
      $Id$
Packit Service a31ea6
Packit Service a31ea6
    this tend to be outdated :-\ ...
Packit Service a31ea6
Packit Service a31ea6
DOCS:
Packit Service a31ea6
=====
Packit Service a31ea6
Packit Service a31ea6
- use case of using XInclude to load for example a description.
Packit Service a31ea6
  order document + product base -(XSLT)-> quote with XIncludes 
Packit Service a31ea6
                                                   |
Packit Service a31ea6
  HTML output with description of parts <---(XSLT)--
Packit Service a31ea6
Packit Service a31ea6
TODO:
Packit Service a31ea6
=====
Packit Service a31ea6
- XInclude at the SAX level (libSRVG)
Packit Service a31ea6
- fix the C code prototype to bring back doc/libxml-undocumented.txt
Packit Service a31ea6
  to a reasonable level
Packit Service a31ea6
- Computation of base when HTTP redirect occurs, might affect HTTP
Packit Service a31ea6
  interfaces.
Packit Service a31ea6
- Computation of base in XInclude. Relativization of URIs.
Packit Service a31ea6
- listing all attributes in a node.
Packit Service a31ea6
- Better checking of external parsed entities TAG 1234
Packit Service a31ea6
- Go through erratas and do the cleanup.
Packit Service a31ea6
  http://www.w3.org/XML/xml-19980210-errata ... started ...
Packit Service a31ea6
- jamesh suggestion: SAX like functions to save a document ie. call a
Packit Service a31ea6
  function to open a new element with given attributes, write character
Packit Service a31ea6
  data, close last element, etc
Packit Service a31ea6
  + inversted SAX, initial patch in April 2002 archives.
Packit Service a31ea6
- htmlParseDoc has parameter encoding which is not used.
Packit Service a31ea6
  Function htmlCreateDocParserCtxt ignore it.
Packit Service a31ea6
- fix realloc() usage.
Packit Service a31ea6
- Stricten the UTF8 conformance (Martin Duerst):
Packit Service a31ea6
  http://www.w3.org/2001/06/utf-8-test/.
Packit Service a31ea6
  The bad files are in http://www.w3.org/2001/06/utf-8-wrong/.
Packit Service a31ea6
- xml:id normalized value
Packit Service a31ea6
Packit Service a31ea6
TODO:
Packit Service a31ea6
=====
Packit Service a31ea6
Packit Service a31ea6
- move all string manipulation functions (xmlStrdup, xmlStrlen, etc.) to
Packit Service a31ea6
  global.c. Bjorn noted that the following files depends on parser.o solely
Packit Service a31ea6
  because of these string functions: entities.o, global.o, hash.o, tree.o,
Packit Service a31ea6
  xmlIO.o, and xpath.o.
Packit Service a31ea6
Packit Service a31ea6
- Optimization of tag strings allocation ?
Packit Service a31ea6
Packit Service a31ea6
- maintain coherency of namespace when doing cut'n paste operations
Packit Service a31ea6
  => the functions are coded, but need testing
Packit Service a31ea6
Packit Service a31ea6
- function to rebuild the ID table
Packit Service a31ea6
- functions to rebuild the DTD hash tables (after DTD changes).
Packit Service a31ea6
   
Packit Service a31ea6
Packit Service a31ea6
EXTENSIONS:
Packit Service a31ea6
===========
Packit Service a31ea6
Packit Service a31ea6
- Tools to produce man pages from the SGML docs.
Packit Service a31ea6
Packit Service a31ea6
- Add Xpointer recognition/API
Packit Service a31ea6
Packit Service a31ea6
- Add Xlink recognition/API
Packit Service a31ea6
  => started adding an xlink.[ch] with a unified API for XML and HTML.
Packit Service a31ea6
     it's crap :-(
Packit Service a31ea6
Packit Service a31ea6
- Implement XSchemas
Packit Service a31ea6
  => Really need to be done <grin/>
Packit Service a31ea6
  - datatype are complete, but structure support is very limited.
Packit Service a31ea6
Packit Service a31ea6
- extend the shell with:
Packit Service a31ea6
   - edit
Packit Service a31ea6
   - load/save
Packit Service a31ea6
   - mv (yum, yum, but it's harder because directories are ordered in
Packit Service a31ea6
     our case, mvup and mvdown would be required)
Packit Service a31ea6
Packit Service a31ea6
Packit Service a31ea6
Done:
Packit Service a31ea6
=====
Packit Service a31ea6
Packit Service a31ea6
- Add HTML validation using the XHTML DTD
Packit Service a31ea6
  - problem: do we want to keep and maintain the code for handling
Packit Service a31ea6
    DTD/System ID cache directly in libxml ?
Packit Service a31ea6
  => not really done that way, but there are new APIs to check elements
Packit Service a31ea6
     or attributes. Otherwise XHTML validation directly ...
Packit Service a31ea6
Packit Service a31ea6
- XML Schemas datatypes except Base64 and BinHex
Packit Service a31ea6
Packit Service a31ea6
- Relax NG validation
Packit Service a31ea6
Packit Service a31ea6
- XmlTextReader streaming API + validation
Packit Service a31ea6
Packit Service a31ea6
- Add a DTD cache prefilled with xhtml DTDs and entities and a program to
Packit Service a31ea6
  manage them -> like the /usr/bin/install-catalog from SGML
Packit Service a31ea6
  right place seems $datadir/xmldtds
Packit Service a31ea6
  Maybe this is better left to user apps
Packit Service a31ea6
  => use a catalog instead , and xhtml1-dtd package
Packit Service a31ea6
Packit Service a31ea6
- Add output to XHTML
Packit Service a31ea6
  => XML serializer automatically recognize the DTd and apply the specific
Packit Service a31ea6
     rules.
Packit Service a31ea6
Packit Service a31ea6
- Fix output of <tst val="x
y"/>
Packit Service a31ea6
Packit Service a31ea6
- compliance to XML-Namespace checking, see section 6 of
Packit Service a31ea6
  http://www.w3.org/TR/REC-xml-names/
Packit Service a31ea6
Packit Service a31ea6
- Correct standalone checking/emitting (hard)
Packit Service a31ea6
  2.9 Standalone Document Declaration
Packit Service a31ea6
Packit Service a31ea6
- Implement OASIS XML Catalog support
Packit Service a31ea6
  http://www.oasis-open.org/committees/entity/
Packit Service a31ea6
Packit Service a31ea6
- Get OASIS testsuite to a more friendly result, check all the results
Packit Service a31ea6
  once stable. the check-xml-test-suite.py script does this
Packit Service a31ea6
Packit Service a31ea6
- Implement XSLT
Packit Service a31ea6
  => libxslt
Packit Service a31ea6
Packit Service a31ea6
- Finish XPath
Packit Service a31ea6
  => attributes addressing troubles
Packit Service a31ea6
  => defaulted attributes handling
Packit Service a31ea6
  => namespace axis ?
Packit Service a31ea6
  done as XSLT got debugged
Packit Service a31ea6
Packit Service a31ea6
- bug reported by Michael Meallin on validation problems
Packit Service a31ea6
  => Actually means I need to add support (and warn) for non-deterministic
Packit Service a31ea6
     content model.
Packit Service a31ea6
- Handle undefined namespaces in entity contents better ... at least
Packit Service a31ea6
  issue a warning
Packit Service a31ea6
- DOM needs
Packit Service a31ea6
  int xmlPruneProp(xmlNodePtr node, xmlAtttrPtr attr);
Packit Service a31ea6
  => done it's actually xmlRemoveProp xmlUnsetProp xmlUnsetNsProp
Packit Service a31ea6
Packit Service a31ea6
- HTML: handling of Script and style data elements, need special code in
Packit Service a31ea6
  the parser and saving functions (handling of < > " ' ...):
Packit Service a31ea6
  http://www.w3.org/TR/html4/types.html#type-script
Packit Service a31ea6
  Attributes are no problems since entities are accepted.
Packit Service a31ea6
- DOM needs
Packit Service a31ea6
  xmlAttrPtr xmlNewDocProp(xmlDocPtr doc, const xmlChar *name, const xmlChar *value)
Packit Service a31ea6
- problem when parsing hrefs with & with the HTML parser (IRC ac)
Packit Service a31ea6
- If the internal encoding is not UTF8 saving to a given encoding doesn't
Packit Service a31ea6
  work => fix to force UTF8 encoding ...
Packit Service a31ea6
  done, added documentation too
Packit Service a31ea6
- Add an ASCII I/O encoder (asciiToUTF8 and UTF8Toascii)
Packit Service a31ea6
- Issue warning when using non-absolute namespaces URI.
Packit Service a31ea6
- the html parser should add <head> and <body> if they don't exist
Packit Service a31ea6
  started, not finished.
Packit Service a31ea6
  Done, the automatic closing is added and 3 testcases were inserted
Packit Service a31ea6
- Command to force the parser to stop parsing and ignore the rest of the file.
Packit Service a31ea6
  xmlStopParser() should allow this, mostly untested
Packit Service a31ea6
- support for HTML empty attributes like 
Packit Service a31ea6
- plugged iconv() in for support of a large set of encodings.
Packit Service a31ea6
- xmlSwitchToEncoding() rewrite done
Packit Service a31ea6
- URI checkings (no fragments) rfc2396.txt
Packit Service a31ea6
- Added a clean mechanism for overload or added input methods:
Packit Service a31ea6
  xmlRegisterInputCallbacks()
Packit Service a31ea6
- dynamically adapt the alloc entry point to use g_alloc()/g_free()
Packit Service a31ea6
  if the programmer wants it: 
Packit Service a31ea6
    - use xmlMemSetup() to reset the routines used.
Packit Service a31ea6
- Check attribute normalization especially xmlGetProp()
Packit Service a31ea6
- Validity checking problems for NOTATIONS attributes
Packit Service a31ea6
- Validity checking problems for ENTITY ENTITIES attributes
Packit Service a31ea6
- Parsing of a well balanced chunk xmlParseBalancedChunkMemory()
Packit Service a31ea6
- URI module: validation, base, etc ... see uri.[ch]
Packit Service a31ea6
- turn tester into a generic program xmllint installed with libxml
Packit Service a31ea6
- extend validity checks to go through entities content instead of
Packit Service a31ea6
  just labelling them PCDATA
Packit Service a31ea6
- Save Dtds using the children list instead of dumping the tables,
Packit Service a31ea6
  order is preserved as well as comments and PIs
Packit Service a31ea6
- Wrote a notice of changes requires to go from 1.x to 2.x
Packit Service a31ea6
- make sure that all SAX callbacks are disabled if a WF error is detected
Packit Service a31ea6
- checking/handling of newline normalization
Packit Service a31ea6
  http://localhost/www.xml.com/axml/target.html#sec-line-ends
Packit Service a31ea6
- correct checking of '&' '%' on entities content.
Packit Service a31ea6
- checking of PE/Nesting on entities declaration
Packit Service a31ea6
- checking/handling of xml:space
Packit Service a31ea6
   - checking done.
Packit Service a31ea6
   - handling done, not well tested
Packit Service a31ea6
- Language identification code, productions [33] to [38]
Packit Service a31ea6
  => done, the check has been added and report WFness errors
Packit Service a31ea6
- Conditional sections in DTDs [61] to [65]
Packit Service a31ea6
  => should this crap be really implemented ???
Packit Service a31ea6
  => Yep OASIS testsuite uses them
Packit Service a31ea6
- Allow parsed entities defined in the internal subset to override
Packit Service a31ea6
  the ones defined in the external subset (DtD customization).
Packit Service a31ea6
  => This mean that the entity content should be computed only at
Packit Service a31ea6
     use time, i.e. keep the orig string only at parse time and expand
Packit Service a31ea6
     only when referenced from the external subset :-(
Packit Service a31ea6
     Needed for complete use of most DTD from Eve Maler
Packit Service a31ea6
- Add regression tests for all WFC errors
Packit Service a31ea6
  => did some in test/WFC
Packit Service a31ea6
  => added OASIS testsuite routines
Packit Service a31ea6
     http://xmlsoft.org/conf/result.html
Packit Service a31ea6
Packit Service a31ea6
- I18N: http://wap.trondheim.com/vaer/index.phtml is not XML and accepted
Packit Service a31ea6
  by the XML parser, UTF-8 should be checked when there is no "encoding"
Packit Service a31ea6
  declared !
Packit Service a31ea6
- Support for UTF-8 and UTF-16 encoding
Packit Service a31ea6
  => added some convertion routines provided by Martin Durst
Packit Service a31ea6
     patched them, got fixes from @@@
Packit Service a31ea6
     I plan to keep everything internally as UTF-8 (or ISO-Latin-X)
Packit Service a31ea6
     this is slightly more costly but more compact, and recent processors
Packit Service a31ea6
     efficiency is cache related. The key for good performances is keeping
Packit Service a31ea6
     the data set small, so will I.
Packit Service a31ea6
  => the new progressive reading routines call the detection code
Packit Service a31ea6
     is enabled, tested the ISO->UTF-8 stuff
Packit Service a31ea6
- External entities loading: 
Packit Service a31ea6
   - allow override by client code
Packit Service a31ea6
   - make sure it is alled for all external entities referenced
Packit Service a31ea6
  Done, client code should use xmlSetExternalEntityLoader() to set
Packit Service a31ea6
  the default loading routine. It will be called each time an external
Packit Service a31ea6
  entity entity resolution is triggered.
Packit Service a31ea6
- maintain ID coherency when removing/changing attributes
Packit Service a31ea6
  The function used to deallocate attributes now check for it being an
Packit Service a31ea6
  ID and removes it from the table.
Packit Service a31ea6
- push mode parsing i.e. non-blocking state based parser
Packit Service a31ea6
  done, both for XML and HTML parsers. Use xmlCreatePushParserCtxt()
Packit Service a31ea6
  and xmlParseChunk() and html counterparts.
Packit Service a31ea6
  The tester program now has a --push option to select that parser 
Packit Service a31ea6
  front-end. Douplicated tests to use both and check results are similar.
Packit Service a31ea6
Packit Service a31ea6
- Most of XPath, still see some troubles and occasionnal memleaks.
Packit Service a31ea6
- an XML shell, allowing to traverse/manipulate an XML document with
Packit Service a31ea6
  a shell like interface, and using XPath for the anming syntax
Packit Service a31ea6
  - use of readline and history added when available
Packit Service a31ea6
  - the shell interface has been cleanly separated and moved to debugXML.c
Packit Service a31ea6
- HTML parser, should be fairly stable now
Packit Service a31ea6
- API to search the lang of an attribute
Packit Service a31ea6
- Collect IDs at parsing and maintain a table. 
Packit Service a31ea6
   PBM: maintain the table coherency
Packit Service a31ea6
   PBM: how to detect ID types in absence of DtD !
Packit Service a31ea6
- Use it for XPath ID support
Packit Service a31ea6
- Add validity checking
Packit Service a31ea6
  Should be finished now !
Packit Service a31ea6
- Add regression tests with entity substitutions
Packit Service a31ea6
Packit Service a31ea6
- External Parsed entities, either XML or external Subset [78] and [79]
Packit Service a31ea6
  parsing the xmllang DtD now works, so it should be sufficient for
Packit Service a31ea6
  most cases !
Packit Service a31ea6
Packit Service a31ea6
- progressive reading. The entity support is a first step toward
Packit Service a31ea6
  asbtraction of an input stream. A large part of the context is still
Packit Service a31ea6
  located on the stack, moving to a state machine and putting everyting
Packit Service a31ea6
  in the parsing context should provide an adequate solution.
Packit Service a31ea6
  => Rather than progressive parsing, give more power to the SAX-like
Packit Service a31ea6
     interface. Currently the DOM-like representation is built but
Packit Service a31ea6
     => it should be possible to define that only as a set of SAX callbacks
Packit Service a31ea6
	and remove the tree creation from the parser code.
Packit Service a31ea6
	DONE
Packit Service a31ea6
Packit Service a31ea6
- DOM support, instead of using a proprietary in memory
Packit Service a31ea6
  format for the document representation, the parser should
Packit Service a31ea6
  call a DOM API to actually build the resulting document.
Packit Service a31ea6
  Then the parser becomes independent of the in-memory
Packit Service a31ea6
  representation of the document. Even better using RPC's
Packit Service a31ea6
  the parser can actually build the document in another
Packit Service a31ea6
  program.
Packit Service a31ea6
  => Work started, now the internal representation is by default
Packit Service a31ea6
     very near a direct DOM implementation. The DOM glue is implemented
Packit Service a31ea6
     as a separate module. See the GNOME gdome module.
Packit Service a31ea6
Packit Service a31ea6
- C++ support : John Ehresman <jehresma@dsg.harvard.edu>
Packit Service a31ea6
- Updated code to follow more recent specs, added compatibility flag
Packit Service a31ea6
- Better error handling, use a dedicated, overridable error
Packit Service a31ea6
  handling function.
Packit Service a31ea6
- Support for CDATA.
Packit Service a31ea6
- Keep track of line numbers for better error reporting.
Packit Service a31ea6
- Support for PI (SAX one).
Packit Service a31ea6
- Support for Comments (bad, should be in ASAP, they are parsed
Packit Service a31ea6
  but not stored), should be configurable.
Packit Service a31ea6
- Improve the support of entities on save (+SAX).
Packit Service a31ea6