Blame TODO

Packit 423ecb
124907 HTML parse buffer problem when parsing larse in-memory docs
Packit 423ecb
124110 DTD validation && wrong namespace
Packit 423ecb
123564 xmllint --html --format
Packit 423ecb
Packit 423ecb
           TODO for the XML parser and stuff:
Packit 423ecb
	   ==================================
Packit 423ecb
Packit 423ecb
      $Id$
Packit 423ecb
Packit 423ecb
    this tend to be outdated :-\ ...
Packit 423ecb
Packit 423ecb
DOCS:
Packit 423ecb
=====
Packit 423ecb
Packit 423ecb
- use case of using XInclude to load for example a description.
Packit 423ecb
  order document + product base -(XSLT)-> quote with XIncludes 
Packit 423ecb
                                                   |
Packit 423ecb
  HTML output with description of parts <---(XSLT)--
Packit 423ecb
Packit 423ecb
TODO:
Packit 423ecb
=====
Packit 423ecb
- XInclude at the SAX level (libSRVG)
Packit 423ecb
- fix the C code prototype to bring back doc/libxml-undocumented.txt
Packit 423ecb
  to a reasonable level
Packit 423ecb
- Computation of base when HTTP redirect occurs, might affect HTTP
Packit 423ecb
  interfaces.
Packit 423ecb
- Computation of base in XInclude. Relativization of URIs.
Packit 423ecb
- listing all attributes in a node.
Packit 423ecb
- Better checking of external parsed entities TAG 1234
Packit 423ecb
- Go through erratas and do the cleanup.
Packit 423ecb
  http://www.w3.org/XML/xml-19980210-errata ... started ...
Packit 423ecb
- jamesh suggestion: SAX like functions to save a document ie. call a
Packit 423ecb
  function to open a new element with given attributes, write character
Packit 423ecb
  data, close last element, etc
Packit 423ecb
  + inversted SAX, initial patch in April 2002 archives.
Packit 423ecb
- htmlParseDoc has parameter encoding which is not used.
Packit 423ecb
  Function htmlCreateDocParserCtxt ignore it.
Packit 423ecb
- fix realloc() usage.
Packit 423ecb
- Stricten the UTF8 conformance (Martin Duerst):
Packit 423ecb
  http://www.w3.org/2001/06/utf-8-test/.
Packit 423ecb
  The bad files are in http://www.w3.org/2001/06/utf-8-wrong/.
Packit 423ecb
- xml:id normalized value
Packit 423ecb
Packit 423ecb
TODO:
Packit 423ecb
=====
Packit 423ecb
Packit 423ecb
- move all string manipulation functions (xmlStrdup, xmlStrlen, etc.) to
Packit 423ecb
  global.c. Bjorn noted that the following files depends on parser.o solely
Packit 423ecb
  because of these string functions: entities.o, global.o, hash.o, tree.o,
Packit 423ecb
  xmlIO.o, and xpath.o.
Packit 423ecb
Packit 423ecb
- Optimization of tag strings allocation ?
Packit 423ecb
Packit 423ecb
- maintain coherency of namespace when doing cut'n paste operations
Packit 423ecb
  => the functions are coded, but need testing
Packit 423ecb
Packit 423ecb
- function to rebuild the ID table
Packit 423ecb
- functions to rebuild the DTD hash tables (after DTD changes).
Packit 423ecb
   
Packit 423ecb
Packit 423ecb
EXTENSIONS:
Packit 423ecb
===========
Packit 423ecb
Packit 423ecb
- Tools to produce man pages from the SGML docs.
Packit 423ecb
Packit 423ecb
- Add Xpointer recognition/API
Packit 423ecb
Packit 423ecb
- Add Xlink recognition/API
Packit 423ecb
  => started adding an xlink.[ch] with a unified API for XML and HTML.
Packit 423ecb
     it's crap :-(
Packit 423ecb
Packit 423ecb
- Implement XSchemas
Packit 423ecb
  => Really need to be done <grin/>
Packit 423ecb
  - datatype are complete, but structure support is very limited.
Packit 423ecb
Packit 423ecb
- extend the shell with:
Packit 423ecb
   - edit
Packit 423ecb
   - load/save
Packit 423ecb
   - mv (yum, yum, but it's harder because directories are ordered in
Packit 423ecb
     our case, mvup and mvdown would be required)
Packit 423ecb
Packit 423ecb
Packit 423ecb
Done:
Packit 423ecb
=====
Packit 423ecb
Packit 423ecb
- Add HTML validation using the XHTML DTD
Packit 423ecb
  - problem: do we want to keep and maintain the code for handling
Packit 423ecb
    DTD/System ID cache directly in libxml ?
Packit 423ecb
  => not really done that way, but there are new APIs to check elements
Packit 423ecb
     or attributes. Otherwise XHTML validation directly ...
Packit 423ecb
Packit 423ecb
- XML Schemas datatypes except Base64 and BinHex
Packit 423ecb
Packit 423ecb
- Relax NG validation
Packit 423ecb
Packit 423ecb
- XmlTextReader streaming API + validation
Packit 423ecb
Packit 423ecb
- Add a DTD cache prefilled with xhtml DTDs and entities and a program to
Packit 423ecb
  manage them -> like the /usr/bin/install-catalog from SGML
Packit 423ecb
  right place seems $datadir/xmldtds
Packit 423ecb
  Maybe this is better left to user apps
Packit 423ecb
  => use a catalog instead , and xhtml1-dtd package
Packit 423ecb
Packit 423ecb
- Add output to XHTML
Packit 423ecb
  => XML serializer automatically recognize the DTd and apply the specific
Packit 423ecb
     rules.
Packit 423ecb
Packit 423ecb
- Fix output of <tst val="x
y"/>
Packit 423ecb
Packit 423ecb
- compliance to XML-Namespace checking, see section 6 of
Packit 423ecb
  http://www.w3.org/TR/REC-xml-names/
Packit 423ecb
Packit 423ecb
- Correct standalone checking/emitting (hard)
Packit 423ecb
  2.9 Standalone Document Declaration
Packit 423ecb
Packit 423ecb
- Implement OASIS XML Catalog support
Packit 423ecb
  http://www.oasis-open.org/committees/entity/
Packit 423ecb
Packit 423ecb
- Get OASIS testsuite to a more friendly result, check all the results
Packit 423ecb
  once stable. the check-xml-test-suite.py script does this
Packit 423ecb
Packit 423ecb
- Implement XSLT
Packit 423ecb
  => libxslt
Packit 423ecb
Packit 423ecb
- Finish XPath
Packit 423ecb
  => attributes addressing troubles
Packit 423ecb
  => defaulted attributes handling
Packit 423ecb
  => namespace axis ?
Packit 423ecb
  done as XSLT got debugged
Packit 423ecb
Packit 423ecb
- bug reported by Michael Meallin on validation problems
Packit 423ecb
  => Actually means I need to add support (and warn) for non-deterministic
Packit 423ecb
     content model.
Packit 423ecb
- Handle undefined namespaces in entity contents better ... at least
Packit 423ecb
  issue a warning
Packit 423ecb
- DOM needs
Packit 423ecb
  int xmlPruneProp(xmlNodePtr node, xmlAtttrPtr attr);
Packit 423ecb
  => done it's actually xmlRemoveProp xmlUnsetProp xmlUnsetNsProp
Packit 423ecb
Packit 423ecb
- HTML: handling of Script and style data elements, need special code in
Packit 423ecb
  the parser and saving functions (handling of < > " ' ...):
Packit 423ecb
  http://www.w3.org/TR/html4/types.html#type-script
Packit 423ecb
  Attributes are no problems since entities are accepted.
Packit 423ecb
- DOM needs
Packit 423ecb
  xmlAttrPtr xmlNewDocProp(xmlDocPtr doc, const xmlChar *name, const xmlChar *value)
Packit 423ecb
- problem when parsing hrefs with & with the HTML parser (IRC ac)
Packit 423ecb
- If the internal encoding is not UTF8 saving to a given encoding doesn't
Packit 423ecb
  work => fix to force UTF8 encoding ...
Packit 423ecb
  done, added documentation too
Packit 423ecb
- Add an ASCII I/O encoder (asciiToUTF8 and UTF8Toascii)
Packit 423ecb
- Issue warning when using non-absolute namespaces URI.
Packit 423ecb
- the html parser should add <head> and <body> if they don't exist
Packit 423ecb
  started, not finished.
Packit 423ecb
  Done, the automatic closing is added and 3 testcases were inserted
Packit 423ecb
- Command to force the parser to stop parsing and ignore the rest of the file.
Packit 423ecb
  xmlStopParser() should allow this, mostly untested
Packit 423ecb
- support for HTML empty attributes like 
Packit 423ecb
- plugged iconv() in for support of a large set of encodings.
Packit 423ecb
- xmlSwitchToEncoding() rewrite done
Packit 423ecb
- URI checkings (no fragments) rfc2396.txt
Packit 423ecb
- Added a clean mechanism for overload or added input methods:
Packit 423ecb
  xmlRegisterInputCallbacks()
Packit 423ecb
- dynamically adapt the alloc entry point to use g_alloc()/g_free()
Packit 423ecb
  if the programmer wants it: 
Packit 423ecb
    - use xmlMemSetup() to reset the routines used.
Packit 423ecb
- Check attribute normalization especially xmlGetProp()
Packit 423ecb
- Validity checking problems for NOTATIONS attributes
Packit 423ecb
- Validity checking problems for ENTITY ENTITIES attributes
Packit 423ecb
- Parsing of a well balanced chunk xmlParseBalancedChunkMemory()
Packit 423ecb
- URI module: validation, base, etc ... see uri.[ch]
Packit 423ecb
- turn tester into a generic program xmllint installed with libxml
Packit 423ecb
- extend validity checks to go through entities content instead of
Packit 423ecb
  just labelling them PCDATA
Packit 423ecb
- Save Dtds using the children list instead of dumping the tables,
Packit 423ecb
  order is preserved as well as comments and PIs
Packit 423ecb
- Wrote a notice of changes requires to go from 1.x to 2.x
Packit 423ecb
- make sure that all SAX callbacks are disabled if a WF error is detected
Packit 423ecb
- checking/handling of newline normalization
Packit 423ecb
  http://localhost/www.xml.com/axml/target.html#sec-line-ends
Packit 423ecb
- correct checking of '&' '%' on entities content.
Packit 423ecb
- checking of PE/Nesting on entities declaration
Packit 423ecb
- checking/handling of xml:space
Packit 423ecb
   - checking done.
Packit 423ecb
   - handling done, not well tested
Packit 423ecb
- Language identification code, productions [33] to [38]
Packit 423ecb
  => done, the check has been added and report WFness errors
Packit 423ecb
- Conditional sections in DTDs [61] to [65]
Packit 423ecb
  => should this crap be really implemented ???
Packit 423ecb
  => Yep OASIS testsuite uses them
Packit 423ecb
- Allow parsed entities defined in the internal subset to override
Packit 423ecb
  the ones defined in the external subset (DtD customization).
Packit 423ecb
  => This mean that the entity content should be computed only at
Packit 423ecb
     use time, i.e. keep the orig string only at parse time and expand
Packit 423ecb
     only when referenced from the external subset :-(
Packit 423ecb
     Needed for complete use of most DTD from Eve Maler
Packit 423ecb
- Add regression tests for all WFC errors
Packit 423ecb
  => did some in test/WFC
Packit 423ecb
  => added OASIS testsuite routines
Packit 423ecb
     http://xmlsoft.org/conf/result.html
Packit 423ecb
Packit 423ecb
- I18N: http://wap.trondheim.com/vaer/index.phtml is not XML and accepted
Packit 423ecb
  by the XML parser, UTF-8 should be checked when there is no "encoding"
Packit 423ecb
  declared !
Packit 423ecb
- Support for UTF-8 and UTF-16 encoding
Packit 423ecb
  => added some convertion routines provided by Martin Durst
Packit 423ecb
     patched them, got fixes from @@@
Packit 423ecb
     I plan to keep everything internally as UTF-8 (or ISO-Latin-X)
Packit 423ecb
     this is slightly more costly but more compact, and recent processors
Packit 423ecb
     efficiency is cache related. The key for good performances is keeping
Packit 423ecb
     the data set small, so will I.
Packit 423ecb
  => the new progressive reading routines call the detection code
Packit 423ecb
     is enabled, tested the ISO->UTF-8 stuff
Packit 423ecb
- External entities loading: 
Packit 423ecb
   - allow override by client code
Packit 423ecb
   - make sure it is alled for all external entities referenced
Packit 423ecb
  Done, client code should use xmlSetExternalEntityLoader() to set
Packit 423ecb
  the default loading routine. It will be called each time an external
Packit 423ecb
  entity entity resolution is triggered.
Packit 423ecb
- maintain ID coherency when removing/changing attributes
Packit 423ecb
  The function used to deallocate attributes now check for it being an
Packit 423ecb
  ID and removes it from the table.
Packit 423ecb
- push mode parsing i.e. non-blocking state based parser
Packit 423ecb
  done, both for XML and HTML parsers. Use xmlCreatePushParserCtxt()
Packit 423ecb
  and xmlParseChunk() and html counterparts.
Packit 423ecb
  The tester program now has a --push option to select that parser 
Packit 423ecb
  front-end. Douplicated tests to use both and check results are similar.
Packit 423ecb
Packit 423ecb
- Most of XPath, still see some troubles and occasionnal memleaks.
Packit 423ecb
- an XML shell, allowing to traverse/manipulate an XML document with
Packit 423ecb
  a shell like interface, and using XPath for the anming syntax
Packit 423ecb
  - use of readline and history added when available
Packit 423ecb
  - the shell interface has been cleanly separated and moved to debugXML.c
Packit 423ecb
- HTML parser, should be fairly stable now
Packit 423ecb
- API to search the lang of an attribute
Packit 423ecb
- Collect IDs at parsing and maintain a table. 
Packit 423ecb
   PBM: maintain the table coherency
Packit 423ecb
   PBM: how to detect ID types in absence of DtD !
Packit 423ecb
- Use it for XPath ID support
Packit 423ecb
- Add validity checking
Packit 423ecb
  Should be finished now !
Packit 423ecb
- Add regression tests with entity substitutions
Packit 423ecb
Packit 423ecb
- External Parsed entities, either XML or external Subset [78] and [79]
Packit 423ecb
  parsing the xmllang DtD now works, so it should be sufficient for
Packit 423ecb
  most cases !
Packit 423ecb
Packit 423ecb
- progressive reading. The entity support is a first step toward
Packit 423ecb
  asbtraction of an input stream. A large part of the context is still
Packit 423ecb
  located on the stack, moving to a state machine and putting everyting
Packit 423ecb
  in the parsing context should provide an adequate solution.
Packit 423ecb
  => Rather than progressive parsing, give more power to the SAX-like
Packit 423ecb
     interface. Currently the DOM-like representation is built but
Packit 423ecb
     => it should be possible to define that only as a set of SAX callbacks
Packit 423ecb
	and remove the tree creation from the parser code.
Packit 423ecb
	DONE
Packit 423ecb
Packit 423ecb
- DOM support, instead of using a proprietary in memory
Packit 423ecb
  format for the document representation, the parser should
Packit 423ecb
  call a DOM API to actually build the resulting document.
Packit 423ecb
  Then the parser becomes independent of the in-memory
Packit 423ecb
  representation of the document. Even better using RPC's
Packit 423ecb
  the parser can actually build the document in another
Packit 423ecb
  program.
Packit 423ecb
  => Work started, now the internal representation is by default
Packit 423ecb
     very near a direct DOM implementation. The DOM glue is implemented
Packit 423ecb
     as a separate module. See the GNOME gdome module.
Packit 423ecb
Packit 423ecb
- C++ support : John Ehresman <jehresma@dsg.harvard.edu>
Packit 423ecb
- Updated code to follow more recent specs, added compatibility flag
Packit 423ecb
- Better error handling, use a dedicated, overridable error
Packit 423ecb
  handling function.
Packit 423ecb
- Support for CDATA.
Packit 423ecb
- Keep track of line numbers for better error reporting.
Packit 423ecb
- Support for PI (SAX one).
Packit 423ecb
- Support for Comments (bad, should be in ASAP, they are parsed
Packit 423ecb
  but not stored), should be configurable.
Packit 423ecb
- Improve the support of entities on save (+SAX).
Packit 423ecb