Blame result/intsubset2.xml.sax2

Packit 423ecb
SAX.setDocumentLocator()
Packit 423ecb
SAX.startDocument()
Packit 423ecb
SAX.internalSubset(kanjidic2, , )
Packit 423ecb
SAX.comment( Version 1.3
Packit 423ecb
	This is the DTD of the XML-format kanji file combining information from
Packit 423ecb
	the KANJIDIC and KANJD212 files. It is intended to be largely self-
Packit 423ecb
	documenting, with each field being accompanied by an explanatory
Packit 423ecb
	comment.
Packit 423ecb
Packit 423ecb
	The file covers the following kanji:
Packit 423ecb
	(a) the 6,355 kanji from JIS X 0208;
Packit 423ecb
	(b) the 5,801 kanji from JIS X 0212;
Packit 423ecb
	(c) the 3,625 kanji from JIS X 0213 as follows:
Packit 423ecb
		(i) the 2,741 kanji which are also in JIS X 0212 have
Packit 423ecb
		JIS X 0213 code-points (kuten) added to the existing entry;
Packit 423ecb
		(ii) the 884 "new" kanji have new entries.
Packit 423ecb
Packit 423ecb
	At the end of the explanation for a number of fields there is a tag
Packit 423ecb
	with the format [N]. This indicates the leading letter(s) of the
Packit 423ecb
	equivalent field in the KANJIDIC and KANJD212 files.
Packit 423ecb
Packit 423ecb
	The KANJIDIC documentation should also be read for additional 
Packit 423ecb
	information about the information in the file.
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(kanjidic2, 4, ...)
Packit 423ecb
SAX.elementDecl(header, 4, ...)
Packit 423ecb
SAX.comment(
Packit 423ecb
	The single header element will contain identification information
Packit 423ecb
	about the version of the file 
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(file_version, 3, ...)
Packit 423ecb
SAX.comment(
Packit 423ecb
	This field denotes the version of kanjidic2 structure, as more
Packit 423ecb
	than one version may exist.
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(database_version, 3, ...)
Packit 423ecb
SAX.comment(
Packit 423ecb
	The version of the file, in the format YYYY-NN, where NN will be
Packit 423ecb
	a number starting with 01 for the first version released in a
Packit 423ecb
	calendar year, then increasing for each version in that year.
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(date_of_creation, 3, ...)
Packit 423ecb
SAX.comment(
Packit 423ecb
	The date the file was created in international format (YYYY-MM-DD).
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(character, 4, ...)
Packit 423ecb
SAX.elementDecl(literal, 3, ...)
Packit 423ecb
SAX.comment(
Packit 423ecb
	The character itself in UTF8 coding.
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(codepoint, 4, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The codepoint element states the code of the character in the various
Packit 423ecb
	character set standards.
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(cp_value, 3, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The cp_value contains the codepoint of the character in a particular
Packit 423ecb
	standard. The standard will be identified in the cp_type attribute.
Packit 423ecb
	)
Packit 423ecb
SAX.attributeDecl(cp_value, cp_type, 1, 2, NULL, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The cp_type attribute states the coding standard applying to the
Packit 423ecb
	element. The values assigned so far are:
Packit 423ecb
		jis208 - JIS X 0208-1997 - kuten coding (nn-nn)
Packit 423ecb
		jis212 - JIS X 0212-1990 - kuten coding (nn-nn)
Packit 423ecb
		jis213 - JIS X 0213-2000 - kuten coding (p-nn-nn)
Packit 423ecb
		ucs - Unicode 4.0 - hex coding (4 or 5 hexadecimal digits)
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(radical, 4, ...)
Packit 423ecb
SAX.elementDecl(rad_value, 3, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The radical number, in the range 1 to 214. The particular
Packit 423ecb
	classification type is stated in the rad_type attribute.
Packit 423ecb
	)
Packit 423ecb
SAX.attributeDecl(rad_value, rad_type, 1, 2, NULL, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The rad_type attribute states the type of radical classification.
Packit 423ecb
		classical - as recorded in the KangXi Zidian.
Packit 423ecb
		nelson - as used in the Nelson "Modern Japanese-English 
Packit 423ecb
		Character Dictionary" (i.e. the Classic, not the New Nelson).
Packit 423ecb
		This will only be used where Nelson reclassified the kanji.
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(misc, 4, ...)
Packit 423ecb
SAX.elementDecl(grade, 3, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The Jouyou Kanji grade level. 1 through 6 indicate the grade in which
Packit 423ecb
	the kanji is taught in Japanese schools. 8 indicates it is one of the
Packit 423ecb
	remaining Jouyou Kanji to be learned in junior high school, and 9 
Packit 423ecb
	indicates it is a Jinmeiyou (for use in names) kanji. [G]
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(stroke_count, 3, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The stroke count of the kanji, including the radical. If more than 
Packit 423ecb
	one, the first is considered the accepted count, while subsequent ones 
Packit 423ecb
	are common miscounts. (See Appendix E. of the KANJIDIC documentation
Packit 423ecb
	for some of the rules applied when counting strokes in some of the 
Packit 423ecb
	radicals.) [S]
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(variant, 3, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	A cross-reference code to another kanji, usually regarded as a variant.
Packit 423ecb
	The type of cross-reference is given in the var_type attribute.
Packit 423ecb
	)
Packit 423ecb
SAX.attributeDecl(variant, var_type, 1, 2, NULL, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The var_type attribute indicates the type of variant code. The current
Packit 423ecb
	values are: 
Packit 423ecb
		jis208 - in JIS X 0208 - kuten coding
Packit 423ecb
		jis212 - in JIS X 0212 - kuten coding
Packit 423ecb
		jis213 - in JIS X 0213 - kuten coding
Packit 423ecb
		deroo - De Roo number - numeric
Packit 423ecb
		njecd - Halpern NJECD index number - numeric
Packit 423ecb
		s_h - The Kanji Dictionary (Spahn & Hadamitzky) - descriptor
Packit 423ecb
		nelson - "Classic" Nelson - numeric
Packit 423ecb
		oneill - Japanese Names (O'Neill) - numeric
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(freq, 3, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	A frequency-of-use ranking. The 2,500 most-used characters have a 
Packit 423ecb
	ranking; those characters that lack this field are not ranked. The 
Packit 423ecb
	frequency is a number from 1 to 2,500 that expresses the relative 
Packit 423ecb
	frequency of occurrence of a character in modern Japanese. This is
Packit 423ecb
	based on a survey in newspapers, so it is biassed towards kanji
Packit 423ecb
	used in newspaper articles. The discrimination between the less
Packit 423ecb
	frequently used kanji is not strong.
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(rad_name, 3, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	When the kanji is itself a radical and has a name, this element
Packit 423ecb
	contains the name (in hiragana.) [T2]
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(dic_number, 4, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	This element contains the index numbers and similar unstructured
Packit 423ecb
	information such as page numbers in a number of published dictionaries,
Packit 423ecb
	and instructional books on kanji.
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(dic_ref, 3, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	Each dic_ref contains an index number. The particular dictionary,
Packit 423ecb
	etc. is defined by the dr_type attribute.
Packit 423ecb
	)
Packit 423ecb
SAX.attributeDecl(dic_ref, dr_type, 1, 2, NULL, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The dr_type defines the dictionary or reference book, etc. to which
Packit 423ecb
	dic_ref element applies. The initial allocation is:
Packit 423ecb
	  nelson_c - "Modern Reader's Japanese-English Character Dictionary",  
Packit 423ecb
	  	edited by Andrew Nelson (now published as the "Classic" 
Packit 423ecb
	  	Nelson).
Packit 423ecb
	  nelson_n - "The New Nelson Japanese-English Character Dictionary", 
Packit 423ecb
	  	edited by John Haig.
Packit 423ecb
	  halpern_njecd - "New Japanese-English Character Dictionary", 
Packit 423ecb
	  	edited by Jack Halpern.
Packit 423ecb
	  halpern_kkld - "Kanji Learners Dictionary" (Kodansha) edited by 
Packit 423ecb
	  	Jack Halpern.
Packit 423ecb
	  heisig - "Remembering The  Kanji"  by  James Heisig.
Packit 423ecb
	  gakken - "A  New Dictionary of Kanji Usage" (Gakken)
Packit 423ecb
	  oneill_names - "Japanese Names", by P.G. O'Neill. 
Packit 423ecb
	  oneill_kk - "Essential Kanji" by P.G. O'Neill.
Packit 423ecb
	  moro - "Daikanwajiten" compiled by Morohashi. For some kanji two
Packit 423ecb
	  	additional attributes are used: m_vol:  the volume of the
Packit 423ecb
	  	dictionary in which the kanji is found, and m_page: the page
Packit 423ecb
	  	number in the volume.
Packit 423ecb
	  henshall - "A Guide To Remembering Japanese Characters" by
Packit 423ecb
	  	Kenneth G.  Henshall.
Packit 423ecb
	  sh_kk - "Kanji and Kana" by Spahn and Hadamitzky.
Packit 423ecb
	  sakade - "A Guide To Reading and Writing Japanese" edited by
Packit 423ecb
	  	Florence Sakade.
Packit 423ecb
	  henshall3 - "A Guide To Reading and Writing Japanese" 3rd
Packit 423ecb
		edition, edited by Henshall, Seeley and De Groot.
Packit 423ecb
	  tutt_cards - Tuttle Kanji Cards, compiled by Alexander Kask.
Packit 423ecb
	  crowley - "The Kanji Way to Japanese Language Power" by
Packit 423ecb
	  	Dale Crowley.
Packit 423ecb
	  kanji_in_context - "Kanji in Context" by Nishiguchi and Kono.
Packit 423ecb
	  busy_people - "Japanese For Busy People" vols I-III, published
Packit 423ecb
		by the AJLT. The codes are the volume.chapter.
Packit 423ecb
	  kodansha_compact - the "Kodansha Compact Kanji Guide".
Packit 423ecb
	)
Packit 423ecb
SAX.attributeDecl(dic_ref, m_vol, 1, 3, NULL, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	See above under "moro".
Packit 423ecb
	)
Packit 423ecb
SAX.attributeDecl(dic_ref, m_page, 1, 3, NULL, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	See above under "moro".
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(query_code, 4, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	These codes contain information relating to the glyph, and can be used
Packit 423ecb
	for finding a required kanji. The type of code is defined by the
Packit 423ecb
	qc_type attribute.
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(q_code, 3, ...)
Packit 423ecb
SAX.comment(
Packit 423ecb
	The q_code contains the actual query-code value, according to the
Packit 423ecb
	qc_type attribute.
Packit 423ecb
	)
Packit 423ecb
SAX.attributeDecl(q_code, qc_type, 1, 2, NULL, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The q_code attribute defines the type of query code. The current values
Packit 423ecb
	are:
Packit 423ecb
	  skip -  Halpern's SKIP (System  of  Kanji  Indexing  by  Patterns) 
Packit 423ecb
	  	code. The  format is n-nn-nn.  See the KANJIDIC  documentation 
Packit 423ecb
	  	for  a description of the code and restrictions on  the 
Packit 423ecb
	  	commercial  use  of this data. [P]
Packit 423ecb
Packit 423ecb
	  sh_desc - the descriptor codes for The Kanji Dictionary (Tuttle 
Packit 423ecb
	  	1996) by Spahn and Hadamitzky. They are in the form nxnn.n,  
Packit 423ecb
	  	e.g.  3k11.2, where the  kanji has 3 strokes in the 
Packit 423ecb
	  	identifying radical, it is radical "k" in the SH 
Packit 423ecb
	  	classification system, there are 11 other strokes, and it is 
Packit 423ecb
	  	the 2nd kanji in the 3k11 sequence. (I am very grateful to 
Packit 423ecb
	  	Mark Spahn for providing the list of these descriptor codes 
Packit 423ecb
	  	for the kanji in this file.) [I]
Packit 423ecb
	  four_corner - the "Four Corner" code for the kanji. This is a code 
Packit 423ecb
	  	invented by Wang Chen in 1928. See the KANJIDIC documentation 
Packit 423ecb
	  	for  an overview of  the Four Corner System. [Q]
Packit 423ecb
Packit 423ecb
	  deroo - the codes developed by the late Father Joseph De Roo, and 
Packit 423ecb
	  	published in  his book "2001 Kanji" (Bojinsha). Fr De Roo 
Packit 423ecb
	  	gave his permission for these codes to be included. [DR]
Packit 423ecb
	  misclass - a possible misclassification of the kanji according
Packit 423ecb
		to one of the code types. (See the "Z" codes in the KANJIDIC
Packit 423ecb
		documentation for more details.)
Packit 423ecb
	  
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(reading_meaning, 4, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The readings for the kanji in several languages, and the meanings, also
Packit 423ecb
	in several languages. The readings and meanings are grouped to enable
Packit 423ecb
	the handling of the situation where the meaning is differentiated by 
Packit 423ecb
	reading. [T1]
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(nanori, 3, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	Japanese readings that are now only associated with names.
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(rmgroup, 4, ...)
Packit 423ecb
SAX.elementDecl(reading, 3, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The reading element contains the reading or pronunciation
Packit 423ecb
	of the kanji.
Packit 423ecb
	)
Packit 423ecb
SAX.attributeDecl(reading, r_type, 1, 2, NULL, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The r_type attribute defines the type of reading in the reading
Packit 423ecb
	element. The current values are:
Packit 423ecb
	  pinyin - the modern PinYin romanization of the Chinese reading 
Packit 423ecb
	  	of the kanji. The tones are represented by a concluding 
Packit 423ecb
	  	digit. [Y]
Packit 423ecb
	  korean_r - the romanized form of the Korean reading(s) of the 
Packit 423ecb
	  	kanji.  The readings are in the (Republic of Korea) Ministry 
Packit 423ecb
	  	of Education style of romanization. [W]
Packit 423ecb
	  korean_h - the Korean reading(s) of the kanji in hangul.
Packit 423ecb
	  ja_on - the "on" Japanese reading of the kanji, in katakana. A
Packit 423ecb
	  	second attribute r_status, if present, will indicate with
Packit 423ecb
	  	a value of "jy" whether the reading is approved for a
Packit 423ecb
	  	"Jouyou kanji".
Packit 423ecb
	  ja_kun - the "kun" Japanese reading of the kanji, in hiragana. 
Packit 423ecb
	  	Where relevant the okurigana is also included separated by a 
Packit 423ecb
	  	".". Readings associated with prefixes and suffixes are 
Packit 423ecb
	  	marked with a "-". A second attribute r_status, if present, 
Packit 423ecb
	  	will indicate with a value of "jy" whether the reading is 
Packit 423ecb
	  	approved for a "Jouyou kanji".
Packit 423ecb
	)
Packit 423ecb
SAX.attributeDecl(reading, r_status, 1, 3, NULL, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	See under ja_on and ja_kun above.
Packit 423ecb
	)
Packit 423ecb
SAX.elementDecl(meaning, 3, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The meaning associated with the kanji.
Packit 423ecb
	)
Packit 423ecb
SAX.attributeDecl(meaning, m_lang, 1, 3, NULL, ...)
Packit 423ecb
SAX.comment( 
Packit 423ecb
	The m_lang attribute defines the target language of the meaning. It 
Packit 423ecb
	will be coded using the two-letter language code from the ISO 639 
Packit 423ecb
	standard. When absent, the value "en" (i.e. English) is implied. [{}]
Packit 423ecb
	)
Packit 423ecb
SAX.externalSubset(kanjidic2, , )
Packit 423ecb
SAX.startElementNs(kanjidic2, NULL, NULL, 0, 0, 0)
Packit 423ecb
SAX.characters(
Packit 423ecb
, 1)
Packit 423ecb
SAX.endElementNs(kanjidic2, NULL, NULL)
Packit 423ecb
SAX.endDocument()