Definitions

The definitions below are an essential part of the parser.

Node Types

The type of a node is determined during parsing and represented by one of the elements in the list below.

Type Description
HDOM_TYPE_ELEMENT Start tag (i.e. <html>)
HDOM_TYPE_COMMENT HTML comment (i.e. <!-- Hello, World! -->)
HDOM_TYPE_TEXT Plain text (i.e. Hello, World!)
HDOM_TYPE_ENDTAG End tag (i.e. </html>)
HDOM_TYPE_ROOT Root element. There can always only be one root element in the DOM.
HDOM_TYPE_UNKNOWN Unknown type (i.e. CDATA, DOCTYPE, etc...)

Example

<!DOCTYPE html><html><!-- Hello, World! --></html>Hello, World!

Note: HDOM_TYPE_ROOT always exists regardless of the actual document structure.

HTML Node Type
HDOM_TYPE_ROOT
<!DOCTYPE html> HDOM_TYPE_UNKNOWN
<html> HDOM_TYPE_ELEMENT
<!-- Hello, World! --> HDOM_TYPE_COMMENT
</html> HDOM_TYPE_ENDTAG
Hello, World! HDOM_TYPE_TEXT

Quote Types

Identifies the quoting type on attribute values.

Type Description
HDOM_QUOTE_DOUBLE Double quotes ("")
HDOM_QUOTE_SINGLE Single quotes ('')
HDOM_QUOTE_NO Not quoted (flag)

Note: Attributes with no values (flags) are stored as HDOM_QUOTE_NO.

Example

<p class="paragraph" id='info1' hidden>Hello, World!</p>
Attribute Description
class="paragraph" HDOM_QUOTE_DOUBLE
id='info1' HDOM_QUOTE_SINGLE
hidden HDOM_QUOTE_NO

Node Info Types

Each node stores additional information (metadata) that is identified by the elements below.

Type Description
HDOM_INFO_BEGIN Cursor position for the start tag of a node.
HDOM_INFO_END Cursor position for the end tag of a node. A value of zero indicates a node with no end tag (missing closing tag).
HDOM_INFO_QUOTE Quote type for attribute values. The value must be an element of Quote Type.
HDOM_INFO_SPACE Array of whitespace around attributes (see Attribute Whitespace).
HDOM_INFO_TEXT Non-HTML text in tags (i.e. comments, doctype, etc...).
HDOM_INFO_INNER Inner text of a node.
HDOM_INFO_OUTER Outer text of a node.
HDOM_INFO_ENDSPACE Whitespace at the end of a tag before the closing bracket.

Attribute Whitespace

Whitespace around attributes is stored in the form of an array with three elements:

Element Description
0 Whitespace before the attribute name.
1 Whitespace between attribute name and the equal sign.
2 Whitespace between the equal sign and the attribute value

Example

<p class="paragraph" id = 'info1'hidden>Hello, World!</p>

Note: Whitespace before attribute names is not displayed in the browser. It is, however, part of the attributes.

Attribute Description
class="paragraph" [0] => ' ', [1] => '', [2] => ''
id = 'info1' [0] => ' ', [1] => ' ', [2] => ' '
hidden [0] => '', [1] => '', [2] => ''