Definitions
The definitions below are an essential part of the parser.
Node Types
The type of a node is determined during parsing and represented by one of the elements in the list below.
Type | Description |
---|---|
HDOM_TYPE_ELEMENT |
Start tag (i.e. <html> ) |
HDOM_TYPE_COMMENT |
HTML comment (i.e. <!-- Hello, World! --> ) |
HDOM_TYPE_TEXT |
Plain text (i.e. Hello, World! ) |
HDOM_TYPE_ENDTAG |
End tag (i.e. </html> ) |
HDOM_TYPE_ROOT |
Root element. There can always only be one root element in the DOM. |
HDOM_TYPE_UNKNOWN |
Unknown type (i.e. CDATA, DOCTYPE, etc...) |
Example
<!DOCTYPE html><html><!-- Hello, World! --></html>Hello, World!
Note: HDOM_TYPE_ROOT
always exists regardless of the actual document structure.
HTML | Node Type |
---|---|
HDOM_TYPE_ROOT |
|
<!DOCTYPE html> |
HDOM_TYPE_UNKNOWN |
<html> |
HDOM_TYPE_ELEMENT |
<!-- Hello, World! --> |
HDOM_TYPE_COMMENT |
</html> |
HDOM_TYPE_ENDTAG |
Hello, World! |
HDOM_TYPE_TEXT |
Quote Types
Identifies the quoting type on attribute values.
Type | Description |
---|---|
HDOM_QUOTE_DOUBLE |
Double quotes ("" ) |
HDOM_QUOTE_SINGLE |
Single quotes ('' ) |
HDOM_QUOTE_NO |
Not quoted (flag) |
Note: Attributes with no values (flags) are stored as HDOM_QUOTE_NO
.
Example
<p class="paragraph" id='info1' hidden>Hello, World!</p>
Attribute | Description |
---|---|
class="paragraph" |
HDOM_QUOTE_DOUBLE |
id='info1' |
HDOM_QUOTE_SINGLE |
hidden |
HDOM_QUOTE_NO |
Node Info Types
Each node stores additional information (metadata) that is identified by the elements below.
Type | Description |
---|---|
HDOM_INFO_BEGIN |
Cursor position for the start tag of a node. |
HDOM_INFO_END |
Cursor position for the end tag of a node. A value of zero indicates a node with no end tag (missing closing tag). |
HDOM_INFO_QUOTE |
Quote type for attribute values. The value must be an element of Quote Type. |
HDOM_INFO_SPACE |
Array of whitespace around attributes (see Attribute Whitespace). |
HDOM_INFO_TEXT |
Non-HTML text in tags (i.e. comments, doctype, etc...). |
HDOM_INFO_INNER |
Inner text of a node. |
HDOM_INFO_OUTER |
Outer text of a node. |
HDOM_INFO_ENDSPACE |
Whitespace at the end of a tag before the closing bracket. |
Attribute Whitespace
Whitespace around attributes is stored in the form of an array with three elements:
Element | Description |
---|---|
0 |
Whitespace before the attribute name. |
1 |
Whitespace between attribute name and the equal sign. |
2 |
Whitespace between the equal sign and the attribute value |
Example
<p class="paragraph" id = 'info1'hidden>Hello, World!</p>
Note: Whitespace before attribute names is not displayed in the browser. It is, however, part of the attributes.
Attribute | Description |
---|---|
class="paragraph" |
[0] => ' ', [1] => '', [2] => '' |
id = 'info1' |
[0] => ' ', [1] => ' ', [2] => ' ' |
hidden |
[0] => '', [1] => '', [2] => '' |