Parsing documents
The parser accepts documents in the form of URLs, files and strings. The document
must be accessible for reading and cannot exceed MAX_FILE_SIZE
.
Name | Description |
---|---|
str_get_html( string $content ) : object |
Creates a DOM object from string. |
file_get_html( string $filename ) : object |
Creates a DOM object from file or URL. |
DOM methods & properties
Name | Description |
---|---|
__construct( [string $filename] ) : void |
Constructor, set the filename parameter will automatically load the contents, either text or file/url. |
plaintext : string |
Returns the contents extracted from HTML. |
clear() : void |
Clean up memory. |
load( string $content ) : void |
Load contents from string. |
save( [string $filename] ) : string |
Dumps the internal DOM tree back into a string. If the $filename is set, result string will save to file. |
load_file( string $filename ) : void |
Load contents from a file or a URL. |
set_callback( string $function_name ) : void |
Set a callback function. |
find( string $selector [, int $index] ) : mixed |
Find elements by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object. |
Element methods & properties
Name | Description |
---|---|
[attribute] : string |
Read or write element's attribute value. |
tag : string |
Read or write the tag name of element. |
outertext : string |
Read or write the outer HTML text of element. |
innertext : string |
Read or write the inner HTML text of element. |
plaintext : string |
Read or write the plain text of element. |
find( string $selector [, int $index] ) : mixed |
Find children by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object. |
DOM traversing
Name | Description |
---|---|
$e->children( [int $index] ) : mixed |
Returns the Nth child object if index is set, otherwise return an array of children. |
$e->parent() : element |
Returns the parent of element. |
$e->first_child() : element |
Returns the first child of element, or null if not found. |
$e->last_child() : element |
Returns the last child of element, or null if not found. |
$e->next_sibling() : element |
Returns the next sibling of element, or null if not found. |
$e->prev_sibling() : element |
Returns the previous sibling of element, or null if not found. |
Camel naming conventions
Method | Mapping |
---|---|
$e->getAllAttributes() |
$e->attr |
$e->getAttribute( $name ) |
$e->attribute |
$e->setAttribute( $name, $value) |
$value = $e->attribute |
$e->hasAttribute( $name ) |
isset($e->attribute) |
$e->removeAttribute ( $name ) |
$e->attribute = null |
$e->getElementById ( $id ) |
$e->find ( "#$id", 0 ) |
$e->getElementsById ( $id [,$index] ) |
$e->find ( "#$id" [, int $index] ) |
$e->getElementByTagName ($name ) |
$e->find ( $name, 0 ) |
$e->getElementsByTagName ( $name [, $index] ) |
$e->find ( $name [, int $index] ) |
$e->parentNode () |
$e->parent () |
$e->childNodes ( [$index] ) |
$e->children ( [int $index] ) |
$e->firstChild () |
$e->first_child () |
$e->lastChild () |
$e->last_child () |
$e->nextSibling () |
$e->next_sibling () |
$e->previousSibling () |
$e->prev_sibling () |