Quick Start
Find below sample code that demonstrate the fundamental features of PHP Simple HTML DOM Parser.
Read plain text from HTML document
echo file_get_html('https://www.google.com/')->plaintext;
Loads the specified HTML document into memory, parses it and returns the
plain text. Note that file_get_html
supports local files as well
as remote files!
Read plaint text from HTML string
echo str_get_html('<ul><li>Hello, World!</li></ul>')->plaintext;
Parses the provided HTML string and returns the plain text. Note that the parser handles partial documents as well as full documents.
Read specific elements from HTML document
$html = file_get_html('https://www.google.com/');
foreach($html->find('img') as $element)
echo $element->src . '<br>';
foreach($html->find('a') as $element)
echo $element->href . '<br>';
Loads the specified document into memory and returns a list of image sources as
well as anchor links. Note that find
supports CSS selectors to find elements in
the DOM.
Modify HTML documents
$doc = '<div id="hello">Hello, </div><div id="world">World!</div>';
$html = str_get_html($doc);
$html->find('div', 1)->class = 'bar';
$html->find('div[id=hello]', 0)->innertext = 'foo';
echo $html; // <div id="hello">foo</div><div id="world" class="bar">World!</div>
Parses the provided HTML string and replaces elements in the DOM before returning
the updated HTML string. In this example, the class for the second div
element
is set to bar
and the inner text for the first div
element to foo
.
Note that find
supports a second parameter
to return a single element from the array of matches.
Note that attributes can be accessed directly by the means of magic methods
(->class
and ->innertext
in the example above).
Collect information from Slashdot
$html = file_get_html('https://slashdot.org/');
$articles = $html->find('article[data-fhtype="story"]');
foreach($articles as $article) {
$item['title'] = $article->find('.story-title', 0)->plaintext;
$item['intro'] = $article->find('.p', 0)->plaintext;
$item['details'] = $article->find('.details', 0)->plaintext;
$items[] = $item;
}
print_r($items);
Collects information from Slashdot for further processing.
Note that the combination of CSS selectors and magic methods make the process of parsing HTML documents a simple task that is easy to understand.