Skip to content

Commit 3b68bc5

Browse files
committed
Define an option so loadHTML does not drop whitespace
Certain builds of PHP seem to drop specific whitespace during the HTML parsing step. There seems to be no reason for this and the behaviour has been seen for versions of PHP ranging all the way from 5.6 to 7.3. The behaviour seems to be sidestepped by providing any supported parsing option to the loadHTML method. LIBXML_NOWARNING was chosen as it seemed like it would have the least impact overall. For a PHP test to surface the behaviour, as well as the test of the effect of constants please see: https://gist.github.com/Zegnat/a94489e9b7d5501193e724e336bc6052 Huge thanks to everyone in #indieweb-dev who went on this journey with me! Especially @cweiske and @Lewiscowles1986 for all the extra testing, and @gRegorLove for getting the ball rolling with parsing options.
1 parent 784b6a6 commit 3b68bc5

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

Mf2/Parser.php

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -362,7 +362,7 @@ public function __construct($input, $url = null, $jsonMode = false) {
362362
$doc = $doc->loadHTML($input);
363363
} else {
364364
$doc = new DOMDocument();
365-
@$doc->loadHTML(unicodeToHtmlEntities($input));
365+
@$doc->loadHTML(unicodeToHtmlEntities($input), \LIBXML_NOWARNING);
366366
}
367367
} elseif (is_a($input, 'DOMDocument')) {
368368
$doc = clone $input;

tests/Mf2/ParserTest.php

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -817,9 +817,9 @@ public function testNotMutatingPassedInDOM() {
817817
// Use same parsing as Parser::__construct(), twice to have a comparison object.
818818
libxml_use_internal_errors(true);
819819
$refDoc = new \DOMDocument();
820-
@$refDoc->loadHTML(Mf2\unicodeToHtmlEntities($input));
820+
@$refDoc->loadHTML(Mf2\unicodeToHtmlEntities($input), \LIBXML_NOWARNING);
821821
$inputDoc = new \DOMDocument();
822-
@$inputDoc->loadHTML(Mf2\unicodeToHtmlEntities($input));
822+
@$inputDoc->loadHTML(Mf2\unicodeToHtmlEntities($input), \LIBXML_NOWARNING);
823823

824824
// For completion sake, test PHP itself.
825825
$this->assertEquals($refDoc, $inputDoc, 'PHP could not create identical DOMDocument instances.');

0 commit comments

Comments
 (0)