clubmate.fi

A good[ish] website

Web development blog, loads of UI and JavaScript topics

Lazyload inserted images in WordPress posts (without regex)

Filed under: Media— Tagged with: lazyload, base64, php

Look into a simple DOM parsing task with PHP, something that is more common to JavaScript.

Here’s the basic mechanics behind a JavaScript based lazyloading: most lazyload solutions use a data-src attribute, instead of the normal src attribute. When browser sees an image with data-src, it does nothing, because it doesn’t mean anything. When the image is wanted to be displayed (when it enters the viewport), the lazyload script takes what’s in data-src, puts it into the src, hence revealing the image to the observer.

So, we want to make normal images that are inserted into a post body, to look like this:

<img data-src="image.jpg" src="placeholder-image.gif" />

The "Multilingual Plane" and the regex dilemma

You might think regex. But as the the most up-voted StackOverflow answer says it’s a bad idea to parse HTML with regex:

[...] Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. [...]

What then?

Some sort of a dom parser. Here's another great SO thread on just that, there are plenty of options. This article looks into using the Simple HTML DOM Parser. It’s much like jQuery but only for PHP (weird clash of the worlds).

Simple HTML DOM Parser

How to edit DOM elements:

// First include it to your template
require_once('assets/simple_html_dom.php');

// Create the dom object, this can be a file just as well
$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');

// Find the second div in the DOM, and give it a class 'bar'
// If you specify a number in the find, it'll output a string
$html->find('div', 1)->class = 'bar';

// Find the first occurrence div with an id of hello,
// and Change it's text to 'hello'
$html->find('div[id=hello]', 0)->innertext = 'foo';

echo $html;

This outputs:

<div id="hello">foo</div>
<div id="world" class="bar">World</div>

If you don’t specify a number in find helpers second parameter: $html->find('div'), then it returns an array of all the required DOM objects. The array can be accessed via foreach loop:

$html = str_get_html($content);
// Grab all the img tags and loop through them
foreach ($html->find('img') as $element) {
  // Add 'lazy' class to the image
  $element->class = 'lazy ' . $element->class;
}
echo $html;

The lazyload example

Here’s the before mentioned lazy load trick. Note that this is very small scale parsing though, and probably the "unholy child does not weep the blood of virgins" if regex is used here.

A normal image might look something like this:

<img src="http://clubmate.fi/uploads/image.jpg" class="size-full" />

We'd like it to look like this:

<img
  src="data:image/gif;base64,R0lGODlhAQABAIAAAMLCwgAAACH5BAAAAAAALAAAAAABAAEAAAICRAEAOw=="
  data-src="http://clubmate.fi/uploads/image.jpg"
  class="lazy size-full"
/>

Let’s look at the constituent parts of the element:

src, this is the placeholder image, we're using a 1px gif here, is Base64 encoded data URI to save one HTTP request. Check here for more 1px data URIs.
data-src, path to the image.
class, lazy class, good to have but no means mandatory.

Here's a function that takes a post content and works it's modifying magic on it:

function cm_add_image_placeholders($content) {
  $html = str_get_html($content, '', '', '', false);
  $placeholder = 'data:image/gif;base64,R0lGODlhAQABAIAAAMLCwgAAACH5BAAAAAAALAAAAAABAAEAAAICRAEAOw==';

  foreach ($html->find('img') as $element) {
    // Element class, prepend lazy to it
    $element->class = 'lazy ' . $element->class;
    // `data-src` attribute, note the bracket syntax cause of the hyphen
    $element->{'data-src'} = $element->src;
    // Placeholder image to the src
    $element->src = $placeholder;
  }

  return $html;
}
// This WP specific filter applies the changes to a post
add_filter('the_content', 'add_image_placeholders', 99);

Notice the third line: str_get_html($content, '', '', '', false);, the fifth parameter is set to false because it will otherwise strip line breaks out.

See the Simple HTML DOM Parser docs for more examples.

Conclusions

There’s also a native PHP methods to traverse and parse the DOM, that looks surprisingly like JavaScript:

$dom = new DOMDocument;
$dom->loadXML($xml);
$books = $dom->getElementsByTagName('book');

foreach ($books as $book) {
  echo $book->nodeValue, PHP_EOL;
}

Comments would go here, but the commenting system isn’t ready yet, sorry.