Test Driven Development of PDFs With PHPUnit and PDFBox

On several of the sites I manage we use wkhtmltopdf to generate PDF files. One of the downsides to this process is that it’s hard to test the generation of the PDF. To overcome this we use PDFBox so we can inspect what’s inside the PDF so we can use TDD.

Install the Necessary Software

If you don’t already have it installed you’ll need to install Java.

Next you’ll need to download PDFBox and place it somewhere on your system. The file is “only” 14 MB so I actually include it in the repo for my project so everyone who is working on the projects is using the same version.

Helper Function

The PHPUnit documentation has you extend the TestCase class. I always create a TestCase class specific to my project that extends TestCase so I add functions I use over and over again (for example, you might need to initialize the same type of class without dependency injection over and over). To this class we’re going to add a helper function so we can use PDFBox to extract the data .

public function getHtmlFromPdf($filename)
{
    $txt = tempnam("/tmp/", "getHtmlFromPdf");
    $path = "/path/to/pdfbox-app-2.0.3.jar";  // replace this with your own path
    $cmd = "java -jar {$path} ExtractText -encoding UTF-8 -html {$filename} {$txt}";
    exec($cmd);
    $streamOut = file_get_contents($txt);
    unlink($txt);
    return $streamOut;
}

Example

<?php
use Example\Tests\TestCase;
use Example\Invoice;

class InvoiceTest extends TestCase
{
    public function testCanGeneratePdfWithInvoiceTotal()
    {
        $invoice = new Invoice();
        $invoice->setTotal(1.5);
        $invoice->generatePdf();

        $body = $this->getHtmlFromPdf();
        $this->assertRegExp("/1.50/", $body, "Should have the invoice total listed")
    }
}

You’ll still need to do a visual inspection of a generated PDF but this will make sure the important parts are included.