Visual PDF Comparison – Validating PDF Content by Image Comparison

Hello everybody, in this article we will learn to validate PDF content by image comparison. We will do a visual PDF comparison automation.

There are test cases where you need to test the PDF reporting service. Those kinds of test cases are the hardest one cause you need to deal with MIME objects. Reading and validating a PDF file is not easy. There is this great Apache PDFBox library where you can extract the content of PDF as String.

But string representation of PDF file is a little bit different than we expect. So looking for a specific string is ok but looking for a long string is not easy as it seems.

I had this kind of test cases a lot. First of all, we decided to use static data(like dates are always between 1st February and 15th February) for those kinds of tests so we know what data to expect.  Then we decided to convert the PDF file into a single image and make a visual comparison.

Dependencies that we are using are below

compile group: 'org.apache.pdfbox', name: 'pdfbox', version: '2.0.11'
compile group: 'org.apache.pdfbox', name: 'jbig2-imageio', version: '3.0.0'

Let’s see how this work…

Downloading File

First, you need to download your file. Here’s a small snippet to help you download a given url.

 public static void saveUrl(final String filename, final String urlString) {
        BufferedInputStream in = null;
        FileOutputStream fout = null;
        try {
            in = new BufferedInputStream(new URL(urlString).openStream());
            fout = new FileOutputStream(filename);
            final byte data[] = new byte[1024];
            int count;
            while ((count = in.read(data, 0, 1024)) != -1) {
                fout.write(data, 0, count);
            }
        } catch (FileNotFoundException e) {
            System.out.println("FileNotFoundException");
        } catch (MalformedURLException e) {
            System.out.println("FileNotFoundException");
        } catch (IOException e) {
            System.out.println("IOException");
        } finally {
            if (in != null) {
                try {
                    in.close();
                } catch (IOException e) {
                }
            }
            if (fout != null) {
                try {
                    fout.close();
                } catch (IOException e) {
                }
            }
        }
    }

Convert PDF into Images

Then you need to convert that PDF file into images. I got the page count of my pdf file and convert temp images for every page in the PDF file.

  List<File> fileList = new ArrayList<File>();
        PDDocument document;
        try {
            document = PDDocument.load(file);
            PDFRenderer pdfRenderer = new PDFRenderer(document);
            //Create Temporary Image from PDF documents according to page count
            for (int page = 0; page < document.getNumberOfPages(); ++page) {
                BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
                String name = "TEMP_IMAGES" + "-" + page + ".png";
                ImageIO.write(bim, "png", new File(name));
                fileList.add(new File(name));
            }
            document.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

Merge Images in One Single Image

Dealing with multiple images requires additional work at the next steps so I decided to merge them into one single file. I’ll append one image after another. You can do different stuff like appending an image next to another etc… That’s up to you.

First, we load all temporary images into a BufferedImage Array

 //Load all crated images on BufferedImage Array
        BufferedImage[] input = new BufferedImage[16];
        for (int i = 0; i < fileList.size(); i++) {
            try {
                input[i] = ImageIO.read(fileList.get(i));
            } catch (IOException x) {
            }
        }

Secondly, I create an output image. As you can see I set the height of the output file by multiplying the height of my temporary file with the count of pdf page. So I will be able to append every image one after another.

In case you want to append those images next to each other, you should do multiplication with width value.

 // Create the output image.
        BufferedImage output = new BufferedImage(
                input[0].getWidth(),
                input[0].getHeight() * fileList.size(),
                BufferedImage.TYPE_INT_ARGB);

Then, we draw each image onto the output image. As you see I add my images one after another by increasing the y value. In case you add images next to each other, you need to increase x value.

int x = 0;
int y = 0;
Graphics g = output.getGraphics();
for (int i = 0; i < input.length; i++) {
     g.drawImage(input[i], x, y, null);
     y += input[0].getHeight();
}

Finally, you save the output image into the disk.

File mergedFile = new File("FINALE_IMAGE.png");
try {
ImageIO.write(output, "PNG", mergedFile);
} catch (IOException ex) {
}

Now you have a single image formed by a PDF file.

After that, you should run image verification with ImageMagick. We have an ImageMagick example used in a Selenium test on this link. You can follow the same practice as used on that page. But that requires some changes of course. In case you need help with it. Just ping us on the comment line.

Here’s the link for a sample project: https://github.com/swtestacademy/pdfAutomation

Happy Image Testing with PDF files.

Source code can be found here: Link will be added.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.