Find Broken Links Using Selenium and Java Streams (Fast and Effective)

In this article, we will do an example to find broken links using Selenium and Java Streams on a webpage. This is one of the fastest and effective solutions than many other solutions on the web. Let’s get started!

Find Broken Links Solution Tech Stack

In this example, we will use Selenium 4 and Java Streams. I installed JDK 17 on my machine to get the latest features of Java.

We will use Selenium to open any website and then in the website by using Selenium Webdriver API, to find elements by tag name “a” and then we will get the “href” attribute of those elements which are the links.

After this step, we will filter the null and empty links, then we can filter any kinds of filtering based on our requirements. Then, to remove duplicate links we will use the distinct() method.

And finally, after many filtering operations, we will hit those URLs with our mini HTTPConnectionUtil class’s getResponseCode() method to get the status codes. If those status codes are in our accepted status code List then the tests will pass, otherwise will fail.

Let’s start with the coding. First, we will write our mini HTTP Connection Utility class which provides us HTTP response codes. You can change the timeout duration based on your requirements.

public class HTTPConnectionUtil {
    @SneakyThrows
    public static int getResponseCode(String address) {
        HttpURLConnection httpURLConnection = (HttpURLConnection) new URL(address).openConnection();
        httpURLConnection.setConnectTimeout(8000);
        return httpURLConnection.getResponseCode();
    }
}

Now, I will do three different implementations and both do the same operation.

In the example below, we have acceptedStatusCodeList and defined that 200, 301, 302, and 403 status codes are OK for us. You can change the accepted status codes based on your requirements for further reading of HTTP status codes please refer here.

We have also the Predicate function which provides us the status code is Ok or Not Ok.

And in our stream pipeline we do the following:

Find the elements which have “a” tag.
By using parallel we do these operations in parallel which increases the speed of the stream operation.
Then, in those elements, we get the value of “href” attributes. (These are our URLs / Links).
Next, we do filtering operations for the links like non-null, not-empty, not contains “javascript” and “*&”. You can do as much filtering as you want based on your needs. (You can omit this filter and only use the next filter based on your cases. I put it as just an example. You can do your own customization.)
Then, we can filter the links starting with HTTP and HTTPS.
With the distinct() method, we remove duplicate links.
And, finally by using the Predicate function isStatusCodeOk.negate() which means “status codes are not OK”, we filter the Not Ok links.
With the peek() function, we print the not ok links.
and with the count() method, we count the not ok links.

The stream pipeline gives us the not ok link count and at the final line, we do the assertion and check the count is bigger than zero. If it is bigger than zero our test fails otherwise passes.

@TestMethodOrder(MethodOrderer.OrderAnnotation.class)
public class BrokenLinks {
    private WebDriver driver;
    List<Integer> acceptedStatusCodeList = new ArrayList<>();
    Predicate<String> isStatusCodeOk = link -> acceptedStatusCodeList.contains(HTTPConnectionUtil.getResponseCode(link));

    @BeforeEach
    public void setup(TestInfo testInfo) {
        System.out.println("Test name: " + testInfo.getDisplayName());
        driver = new ChromeDriver();
        driver.get("https://www.swtestacademy.com");

        //Filter the status codes
        Collections.addAll(acceptedStatusCodeList, 200, 301, 302, 403);
    }

    @AfterEach
    public void tearDown() {
        System.out.println("");
        driver.quit();
    }

    @Test
    @Order(1)
    public void swTestAcademyHomePageBrokenLinksTest1() {
        long count = driver.findElements(By.tagName("a"))
            .stream()
            .parallel()
            .map(element -> element.getAttribute("href"))
            .filter(Objects::nonNull) ////filter the not null links.
            .filter(link -> !link.isEmpty()) //filter the non-empty links.
            .filter(link -> !link.contains("javascript") && !link.contains("*&")) //Filter other link related patterns.
            .filter(link -> link.startsWith("http") || link.startsWith("https")) //Filter links started http and https.
            .distinct() //remove duplicate links
            .filter(isStatusCodeOk.negate()) //Filter the Not Ok status codes
            .peek(link -> System.out.println("Failed Link: " + link + " Response Code: " + HTTPConnectionUtil.getResponseCode(link)))
            .count();

        System.out.println("Count: " + count);
        Assertions.assertFalse(count > 0);
    }
}

Output

Second Way to Find Broken Links with AnyMatch() Method

In a second way, we use anyMatch() terminal operation to find the broken links. If any of the status codes matches the not accepted status codes then the test fails, otherwise the test passes.

@Test
public void swTestAcademyHomePageBrokenLinksTest2() {
    boolean result = driver.findElements(By.tagName("a"))
        .stream()
        .parallel()
        .map(element -> element.getAttribute("href"))
        .filter(Objects::nonNull) ////filter the not null links.
        .filter(link -> !link.isEmpty()) //filter the non-empty links.
        .filter(link -> !link.contains("javascript") && !link.contains("*&")) //Filter other link related patterns.
        .filter(link -> link.startsWith("http") || link.startsWith("https")) //Filter links started http and https.
        .distinct() //remove duplicate links
        .peek(link -> System.out.println("Link: " + link + " Response Code: " + HTTPConnectionUtil.getResponseCode(link)))
        .anyMatch(isStatusCodeOk.negate());

    Assertions.assertFalse(result);
}

Output

Third Way to Find Broken Links by Checking List Size

And the last example, we collect the not ok links as a list and check the not ok link size is bigger than zero or not. If the size of the list is bigger than zero then the test fails otherwise passes.

@Test
public void swTestAcademyHomePageBrokenLinksTest3() {
    List<String> brokenLinkList = driver.findElements(By.tagName("a"))
        .stream()
        .parallel()
        .map(element -> element.getAttribute("href"))
        .filter(Objects::nonNull) ////filter the not null links.
        .filter(link -> !link.isEmpty()) //filter the non-empty links.
        .filter(link -> !link.contains("javascript") && !link.contains("*&")) //Filter other link related patterns.
        .filter(link -> link.startsWith("http") || link.startsWith("https")) //Filter links started http and https.
        .distinct() //remove duplicate links
        .filter(isStatusCodeOk.negate())
        .collect(Collectors.toList());

    Assertions.assertFalse(brokenLinkList.size() > 0, brokenLinkList.toString());
}

Output

GitHub Project

How to find broken links with selenium and java streams

In this article, I explained how to find broken links with Selenium and Java Streams in an effective and fast way. I hope you enjoyed reading it.

You can also read an alternative solution written by Canberk Akduygu. Find Broken URLs using Selenium with Multiple Treads. Check it here!

See you in another article,
Onur Baskirt

Onur Baskirt

Onur Baskirt is a Software Engineering Leader with international experience in world-class companies. Now, he is a Software Engineering Lead at Emirates Airlines in Dubai.

Find Broken Links Solution Tech Stack

Second Way to Find Broken Links with AnyMatch() Method

Third Way to Find Broken Links by Checking List Size

GitHub Project

Leave a Comment Cancel reply