Find Broken URLs using Selenium with Multiple Threads

In this tutorial, I will tell you how to find broken URLs on a website by using Selenium. Let’s get started!

I came across a customer who has problems with URL s located on their web pages. Because of many deployments, sometimes links get broken or redirects are not working fine. In case something like that happens, monitoring teams warn them immediately, which is annoying :)

They were curious about how to check if all URLs get HTTP 200 responses or not.

They are using Selenium WebDriver to test the website, so our approach is “why not use our driver to fetch all href attribute in the page and make a GET call to them.”

Here’s how we do it!

Step 1: We need to open a web page and fetch all elements having href attribute.

Step 2: As there are more than 500 links on those pages, it takes very long to check all URLs. So basic multi-threading works very well. So we create 30 threads to fetch all URLs.

Step 3: Our threads have the basic implementation of HTTPClient and make a GET request to URLs.

Step 4: In case of the HTTP response is other than 200, we add this URL to a list to be able to report them after the test execution.

Step 5: After all the links are controlled, we check our error list. In case the list is not empty, we fail with an assertion.

Here’s a not fancy but useful code snippet that you can also use.

public class App
{
   public static List<String> errorList = new ArrayList<String>();
   ExecutorService executor;
   ChromeDriver driver;
  
   @Test
   public void URLCheckTest()
   {
       int MYTHREADS = 30;
       executor = Executors.newFixedThreadPool(MYTHREADS);
       System.setProperty("webdriver.chrome.driver", "chromedrivermac");
       driver = new ChromeDriver();
       driver.manage().window().maximize();
       driver.navigate().to("http://www.yourwebsite.com");
       List<WebElement> list = driver.findElements(By.xpath(".//a[@href!='']"));
 
       for (int i = 0; i < list.size(); i++) {
           WebElement element = list.get(i);
           Runnable worker = new MyRunnable(element);
           executor.execute(worker);
       }
       executor.shutdown();
 
       while (!executor.isTerminated()) {
       }
       if(errorList.size()>0) {
           for (String link: errorList
                ) {
               System.out.println("Url = "+link);
           }
           Assert.assertTrue(false);
       }
 
   }
 
   private static void sendGet(WebElement element){
           String href = element.getAttribute("href");
           HttpClient client = new DefaultHttpClient();
           HttpGet request = new HttpGet(href);
           HttpResponse response = null;
           try {
               response = client.execute(request);
           } catch (IOException e) {
               e.printStackTrace();
           }
           if(response.getStatusLine().getStatusCode()!=200)
               errorList.add(href);
 
   }
 
   public static class MyRunnable implements Runnable {
       private final WebElement hrefEl;
 
       MyRunnable(WebElement el) {
           this.hrefEl = el;
       }
 
       @Override
       public void run() {
           sendGet(hrefEl);
       }
   }

}

In this article, we learned how to find broken URLs by using Selenium and Java with multithreading. I hope you enjoyed reading it.

You can also check how to find broken links on a website using Selenium and Java Streams in this article.

Thanks for reading,
Canberk

3 thoughts on “Find Broken URLs using Selenium with Multiple Threads”

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.