如何使用Selenium Automation测试PDF文件？

PDF文档是小型，高度安全的文件。几乎所有企业都使用PDF来处理其文件。原因是无论使用哪种工具访问PDF文件，都保持格式的显着特征。毫不奇怪，我们的所有发票，正式文件，合同文件，登机牌，银行对帐单等通常都是PDF格式。

即使是开发人员，我们也会遇到需要验证PDF文件或将其用于定位数据某些部分的情况。考虑到您有足够的时间来闲暇，您可以手动执行此操作，也可以选择自动化测试。在使用自动化处理此类文件的棘手组件时，似乎有点棘手。但是事实并非如此。Selenium测试自动化可以非常轻松地测试PDF文件格式。

在此博客文章中，我们将探讨Selenium测试PDF文件的棘手主题，并提出不同的解决方案来使用自动化处理PDF文档。

为什么测试PDF文件很重要？ 在当今世界中，PDF文件格式通常用于生成正式信函，文档，合同和其他重要文件。主要是因为无法编辑PDF，而可以编辑Word格式。因此，以PDF格式存储机密信息被认为是一种良好的安全措施。

此类高安全性文档必须始终包含准确的详细信息，并且必须确保所提供的信息得到验证。PDF文档的生成方式应使人类可以阅读，但不能被机器阅读。手动完成验证和验证文档可能很容易，但这带来了与时间相关的主要挑战。

当必须自动进行验证时会发生什么？

这就是自动化测试人员面临的复杂性之一，这就是Selenium测试PDF文件的源泉。让我给您一个实际的示例，其中测试PDF文档成为基本的设计要求。

在银行系统中，当我们需要特定时期的对帐单时，该对帐单将以PDF格式下载。该文件将包含用户的基本信息以及规定期间内的交易。

如果在上线之前未对这种设计进行高精度验证，则最终用户将在其帐户对帐单中面临多个差异。因此，负责测试此要求的人员必须确保在帐户对帐单中打印的所有详细信息与客户执行的信息或操作完全匹配。

我希望这可以证明Selenium测试PDF文件的机智。让我们通过向您展示可以使用Selenium进行PDF测试的不同操作来开始本Selenium测试PDF文件教程。

如何在Selenium Webdriver中处理PDF？ 为了在Selenium测试自动化中处理PDF文档，我们可以使用一个名为PDFBox的Java库。Apache PDFBox是一个开放源代码库，专门用于处理PDF文档。我们可以使用它来验证文档中存在的文本，提取文档中文本或图像的特定部分，等等。要在Selenium测试PDF文件中使用此功能，我们需要在pom.xml文件中添加maven依赖项，或将其添加为外部jar。

要将其添加为Maven依赖项，请执行以下操作：

导航到以下URL https://mvnrepository.com/artifact/org.apache.pdfbox

选择最新版本并将其放置在pom.xml文件中。Maven的依赖关系如下所示

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.20</version>
</dependency>

要添加为外部jar：

在以下路径中下载jar文件 https://repo1.maven.org/maven2/org/apache/pdfbox/pdfbox/2.0.20/
转到您的项目，然后选择配置Build Path并添加外部jar文件，如下所示。pdf格式

在项目中添加依赖项或jar之后，最好使用编码部分。

验证PDF中的内容 在关于Selenium测试PDF文件的本教程的下一步中，我们将找到如何验证PDF内容的方法。要检查PDF文档中是否存在特定文本，我们使用PDFTextStripper，可以从中导入 org.apache.pdfbox.util.PDFTextStripper。

这是我们可以使用Selenium进行PDF测试并验证其内容的代码。

package Automation;
import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

import org.testng.Assert;
import org.testng.annotations.AfterTest;
import org.testng.annotations.BeforeTest;
import org.testng.annotations.Test;
public class PdfHandling {

    WebDriver driver = null;

    @BeforeTest
    public void setUp() {
        //specify the location of the driver
        System.setProperty("webdriver.chrome.driver", "C:\\Users\\Shalini\\Downloads\\Driver\\chromedriver.exe");

        //instantiate the driver
        driver = new ChromeDriver();
    }

    @Test
    public void verifyContentInPDf() {
        //specify the url of the pdf file
        String url ="http://www.pdf995.com/samples/pdf.pdf";
        driver.get(url);
        try {
            String pdfContent = readPdfContent(url);
            Assert.assertTrue(pdfContent.contains("The Pdf995 Suite offers the following features"));
        } catch (MalformedURLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    @AfterTest
    public void tearDown() {
        driver.quit();
    }


    public static  String readPdfContent(String url) throws IOException {

        URL pdfUrl = new URL(url);
        InputStream in = pdfUrl.openStream();
        BufferedInputStream bf = new BufferedInputStream(in);
        PDDocument doc = PDDocument.load(bf);
        int numberOfPages = getPageCount(doc);
        System.out.println("The total number of pages "+numberOfPages);
        String content = new PDFTextStripper().getText(doc);
        doc.close();

    return content;
}

    public static int getPageCount(PDDocument doc) {
        //get the total number of pages in the pdf document
        int pageCount = doc.getNumberOfPages();
        return pageCount;

    }

}

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">
<suite name="PDF Handling">
  <test name="Verify Pdf content">
      <classes>
      <class name="Automation.PdfHandling"/>
      </classes>
  </test> 
 </suite>

要运行测试，请单击类-> Run As-> TestNG Test。

输出控制台将显示指示成功和失败案例的默认测试报告。

下载PDF文件 有时，在开始使用Selenium测试PDF文件之前，我们需要下载它们。要从网页下载PDF文件，我们需要指定定位器以标识要下载的链接。我们还需要禁用弹出窗口，该弹出窗口要求我们指定下载文件的放置路径。

在我们开始Selenium测试PDF文件之前，这是可用于下载PDF的代码。

package Automation;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.testng.annotations.AfterTest;
import org.testng.annotations.BeforeTest;
import org.testng.annotations.Test;
public class DownloadPdf {

    WebDriver driver = null;
    @BeforeTest
    public void setUp() {
        System.setProperty("webdriver.chrome.driver", "C:\\Users\\Shalini\\Downloads\\Driver\\chromedriver.exe");

        ChromeOptions options = new ChromeOptions();
        Map<String, Object> prefs = new HashMap<String, Object>();
        prefs.put("download.prompt_for_download", false);
        options.setExperimentalOption("prefs", prefs);
        driver = new ChromeDriver(options);
    }


    @Test
    public void downloadPdf() {
    driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
    driver.manage().window().maximize();
    driver.get("https://www.learningcontainer.com/sample-pdf-files-for-testing");
    //locator to click the pdf download link
    driver.findElement(By.xpath("//*[@id=\"bfd-single-download-810\"]/div/div[2]/a/p[1]/strong")).click();
    }
    @AfterTest
    public void tearDown() {
        driver.quit();
    }
}

Console output

TestNG output console

设置PDF文档的开始 使用Selenium测试PDF文件，验证一个小的PDF文件将是一件容易的事。但是，您将如何处理更大尺寸的文件？解决方案很简单。您可以设置PDF的起始页，然后继续使用Selenium进行PDF测试的验证。

如果您查看我在本文中提到的示例PDF链接，则该页面包含5页，并且简介从第2页开始。如果在代码中将起始页设置为2，并且打印了文本，则可能会看到其中的内容从第二页开始打印。如前所述，如果文件很大，则可以设置文档的开头，提取内容，然后仅验证内容。

以下是用于设置Selenium测试PDF文件的文档开始的简单代码。

package Automation;
import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.testng.Assert;
import org.testng.annotations.AfterTest;
import org.testng.annotations.BeforeTest;
import org.testng.annotations.Test;
public class ExtractContent {
    WebDriver driver = null;
    @BeforeTest
    public void setUp() {
        //specify the location of the driver
        System.setProperty("webdriver.chrome.driver", "C:\\Users\\Shalini\\Downloads\\Driver\\chromedriver.exe");

        //instantiate the driver
        driver = new ChromeDriver();
    }

    @Test
    public void verifyContentInPDf() {
        //specify the url of the pdf file
        String url ="http://www.pdf995.com/samples/pdf.pdf";
        driver.get(url);
        try {
            String pdfContent = readPdfContent(url);
            System.out.println(pdfContent);
            Assert.assertTrue(pdfContent.contains("Introduction"));
        } catch (MalformedURLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }


    @AfterTest
    public void tearDown() {
        driver.quit();
    }


    public static  String readPdfContent(String url) throws IOException {

        URL pdfUrl = new URL(url);
        InputStream in = pdfUrl.openStream();
        BufferedInputStream bf = new BufferedInputStream(in);
        PDDocument doc = PDDocument.load(bf);
        PDFTextStripper pdfStrip = new PDFTextStripper();
        pdfStrip.setStartPage(2);
        String content = pdfStrip.getText(doc);
        doc.close();

    return content;
}

    public static int getPageCount(PDDocument doc) {
        //get the total number of pages in the pdf document
        int pageCount = doc.getNumberOfPages();
        return pageCount;

    }
}

Console output

控制台从第二页开始显示内容。

正如我们在本教程前面所讨论的Selenium测试PDF文件中所讨论的那样-当文件很大时，您可以设置文档的起始页并提取内容，然后继续进行验证。

但是，如果您必须打印特定页面的全部内容该怎么办？

如果仅设置起始页面并打印内容，则将从指定页面开始的所有内容都将打印到文档末尾。如果文件很大，那不是一个好选择。相反，我们也可以设置文档的首页！

这会使Selenium测试PDF文件更容易吗？

如果要打印从第2页到第3页的内容，可以在代码中设置以下选项。

pdfStrip。setStartPage（2）;
pdfStrip。setEndPage（3）;

如果要打印单个页面的全部内容，则可以提及与起始页和末页相同的页码。

pdfStrip.setStartPage(2);
pdfStrip.setEndPage(2);

在本Selenium测试PDF文件教程的下一部分，我们将研究在基于云的平台上使用Selenium Grid进行的PDF测试。

使用Selenium LambdaTest网格进行PDF测试 我们上面执行的所有使用Selenium进行PDF测试的操作也可以在在线Selenium网格上执行。LambdaTest网格提供了一个很棒的选项，可以自动在云中进行测试。我们可以在多种环境或浏览器中进行测试，这有助于我们确定网页的行为。

现在，在此Selenium测试PDF文件教程中，我们将看到如何实现与LambdaTest网格中上面处理的相同的PDF操作。

要在LambdaTest网格中进行Selenium测试PDF文件，我们需要创建一个帐户。您可以在这里免费注册。

登录后，将为您提供一个用户名和一个访问密钥，可通过单击下面突出显示的密钥图标来查看。

用户名和访问密钥必须在下面的代码中替换。

package Automation;
import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.openqa.selenium.remote.DesiredCapabilities;
import org.openqa.selenium.remote.RemoteWebDriver;
import org.testng.Assert;
import org.testng.annotations.AfterTest;
import org.testng.annotations.BeforeTest;
import org.testng.annotations.Test;
public class PdfHandlingInGrid {

    String username = "Your username";      //Enter your username 
    String accesskey = "Your access Key";       //Enter your accesskey
    static RemoteWebDriver driver = null;
    String gridURL = "@hub.lambdatest.com/wd/hub";

    boolean status = false;

    @BeforeTest
    public void setUp()throws MalformedURLException  
    {           

        DesiredCapabilities capabilities = new DesiredCapabilities();
        capabilities.setCapability("browserName", "chrome");      //To specify the browser
        capabilities.setCapability("version", "70.0");            //To specify the browser version
        capabilities.setCapability("platform", "win10");          // To specify the OS
       capabilities.setCapability("build", "PdfTestLambdaTest"); //To identify the test 
        capabilities.setCapability("name", "PDFHandling");
        capabilities.setCapability("network", true);        //To enable network logs
        capabilities.setCapability("visual", true);                   // To enable step by step screenshot
        capabilities.setCapability("video", true);          // To enable video recording
        capabilities.setCapability("console", true);            // To capture console logs
        try {
            driver = new RemoteWebDriver(new URL("https://" + username + ":" + accesskey + gridURL), capabilities);
        } catch (MalformedURLException e) {
            System.out.println("Invalid grid URL");
        } catch (Exception e) {
            System.out.println(e.getMessage());
        }

}
        @Test
        public void pdfHandle() {
            String url ="http://www.pdf995.com/samples/pdf.pdf";
            driver.get(url);
            try {
                String pdfContent = readPdfContent(url);
                System.out.println(pdfContent);
               Assert.assertTrue(pdfContent.contains("Introduction"));
            } catch (MalformedURLException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

     @AfterTest
     public void tearDown() {
        driver.quit();
}   

     public static  String readPdfContent(String url) throws IOException {

         URL pdfUrl = new URL(url);
        InputStream in = pdfUrl.openStream();
        BufferedInputStream bf = new BufferedInputStream(in);
        PDDocument doc = PDDocument.load(bf);
        PDFTextStripper pdfStrip = new PDFTextStripper();
        pdfStrip.setStartPage(2);
        pdfStrip.setEndPage(2);

        String content = pdfStrip.getText(doc);
        doc.close();
       return content;
    }

    public static int getPageCount(PDDocument doc) {
        int pageCount = doc.getNumberOfPages();
        return pageCount;

    }
}

TestNG.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">
<suite name="PDF Handling">
  <test name="Verify Pdf content">
      <classes>
      <class name="Automation.PdfHandlingInGrid"/>
      </classes>
  </test> 
 </suite>

Console output

控制台输出仅在第二页中显示PDF文档的内容，因为起始页和结束页都具有相同的含义。

如何在LambdaTest仪表板中查看测试？ Selenium测试PDF文件的下一个主要步骤是查看测试结果并进行验证。成功执行测试用例后，导航至LambdaTest仪表板页面。此页面显示有关已运行测试的简短描述。

要获取有关每个测试的详细信息，请导航到“自动化”选项卡。

在LambdaTest网格中运行的测试将放置在源代码提供的目录中。在代码中，我们将路径名设置为PdfTestLambdaTest，这将有助于我们在仪表板中找到测试。

capabilities.setCapability("build", "PdfTestLambdaTest"); //To identify the test

LambdaTest还提供各种过滤器来标识运行的测试。可以根据执行日期，构建名称以及构建状态来过滤测试。通过单击构建，我们将导航到详细的测试页面，其中将列出在特定构建中运行的所有测试。

将列出有关浏览器，其版本，测试状态的信息，并在网格中运行时记录测试，并且可以借助视频记录功能轻松跟踪和修复测试执行过程中的任何故障。这将Selenium测试PDF文件带到了一个新的层次。

以下是在LambdaTest网格中运行的测试结果的屏幕截图。

Wrapping Up!

到目前为止，我已经解释了使用硒进行PDF测试的必要性。这篇有关Selenium测试PDF文件的帖子解释了有关使用Apache PDFBox，PDFTextStripper和TestNG断言的所有内容。从特定页面提取内容到验证其内容，您可以在LambdaTest中执行所有这些操作。

在Selenium测试自动化中处理PDF并进行验证可能非常棘手。希望大家都对硒测试PDF文件有所了解。如果您在处理PDF文件时遇到任何其他挑战，请在下面分享您的经验。我们希望收到有关硒测试PDF文件的反馈。请务必与您的同事和同事分享这篇文章，因为这可能对他们有所帮助。

原文链接：http://codingdict.com

如何使用Selenium Automation测试PDF文件？

热门标签