Apache MetaModel Library for CSV File Manipulations

I think you are all familiar with CSV (Comma Separated Values) files. We use those files in every part of the software development lifecycle. Here’s a new library that you can use while reading a CSV. It allows you to configure a CSV reader with many options. Then it allows you to query file content, select data by a specified colums, etc…

It’s like a SQL for CSV files.

Libary belongs to Apache Foundation. Library’s name is Apache MetaModel (http://metamodel.apache.org/). It doesn’t only support CSV datasources but also many other data sources. But in this tutorial, we’ll simply show to use that library with CSV files. Let’s see how it works…

You need to integrate the library into your project by using Gradle or Maven.

Gradle

// https://mvnrepository.com/artifact/org.apache.metamodel/MetaModel-full
compile group: 'org.apache.metamodel', name: 'MetaModel-full', version: '5.3.3'

Maven

<!-- https://mvnrepository.com/artifact/org.apache.metamodel/MetaModel-full -->
<dependency>
    <groupId>org.apache.metamodel</groupId>
    <artifactId>MetaModel-full</artifactId>
    <version>5.3.3</version>
</dependency>

Example CSV File Content

Our CSV file contains 5 rows. It contains information about our test users.

name|lastname|gender|age|city|
canberk|akduygu|m|33|amsterdam
onur|baskirt|m|37|dubai
gulce|akduygu|m||amsterdam
ege|aksoz|m|29|hiversum

Configuring CSV Reader

First, we create a CsvConfiguration object. There are many constructor types. We use the below Constructor to specify “Column Name Row Number”, “File encoding”, “separator”,”quote” and “escape” characters.

CsvConfiguration conf = new CsvConfiguration(1, "UTF-8", '|', '^', '\\');

Then we create a DataContext object that takes our CSV file and CsvConfiguration object. By using the below code, you load CSV file into a Table object.

File csvFile = new File("users.csv");
DataContext csvContext = DataContextFactory.createCsvDataContext(csvFile, conf);
Schema schema = csvContext.getDefaultSchema();
List<Table> tables = schema.getTables();
Table table = tables.get(0);

Extracting the DataSet from the Table

DataSet dataSet = csvContext.query()
                .from(table)
                .selectAll()
                .execute();

This is the basic approach. You extract the whole data into a DataSet.

Querying the Table

DataSet dataSet = csvContext.query()
                .from(table)
                .selectAll()
                .where("age").ne("")
                .execute();

We added a where clause into our data loader. This command will get fetch the data according to “age” column. It will only get the row that is not equal to an empty string. It will return 4 rows as a user with name Gulce does not have age info. You can use as many where method as you want like SQL queries.

There are many other query methods as below.

eq = equals

gt = greater than

lt = less than

Reading the Row Data

After fetching the rows, you need to read the column values. This is the easiest of all operations.

List<Row> rows = dataSet.toRows();
foreach(Row r: rows){
     String name = rows.getValues()[1].toString();
     String lastname = rows.getValues()[2].toString();
     String age = rows.getValues()[3].toString();
     ....
     ....
}

We have a data-driven API test framework and this library is in the core of it as all of our data is in many CSV files. Hope that it would be useful for your projects too.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.