banner



How To Read Csv File Using Groovy Script

Keen is an almost perfect complement to Java, providing a compact, highly expressive and compatible scripting environment for my use. Of form, Great isn't totally perfect; as with whatever programming language, its blueprint is based on a serial of trade-offs that need to be understood in order to produce quality results. Merely for me, Great's advantages far outweigh its disadvantages, making it an indispensable office of my data analysis toolkit. In a series of articles, I'll explain how and why.

In the belatedly 1990s, I found myself becoming increasingly interested in the Java programming language, using it more and more for stuff that was likewise complicated for AWK and for most of what I used to do in C. By 2005, the low cost and bully functionality of Linux had convinced me to ditch my dear merely crumbling Sun workstation. And for my kind of work, AWK, sort(i), paste(ane), and join(1) had serious competition in the Linux environment, first from Perl, then from Python. The syntax of Perl has never been to my taste, but I found Python intriguing considering of its readability, its "batteries included" philosophy, and its level of integration with all sorts of other stuff, such as delimited text files, spreadsheets, databases, graphing, except for one affair—information technology didn't give me what felt like clean and transparent access to the whole Java universe that was becoming increasingly fundamental to my workflow.

And then I "discovered" Groovy.

Delimited text files

In my AWK-centric data analysis universe, working with data by and large ways working with delimited text files. This came nearly through a combination of two factors. The first was that Unix text file processing facilities generally recognized that information was often encountered in delimited text files—that is, text files whose lines, delimited by newline characters, were separated into fields, delimited by a field separator graphic symbol (for example, a TAB, or another unusual character, such equally the vertical bar). The second was that tools such as spreadsheets tended to provide an "export" facility that produced comma-separated value text files, whose showtime line was by convention the names of each field, and whose remaining lines consisted of fields of data separated by commas (or, in countries that used the comma as a decimal bespeak, past semicolons).

AWK is pretty good at dealing with delimited text files, unless the field or line delimiters besides bear witness up within the information. Moreover, AWK is also really geared toward being used to write stanzas of code that react to the information presented, and is not nearly and then attractive when the data presentation is circuitous (for instance, hierarchical). Nor does AWK actually provide any good way to read from, or write to, a relational database, or a spreadsheet, or a binary format such every bit dBase, without passing through an intermediate delimited-text format.

This is where a more complete programming language—such as Python or Nifty— starts to get interesting. But before getting to those kinds of direct integration examples, I'm going back to delimited text. Allow'southward write some code! But first, let'southward get some data! But await—we meliorate install Groovy start.

Getting Corking

The best fashion to find out how to install Groovy is to get to the installation instructions at groovy-lang.org. I prefer to employ SDKMAN for this purpose (documented midway downwards the installation instructions), simply you can besides install the version in your repositories. Note that Java is a prerequisite. These days, I use Coffee 8. Again, you can install the version in your repositories.

Getting data

Now that you have Groovy, apply your browser to visit the open world population information from the Earth Banking company site. On the right, you'll meet a Download button. Get the data in CSV format; it comes zipped in a directory called API_SP.POP.TOTL_DS2_en_csv_v2. Unzip this directory into a good identify on your system. Then open a terminal window and cd into that directory.

Finally—some lawmaking!

Here is a simple Groovy script to read one of the CSV files you downloaded and print it to your terminal window:

          

String mdCountryCSV = "Metadata_Country_API_SP.Pop.TOTL_DS2_en_csv_v2.csv"

new File (mdCountryCSV).withReader { reader ->
reader.eachLine { line ->
println line
}
}

This script gives a good overview of what Dandy provides for Java programmers.

First the Cord mdCountryCSV = .... This is "merely like Java"—we are declaring a Cord variable that is initialized to a String literal. Oh yep, Neat allows the states to drop line-ending semicolons in most cases.

Side by side, new File(mdCountryCSV).withReader { reader ->, which is closed by a } four lines later. The new File() part is besides just similar Java; however, Smashing enhances a lot of coffee.lang.*, java.io.*, java.util.*, and other parts of the standard Coffee libraries. And in this case, Groovy enhances the File class with a method called withReader. This method accepts a closure as an argument, which in this case nosotros manifest equally the block of code { reader -> ... }. The reader -> defines the statement to the closure as the variable reader.

What does this Groovy newness accomplish? Functionally, withReader creates a Reader example and calls the closure code with that case assigned to the variable reader, finally closing the File case created and releasing its resource and handling whatsoever errors that occur. Effectively, this lets the Groovy developer declare bearding methods as parameters to other method calls. Moreover, the surrounding context is available inside the closure without any special hocus-pocus.

Next, reader.eachLine { line ->, which is closed by a } 2 lines later. Again, we are seeing a Swell-enhanced Reader method, eachLine, being called with a closure equally an statement, which in this case we manifest with { line -> ... }. Here the Reader instance calls the closure for each line of the file read.

Finally, println line simply prints the line read by the Reader instance. Here Groovy shows united states that information technology'southward OK to omit parenthesis around arguments to method calls, and as well that information technology in effect has an import System.out as preamble to executing the code.

Save this code block as ex01.bang-up in the same directory every bit the information and execute information technology from the terminal command line with:

groovy ex01.not bad

What do you run into?

At this point, information technology's worth noting that Corking also quietly did away with the imports and public class definitions that demand to happen in a Java program that might behave out the aforementioned task.

Dealing with fields

So far, our Neat script has dealt with line delimiters, but has yet to split the lines into fields. A quick examination of the file Metadata_Country_API_SP.POP.TOTL_DS2_en_csv_v2.csv volition show that information technology is the well-nigh circuitous kind of CSV file—it uses commas as the field separator and quotes fields that can contain field or line separators.

Await at the 3rd line, for Republic of angola; in the fourth field, the phrase "Based on IMF data, national accounts" appears. And in the ninth line, for Argentina, non only commas appear in the same field but besides carriage-return/line-feed pairs. Finally, on line 199, that field contains a double-quote character, which is shown as two successive double quotes; a "quoted quote," which is not to exist dislocated with two successive double quotes as the just content of a field, implying an empty field. Ugly!

In AWK, dealing with this kind of messy stuff is less than pleasant; nevertheless, in Groovy, we tin make employ of a fine Java library called opencsv. Download the .jar file from SourceForge. Put that .jar file in Bully'due south default lookup path—in your dwelling house directory, in the .groovy/lib subdirectory.

At this betoken, the first program can become field-aware:

          

import com.opencsv.CSVReader

Cord mdCountryCSV = "Metadata_Country_API_SP.Pop.TOTL_DS2_en_csv_v2.csv"

new File (mdCountryCSV).withReader { reader ->
CSVReader csvReader = new CSVReader(reader)
csvReader.each { fields ->
println fields
}
}

Save this every bit ex02.neat in the same directory and run it with the groovy command.

What's new here?

First, nosotros accept to import the CSVReader capability. Then we create a CSVReader instance from the reader handed to united states by withReader. Finally, we impress the fields yielded by the CSVReader instance. Here, we employ the each method that Groovy puts on every object and the line-splitting that opencsv provides in order to process the lines in the file. csvReader.each { fields -> gives united states each line split into fields—that is, an array of Strings. Nosotros tin refer to the first field as fields[0], the second as fields[1] and so on.

Given that the first line of this kind of CSV file provides the field names, we tin adapt the above code to let the states refer to the fields by name, as follows:

          

import com.opencsv.CSVReader String mdCountryCSV = "Metadata_Country_API_SP.POP.TOTL_DS2_en_csv_v2.csv" new File (mdCountryCSV).withReader { reader -> CSVReader csvReader = new CSVReader(reader) Cord [ ] csvFieldNames = csvReader.readNext ( ) HashMap fieldValuesByName = new HashMap ( )
csvReader.each { fieldValuesByNumber ->
csvFieldNames.eachWithIndex { name, number ->
fieldValuesByName[name] = fieldValuesByNumber[number]
}
println "fieldValuesByName[\"Country Code\"] = " +
fieldValuesByName[ "Country Code" ] +
" fieldValuesByName[\"IncomeGroup\"] = " +
fieldValuesByName[ "IncomeGroup" ]
}
}

Save this as ex03.peachy in the aforementioned directory and run it.

In the above code, we call the readNext from the CSVReader instance right away to become the first tape, and save the field names in a String array. And so, every fourth dimension we read a record, we execute the each method on the field names to copy the values from the array of fields delivered past csvReader.each() into a map where the key is the field proper name and the value comes from the respective field on the record.

The println statement shows us accessing field values by name, for example, fieldValuesByName["Country Lawmaking"].

Where to next?

That's probably enough of Bang-up to get started. Here are good references to enhance the experience:

  • The Groovy language site contains examples and reference documentation
  • Tim Roes' nice tutorials on Groovy for Java developers
  • Mr. Haki (allonym Hubert Klein Ikkink)'s excellent series of Groovy posts

Quite a number of Groovy books with skilful introductions to the language are also available.

The next installment in this Groovy serial will take the themes already introduced further: making the last case groovier, reading multiple CSV files, linking their data, and writing out a composite/summary.

Do you have ideas for programming "how-to" articles? Submit your story proposals.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Akin 4.0 International License.

How To Read Csv File Using Groovy Script,

Source: https://opensource.com/life/16/10/getting-groovy-data

Posted by: shellenbargerjuplage.blogspot.com

0 Response to "How To Read Csv File Using Groovy Script"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel