SHARE

How to work more efficiently with log files using Linux csvkit

All IT pros and incident handlers have to deal almost daily with log files from various sources. Learn to work more quickly and efficiently to get the best out of CSV files with csvkit on Linux.

Written By

Cedric Pernet

Dec 1, 2021

We may earn from vendors via affiliate links or sponsorships. This might affect product placement on our site, but not the content of our reviews. See our Terms of Use for details.

Must-read developer coverage

CSV files are often imported into Excel or LibreOffice software before being used and analyzed. It is very convenient and comfortable, as long as the files are not too big. But some log files might contain billions of lines, which makes it impossible to import it into spreadsheets. Or you might need to analyze files remotely on headless servers without any ability to use a graphical user interface.

SEE: Hiring Kit: JavaScript Developer (TechRepublic Premium)

Luckily, an easy solution is available on all Linux operating systems: the csvkit software.

How to install csvkit
How to work on a CSV file
How to figure out the columns of the file
How to output selected columns
How to change the column order
How to sort the data with csvsort
How to combine csvcut and csvsort
How to get a nicer output
How to get statistics with csvstat

How to install csvkit

With the tool being available in the standard repositories, it is extremely easy to install. In this article, we’ll use an Ubuntu-based operating system.

Let’s issue the installation in a command-line shell by executing:

sudo apt install csvkit

That’s it. The system now installs the tool and all the necessary dependencies.

How to work on a CSV file

To illustrate our point, we’ll work on a CSV file from SimpleMaps.com containing a list of cities and information about them: country, longitude, latitude, population and more.

The first line of the CSV file shows the different column names, as is often the case with CSV files. We can see it with the “head” command, which by default shows the first 10 lines of a file (Figure A).

Figure A

How to figure out the columns of the file

Now let’s start using csvcut from the command-line, one of the tools embedded in the csvkit. Launching the next command will automatically show the named columns and the indices (Figure B):

csvcut -n

Figure B

Using csvcut to list the columns from the file.

We might then use either the indices or the column names to address it.

How to output selected columns

One of the most common operations when dealing with CSV files consists of selecting just a few columns, or reorganizing columns.

To output just a few columns, let’s once again use the csvcut command with the -c option. Both command lines work, to show how to use both the indices or the column name. In our example, we’ll once again use the head command with a pipe, just to show the first lines of the results (Figure C).

csvcut -c 1,5,10

csvcut -c city,country,population

Figure C

Should we want line numbers added to the output, option -l comes to rescue and adds a new column named line_number to our output (Figure D).

Figure D

Adding a line number to the output results.

Output can of course be redirected to a new file. To do this, we redirect the output to a file by using the > character. From our previous example:

csvcut -l -c city,country,population worldcities.csv > newfile.csv

How to change the column order

Using csvcut we can also create an output that reorders the columns. All we need is to specify the columns, and the tool will display it accordingly (Figure E).

Figure E

How to sort the data with csvsort

It is possible to sort data using the csvsort command. Similar to csvcut, csvsort allows the use of option -n to list columns, and -c to use either the column index or the column name.

By default, csvsort works in ascending mode, but it is possible to use the -r option to sort in descending mode.

Let’s sort our file by country name, in descending order (Figure F):

csvsort -r -c country worldwities.csv

Figure F

Results sorted by country name in descending order.

It is possible to sort multiple columns: All you need is to use them with the -c option (Figure G). The next line will sort our data in descending mode by country and by population:

csvsort -r -c country,population worldcities.csv

Figure G

How to combine csvcut and csvsort

Csvsort is powerful but it always outputs all the columns. By combining csvcut and csvsort, we can achieve any kind of outputting or sorting.

As an example, let’s extract only the city name, country name, latitude, longitude, and sort those columns by latitude (Figure H).

csvcut -c city,country,lat worldcities.csv | csvsort -c lat

Figure H

How to get a nicer output

Should you want a nicer output, command csvlook allows you to render the CSV output in a Markdown-compatible, fixed-width format.

From our previous example, we just pipe the csvlook command to the end of our line (Figure I):

csvcut -c city,country,lat worldcities.csv | csvsort -c lat | csvlook

Figure I

How to get statistics with csvstat

The csvstat command allows you to get different statistics on the CSV file.

Run without arguments except the filename, it provides detailed statistics for each column. It is also possible to use the -c option to output selected columns (Figure J).

csvstat -c country

Figure J

It is possible to tune the output of the command by using different options.

To extract the unique values of the country column, we may use the –unique option (Figure K).

Figure K

The number of unique countries using csvstat.

For a list of all options of csvstat, please type the following command:

csvstat -h

Csvkit contains several different command-line tools that allow IT specialists and people who need to work on large CSV files to do it easily in the command-line. The ability to combine these tools, especially csvcut and csvsort, makes it very powerful and should suit all needs of professionals.

Additionally, it is also possible to use csvkit for converting XLS and JSON files to CSV before analyzing or using them with the command-line tools.

Disclosure: I work for Trend Micro, but the views expressed in this article are mine.

Cedric Pernet

Cedric Pernet is a senior threat expert with a strong focus on cybercrime and cyberespionage. He currently works at Trend Micro. Prior to that position, he worked for several Computer Emergency Response Teams (CERTs) where he did threat intelligence investigations, incident response, and computer forensics. He was also a Law Enforcement Officer working on Cybercrime in France. He is the author of a paperbook in French language on cyberespionage and an influential person in the cybersecurity community.