csvkit awesomeness

On one of my recent consulting gigs, I was working with a large multinational client to develop software tools and reports to help continually monitor and improve their Lync videoconferencing installation.

As part of this, I’d have to sort through hundreds of thousands of lines of collected data logs containing lots of statistics about all the daily activity on the system. This was truly a “big data” exercise.

One set of tools that I found to be incredibly useful in these exercises is called csvkit. csvkit provides a set of command line utilities to quickly and easily manipulate comma-separated value files.

Among the various operations, these tools provide, some of the nicest are:

  1. Converting Excel or JSON data to CSV:

    # Excel to CSV
    in2csv data.xls > data.csv
    # JSON to CSV`
    in2csv data.json > data.csv
  2. Selecting a subset of columns (columns 1, 3, and 5 in this case):

    csvcut -c 1,3,5 data.csv > new.csv
  3. Filtering rows based on a match filter:

    csvgrep -c FirstName -r "Bill" data.csv > new.csv
  4. Import data into and exporting data from a SQL database:

    # import into a sqlite database
    csvsql --db sqlite:///mydatabase.db --insert data.csv
    # export from a sqlite database
    sql2csv --db sqlite:///mydatabase.db --query "select * from data"

If you need to create data analytic workflows, I strongly recommend having csvkit in your toolbox.

Install csvkit by using the Python package manager, pip:

pip install csvkit

Copyright © 2016: All rights reserved by Bill Heyman