On one of my recent consulting gigs, I was working with a large multinational client to develop software tools and reports to help continually monitor and improve their Lync videoconferencing installation.
As part of this, I’d have to sort through hundreds of thousands of lines of collected data logs containing lots of statistics about all the daily activity on the system. This was truly a “big data” exercise.
One set of tools that I found to be incredibly useful in these exercises is called csvkit. csvkit provides a set of command line utilities to quickly and easily manipulate comma-separated value files.
Among the various operations, these tools provide, some of the nicest are:
Converting Excel or JSON data to CSV:
# Excel to CSV in2csv data.xls > data.csv # JSON to CSV` in2csv data.json > data.csv
Selecting a subset of columns (columns 1, 3, and 5 in this case):
csvcut -c 1,3,5 data.csv > new.csv
Filtering rows based on a match filter:
csvgrep -c FirstName -r "Bill" data.csv > new.csv
Import data into and exporting data from a SQL database:
# import into a sqlite database csvsql --db sqlite:///mydatabase.db --insert data.csv # export from a sqlite database sql2csv --db sqlite:///mydatabase.db --query "select * from data"
If you need to create data analytic workflows, I strongly recommend having csvkit in your toolbox.
Install csvkit by using the Python package manager, pip:
pip install csvkit