Data hand tools – O’Reilly Radar

A Shebang, also Hashbang or Sharp bang. This i...
Image via Wikipedia

Whenever you need to work with data, don’t overlook the Unix “hand tools.” Sure, everything I’ve done here could be done with Excel or some other fancy tool like R or Mathematica. Those tools are all great, but if your data is living in the cloud, using these tools is possible, but painful. Yes, we have remote desktops, but remote desktops across the Internet, even with modern high-speed networking, are far from comfortable. Your problem may be too large to use the hand tools for final analysis, but they’re great for initial explorations. Once you get used to working on the Unix command line, you’ll find that it’s often faster than the alternatives. And the more you use these tools, the more fluent you’ll become.

via Data hand tools – O’Reilly Radar.

This is a great remedial refresher on the Unix commandline and for me kind of reinforces an idea I’ve had that when it comes to computing We Live Like Kings. What? How is that possible, well think about what you are trying to accomplish and finding the least complicated quickest way to that point is a dying art. More often one is forced to follow or highly encouraged to set out on a journey with very well defined protocols/rituals included. You must use the APIs, the tools, the methods as specified by your group. Things falling outside that orthodoxy are frowned upon no matter what the speed and accuracy of the result. So doing it quick and dirty using some Shell scripting and utilities is going to be embarrassing for those unfamiliar with those same tools.

My experience doing this involved a very low end attempt to split Web access logs into nice neat bits that began an ended on certain dates. I used grep, split, and a bunch of binaries I borrowed for doing log analysis and formatting the output into a web report. Overall it didn’t take much time, and required very little downloading, uploading,uncompressing,etc. It was all commandline based with all the output dumped to a directory on the same machine. I probably spent 20 minutes every Sunday running these by hand (as I’m not a cronjob master much less an atjob master). And none of the work I did was mission critical other than being a barometer of how much use the websites were getting from the users. I realize now I could have had the whole works automated with variables setup in the shell script to accommodate running on different days of the week, time changes, etc. But editing the scripts by hand in vi editor only made me quicker and more proficient in vi (which I still gravitate towards using even now).

And as low end as my needs were and how little experience I had initially using these tools, I am grateful for the time I spent doing it. I feel so much more comfortable knowing I can figure out how to do these tasks on my own, pipe outputs into inputs for other utilities and get useful results. I think I understand it though I’m not a programmer, and couldn’t really leverage higher level things like data structures to get work done, no. I’m a brute force kind of guy and given how fast the CPUs are running, a few ugly, inefficient recursions isn’t going to kill me or my reputation. So here’s to Mike Loukides article and how much it reminds me of what I like about Unix.




, , ,


%d bloggers like this: