Wednesday, October 24, 2012

Data Analytics with Powershell Part I

Below is some cruft I am using to analyze PDC data for candidates in my local WA elections. This is part of a larger project I am working on to use Powershell 3.0 as a data analytic solution.  I find working with data in Powershell 3.0 somewhat tricky,  sometimes limited, but also no less straightforward than SQL or R.  Although, Powershell can sometimes be a little frustrating to work with , I rather like being able to craft my own 'data analytic'  solution from the console. I found myself close to the data while working with PS 3.0. Certain techniques produced surprisingly rapid and illuminating results. Take the query below. After having imported to a variable ('$NM') a candidates data from CSV, in one line of code I am able to exclude all WA state contributions; then use 'group-object' to list all 'out of state' contributors,state,amounts in a sort.


$NM | Where State -ne WA | group -property Contributor,State,Amount -noelement | sort -desc Count,Name | ft -auto -wrap

Count Name
----- ----
    1 WEBB LISA, MT, 100
    1 VASKAS JANET, PA, 100
    1 VASKAS ALAN, PA, 100
    1 TURNER ZACHARY M, CO, 900
    1 TERESA JUDITH, WV, 50
    1 SOTO JLEANA, CA, 100
    1 MCCLENDON SUSAN, GA, 500
    1 ATU, DC, 900