Thursday, June 24, 2010

Stand Response Distribution recap


Since it is being discussed today, here is a repost of the map showing response areas of Stand.

Wednesday, May 26, 2010



A few weeks back, the Twitter Census datasets were released by Infochimps. There are several datasets in the collection that is comprised of a scrape of Twitter's 40 million users..

I downloaded the Twitter Users by Location dataset to explore. I first unzipped the file, added a .csv file extension to it and opened it in Excel to see what the data looked like. Basically a long column of entries from the location field on the Twitter users's profile. There are 3.6 million rows in this dataset, so Excel wasn't quite capable of doing the work. I switched to a terminal screen and used cat to look around. While letting cat stream the data up the screen, I saw two large blocks of clean coordinates. One block were users who used their iPhone to put coordinates in the location field in Twitter, the other block must have been another phone type(BB?) doing the same as they both had consistent characters prefixing the geocoding. I used the prefix to match and extract those lines into a separate file, loaded them into a mysql table, ran some delete commands to remove invalid coordinates, and ended up with over 500,000 points to map. You know the rest, I connected to the db with Tableau and watched the map render. Some of the maps are in a Picasa Album along with some more abstract images from the map.


To do these, I just set the map layer washout to 100% and started zooming in to different areas. The image above is Atlanta, Ga. I liked what I saw but wanted to add more color to them. The database this is pulling from is just a table with 3 columns- a unique id, latitude and longitude, so there was nothing there to use as a dimension to apply color to. So, I created some new columns and used the rand() function to populate the first column with random numbers between 1 and 5, and the next column 1-10. These random numbers were then used as the color in Tableau. Below are a couple of results - the first is the eastern US with 5 colors and no other changes, the second one is Atlanta with 10 colors, open circles for the markers and transparency increased. More of these are in the Picasa album and more will be put there.







Again, these images represent a set of 500,000 locations extracted from a larger set of 3.6 million. I think for mapping purposes, the remaining 3 million would just show more of the same, but would certainly fill in some blanks. (For instance, no coordinates were in there for North Korea, but a text search revealed a couple of dozen hits for North Korea.) However, for the abstract images, I think more would be better. I am working on a few simple searches and then some regular expressions to sift through the rest and pull out things that can be mapped. These include addresses and other coordinate sets. The big challenge will be trying to map the ones that just have a city name, for instance Atlanta has around 10,ooo points now from coordinates, but a search for Atlanta reveals at least 10,000 more by name. My plan is to take those returns and use the rand() function, or something else, to randomly generate coordinates within the area of interest and see what happens. Hopefully a purely aesthetic cartography.

Monday, May 17, 2010

Poetry Concat()

In the last post, I talked about reshaping the Stand categories to change it from wide to long data. The next thing I did was to concatenate, or reassemble, the responses. The four questions were divided into up to five sections for categorization. In order to easily browse the responses, I put them back together. I used Excel to simply merge the columns and loaded the resulting file in to mysql. To keep the sections neatly separated, I could have inserted a column between them with a special character or string that would mark the change or be replaced by a space or some other delimiter. I didn't. The result is much more interesting as evidenced from a few selections below. The csv file of the concatenated responses can be downloaded and combined with the one for the categories. Two more tables and you will have the basic database I use for ad hoc queries. Woo-hoo!

On to the poetry concat()....

see above healthier population see #2
walkable downtown mountains
outdoors affordable
downtown weather
future potential lifestyle values
see above no rich tards running it no snow see #2
aquarium fiber optic lines
downtown river bridges boats grass
see #1 plus direct air flight to NOLA see #2
drugs that would take care of crime
recycle not litter carpool
visitor - not sure
clean more well restored buildings
see above see #2
clean drug and gang free
i just did
? plant stuff
historic clean safe religious
traffic traffic traffic traffic traffic
education taxes litter pollution
crime education apathy
see above see #2

Ok, that is just a few combinations of words that I liked and found in a few minutes of scanning the first few hundred responses, found a theme, ran some queries and went with it. Many many more and better ones are in there. More gems to come.

Note: If the main download link doesn't work, try the one just above it to the left.



Monday, May 10, 2010

Stand cat reshape

The first thing I did when I downloaded the Stand results was to load the csv file into Tableau. It was able to do some nice basic mapping, but when it came to graphing and mapping categories it wasn't happy. The categories are assigned to the responses in pieces.( A brief data dictionary might be in order.) To the best of my knowledge, the 4 Stand questions were broken into 5 possible responses each. This was derived from the fact that the paper survey, which most of the results came in on, had 5 lines under each question. So some people filled it out as bullet points, 1-2-3-4-5, while others used the lines for a narrative response. Either way, each question was broken into 5 parts and each part could receive 3 category designations. Four questions, 5 parts, 3 categories gives us the potential of having 60 category designations per question/respondent. This is wide data. Many columns per row. Whether due to the way Tableau works or my ignorance of it, I couldn't get the wide data to work when trying to do category analysis per response record or per category designation. I needed to take the wide data and make it long data, or one row per respondent per category number. Excel was the savior in this. I downloaded an extension for Excel that did just that and reshaped the data to create one row for each designation. Five minutes later I had a spreadsheet with case numbers and categories that their responses fit into. Luckily, not all of the responses were 5 lines with 3 categories, otherwise it would have gone beyond Excel's limits. It turned out 273,000+ rows. I used this sheet for one of my four final database tables for looking at Stand data. I will outline and share the others in upcoming posts. While I do have 2 primary dbs of Stand data, I mostly use the Stand web interface for quick queries and preliminary scanning.

Note: My intention was to put up an interactive visual Stand exploration tool on this blog. It might still happen as I imagined, however the free Tableau Public tool has a row limit which the category sheet above more than surpasses. I hope to still create a map interface that then displays results from the web interface via its API.

Wednesday, May 5, 2010

2010 Shooting Map Update


This map updates the previous one, so covers and additional 3 weeks. I will post the same data by zip code later. If interested, you can then look at Stand results under the Crime subcategories and see how it meshes with these maps. An example is the subcategory for Violence.

Friday, April 30, 2010

Its Arbor Day somewhere.


I have been, slowly, working on placing 3D trees that are in the city's urban forest/Take Root project in Google SketchUp for eventual inclusion in Google Earth. This is part of the Chattanooga 3D project. While there are scripts that can be used to mass import objects to SketchUp, I have been placing them, some 1700 trees, by hand. The initial placement on the block tiles moves pretty quickly, but some of the placement points are on top of cars, in the middle of the street, or on a building - so some fine tuning is required. The KMZ file that I am using can be downloaded for your browsing pleasure. (I finally setup a Google Doc share to hold the files and datasets that I keep saying I am going to post.) Once you download the file, just open it in Google Earth. Another piece I have been working on is a file that has the species list as well. This is where it can get fun. I was recently in San Francisco and New York, both cities are releasing data to the public, and both cities have tree loving developers that built tree apps, SF Trees and TreesNY(Near You) respectively. Both functionally and aesthetically, I prefer TreesNY, but I had much more time to play with it than I did SF Trees.

Now, to real trees. I hear that VW has been planting a lot of trees around town. Hopefully all of those are being documented by plant date, geocode, and species and that they will get added to the map. If you don't get rained out, show some arboreal love this weekend. I have some small trees that I am going to try to get in the ground with the kids. If you can't plant a real tree, I would bet volunteer virtual tree planters would be welcomed to the Chattanooga 3D project!

Oh, and here are the Chattanooga Stand results for the tree category. Go and explore.




Monday, April 26, 2010

Back in CHA

Nice to be back in Chattanooga after being gone for 2 weeks! Came back in time to do Day of Service in East Chattanooga on Saturday and to get strawberries at Chattanooga Market. My trip started off in San Francisco for higher ed tech conference. Luckily there was time for food and some opengov. Open Gov West had a follow up meeting to their conference in Seattle while I was there. It was an incredible group of people from the bay area, Seattle, Canada...oh, and Tennessee. Some of the attendees were from the city and county of San Francisco and are behind DataSF, other groups represented included Code for America, and Knowledge as Power. Big ideas and wonderful conversations that I was honored to be a part of.

Next up was 5 days without wireless and almost without cell signal in and around Stillwater, NY. After that, I rode with some friends down to NYC. On the way we dropped one person off at Gate Hill Co-Op. Her house is feet from the old farm house where John Cage, David Tudor, and 7 others lived in the early to mid 50s. (Cage later built a house up the hill. ) Then on to the city. I rented a bike and rode from midtown up to the Cloisters. Almost everyday, I put the Trees Near You app to work identifying trees.

In my downtime in two of the best open data cities, I took time to work on some ChattaData too. I took the csv of the Stand data and reshaped it to use for category browsing in Tableau. I also broke it up and put it in a simple mysql database to run ad hoc queries against. I will post both sometime soon once I get back into the swing of things.

Followers