Welcome to Vizual Statistix! My name is Seth Kadish. I live in Portland, OR, where I work as a scientist. To learn more about me, visit my LinkedIn profile, and send an invitation to connect.
This blog is a product of my passion for data visualization. The data shown here are sourced from other websites, but all statistical operations on these data and the resulting graphics are original.
If you would like to use one of my graphics on your website or in a publication, please email me. I also take requests and am available for freelance work. Contact me if you have a suggestion for a graphic or need support on a project.
I was recently in Berkeley, CA, and saw a car with a receipt on its dashboard showing it had paid for a couple hours of parking. There was also a note that said, “Please stop giving me parking tickets. I clearly pay for parking.” It had an arrow pointing to a massive stack of receipts for parking. Under the windshield wiper was a parking ticket. Either the parking attendants really don’t like that car, or the owner had been illegally plugging the meter.
As it turns out, the city you live in makes a big difference when it comes to getting parking tickets. I used to live in Providence, where you cannot park on the street at night. If a friend were visiting and you didn’t have an extra parking space in your driveway, you could call the city and get the license plate put on a special list so the car wouldn’t be ticketed overnight. But the parking enforcement was so aggressive; leave your car in a spot for 2 hours and 5 minutes, expect a ticket! I’m glad I only owned a bicycle.
To generate this time series comparison, I compiled data from four cities. Each data set represented at least one year of tickets, with N-values between 740k and 6.4M, so the trends are robust. The data were given at one-minute intervals, which I then binned to five-minute intervals to reduce the noise. Of these cities, Milwaukee’s parking ticket timing is certainly the odd one out; about 50% of the citations are made before 6 AM each day. This suggests it has a strict parking law that applies at night, between 2 AM and 6 AM. New York City has a strange pattern in the morning, from 7 AM to noon; tickets are much more frequent near the beginning of the hour and half hour. Then, in the afternoon, the pattern stops. Perhaps a New Yorker can explain this…
The U.S. Green Building Council publishes a directory of all LEED-certified public projects in the country. The data mapped here do not include residential projects. Not surprisingly, some states are much more focused on sustainable construction practices than others. On a per capita basis, VT, CO, OR, WA, and MD lead the way, with more than 100 LEED-certified projects per 1M state population. The six states below 25 projects per 1M state population are LA, WV, OK, MS, AL, and KY. Is the divide by political preference a coincidence? I’ll let you decide…
Since the 1856 US presidential election, the Democratic and Republican parties have dominated the popular vote, effectively creating a two-party system. During that time, only three elections (1856, 1860, and 1912) have seen third parties earn more than 20% of the vote. The 1992 election was close, with Ross Perot running as an Independent, earning 18.9% of the vote.
Canadian federal elections have witnessed the opposite trend – from 1867 to 1988, conservative and liberal parties earned more than two-thirds of the popular vote, but have not reached that level in the seven elections since 1993. For Canadian political parties, conservative refers to the following: Conservative Party of Canada, Progressive Conservative Party, Liberal-Conservative Party, Unionist Coalition, National Liberal and Conservative Party, National Government Party and Conservative-Labour. Liberal parties include: Liberal Party of Canada and Anti-Confederates.
These maps show the geographic distribution of class A, B, and C IP addresses. The left-hand maps show point locations, while those on the right display kernel densities. Because points can be co-located, some high-density locations only appear on the kernel density maps.
There are some interesting differences between the maps, particularly in Brazil, Australia, and parts of Western Europe. There is also a bright yellow spot on the kernel density maps just northwest of Lake Bakal in Russia. At that location, tens of thousands of IP addresses are all located at a very specific latitude/longitude.
Data source: http://dev.maxmind.com/geoip/geoip2/geolite2/
Between the 2010 and 2011 NFL seasons, in an attempt to reduce injuries and protect the safety of the players, the league moved the placement of kickoffs from the 30-yd line to the 35-yd line. It was thought that advancing the ball would increase touchbacks and reduce high-speed, head-on collisions between players, which are frequent during kickoff returns.
At least in terms of increasing the percentage of touchbacks, the intended effect of the rule change has been realized. During this past off-season, owners voted down a proposal to move the line another five yards to the 40. This almost certainly would have raised the percentage of touchbacks even further.
In an effort to focus only on the location to where the ball was returned, I have excluded kickoffs that were fumbled and recovered by the kicking team. The field position measured is based on the first play by the receiving team after the kickoff. As such, penalties assessed during the kickoff are accounted for, but kickoffs received at the end of a half are not included. There were approximately 2500 kickoffs included per season, so each multi-season graph represents more than 7000 kickoff returns.
The U.S. Department of Defense consumes a lot of energy - close to 80% of the total used by the entire federal government. The primary sink is aviation. Almost 48% of the energy consumed by the government is in the form of jet fuel, and 98.6% of that jet fuel is used for defense.
Crayola has 120 crayons in its largest box of standard colors. These graphs show the distribution of the interpreted RGB values for those colors. The four scatter plots show the exact same data, just from different angles. It turns out, Crayola has a bias toward high red values, as shown by the violin plot. These include colors like orange, yellow, pink, and, of course, red. This is strange, given that Crayola performed a color census survey (not sure how scientific it was) and found that most of America’s favorite crayon colors are in the blue family. But hey, kids don’t really know what they want, do they?
For those heading to Las Vegas hoping to win vast amounts of money, use this graphic as a friendly reminder that things probably won’t work out in your favor. That said, if you want to lose a smaller percentage of your money, the scatter plot suggests you should stick with the data points hovering near the x-axis.
Your best bet is the $100 slot machines, where the casinos take only 3.6% of your money! Of course, you’ll need a hefty stack of Benjamins if you want to play for more than a couple minutes. And if you are concerned about the magnitude of money lost rather than the percentage, you may want to move over to the penny slots, where you’ll still lose only 11.8%.
If you’re a sports fan, betting on baseball will give you slightly better odds than basketball and football. You’re much less likely to lose your money betting on those sports than on racing.
If you like heading to the tables, bingo is the game-o. The house only takes in 8.8% of the wagers there, followed by blackjack (11.1%). Stay away from 3-card poker, where gamblers lose an average of 32.5% of their money.
The weekly average values in this graphic were derived from one year of data for all non-restricted locations in the Las Vegas Strip area.
If you are a baseball fan, you know that the Boston Red Sox won the World Series last year. They were also one of only two teams to get more singles, doubles, triples, and home runs than the league average. Perhaps a few sabermetricians already know the other team? It was the Colorado Rockies, who finished with a 74-88 record despite having the third most hits in the league. The Detroit Tigers, who led the league in hits, finished with only 23 triples, which was below the league average of approximately 26.
These graphics show the total hits and distribution of hits for each team, relative to the league average. While the total hits graphic better reflects the overall strengths and weaknesses of each team’s batting, I decided to generate the distribution graphic because it removes the bias of how well a team played. I have a friend who is a huge Houston Astros fan, and I figured that it would be nice for her to be able to say, “Sure, my team tied for the second fewest hits in the league last year, but in those rare occurrences when they got a hit, it was 0.3% more likely to be a home run than the league average!” Hmmm…that still doesn’t make them sound very impressive.
Happy return of the MLB season, baseball fans. May the low-paid A’s crush the wealthy Tigers, and all other foes who stand in their path.
I always thought it was strange that dimes are the smallest coin in the US. As it turns out, there is a somewhat reasonable explanation for this – nickels used to be smaller than dimes, but they were too small to handle. To remedy this, the US Mint changed the metallic content of nickels, removing the expensive silver and adding cheap nickel, and then made it bigger than the dime.
As it turns out, though, the US isn’t the only country to have its five-cent pieces larger than its ten-cent pieces. Of the 20 countries I included for this graphic, 14 have coins with denominations of both five and ten. In eight of the 14, the five is bigger than the ten. This is by no means a statistically robust result – a paired t-test of the diameters resulted in accepting the null hypothesis – but it would seem more logical if the ten-cent piece had a significantly larger diameter.
For this graphic, I used the following currencies: AUD, CAD, CHF, DKK, EUR, GBP, HKD, INR, JPY, KRW, MXN, NOK, NZD, PLN, RUB, SEK, SGD, TRY, USD, and ZAR. I always selected the dimensions of the most recently minted standard editions of a given denomination. Some countries have coins worth more than 500, which were not included.
I would like to congratulate Turkey for being the only county of the 20 included to use denominations below 100 and have the relationship between its coins’ values and diameters described by a monotonic function (i.e., the bigger the coin, the more it’s worth). Norway and Sweden also pass the logic test, but they don’t use any coins with values less than 100 (anymore).
Data sources: All data were collected from the websites of the respective national mints when available. When not available, data were sourced from Wikipedia.
A lot of people were interested in comparing the US counties (see original post) to European metropolitan areas. I’ve graphed the results for four here. The method is unchanged, but the data source is different. Note that these cities are notably less gridded than the largest US cities.
Unlike like Emperor Kuzco, I was actually born with an innate sense of direction. If you’re like me, and you use the Sun to navigate, you probably appreciate cities with gridded street plans that are oriented in the cardinal directions. If you know that your destination is due west, even if you hit a dead end or two, you’ll be able to get there. However, not all urban planners settled on such a simple layout for road networks. For some developers, topography or water may have gotten in the way. Others may not have appreciated the efficiency of the grid. This visualization assesses those road networks by comparing the relative degree to which they are gridded.
To generate the graphic, I first calculated the azimuth of every road in ten counties (plus one parish and D.C.). I tried to choose consolidated city-counties to keep the focus on urban centers, but for larger counties, I opted not to clip the shapefile to the city boundary. All calculations were made in a sinusoidal map projection using the central longitude of the area of interest. I then graphed the angles on rose diagrams (wind roses) using bins of 5° to show relative distributions for each area. The plots were scaled such that the maximum bar height was the same on each rose. To ensure rotational symmetry in the plots, each azimuth was counted twice: once using the original value and once using the opposite direction (e.g., 35° and 215°). As such, all streets, regardless of one-way or two-way traffic, were considered to be pointing in both directions.
The plots reveal some stark trends. Most of the counties considered do conform to a grid pattern. This is particularly pronounced with Chicago, even though much of Cook County is suburban. Denver, Jacksonville, Houston, and Washington, D.C., also have dominant grid patterns that are oriented in the cardinal directions. While Philadelphia and New York are primarily gridded, their orientations are slightly skewed from the traditional N-E-S-W bearings. Manhattan is particularly interesting because it has a notable imbalance between the number of streets running the width of the land (WNW to ESE) and the length of the land (NNE to SSW). New Orleans and San Francisco express some grid-like forms, but have a nontrivial proportion of roads that are rotated in other directions. Downtown Boston has some gridded streets, but the suburban grids are differently aligned, dampening the expression of a single grid on the rose diagram. Finally, the minimal geographic extents of the grids in Charlotte and Honolulu are completely overwhelmed by the winding roads of the suburbs, resulting in plots that show only slight favoritism for certain street orientations.
If you want to see more detail, a full-resolution version of this graphic can be downloaded here:
Do you favor manors and mansions or studios and bungalows? Whatever your preferred housing style is, this graphic will help you look in the right place. The map displays the median number of rooms in housing units for each county. The five highest and lowest medians are listed. For reference, the median for the entire U.S. is 5.5 rooms.
The darker areas tend to have roomier housing units. These include parts of the Mid-Atlantic, the Midwest, and Central Plains. The latter may be due to large farmhouses. There are also specific hotspots in some states (e.g., Utah, Georgia, and Tennessee) that have higher medians. Of the 25 counties with medians of at least seven rooms, eight are in Virginia, four are in Maryland, and three are in Utah. No other state has more than two.
On the other end of the spectrum, smaller housing units with fewer rooms are primarily found in Alaska. There are 15 counties with medians of less than four; ten are in Alaska, two in New York, and one each in Texas, Arizona, and Hawaii.
The violin plot shows the distribution of numbers of rooms using percentages of housing units in each county with one, two, three…all the way up to nine or more rooms. The dots in the bars are the medians. So in a typical U.S. county, approximately 2.5% of housing units have only one or two rooms. Close to a quarter (22.8%) have five rooms, and just under a fifth (19.7%) have six rooms. The roomiest housing units (with at least nine rooms) account for 9.2% of homes.
“When counting the number of rooms in a home for the American Community Survey (ACS), please count rooms separated by built-in archways or walls that extend out at least 6 inches and go from floor to ceiling. Include only whole rooms used for living purposes, such as living rooms, dining rooms, kitchens, bedrooms, finished recreation rooms, family rooms, enclosed porches suitable for year-round use, etc.
DO NOT count bathrooms, kitchenettes, strip or pullman kitchens, utility rooms, foyers, halls, open porches, balconies, unfinished attics, unfinished basements, or other unfinished space used for storage.”
The dark green represents a consensus pick of the top seed, while light blue indicates that the lower seed was selected most often. The dark blue boxes are those where the choices on the three websites did not agree.
As you can see, most people go with the top seed. There are a few nines over eights and threes over twos, but nothing outrageous except…
For some reason, when I have CBSSports auto-fill the bracket with the favorite choices of users, it picks Manhattan over Louisville. Is this a real result, or an error? I don’t know, but maybe the CBS users can smell a Cinderella story in the making. Only time will tell.