Since the 1856 US presidential election, the Democratic and Republican parties have dominated the popular vote, effectively creating a two-party system.  During that time, only three elections (1856, 1860, and 1912) have seen third parties earn more than 20% of the vote.  The 1992 election was close, with Ross Perot running as an Independent, earning 18.9% of the vote.

Canadian federal elections have witnessed the opposite trend – from 1867 to 1988, conservative and liberal parties earned more than two-thirds of the popular vote, but have not reached that level in the seven elections since 1993. For Canadian political parties, conservative refers to the following: Conservative Party of Canada, Progressive Conservative Party, Liberal-Conservative Party, Unionist Coalition, National Liberal and Conservative Party, National Government Party and Conservative-Labour. Liberal parties include: Liberal Party of Canada and Anti-Confederates.

Data sources:

http://www.parl.gc.ca/parlinfo/compilations/electionsandridings/ResultsParty.aspx

http://www.electionalmanac.com/ea/canada-popular-vote-results/

http://uselectionatlas.org/RESULTS/

These maps show the geographic distribution of class A, B, and C IP addresses. The left-hand maps show point locations, while those on the right display kernel densities.  Because points can be co-located, some high-density locations only appear on the kernel density maps. 

There are some interesting differences between the maps, particularly in Brazil, Australia, and parts of Western Europe. There is also a bright yellow spot on the kernel density maps just northwest of Lake Bakal in Russia. At that location, tens of thousands of IP addresses are all located at a very specific latitude/longitude.

Data source: http://dev.maxmind.com/geoip/geoip2/geolite2/

Between the 2010 and 2011 NFL seasons, in an attempt to reduce injuries and protect the safety of the players, the league moved the placement of kickoffs from the 30-yd line to the 35-yd line. It was thought that advancing the ball would increase touchbacks and reduce high-speed, head-on collisions between players, which are frequent during kickoff returns.

At least in terms of increasing the percentage of touchbacks, the intended effect of the rule change has been realized. During this past off-season, owners voted down a proposal to move the line another five yards to the 40. This almost certainly would have raised the percentage of touchbacks even further.

In an effort to focus only on the location to where the ball was returned, I have excluded kickoffs that were fumbled and recovered by the kicking team. The field position measured is based on the first play by the receiving team after the kickoff. As such, penalties assessed during the kickoff are accounted for, but kickoffs received at the end of a half are not included. There were approximately 2500 kickoffs included per season, so each multi-season graph represents more than 7000 kickoff returns.

Data source: http://www.advancednflstats.com/2010/04/play-by-play-data.html

The U.S. Department of Defense consumes a lot of energy - close to 80% of the total used by the entire federal government. The primary sink is aviation. Almost 48% of the energy consumed by the government is in the form of jet fuel, and 98.6% of that jet fuel is used for defense. 

Data source: http://www.eia.gov/totalenergy/data/annual/showtext.cfm?t=ptb0113

Crayola has 120 crayons in its largest box of standard colors. These graphs show the distribution of the interpreted RGB values for those colors. The four scatter plots show the exact same data, just from different angles. It turns out, Crayola has a bias toward high red values, as shown by the violin plot. These include colors like orange, yellow, pink, and, of course, red. This is strange, given that Crayola performed a color census survey (not sure how scientific it was) and found that most of America’s favorite crayon colors are in the blue family. But hey, kids don’t really know what they want, do they?  

Data source: http://www2.crayola.com/colorcensus/

http://www2.crayola.com/colorcensus/bureau/overall_view.cfm

http://en.wikipedia.org/wiki/List_of_Crayola_crayon_colors

For those heading to Las Vegas hoping to win vast amounts of money, use this graphic as a friendly reminder that things probably won’t work out in your favor. That said, if you want to lose a smaller percentage of your money, the scatter plot suggests you should stick with the data points hovering near the x-axis.

Your best bet is the $100 slot machines, where the casinos take only 3.6% of your money! Of course, you’ll need a hefty stack of Benjamins if you want to play for more than a couple minutes. And if you are concerned about the magnitude of money lost rather than the percentage, you may want to move over to the penny slots, where you’ll still lose only 11.8%.

If you’re a sports fan, betting on baseball will give you slightly better odds than basketball and football. You’re much less likely to lose your money betting on those sports than on racing.

If you like heading to the tables, bingo is the game-o. The house only takes in 8.8% of the wagers there, followed by blackjack (11.1%). Stay away from 3-card poker, where gamblers lose an average of 32.5% of their money.

The weekly average values in this graphic were derived from one year of data for all non-restricted locations in the Las Vegas Strip area.

Data source: http://gaming.nv.gov/index.aspx?page=149 (12-month summary of Feb 2014 pdf).

If you are a baseball fan, you know that the Boston Red Sox won the World Series last year. They were also one of only two teams to get more singles, doubles, triples, and home runs than the league average. Perhaps a few sabermetricians already know the other team? It was the Colorado Rockies, who finished with a 74-88 record despite having the third most hits in the league. The Detroit Tigers, who led the league in hits, finished with only 23 triples, which was below the league average of approximately 26.

These graphics show the total hits and distribution of hits for each team, relative to the league average. While the total hits graphic better reflects the overall strengths and weaknesses of each team’s batting, I decided to generate the distribution graphic because it removes the bias of how well a team played. I have a friend who is a huge Houston Astros fan, and I figured that it would be nice for her to be able to say, “Sure, my team tied for the second fewest hits in the league last year, but in those rare occurrences when they got a hit, it was 0.3% more likely to be a home run than the league average!”  Hmmm…that still doesn’t make them sound very impressive.

Happy return of the MLB season, baseball fans. May the low-paid A’s crush the wealthy Tigers, and all other foes who stand in their path.

Data source: http://www.baseball-reference.com/leagues/MLB/2013.shtml

I always thought it was strange that dimes are the smallest coin in the US. As it turns out, there is a somewhat reasonable explanation for this – nickels used to be smaller than dimes, but they were too small to handle. To remedy this, the US Mint changed the metallic content of nickels, removing the expensive silver and adding cheap nickel, and then made it bigger than the dime.

As it turns out, though, the US isn’t the only country to have its five-cent pieces larger than its ten-cent pieces. Of the 20 countries I included for this graphic, 14 have coins with denominations of both five and ten. In eight of the 14, the five is bigger than the ten. This is by no means a statistically robust result – a paired t-test of the diameters resulted in accepting the null hypothesis – but it would seem more logical if the ten-cent piece had a significantly larger diameter.

For this graphic, I used the following currencies: AUD, CAD, CHF, DKK, EUR, GBP, HKD, INR, JPY, KRW, MXN, NOK, NZD, PLN, RUB, SEK, SGD, TRY, USD, and ZAR. I always selected the dimensions of the most recently minted standard editions of a given denomination. Some countries have coins worth more than 500, which were not included.

I would like to congratulate Turkey for being the only county of the 20 included to use denominations below 100 and have the relationship between its coins’ values and diameters described by a monotonic function (i.e., the bigger the coin, the more it’s worth). Norway and Sweden also pass the logic test, but they don’t use any coins with values less than 100 (anymore).

Data sources: All data were collected from the websites of the respective national mints when available. When not available, data were sourced from Wikipedia.

A lot of people were interested in comparing the US counties (see original post) to European metropolitan areas. I’ve graphed the results for four here. The method is unchanged, but the data source is different. Note that these cities are notably less gridded than the largest US cities.

Full-resolution version:

https://www.dropbox.com/s/6n54q5djse0dhqs/Road_Orientation_Europe.png

Data source: http://download.geofabrik.de/

I received a lot of requests to add counties to my previous post. I’ve tried to include the most common counties of interest. All methods and sources are the same.

The full-resolution image can be downloaded here: 

https://www.dropbox.com/s/ymu2h9zxwcs8in8/Road_Orientation_V2.png

Unlike like Emperor Kuzco, I was actually born with an innate sense of direction.  If you’re like me, and you use the Sun to navigate, you probably appreciate cities with gridded street plans that are oriented in the cardinal directions. If you know that your destination is due west, even if you hit a dead end or two, you’ll be able to get there. However, not all urban planners settled on such a simple layout for road networks. For some developers, topography or water may have gotten in the way. Others may not have appreciated the efficiency of the grid. This visualization assesses those road networks by comparing the relative degree to which they are gridded.

To generate the graphic, I first calculated the azimuth of every road in ten counties (plus one parish and D.C.). I tried to choose consolidated city-counties to keep the focus on urban centers, but for larger counties, I opted not to clip the shapefile to the city boundary. All calculations were made in a sinusoidal map projection using the central longitude of the area of interest. I then graphed the angles on rose diagrams (wind roses) using bins of 5° to show relative distributions for each area. The plots were scaled such that the maximum bar height was the same on each rose. To ensure rotational symmetry in the plots, each azimuth was counted twice: once using the original value and once using the opposite direction (e.g., 35° and 215°). As such, all streets, regardless of one-way or two-way traffic, were considered to be pointing in both directions.

The plots reveal some stark trends. Most of the counties considered do conform to a grid pattern. This is particularly pronounced with Chicago, even though much of Cook County is suburban. Denver, Jacksonville, Houston, and Washington, D.C., also have dominant grid patterns that are oriented in the cardinal directions. While Philadelphia and New York are primarily gridded, their orientations are slightly skewed from the traditional N-E-S-W bearings. Manhattan is particularly interesting because it has a notable imbalance between the number of streets running the width of the land (WNW to ESE) and the length of the land (NNE to SSW). New Orleans and San Francisco express some grid-like forms, but have a nontrivial proportion of roads that are rotated in other directions. Downtown Boston has some gridded streets, but the suburban grids are differently aligned, dampening the expression of a single grid on the rose diagram. Finally, the minimal geographic extents of the grids in Charlotte and Honolulu are completely overwhelmed by the winding roads of the suburbs, resulting in plots that show only slight favoritism for certain street orientations.

If you want to see more detail, a full-resolution version of this graphic can be downloaded here:

https://www.dropbox.com/s/my7y24hrzvhagce/Road_Orientation.png

Data source: http://www.census.gov/cgi-bin/geo/shapefiles2013/main

Script for azimuth calculation: http://www.ian-ko.com/free/free_arcgis.htm

Do you favor manors and mansions or studios and bungalows? Whatever your preferred housing style is, this graphic will help you look in the right place. The map displays the median number of rooms in housing units for each county. The five highest and lowest medians are listed. For reference, the median for the entire U.S. is 5.5 rooms.

The darker areas tend to have roomier housing units. These include parts of the Mid-Atlantic, the Midwest, and Central Plains. The latter may be due to large farmhouses. There are also specific hotspots in some states (e.g., Utah, Georgia, and Tennessee) that have higher medians. Of the 25 counties with medians of at least seven rooms, eight are in Virginia, four are in Maryland, and three are in Utah. No other state has more than two.

On the other end of the spectrum, smaller housing units with fewer rooms are primarily found in Alaska. There are 15 counties with medians of less than four; ten are in Alaska, two in New York, and one each in Texas, Arizona, and Hawaii.

The violin plot shows the distribution of numbers of rooms using percentages of housing units in each county with one, two, three…all the way up to nine or more rooms. The dots in the bars are the medians. So in a typical U.S. county, approximately 2.5% of housing units have only one or two rooms. Close to a quarter (22.8%) have five rooms, and just under a fifth (19.7%) have six rooms. The roomiest housing units (with at least nine rooms) account for 9.2% of homes.

Keep in mind that bathrooms are not counted in these data. For those interested in the definition of a room (from https://ask.census.gov/faq.php?id=5000&faqId=7433):

“When counting the number of rooms in a home for the American Community Survey (ACS), please count rooms separated by built-in archways or walls that extend out at least 6 inches and go from floor to ceiling. Include only whole rooms used for living purposes, such as living rooms, dining rooms, kitchens, bedrooms, finished recreation rooms, family rooms, enclosed porches suitable for year-round use, etc.

DO NOT count bathrooms, kitchenettes, strip or pullman kitchens, utility rooms, foyers, halls, open porches, balconies, unfinished attics, unfinished basements, or other unfinished space used for storage.”

Data source: http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml (Table DP04)

March Madness has arrived once again! Last year, I did some fun math on the likelihood of a randomly selected bracket being perfect. This year, I’ve decided to look at what the masses think. This bracket shows the most popular selections that people have submitted to the pools at ESPN, Yahoo, and CBSSports.

The dark green represents a consensus pick of the top seed, while light blue indicates that the lower seed was selected most often. The dark blue boxes are those where the choices on the three websites did not agree. 

As you can see, most people go with the top seed.  There are a few nines over eights and threes over twos, but nothing outrageous except…

For some reason, when I have CBSSports auto-fill the bracket with the favorite choices of users, it picks Manhattan over Louisville. Is this a real result, or an error?  I don’t know, but maybe the CBS users can smell a Cinderella story in the making. Only time will tell.

Data sources: 

http://www.cbssports.com/collegebasketball/ncaa-tournament/brackets

http://espn.go.com/mens-college-basketball/tournament/bracket

https://tournament.fantasysports.yahoo.com/

http://www.kenstalk.com/excel/2014Bracket.xls (for template)

The surface of Mars is fascinating. I spent five years looking at it for about eight hours every weekday. Maybe it’s not interesting enough to warrant that level of attention, but for the handful of planetary geologists out there, it’s quite the amazing laboratory.

Though the slope map I’ve made for this post is an original graphic, the subject of kilometer-scale slopes and roughness has been extensively studied, and similar maps are available at varying scales. (The scale of a slope map is the distance between topographic measurements – so a kilometer-scale map shows the slope between points that are one kilometer apart.) For this map, I’ve used Mars Orbiter Laser Altimeter (MOLA) data at 64 pixels/degree, which equates to 0.926 km/pixel at the equator. If you’ve read Andy Weir’s book, The Martian, this map will help you to understand some of the issues he encountered - I don’t want to spoil it, so I’m being intentionally vague.

As a side note, one of the great aspects of graduate school is that it offers the opportunity to do incredible research. Unfortunately, you may make some profound discoveries in your field, but unless you get your paper published in Science or Nature, or picked up by the media, very few people will ever read your work. I don’t mean to be too cynical, but I would wager a hefty sum that more people will see this map (which is not at all groundbreaking and took me about 20 minutes to make) than will read any of the papers I published that required years of research. But if you are ever really curious to see some crazy craters on Mars…

Data source: http://webgis.wr.usgs.gov/pigwad/down/mars_dl.htm (Thanks to the MOLA team for all their hard work measuring Mars’ topography!)

Interested in founding a startup? Your success may depend on the industry you choose. I’ve seen a lot of conflicting statistics about the likelihood of success, so I put together these graphs on the survival rates of establishments as a function of time since founding.

The first graph shows the average survival rate across all industries. Specifically, it depicts the percentage of establishments (these count multiple branches/franchises) that is still open from one to five years after initiation. The data used include ten consecutive years of startups, including all U.S. companies that first opened their doors between 1998 and 2007. The graph indicates that, after five years, about 54% of startups are still operating.

The second graph plots the data by industry, normalized to the average across all industries. For this, I took the differences (subtraction, not division) in percentages between the curve for each industry and the average shown in the first graph. This highlights which industries have, in the past decade, outperformed and remained more likely to stay in business, and which have underperformed, with more establishments closing.

Data source: http://www.bls.gov/bdm/bdmage.htm