This blog is a product of my passion for data visualization. The data shown here are sourced from other websites, but all statistical operations on these data and the resulting graphics are original.
I take requests and am available for freelance work. If you have a suggestion for a graphic or need support on a project, email me.
To learn more about me, visit my LinkedIn profile, and send an invitation to connect.
Thanks for visiting!
For this post, I’ve mapped and graphed a decade of earthquake data (2004-2013, inclusive). I think it’s interesting to visualize tectonic plate boundaries in this way because you can immediately see which boundaries are the most active. The graph provides a sense of how frequently earthquakes of each magnitude occur; we experience just under 500 earthquakes of magnitude 5.0 each year, but fewer than 20 at magnitudes of 7.0 and greater.
Using data from the Automated Wreck and Obstruction Information System, I developed a heat map of shipwrecks off the coast of the Continental US. The database is not comprehensive, but gives an idea of the relative density of shipwrecks. The highest densities occur in the Northeast, where the 50-mile search radius returned values over 600. The maximum for the scale of this kernel density map is set to 150, so any regions exceeding that value appear as red.
This is a comparison of probation vs parole rates by state (plus DC). The data used are from 2011. I expected the two variables to be strongly correlated, but they aren’t. Whether this is influenced by state laws, the behavior of the people, the attitudes of judges, or the leniency of parole boards, I don’t know, though I suspect it is a combination of all of them.
For those wondering about the difference between probation and parole, you can read a detailed description here: http://www.bjs.gov/index.cfm?ty=qa&iid=324. The most fundamental difference is that parole is a supervised release from jail while probation is a sentencing by a judge that requires supervision of the individual.
What I found most amazing about these data is that 4.6% of Georgia’s population is on probation. If you rule out minors from the population pool, more than 1 in 20 adults is on probation there.
I’ve read several articles about record-setting melon prices in Japan. This got me thinking about the prices of various farm-produced commodities. It turns out that some crops can fetch a pretty penny in certain countries relative to the median global price. Here, I’ve derived the maximum ratios of the amount a farmer is paid at the first point of sale for a given item in a given country relative to the median price among all reporting countries. Only commodities where data were available for at least 20 countries were included. All raw data were reported in US dollars per metric ton.
I’ve graphed some of the maximum ratios that exceeded a value of five in the first graph. The second graph only shows maximums associated with Japan. The values are not PPP-adjusted, so either farmers are really rich in Japan, or it is an incredibly expensive country in which to live!
The Census defines Core Based Statistical Areas as follows:
“Metropolitan and micropolitan statistical areas (metro and micro areas) are geographic entities delineated by theOffice of Management and Budget for use by Federal statistical agencies in collecting, tabulating, and publishing Federal statistics. The term “Core Based Statistical Area” (CBSA) is a collective term for both metro and micro areas. A metro area contains a core urban area of 50,000 or more population, and a micro area contains an urban core of at least 10,000 (but less than 50,000) population. Each metro or micro area consists of one or more counties and includes the counties containing the core urban area, as well as any adjacent counties that have a high degree of social and economic integration (as measured by commuting to work) with the urban core.”
When releasing population counts for CBSAs, the Census provides estimates for both the number of people living in the principal city (or cities) of the CBSA, and the number of people living outside the city (cities); some metro areas have more than one principal city. Using these data, I’ve constructed a map that shows the balance between the urban and suburban population for each of these areas. Specifically, the map shows the percentage of people who live in one of the principal cities (urban setting). I have distinguished between metro and micro areas using different outline shades - the lighter shade indicates a metro area and the darker indicates a micro area.
On average, 39.0% of people living in a metro area live within the limits of the principal city. In micro areas, a slightly lower percentage (32.9%) live in the principal city, so there is a slightly higher preference for suburban living in micro areas. Of course, some would consider anyone living in one of these micro areas to be in a suburban setting! Overall, CBSAs in the middle of the country have a more urban-focused population, while coastal CBSAs have a larger proportion of suburban citizens.
With the arrival of December, I thought it would be appropriate to do a post on the Arctic. It is, after all, where Mr. and Mrs. Claus (and their reindeer and elves) live. I’ve generated maps that show the spatial extent of Arctic sea ice as an annual timeseries for both March (winter) and September (summer). These two months usually represent the maximum and minimum extent, respectively, of the Arctic sea ice each year. Because the outlines overlap so much, thus preventing us from seeing the more recent delineations, I’ve also generated animated gifs, which are in the following post.
To emphasize the decline in Arctic sea ice as a function of time, I’ve graphed the areal extents for both months, and included trendlines. Assuming Santa has accepted the reality of climate change, he may need to think about relocating sooner rather than later…
Even though I’m a vegetarian, Thanksgiving is my favorite holiday. I couldn’t find Tofurky sales data for comparison, but these data are still interesting on their own. After 2004, the data were published quarterly, so the November spike disappears and it’s no longer a clear comparison. The 45-year timeseries is more than sufficient to see the steady increase, though.
The California School Immunization Law requires that children receive certain immunizations in order to attend elementary and secondary schools, as well as other care centers. However, there are exemptions to the immunization requirements. One of these is a personal belief exemption, whereby parents/guardians can skip the immunizations for their children if they claim it is contrary to their beliefs.
California keeps records of these immunizations and the reasons for any exemptions. In this post, I’ve mapped the rates (as percentages) of personal belief exemptions for kindergarten students by county. I’ve also included the cities where at least 30% of students are not vaccinated due to parents’ personal beliefs.
There is a website that evaluates counties based on a variety of health-related factors and gives them a final ranking. The data are quite interesting, so I decided to pull a few of the variables out for a visualization.
Here, I’ve mapped sexually transmitted infection (STI) rates, which are based entirely on chlamydia cases, as well as the teen birth rates. Some counties did not supply data, so they are shaded black. The top scatter plot graphs these variables against each other. You can see that there is a weak positive correlation, with some extreme outliers. The bottom scatter plot pairs teen birth rate with median household income. Not surprisingly, there is a moderate negative (inverse) correlation, with the richest counties generally having lower teen birth rates.
Edit: I’ve received some questions about the statistical significance of the differences between the home and away means. The data I was able to pull were already aggregated at the team per season, all teams per week, and per referee levels. I would need the per game data in order to calculate the 95% confidence intervals (because I need the standard deviations), but I don’t have access to these data. So in lieu of error bars, the best I can do is to run paired two-sample t-tests on the data at the aggregated levels using a hypothesized difference between the means of 0. I did run a Shapiro-Wilk test and the data are normally distributed. The t-tests show that, even at the aggregated levels (which have smaller sample sizes), the p-values are «0.05, and the t statistics exceed the critical values for each paired set. As such, we can reject the null hypothesis (that the means are equal), and conclude that the differences between the means are real. So I can’t be certain, but I’m guessing that at the per game level, where the sample sizes would be significantly larger, the confidence intervals would show the differences to be statistically significant.
No matter who you root for, you’re bound to witness calls against your team that you think are bogus. That’s just the nature of sports. Did anyone see that penalty against Ahmad Brooks where he apparently made an illegal tackle on Drew Brees last Sunday? That was a clean hit! If the game had been played in SF, would the call still have been made? And therein lies the question behind this post.
There have been many studies about official/referee/umpire bias in sports. Some claim it exists, others say it doesn’t. For this set of graphs, I simply took the raw data from the past three complete regular seasons of each sport (four for football because there are fewer games and I wanted a robust sample size), and calculated the averages for the home/away split. It turns out that, for every statistic over which an official has direct control, the home team outperforms the away team. Coincidence? Probably not. Of course, there could be other factors at play; it’s possible that most teams just play better at home. But the fact that it occurs in all categories across all sports makes me a tad suspicious.
So the next time you see a ridiculous call made against your team, just remember it might be because they are on the road. If they’re at home, well, you’ve got no one to blame but your lousy team.
This pair of maps shows data from 2012 on property and violent crime by state. The top map highlights the total crime rates (property crimes + violent crimes per 1,000 people). The bottom map reveals the number of property crimes committed per violent crime.
With the recent launch of MAVEN (a spacecraft that will study Mars’ upper atmosphere), I’ve been feeling a bit nostalgic for my days as a planetary scientist. There’s no insightful analysis in this post, just a clear graph of how far Earth is from the other planets.
A seemingly simple question is, “How far away are the other planets in our solar system?” The answer is, unfortunately, not so simple; it depends on when you ask the question. Generally speaking, when a planet orbiting outside Earth’s orbit is in opposition (opposite the sun), it is closest to Earth. When that planet is in conjunction (directly “behind” the sun), it is farthest from Earth. Planets with smaller orbital radii (i.e., Mercury and Venus) are said to be in inferior conjunction and superior conjunction when they are closest and farthest, respectively, from Earth. Measuring these distances is somewhat complicated by the fact that orbits have some inclination with regard to the ecliptic, and they are elliptical. Consequently, the distance between Earth and another planet in opposition/conjunction is not always the same!
So here I’ve plotted the average opposition and conjunction distances, as well as the theoretical extreme distances. These extremes are based on aphelion and perihelion measurements for each planet, and may not, therefore, have occurred precisely (or will ever occur). But they are extremely good estimates, especially given the limited resolution of the graph, for how close/far we ever get to other planets.
Keep in mind that the relative sizes of the dots is not representative of the planets’ sizes – they are varied partly to help those who are colorblind distinguish between the red and the green, and to emphasize that a planet will appear smaller when farther from the Earth.
I’m a huge fan of public transportation – I take the bus to and from work every weekday. But I also love the convenience of owning a car. Not surprisingly, so does most of the USA.
In this post, I’ve mapped vehicle ownership by county in three ways. The largest map is simply the average number of vehicles per household. I was also curious about the extremes, though, so in the smaller maps, I calculated the percentage of households that has excess vehicles (i.e., more vehicles than people) (bottom left), and the percentage that has no vehicles at all (bottom right). Note that the calculation for excess vehicles was based only on one-, two-, and three-person households due to how the Census binned the data.
Overall, there is no statistically significant relationship between a county’s population and the number of vehicles per household. Looking at the map, there appears to be a slight preference for more vehicles per household around the Northern Rockies and Great Plains, as well as around northern Virginia. The nine counties with the highest averages (>2.7) are in North Dakota, South Dakota, Montana, and Nebraska. The nine counties with the lowest averages (<0.8) are in New York and Alaska. The national average is 1.77 vehicles per household.
The vast majority of households (approximately 77%) own a vehicle, but not more than the number of people in the household. The less common situations are to have excess vehicles (13.9% of US households), or not to own any vehicle (8.9% of US households). So a household is slightly more likely to have excess cars than it is to have no cars. On both of these maps, I highlighted the extreme counties in yellow for easy identification of the outliers. These include counties where more than 50% of households have excess vehicles, and where more than 20% of households have no vehicle.
I’ve had the same ringtone for as long as I’ve had a cell phone: the first 30 seconds of “The Mango Song” by Phish. It’s a fantastic ringtone – a solid, building riff with no lyrics. When my phone rings and my older brother is around, he tells me to let it go to voicemail just so we can listen to it. But it turns out that ringtones are, more often than not, a flavor of the week (or few weeks) item.
Nielsen RingScan compiles data every week on how many times each ringtone (i.e., song formatted as a ringtone) was downloaded. Billboard then uses the data to publish a weekly top 20 ringtones chart. I’ve pulled the top-downloaded song for each week since 2007 and graphed them as a timeseries.
LMFAO’s “Sexy and I Know It” held the top spot for the longest period (25 weeks), followed by T.I.’s “Whatever You Like” (18 weeks). June through October of 2012 saw the most repetition/cycling of top ringtones, with “Call Me Maybe” (Carly Rae Jepsen), “Pontoon” (Little Big Town), and “Whistle” (Flo Rida) battling for the top spot. You can see the effect of Michael Jackson’s death in late-June, 2009, as “P.Y.T.” and “Thriller” top the ringtone chart in consecutive weeks in July, 2009.