I’ve been thinking about big data lately. Mostly I’ve been trying to articulate why it’s a big deal, which I know, but isn’t often put succinctly. Recently I had a thought that the reason it’s such a big deal is because it means we can move away from using samples to infer to using actuals to understand. That seems really obvious, but it wasn’t a connection I had made before (though I sort of think it was obvious to everyone else).
Anyway, on the big data tip, I found this Wired piece on “long data” pretty interesting (even though I thought I was going to hate it based on the title). The gist:
By “long” data, I mean datasets that have massive historical sweep — taking you from the dawn of civilization to the present day. The kinds of datasets you see in Michael Kremer’s “Population growth and technological change: one million BC to 1990,” which provides an economic model tied to the world’s population data for a million years; or in Tertius Chandler’s Four Thousand Years of Urban Growth, which contains an exhaustive dataset of city populations over millennia. These datasets can humble us and inspire wonder, but they also hold tremendous potential for learning about ourselves.