To Know but Not Understand

David Weinberger has an interesting rumination on big data, accompanied by some remarkable illustrative photographs.

In 1963, Bernard K. Forscher of the Mayo Clinic complained in a now famous letter printed in the prestigious journal Science that scientists were generating too many facts. Titled Chaos in the Brickyard, the letter warned that the new generation of scientists was too busy churning out bricks — facts — without regard to how they go together. Brickmaking, Forscher feared, had become an end in itself. “And so it happened that the land became flooded with bricks. … It became difficult to find the proper bricks for a task because one had to hunt among so many. … It became difficult to complete a useful edifice because, as soon as the foundations were discernible, they were buried under an avalanche of random bricks.”

There are three basic reasons scientific data has increased to the point that the brickyard metaphor now looks 19th century. First, the economics of deletion have changed. We used to throw out most of the photos we took with our pathetic old film cameras because, even though they were far more expensive to create than today’s digital images, photo albums were expensive, took up space, and required us to invest considerable time in deciding which photos would make the cut. Now, it’s often less expensive to store them all on our hard drive (or at some website) than it is to weed through them.

Second, the economics of sharing have changed. The Library of Congress has tens of millions of items in storage because physics makes it hard to display and preserve, much less to share, physical objects. The Internet makes it far easier to share what’s in our digital basements. When the datasets are so large that they become unwieldy even for the Internet, innovators are spurred to invent new forms of sharing. For example, Tranche, the system behind ProteomeCommons, created its own technical protocol for sharing terabytes of data over the Net, so that a single source isn’t responsible for pumping out all the information; the process of sharing is itself shared across the network. And the new Linked Data format makes it easier than ever to package data into small chunks that can be found and reused. The ability to access and share over the Net further enhances the new economics of deletion; data that otherwise would not have been worth storing have new potential value because people can find and share them.

Third, computers have become exponentially smarter. John Wilbanks, vice president for Science at Creative Commons (formerly called Science Commons), notes that “[i]t used to take a year to map a gene. Now you can do thirty thousand on your desktop computer in a day. A $2,000 machine — a microarray — now lets you look at the human genome reacting over time.” Within days of the first human being diagnosed with the H1N1 swine flu virus, the H1 sequence of 1,699 bases had been analyzed and submitted to a global repository. The processing power available even on desktops adds yet more potential value to the data being stored and shared.

The article goes on to describe a methodology that is displaying promise in biology. Can it be applied to planetary sciences as well?


  1. I'm seeing a lot of comments from knowledgeable laypeople that reality is outstripping science. Well, that's my sloppy summary, but I believe there is a limited utility in perfecting theory when reality is staring us in the face.

    I'm not sure what to do or say about this, but tradition holds sway in science as in other disciplines, and we need strong action. I don't blame people for continuing their daily routines (I do) but we need something different. As long as the acquisition of things and wealth are more important to our power elites and their vast following than adding value to our lives, we're stuck.

    This is sloppy and off topic a bit, so I apologize for that, but not for being worried sick and wanting us to move on.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.