Meeting Titans of Open Data

October 24, 2017
/ Blog Open Data
/ By Ari Lamstein

The recent Association of Public Data Users (APDU) Conference gave me the opportunity to meet some people who have made tremendous contributions to the world of Open Data.

Jon Sperling

The author with Jon Sperling

One of the most popular R packages I’ve published is choroplethrZip. This package contains the US Census Bureau’s Zip Code Tabulation Area (ZCTA) map, as well as metadata and visualization functions. It literally never occurred to me to think about who created the first ZCTA map, the methodology for creating it, and so on.

It turns out that one of the people who created the ZCTA map – Jon Sperling – was at the conference. We had a fascinating conversation, and I even got to take selfie with him! You can learn more about Jon’s role in the creation of the TIGER database in his 1992 paper Development and Maintenance of the TIGER Database: Experiences in Spatial Data Sharing at the U.S. Bureau of the Census.

Jon currently works at the Department of Housing and Urban Development (HUD). You can learn more about their data here.

Nancy Potok

Nancy Potok is the Chief Statistician of the United States, and she gave a fascinating keynote. Before her talk I did not know that the country even had a Chief Statistician. Her talk taught me about the Federal Statistical System as well as the Interagency Council on Statistical Policy.

The Q&A portion of her talk was also interesting. A significant portion of the audience worked at federal agencies. I believe that it was during this session when someone said “I can’t tell how much of my time should be dedicated to supporting the data and analyses which I publish. In a very real sense, my job ends when I publish it.” This question helped me understand why it is sometimes difficult to get help understanding how to use government datasets: there simply aren’t incentives for data publishers to support users of that data.

Andrew Reamer

Sometimes at a conference you can tell that there’s a celebrity there just by how people act towards them.

In this case it seemed that everyone except me knew who Andrew Reamer was. During the Q&A portion of talks, Andrew would raise his hand and speakers would say “Hi Andrew! What’s your question?”. Or they would get a question from someone else and say “I’m actually not sure what the answer is. Maybe Andrew knows. Andrew?”

Note that Andrew didn’t actually have a speaking slot at the event. But yet he still did a lot of speaking!

During lunch I worked up the courage to go up Andrew and ask him about his work. It turns out that he is a Research Professor at George Washington University’s Institute of Public Policy. He is a well-known social scientist. As I’m quickly learning, social scientists tend to be major users of public datasets.

I previously published a quote from the Census Bureau that the American Community Survey (a major survey they run) impacts how over $400 billion is allocated. However, I was never able to get any more granularity on that. If you’re interested in the answer to that, it turns out that Andrew wrote a piece on it: Counting for Dollars 2020: The Role of the Decennial Census in the Geographic Distribution of Federal Funds.

Sarah Cohen

Sarah Cohen is a Pulitzer Prize winning journalist who is now the Assistant Editor for Computer-Assisted Reporting at the New York Times. Before Sarah’s talk I only had passing familiarity with data journalism, and had a very narrow view of what it entailed. Sarah’s keynote gave examples of different ways that public data is shaping journalism. She also discussed common problems that arise when journalists communicate with data publishers.

Key Takeaway

The 2017 APDU Conference was a great chance for me (an R trainer and consultant with a strong interest in open data) to meet people who have made major contributions to the field of public data. If you are interested in learning more about the Association of Public Data Users, I recommend visiting their website here.