R Programmers: What is your biggest problem when working with Census data?

A few weeks ago I announced my latest project: Data Science instruction at the Census Bureau.

In addition to announcing the project, I also snuck a bit of market research into the post. I asked people the types of analyses they do when working with Census data. I also asked what packages they use when solving those problems.

23 people left comments, and they have been very helpful in shaping the curriculum of the course. Thank you to everyone who left a comment!

That was such an effective way to learn about the community of R Census users that I’ve decided to do it again. If you are an R programmer who has worked with Census data, please leave a comment with an answer to this question:

What is your biggest problem when working with Census data in R?

Understanding the obstacles people face has the potential to help us design better courses.

Leave your answer as a comment below!

Comments on R Programmers: What is your biggest problem when working with Census data?

  1. Scott Davis says:

    The problem I’ve been having lately is with matrix arithmetic, specifically multiplying rows of arrays. Seems to start with R thinking my dataframe is a list instead of a DF which doesn’t make a lot of sense to me. For example, I might want to multiply a row of population data by a column of income distributions to calculate the share of population in each category.

  2. kent37 says:

    I’m a beginner with Census data; my biggest problem is always just finding the data I need. Hopefully Census Bureau workers won’t have that problem!

  3. Matt Marzillo says:

    When first getting started it was hard to get to the roll-up columns. They’re there most of the time, but often they’re parsed out in many different sub-columns (i.e. by age, gender, etc) and I had do some digging to find the right columns. Some sort of list of “most commonly used data elements” would’ve been super helpful.

    Also, I typically wind up mapping the census data back to my internal data by city/state and there’s often much cleanup to get the joins working, but that’s just part of working with data…not sure there’s anything to be done with that.

  4. Ricardo says:

    I use the Census data quite a bit, and API-type fetches have helped simplify a lot of my challenges. Anything in this direction is super helpful.

  5. My largest challenge with census data was navigating the huge number of variable names in surveys such as the ACS. I learned that carefully reading the survey questionnaire is a much better approach than attempting to scroll through the endless list of variables that are segmented to provide useful information. Also, understanding how the more accurate CPS population data is joined with other surveys instead of the ACS population data. Including residual or error data fields and including them in reports. The tradeoff between larger administrative levels to get larger sample sizes for recent data and using a larger span of time to overcome small sample sizes for smaller administrative levels ( smaller than counties, PUMA, congressional districts )

Comments are Closed