This past September I gave the closing keynote at posit::conf; it’s now on YouTube to watch. Keen-eyed observers will note from the title that it’s about trustworthy data visualization. But it’s also about trust a bit more generally, and how we should think about it in a world where researchers are faking results, AIs are enthusiastically confabulating, and government is destroying data infrastructure. When you find yourself giving a talk with a little tiny microphone stuck to the side of your head you have to ask yourself some hard questions, but the talk was partly about that. ❧ Continue reading…
Mamdani’s victory in the New York City mayoral election gave me the opportunity to draw a few maps, and also to learn a bit about incorporating additional spatial data into maps drawn in R. R is not a specialized piece of GIS software. ESRI’s ArcGIS is the 800lb gorilla in this world and QGIS the GIMP to its Photoshop, so to speak.
Still, you can do a lot of spatial stuff in R, grounded in the sf package and its many friends. Plus you get the benefit of all the data manipulation and analysis that R is really good at. So, having gotten the precinct-level results for the election, some maps from New York City (e.g., the clipped borough boundaries map), and GTFS data from the MTA describing the structure of the subway system, I was able to draw some things. I strongly approve of the existence of the GTFS, by the way. It’s a spec for encoding transit data and lots of cities use it. Really handy. ❧ Continue reading…
Release 2 of the 2024 GSS cross-section and 1972-2024 culumative data are now available. I’ve updated gssr and gssrdoc to incorporate them. There are quite a few changes in the data and variables, thanks in part to some changes in data collection methods and a privacy/disclosure review.
The gssr and gssrdoc packages are the nicest way to get General Social Survey data up and running in R. The figure above shows (survey-weighted) trends derived from the immameco question. ❧ Continue reading…
Here I continue my efforts to design visualizations that are as poorly-suited as possible to being displayed on phones. It looks pretty good on a big monitor, or six feet wide on a wall.
I made a version of this plot a few years ago. I ended up revisiting it this morning because I’m updating various datasets and code. A Manhattan
plot is a term sometimes used to describe a kind of scatter plot where the x-values are fairly continuous, and
the y values have distributions with long tails, so the plot looks like a skyline. This one here is a bar chart rather than a scatter plot but it’s still a kind of Manhattan plot of Manhattan. ❧ Continue reading…
Regular readers know that I maintain gssr and gssrdoc, two packages for R. The former makes the General Social Survey’s annual, cumulative and panel datasets available in a way that’s easy to use in R. The latter makes the survey’s codebook available in R’s integrated help system in a way that documents every GSS variable as if it were a function or object in R, so you can query them in exactly the same way as any function from the R console or in the IDE of your choice. As a bonus, because I use pkgdown to document the packages, I get a website as a side-effect. In the case of gssrdoc this means a browsable index of all the GSS variables. The GSS is the Hubble Space Telescope of American social science; our longest-running representative view of many aspects of the character and opinions of American households. The data is freely available from NORC, but they distribute it in SPSS, SAS, and STATA formats. I wrote these packages in an effort to make it more easily available in R. If you want to know the relationship between these various platforms, I have you covered. But the important thing is that R is a free and open-source project, and the others are not. ❧ Continue reading…