Final version excerpt, clearly labeled
Data and Context-
...Or how I got a my mushroom data visualization project post taken down from Reddit because they thought I was going to poison everybody.
This was such a wild 24 hours. A data visualization project, where the focus was to present data in an attractive, accurate, and informative way ultimately got taken down in 12 hours from r/dataisbeautiful.
Long story short, data people didn't like it because I didn't clearly explain the source material, and mushroom people didn't like it because they thought I was trying to create a definitive field identification guide that could potentially be harmful.
I learned quite a bit from this experience, especially that the UX of consuming data needs to clearly explain what is being presented, and lead users on how to think about it.
Brag: I found some good chanterelles this year.
In March of 2023, I took a data visualization course on Domestika by Federica Fragipani on data visualization. In the course, we learned how to source, clean and graph data from a CSV file.
For my final project, I took data from an 8000+ record CSV data set of characteristics of mushrooms, and created an infographic in Illustrator. Seems pretty innocuous, yes?
Reader, it was not.
The Data Source
The CSV file I used featured 23 species of mushrooms (this will be important later!) collected in the early 80s and made available online by the University of Irvine Machine Learning Project, Donated to UCI ML 27 April 1987.
Context and content:
Although this dataset is quite old, mushrooms don't evolve that quickly and the data is still relevant (this will also be important later!). The first question everybody asks when I pick a mushroom is "Is it poisonous?"
Twenty three features including odor, rings, gill shape and color, cap color, spore print, and growth area were recorded, along with toxicity.
Sorting and cleaning:
22 characteristics to 4.
I downloaded the CSV and got to work in Google Sheets. I chose to only focus on a handful of characteristics: I eliminated the categories where there was no significant amount of data such as the mushrooms having double rings, or categories that had no obvious correlation to toxicity such as texture of the stem. I chose to focus on the most recognizable and universal features, namely:
spore print color
I also looked at substrate, but chose to eliminate that category as well, simply for length.
Traditional to Stylized
I could have stopped here, and in fact many people said that my final presentation was too difficult to read (more on that later).
This is a bit of initial graphing created in Google sheets and RAW Graphs that helped me, well, visualize the data for the first time.
And why it got banned.
Unpacking the criticisms and defending some decisions.
What I uploaded to Reddit
First impressions and caveats:
This is not traditional digital data visualization, and was not meant to be part of a dashboard or interface of any digital product. This is intended to be more of a graphic for print.
As Federica Fragipani said about her own work:
"I wish to invite the readers in to looking at my pieces and to explore them, when the users' context allows it."
(she's Italian, and this is the most Italian thing anyone can say.)
Looking back, this is definitely an early draft, and there are many design decisions I'm not going to stand by now. But for now I want to point out a few things:
1) The colors chosen were for high contrast without being garish: I wanted an elegant feel, but I didn't want to stick to only "mushroomy" colors, because they looked a bit too earthy and bland. I went overboard, though.
2) I used photos for some visual aid, since it's quite difficult to describe what gills are or a spore print is without an example. Plus I just liked it.
3) There is a wide variation in the sizes of certain records (1000s or records to 10 records, in some cases) and I struggled quite a bit with trying to show them accurately.
I posted this on Reddit /dataisbeautiful, and after 12 hours it got removed. Here's basically why:
"I can't use this to identify poisonous mushrooms" 🍄❓🔍
Users/ Redditors were confused about the intended use of this chart. They wanted a clear field guide "key" to id unknown mushrooms
This was not, nor was it ever intended to be a field guide. It shows the likelihood of a mushroom being toxic based on various traits.
Aggregate data is not granular!
In subsequent postings on other boards, I stated that this was largely decorative, and not intended for ID of mushrooms
"Smaller traits are hard to see" 🔬
"I’m not a huge fan of the last two visualizations. I’m not positive, but I think your overall purpose here was to show how well various traits correlate with toxicity. The third one makes this difficult in two ways: first by not putting the same traits on the same horizontal level, and second by also trying to include the relative sizes of the traits. For the rarer traits, it’s hard to see which small circle is smaller and by how much."
This is going to be the challenge of any data visualization that had vastly different quantities that need to be shown, and I'm new enough that I haven't quite worked that out.
I would love a more interactive platform that could expand and contract so these points could be seen better.
"The data is old" 🧓
From the r/Dataisbeautiful moderator:
The data is from 30 years ago which brings into question it's accuracy. While you did note this is a couple response comments, it wasn't in the main comment required for citing the data source and tool used to create the visual.
Mushrooms haven't evolved in the past 30 years, so IDK what the issue is.
People use old data all the time, how old is too old?
The mod was actually really nice, but I just don't understand this point at all.
"This data set isn't useful, and the resulting graphs are misleading," 👎📊
"this is based on data from a very outdated field guide and only accounts for 23 species. This is not even close to being statistically significant enough to draw any conclusions at all
ETA: it even states this in one of your sources: “The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like ``leaflets three, let it be'' for Poisonous Oak and Ivy.”
😨 My bad, and it got my post taken down.
This one hurt, and he was an *ss about it, but he was right. Here's the worst mistake I made:
8000+ RECORDS are not the same as 8000+ SPECIES. There are about 10k species in America, so this would be a massive undertaking.
This data set only examined 23 species of Agaricus and Lepiota mushrooms. (these are the most common species of mushrooms with several hundred sub-species, but still)
Here's the problem: Users thought this chart represented ALL mushrooms in North America, and that just is not the case. Therefore, users would draw incorrect conclusions about all mushrooms' toxicity based on this data.
I solemnly swear: ✋
💬 I will clearly explain the source and scope of my data.
✨ Aesthetics will come second to presentation.
🤏 I will work harder to show data points that are smaller in a clear way.
💀 I will explain the correct purpose and use of may data so people won't accidentally poison themselves.
📊 I will learn more about data visualization so I can avoid these problems in the future.
There are still some problems I'm working on right now- like the small data points, but I really learned a lot from this project and I hope to try more iterations in the future.