Cooperative Institute for Research in Environmental Sciences
Tuesday, November 12, 2024

Social media posts reveal regional patterns in seasonal allergies

CU Boulder scientists use machine learning to identify how allergy intensity and timing vary across the US

A pine tree releasing yellow pollen into the air
A pine tree releasing pollen into the wind.
- W. Carter/Wikimedia Commons

Over 25 percent of adults in the U.S. suffer from seasonal allergies, but scientists have struggled to track allergy trends because cases don’t always require medical care. Some allergy sufferers venture online to post about their itchy eyes, runny noses, and sneezing on social media or to search for remedies. Now, CU Boulder scientists are harnessing information from online activity to track allergy intensity across the U.S. The work, published recently in PNAS Nexus, reveals important regional patterns, including an allergy “hotspot” in the Southeastern U.S. and a winter allergy season in Colorado, Texas, and Florida.

“There isn’t a good metric for measuring the intensity of seasonal allergies,” said Elías Stallard-Olivera, a PhD student in Ecology and Evolutionary Biology at CU Boulder and lead author of the paper. “Traditional allergy prediction methods, like pollen counts, often fall short, making it essential to develop more reliable ways of identifying and tracking allergen sources.”

Stallard-Olivera and Noah Fierer, CIRES fellow and CU Boulder professor of Ecology and Evolutionary Biology, decided to explore readily available data online. They used machine learning to identify and extract counts of allergy-related social media posts on X, formerly known as Twitter, between 2016 and 2022. They then grouped the data by county.

Using online activity to track health trends is not new, but in recent years, scientists have raised concerns about the accuracy of the approach, including in Google’s flu tracker. To validate their dataset, the team added an extra step — they used a statistical technique called “cointegration” to compare the occurrence of allergy-related social media posts to Google search frequency and hospital records from California.

Economists use cointegration to track the relationship between more than two things or variables over time. Variables are “cointegrated” when they exhibit the same patterns; for example, when the price of cacao beans rises, the price of chocolate also tends to increase. The study is one of the first to apply the statistical technique to online activity.

The team found a strong cointegration between the allergy-related online data and hospital records, which meant they could use their social media dataset as a “proxy” for seasonal allergy intensity. 

Stallard-Olivera and Fierer calculated Z-scores, a measure of how far away a data point is from the average, for U.S. counties with populations greater than 500,000. Counties with higher, more positive Z-scores represent areas with more intense seasonal allergies, while counties with lower, more negative Z-scores represent areas with less intense seasonal allergies. They then used the Z-scores to build annual and monthly maps of seasonal allergy intensity.

Their results show that seasonal allergies are most severe in the Southeastern U.S. and least severe in Florida and Southern California. The team also discovered stark differences between regions within California; for example, allergies in the Central Valley are much more intense than elsewhere in the state. 

A map of the United States showing seasonal allergy intensity using Z-scores. Highest Z-scores (yellow dots) are found on the East Coast. Lowest Z-scores (purple) are found in Florida and Southern California.

Seasonal patterns in allergy intensity across the 144 counties in the United States with a population above 500,000, based on the Z-score of Twitter post volume averaged across 2016–2022 related to seasonal allergies. Spring months (March through May) represent the dominant allergy season, and fall months (September through October) represent the second most prominent season. Summer and winter allergy seasons are less pronounced but can be important in specific regions, including Colorado, Florida, and Texas. Figure from Stallard-Olivera and Fierer, 2024.

When they dug into the monthly data, the researchers found that spring (March through May) is the most dominant allergy season across the U.S., with fall (September through October) coming in second. However, peak intensity and timing were not the same in every region. In Florida, peak allergy season is March, but in the Midwest and Northeast, the peak is in April and May.

Their analysis also revealed some surprising patterns; for example, a small uptick in allergies during winter months in Colorado, Florida, and Texas.

Maps of the United States showing allergy intensity by month.

Weekly allergy intensity as measured by the raw probability of allergy-related Twitter posts in California by county, arranged north to south for all counties that had a sufficiently large number of geolocated Twitter data (>500,000 people). Allergy intensity was calculated based on allergy-related Twitter posts generated over a period from 2020 January 1 through 2019 December 31. Figure from Stallard-Olivera and Fierer, 2024.

The team’s work would not have been possible without the power of social media. “We are just thankful that so many allergy sufferers are willing to complain about their allergies on social media,” Fierer said.

Now, the team is looking toward the future. They will use this study’s findings to investigate the link between weather and seasonal allergies. The team also plans to integrate the data on seasonal allergies with DNA-based analyses of air samples from across the U.S. to identify the specific allergens, including molds and pollens, responsible for the spikes in seasonal allergies.

“Now that we know when allergies are most likely to spike, the next step is better to determine the specific triggers of allergies and why the spikes occur when they do,” Fierer said.

Contacts

Recent News