Explore a database of the most popular “Florida Man” headlines

The web app uses parsed headlines from the most highly rated Florida Man subreddit posts of all time.

Kevin McElwee
4 min readMar 2, 2020

--

For almost a decade, “Florida Man” has been a mainstay antihero of internet culture. Headlines like “Florida man too fat for jail” and “Florida man steals dinosaur bones” are easy fodder for meme-ification. In early 2013, “Florida Man” was canonized on Twitter with @_FloridaMan and on Reddit with the r/FloridaMan subreddit. And after seven years of retweeting and upvoting, we can gather the most popular headlines to see what makes a “Florida Man” headline successful.

Below is a quick overview of the numbers behind these headlines and here is a link to a web app where you can explore the articles on your own.

And just as a quick warning: “Florida Man” headlines often contain adult content.

DATA ANALYSIS

By gender

There are more than four times as many “Florida Man” headlines as there are “Florida Woman” headlines, though given that the data was pulled from the “Florida Man” subreddit, that shouldn’t come as much a surprise.

By publication

By a large margin, the Tampa Bay Times (shown below as tampabay.com) is the largest producer of Florida Man material, at least material that generally ranks well online, but their stories have some of the highest and lowest scored URLs in the data collected.

Fox News is the only national outlet in the Top 10 (abcactionnews.com is specific to the Tampa Bay region), and they largely file their articles under the their “U.S.” desk.

Reddit score by publication (outliers removed)

As shown in the box plot above, there isn’t too dramatic a difference among the news organizations as rated by redditors.

By content

Florida’s “sunshine laws” allow public access to government materials, including arrest records. These laws’ effect on the “Florida Man” phenomenon is clear when looking at the verb distribution below. Verbs like “arrest”, “steal”, and “shoot” are very common.

Many of the top verbs, highlighted in red, show how Florida Man headlines largely pertain to items likely found in police records. Florida’s “Sunshine Laws” allow public access to police records.

WEB APP

Here is a link to my headline explorer. The platform allows you to easily sift through the database. You can sort alphabetically or by frequency.

METHODOLOGY

The Reddit API

Data was pulled using Reddit’s API. Using Reddit’s praw Python library is relatively straightforward, making it easy to collect URLs. The following few lines of code were all that was necessary:

The full code can be found in this notebook. Each page object provided by the API contains100 posts, and unfortunately, Reddit only provides the top 1000 reddit submissions, so our database cannot be expanded to the entire subreddit. (Advice on how to get more data is welcome!)

Why Reddit?

When compared to Twitter, Reddit’s data is easier to parse. The main “Florida Man” handle on Twitter includes images and commentary, whereas Reddit posts are often just a URL with a headline. Furthermore, the “Florida Man” subreddit has 640k followers, and the Twitter account has 412k followers, so they provide similar indicators for an article’s popularity.

Headline parsing with NLTK

I parsed the headlines using NLTK (the Natural Language Tool Kit), a python library that parses a sentence and can give the parts of speech. In order to standardize the verbs for the analysis and web app, we can also use theWordNetLemmatizer function to put all verbs into one tense (e.g. “give”, “gave”, “giving” would all be compressed to “give”).

The function below helped me properly sort through the verbs in headlines, though some manual entry and cleaning was required.

Cleaning Data

Reddit-specific posts were dropped and some posts (like cartoons or mugshots) were not conducive to the web app’s format and were removed. Basic cleaning and verb parsing are documented in this notebook. While many headlines were double-checked by hand, there are definitely repeats and errors that still exist.

Data and notebooks

If you are interested in seeing the scripts or the data, it’s available at this link, and the full cleaned dataset is available here.

Questions? Corrections? Contact me or see more projects on my website.

--

--