Job Postings Word Cloud

Over the past year, we have posted 32 different job postings from 20 different Major League Baseball teams and 15 job postings from TrackMan, Baseball Information Solutions, Inside Edge, STATS Inc, TruMedia, Wasserman Media Group and the Sydney Blue Sox. At Paul Swydan’s suggestion, I created word clouds to summarize these postings. These give a quick overview of what those jobs entail and the required qualifications. For those not familiar with the research and data science side of baseball, I’ll explain a few of the software tools which are prominent in the job postings and can be found in the word cloud.

To make the word cloud, I collected all the pieces we’ve published since January 2015 that contained “Job Posting” in the title. I separated the text content of each post into two different categories: job description and qualifications. From there, I took those two documents into R and used the tm package to clean the text, removing punctuation and unnecessary words like articles and prepositions. The package also tabulated the words. Additionally, I removed some other words like baseball, experience and strong. These words occurred frequently in the posts, but they were either obvious or not helpful. Then with the processed text data, I constructed the graphic using the aptly named wordcloud package. If you are unfamiliar with word clouds, larger words indicate that the specific word was found more often in the job postings.

Job Posting Descriptions

The job description word cloud contains typical jargon commonly found in job postings, such as communication, environment and strategy. But words like queries, assist, develop and research summarize what most job postings entail.

Job Posting Qualifications

I find the qualifications more interesting than the description since it mentions the specific skills candidates need to be considered. Out of the software tools, SQL occurs the most often. SQL is a database querying language. There are many different implementations of SQL databases such as MySQL and Microsoft SQL. There are some differences between them, but the structure of the query language is similar. SQL is popular because most baseball data is kept in large relational databases, which are like very large, very robust spreadsheets. Speaking of spreadsheets, Excel does show up in the word cloud, but not as much other tools such as SQL, R or Python. R is a statistical programming language that is used for creating and evaluating models. Python can be used in a similar manner as R, but R has more Statistics-centric packages.

You Aren't a FanGraphs Member
It looks like you aren't yet a FanGraphs Member (or aren't logged in). We aren't mad, just disappointed.
We get it. You want to read this article. But before we let you get back to it, we'd like to point out a few of the good reasons why you should become a Member.
1. Ad Free viewing! We won't bug you with this ad, or any other.
2. Unlimited articles! Non-Members only get to read 10 free articles a month. Members never get cut off.
3. Dark mode and Classic mode!
4. Custom player page dashboards! Choose the player cards you want, in the order you want them.
5. One-click data exports! Export our projections and leaderboards for your personal projects.
6. Remove the photos on the home page! (Honestly, this doesn't sound so great to us, but some people wanted it, and we like to give our Members what they want.)
7. Even more Steamer projections! We have handedness, percentile, and context neutral projections available for Members only.
8. Get FanGraphs Walk-Off, a customized year end review! Find out exactly how you used FanGraphs this year, and how that compares to other Members. Don't be a victim of FOMO.
9. A weekly mailbag column, exclusively for Members.
10. Help support FanGraphs and our entire staff! Our Members provide us with critical resources to improve the site and deliver new features!
We hope you'll consider a Membership today, for yourself or as a gift! And we realize this has been an awfully long sales pitch, so we've also removed all the other ads in this article. We didn't want to overdo it.

While much attention in the public sphere is centered around the data and the information derived from that data, communication of that information to the front offices and the team is almost as important. The information is useless if the front office, coaches and players can’t access and understand it. Software used to make end-user tools are found in the word cloud, too, such as JavaScript, HTML and Tableau. HTML and JavaScript are both used to create interactive, browser-based interfaces, and Tableau is an enterprise-centric data visualization tool that’s available in a free public version.

At FanGraphs, we use many of these tools on a daily basis. Excel is ubiquitous, being used for data prep, data visualization or even checking .csv files. For more in-depth or customized research we use SQL to create data sets for articles. R has been used for many research-intensive projects such as Jonah Pemstein’s posts. Bill Petti has written an introduction to using R for baseball statistics at the Hardball Times and he is creating a package to make it easier for people to access data from FanGraphs and Baseball Reference to use in their own analysis.

The analytical tools I highlighted are mostly open-source and there are many free web resources available to anyone to learn who is willing to put in the time and effort.

The last interesting word to pop up in the qualification word cloud is weekends. Who doesn’t love to work weekends?





I build things here.

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Baron Samedi
9 years ago

If you’re not a player or an owner, working in baseball seems like a horrible life.

Paul SwydanMember since 2020
9 years ago
Reply to  Baron Samedi

If your goal is to maximize your income, it certainly isn’t an optimal job. If you’re looking to do something you love, and you love baseball, there aren’t many jobs that are more optimal.