FAQ: Exporting Data

One of the most frequent questions we get is: The players names and team names are garbled with HTML when I export data. How can I fix this?

Simple Tag Removal:

To remove any HTML tags from the data all you need to do is a quick Find & Replace All.

Fill in “<*>” (no quotes) into the Find what box, and then leave the Replace with box blank and hit Replace All.

And that’s it. You now have a worksheet completely free of all HTML.

More Advanced Parsing:

But since we’re leaving the HTML in, there are some neat tricks you can do if you want to preserve the playerid.

Step 1: Highlight the entire A column and do these replace all’s in sequence, replacing them with nothing (no quotes):

“<a href=”statss.aspx?playerid=”
“position*>”
“<*>”

Step 2: Add an additional column next to first column.

Step 3: Finally you’ll need to run the Text to Columns wizard, which you can access by holding alt + d + e in that order or clicking on it under the data menu. Select the delimiter Other, use the & symbol, hit finish and you’ll have a properly formatted worksheet with playerids preserved.

Using similar techniques, you can preserve teamids and player positions. If you find yourself doing more involved parsing frequently, it’s also possible to record the steps taken as a macro for a one click solution.





David Appelman is the creator of FanGraphs.

14 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jacob Smith
12 years ago

Great post, David. Thank you.