Max Garber Metis Data Science Blog:
    About     Archive     Feed

Board Game Geek - Linear Regression

Board Game Geek Ratings

Max Garber

Project Description Backstory:

Using information we scrape from the web, build linear regression models from which we can learn about movies, sports, or categories.

Data:

acquisition: web scraping

storage: flat files

sources: (as listed below or any other publicly available information)

movie: boxofficemojo.com, imdb.com

sports: sports-reference.com

Skills:

basics of the web (requests, HTML, CSS, JavaScript)

web scraping

numpy and pandas

statsmodels, scikit-learn

Analysis:

linear regression is required, other regression methods are optional



The board game website BoardGameGeek has rating for board games based on user reviews. I created a model relating different features of a board game to its BGG rating.

Web Scraping

Top games by number of votes were scraped from BoardGameGeek using Beautiful Soup. Additional information for each game was scraped its individual BBG page with help from using Selenium. Game expansions were ignored and games published before 1900 were discarded.

Geek Ratings v features

Fit

The data was fit using Linear Regression with L1 and L2 regularization values found via a grid search. BGG fit

Basic Game Information

Fitted relationships between basic game information and the game score.

Basic Game Information

Mechanics

Effects of a games mechanics on the score of the game.

Number of Mechanics Mechanics

Categories

Effects of a games categories on the score of the game.

Categories

Publishers

Effects a games publisher on the score of the game.

Publishers