Board Game Geek - Linear Regression
Board Game Geek Ratings
Max Garber
Project Description
Backstory:Using information we scrape from the web, build linear regression models from which we can learn about movies, sports, or categories.
Data:acquisition: web scraping
storage: flat files
sources: (as listed below or any other publicly available information)
movie: boxofficemojo.com, imdb.com
sports: sports-reference.com
Skills:basics of the web (requests, HTML, CSS, JavaScript)
web scraping
numpy and pandas
statsmodels, scikit-learn
Analysis:linear regression is required, other regression methods are optional
The board game website BoardGameGeek has rating for board games based on user reviews. I created a model relating different features of a board game to its BGG rating.
Web Scraping
Top games by number of votes were scraped from BoardGameGeek using Beautiful Soup. Additional information for each game was scraped its individual BBG page with help from using Selenium. Game expansions were ignored and games published before 1900 were discarded.
Fit
The data was fit using Linear Regression with L1 and L2 regularization values found via a grid search.
Basic Game Information
Fitted relationships between basic game information and the game score.
Mechanics
Effects of a games mechanics on the score of the game.
Categories
Effects of a games categories on the score of the game.
Publishers
Effects a games publisher on the score of the game.