Lahman

Lahman

Lahman’s baseball database contains complete batting and pitching statistics from 1871 to 2014, plus fielding statistics, standings, team stats, managerial records, post-season data, and more.

Original source: www.seanlahman.com

Versions

  • Lahman_2014 (by Jan Motl)

    • add foreign key constrains, delete records noncorforming key constrains

Dataset details

Associated task:
Regression
Domain:
Sport
Data types:
Size:
74.1 MB
Count of tables:
25
Count of rows:
470,225
Count of columns:
353
Missing values:
Yes
Compound keys:
No
Loops:
Yes
Type:
Real
Instance count:
23,111
Target table:
salaries
Target column:
salary
Target ID:
teamID, playerID, lgID
Target timestamp:
yearID

How to download the dataset

The datasets are publicly available directly from MariaDB database.

  1. Open your favourite MariaDB client (MySQL Workbench works, but see FAQ)
  2. Use following credentials:
    • hostname: relational.fit.cvut.cz
    • port: 3306
    • username: guest
    • password: relational
  3. Export "lahman_2014" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).