Traffic accident database consists of all accidents that happened in Slovenia’s capital city Ljubljana between the years 1995 and 2005.
Adventure Works 2014 (OLTP version) is a sample database for Microsoft SQL Server, which has replaced Northwind and Pub sample databases that were shipped earlier. The database is about a fictious, multinational bicycle manufacturer called Adventure Works Cycles.
Airline on-time data are reported each month to the U.S. Department of Transportation (DOT), Bureau of Transportation Statistics (BTS) by the 16 U.S. air carriers that have at least 1 percent of total domestic scheduled-service passenger revenues, plus two other carrie…
The task is to predict rank of teams.
Transactional data from Czech debit card company specialising on payments at petrol pumps.
A database containing geospatial information, as well as SAT average scores and Free-or-Reduced-Price Meal eligibility data, for California schools.
The goal is to predict the outcome of a match.
The schema is for Classic Models, a retailer of scale models of classic cars. The database contains typical business data such as customers, orders, order line items, products and so on.
A bit more complex artificial database with loops.
Artificial data from a Czech bank.
Officer-involved shootings as disclosed by the Dallas Police Department. Includes separate tables for officer and subject/suspect information.
The employees test database: small, fake database of employees.
Ergast.com is a webservice that provides a database of Formula 1 races, starting from the 1950 season until today. The dataset includes information such as the time taken in each lap, the time taken for pit stops, the performance in the qualifying rounds etc. of all Fo…
PKDD'99 Financial dataset contains 606 successful and 76 not successful loans along with their information and transactions. The standard task is to predict the loan outcome for finished loans (A vs B in loan.status) at the time of the loan start (defined by loan.dat…
Anonymised data from a hospital in Hradec Kralove, Czech Republic, about treatment and medication.
PAKDD'15 Data Mining Competition: The task is to reconstruct the information about user’s gender from product viewing logs. The data were obtained from simulations of product viewing activities of users with known gender. The data closely follow the real-life distribut…
Data on deputies and senators in the Czech Republic.
GO Sales dataset from IBM contains information about daily sales, methods, retailers, and products of a fictitious outdoor equipment retail chain “Great Outdoors” (GO). The task is to predict sale quantity.
Bulgarian court decision metadata.
PKDD'99 Medical dataset describes 41 patients with Thrombosis.
A geography dataset from University of Göttingen describes 114 Christian countries and 71 non-Christian countries.
A database with information about basketball matches from the National Basketball Association. Lists Players, Teams, and matches with action counts for each player.
2015 NCAA Basketball Tournament.
The Northwind database contains the sales data for a fictitious company called Northwind Traders, which imports and exports specialty foods from around the world.
A database with information about football matches from the UK Premier League. Lists Players, Teams, and matches with action counts for each player.
The pubs sample database is modeled after a book publishing company.
The venerable sakila test database: small, fake database of movies.
You are a member of the Sales Management team in a large retail bank. The current date is July 02, 2007. Your Sales Director has just asked you to generate additional revenues of $1,500,000 before September 01, 2007. You must find ways to sell more "Credit++" – the ne…
Seznam.cz is a web portal and search engine in the Czech Republic. The data represent online advertisement expenditures from Seznam's "wallet". Table description: client: location and domain field of the client (anonymized) dobito: prepaid into a wallet in Czech cur…
The San Francisco Dept. of Public Health’s database of eateries, inspections of those eateries, and violations found during the inspections. The task is to predict the unscheduled inspection scores from 2013 to 2016. The scores range from 1 to 100, where 100 means that…
An anonymized dump of all user-contributed content on the Stats Stack Exchange network.
TPC-C is the benchmark published by the Transaction Processing Performance Council (TPC) for Online Transaction Processing (OLTP).
TPC-D represents a broad range of decision support (DS) applications that require complex, long running queries against large complex data structures.
TPC-DS is the new decision support benchmark that models several generally applicable aspects of a decision support system, including queries and data maintenance. Although the underlying business model of TPC-DS is a retail product supplier, the database schema, data …
TPC-H is the benchmark published by the Transaction Processing Performance Council (TPC) for decision support.
VOC database provides a peephole view into the administrative system of an early multi-national company, the Vereenigde geoctrooieerde Oostindische Compagnie (VOC for short - The (Dutch) East Indian Company) established on March 20, 1602.
Walmart challenges participants to accurately predict the sales of 111 potentially weather-sensitive products (like umbrellas, bread, and milk) around the time of major weather events at 45 of their retail locations.