Mission
To support the growth of relational machine learning.
How to cite
Cite this article.
FAQ
- Why are the datasets not stored in CSV files?
- Because CSV files do not store information about data types, PKs, FKs and other constraints.
- Do I have to install a database client?
- You can try a web database client instead and open "Relational dataset repo".
- Why am I not able to connect to the database?
- If you are connecting to the database over a corporate/university network, firewalls could be the culprit (they may block port 3306).
Try to access the database with a different internet provider (e.g., with your cellular provider).
Also, keep in mind that database names are case sensitive. Database "mutagenesis" is not the same database as "Mutagenesis".
If the problems persist, contact us and provide us with the following information:- Your database client and its version (e.g., MySQL Workbench 6.3.10).
- The database name you tried to connect to (e.g., mutagenesis).
- Why MySQL Workbench complaints about incompatible/nonstandard server version?
- We are using open source version of MySQL called MariaDB, hence the warning. For all purposes that the public account permits it is safe to ignore the message.
- Why mysqldump cannot find COLUMN_STATISTICS in information_schema?
- MariaDB has the table in MYSQL.COLUMNM_STATS. Use one of the workarounds.
- What to do if I want an ILP format?
- See a collection of datasets at ILPnet2.
Or use a conversion tool, where you have to change the connection parameters insrc/Read.java
from:read.setConnection("jdbc:mysql://mantong01.dyndns.org:3306/mln","temp","Passw0rd");
to:read.setConnection("jdbc:mysql://relational.fit.cvut.cz:3306/mutagenesis","guest","relational");
- Why do the datasets contain missing values/composite keys/strange data types/any other ugly thing you may think of?
- Because they are also present in the real datasets.
- What is the point of including artificial datasets?
- While datasets like Adventure Works may not contain any pattern that could be found during modeling, they still increase the diversity of the repository. For example, the named Adventure Works dataset has the highest table count in the whole repository.
If your algorithm can process all the tables present in Adventure Works, it may be able to process real-world datasets.
Tools that use our repository
dm: Relational Data Models, a package for working with relational data in R.
Data Xtractor, a visual SQL query builder for Windows.
getML, a propositionalization library in Python.