To support the growth of relational machine learning.
How to cite
Cite this article.
- Why are the datasets not stored in CSV files?
- Because CSV files do not store information about data types, PKs, FKs and other constraints.
- Why MySQL database?
- Because in combination with ClowdFlows you can process the datasets online.
Just open one of the public workflows (like Wordification or Cross-validation), change the credentials in "MySQL Connect" operator to the credentials from the repository and you are ready to go!
- Why am I not able to connect to the database?
- If you are connecting to the database over a corporate network, the corporate firewalls could be the culprit.
Try to access the database with a different internet provider (e.g. with your cellular provider). If the problems persist, contact us.
- Why MySQL Workbench complaints about incompatible/nonstandard server version?
- We are using open source version of MySQL called MariaDB, hence the warning. For all purposes that the public account permits it is safe to ignore the message.
- What to do if I want an ILP format?
- See a collection of datasets at ILPnet2.
- Why do the datasets contain missing values/composite keys/strange data types/any other ugly thing you may think of?
- Because they are also present in the real datasets.
- What's the point of including artificial datasets?
- While datasets like Adventure Works may not contain any pattern that could be found during modeling, they still increase the diversity of the repository. For example, the named Adventure Works dataset has the highest table count in the whole repository.
If your algorithm can process all the tables present in Adventure Works, it may be able to process real-world datasets.