Musk

Musk

The Musk database describes molecules occurring in different conformations. Each molecule is either musk or non-musk and one of the conformations determines this property. Such a problem is known as a multiple-instance problem, and is modeled by two tables molecule and conformation, joined by a one-to-many association. Confirmation contains a molecule identifier plus 166 continuous features. Molecule just contains the identifier and the class. There are two versions of the dataset, MuskSmall, containing 92 molecules and 476 confirmations, and MuskLarge, containing 102 molecules and 6598 confirmations.

Original source: sourceforge.net

Versions

  • MuskLarge (by Arnaud Barragao)

  • MuskSmall (by Arnaud Barragao)

Dataset details

Associated task:
Classification
Domain:
Medicine
Data types:
Size:
5.8 MB
Count of tables:
2
Count of rows:
6,380
Count of columns:
170
Missing values:
No
Compound keys:
No
Loops:
No
Type:
Real
Instance count:
102
Target table:
molecule
Target column:
class
Target ID:
molecule_name
Target timestamp:
?

How to download the dataset

The datasets are publicly available directly from MySQL database.

  1. Open your favourite MySQL client (for example MySQL Workbench)
  2. Use following credentials:
    • hostname: relational.fit.cvut.cz
    • port: 3306
    • username: guest
    • password: relational
  3. Export "MuskLarge" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).