CORA

CORA

The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.

Original source: linqs.cs.umd.edu

Versions

  • CORA (by Arnaud Barragao)

Dataset details

Associated task:
Classification
Domain:
Education
Data types:
Size:
4.6 MB
Count of tables:
3
Count of rows:
57,884
Count of columns:
6
Missing values:
No
Compound keys:
No
Loops:
Yes
Type:
Real
Instance count:
2,708
Target table:
paper
Target column:
class_label
Target ID:
paper_id
Target timestamp:
?

How to download the dataset

The datasets are publicly available directly from MySQL database.

  1. Open your favourite MySQL client (for example MySQL Workbench)
  2. Use following credentials:
    • hostname: relational.fit.cvut.cz
    • port: 3306
    • username: guest
    • password: relational
  3. Export "CORA" database (or other version of the dataset, if available) in your favourite format (e.g. CSV or SQL dump).