Reddit Threads
Dataset information
Discussion and non-discussion based threads from Reddit which we collected in May 2018. Nodes are Reddit users who participate in a discussion and links are replies between them. The task is to predict whether a thread is discussion based or not (binary classification).
Properties |
Number of graphs: 203,088 |
Directed: No. |
Node features: No. |
Edge features: No. |
Graph labels: Yes. Binary-labeled. |
Temporal: No. |
Stats | Min | Max |
Nodes | 11 | 97 |
Density | 0.021 | 0.382 |
Diameter | 2 | 27 |
Possible tasks |
Graph classification |
Paper: https://arxiv.org/abs/2003.04819
Github Page: https://github.com/benedekrozemberczki/karateclub
Source (citation)
B. Rozemberczki, O. Kiss, R. Sarkar: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs 2019.
@inproceedings{karateclub,
title = {{Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs}},
author = {Benedek Rozemberczki and Oliver Kiss and Rik Sarkar},
year = {2020},
pages = {3125–3132},
booktitle = {Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM '20)},
organization = {ACM},
}
Files