Open positions
Open research positions in SNAP group are available at undergraduate, graduate and postdoctoral levels.

Reddit Threads

Dataset information

Discussion and non-discussion based threads from Reddit which we collected in May 2018. Nodes are Reddit users who participate in a discussion and links are replies between them. The task is to predict whether a thread is discussion based or not (binary classification).

Properties
Number of graphs: 203,088
Directed: No.
Node features: No.
Edge features: No.
Graph labels: Yes. Binary-labeled.
Temporal: No.
StatsMinMax
Nodes 1197
Density 0.0210.382
Diameter 227

Possible tasks
Graph classification

Paper: https://arxiv.org/abs/2003.04819
Github Page: https://github.com/benedekrozemberczki/karateclub

Source (citation)

  • B. Rozemberczki, O. Kiss, R. Sarkar: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs 2019.
  •   >@misc{karateclub2020,
        title={An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs},
        author={Benedek Rozemberczki and Oliver Kiss and Rik Sarkar},
        year={2020},
        eprint={2003.04819},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
    }
    
      

    Files

    File Description
    reddit_threads.zipReddit Threads dataset