CRED-1: An Open Multi-Signal Domain Credibility Dataset for Misinformation Pre-Bunking

Mar 21, 2026

—

I’m excited to share CRED-1, an open, reproducible domain-level credibility dataset that I’ve been working on as part of my doctoral research at Frankfurt University of Applied Sciences. The preprint is now available on SSRN, and the full dataset and pipeline are openly accessible on GitHub and Zenodo.

What is CRED-1?

CRED-1 provides credibility scores for 2,672 domains known to publish misinformation, conspiracy theories, or other unreliable content. Unlike existing approaches that rely on a single ground-truth label, CRED-1 combines multiple openly licensed signals into a composite credibility score ranging from 0.0 (least credible) to 1.0 (most credible).

Multi-Signal Scoring

The dataset integrates signals from diverse sources:

Fact-checker ratings from curated source lists
Media Bias/Fact Check (MBFC) classifications
Domain age and registration history via WHOIS
TLS certificate quality and transparency
Web popularity signals
Google Safe Browsing and Fact Check Tools API data

This multi-signal approach lets researchers and developers define their own credibility thresholds rather than depending on a single binary label.

Designed for Pre-Bunking

A key design goal of CRED-1 is pre-bunking: flagging potentially unreliable domains before users engage with their content, rather than debunking claims after they’ve already spread. The dataset is deliberately compact enough for on-device deployment in mobile apps, browser extensions, or edge computing scenarios, with no server calls required.

This privacy-preserving approach means users can benefit from credibility signals without their browsing behavior being sent to external servers.

Domain Distribution

The dataset covers domains across several categories:

Unreliable: 1,117 domains (41.8%)
Bias: 561 domains (21.0%)
Fake: 493 domains (18.4%)
Conspiracy: 153 domains (5.7%)
Satire: 94 domains (3.5%)

Open and Reproducible

Everything is open:

The full Python pipeline rebuilds the dataset from scratch
All source data uses openly licensed datasets (no proprietary dependencies)
Licensed under CC BY 4.0

Citation

If you use CRED-1 in your research, please cite:

@article{loth2026cred1,
  title     = {{CRED-1}: An Open Multi-Signal Domain Credibility
               Dataset for Automated Pre-Bunking of Online
               Misinformation},
  author    = {Loth, Alexander and Kappes, Martin
               and Pahl, Marc-Oliver},
  year      = {2026},
  doi       = {10.2139/ssrn.6448466},
  url       = {https://ssrn.com/abstract=6448466},
  note      = {Preprint available at SSRN}
}

To cite the dataset archive directly:

@dataset{loth2026cred1data,
  title     = {{CRED-1}: An Open Multi-Signal Domain
               Credibility Dataset},
  author    = {Loth, Alexander},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.18769460}
}

This work is part of my ongoing doctoral research on agentic AI and information integrity. Feedback, questions, and collaboration ideas are welcome. Feel free to open an issue on the GitHub repository or reach out directly.