I’m excited to share CRED-1, an open, reproducible domain-level credibility dataset that I’ve been working on as part of my doctoral research at Frankfurt University of Applied Sciences. The preprint is now available on SSRN, and the full dataset and pipeline are openly accessible on GitHub and Zenodo.
What is CRED-1?
CRED-1 provides credibility scores for 2,672 domains known to publish misinformation, conspiracy theories, or other unreliable content. Unlike existing approaches that rely on a single ground-truth label, CRED-1 combines multiple openly licensed signals into a composite credibility score ranging from 0.0 (least credible) to 1.0 (most credible).
Multi-Signal Scoring
The dataset integrates signals from diverse sources:
- Fact-checker ratings from curated source lists
- Media Bias/Fact Check (MBFC) classifications
- Domain age and registration history via WHOIS
- TLS certificate quality and transparency
- Web popularity signals
- Google Safe Browsing and Fact Check Tools API data
This multi-signal approach lets researchers and developers define their own credibility thresholds rather than depending on a single binary label.
Designed for Pre-Bunking
A key design goal of CRED-1 is pre-bunking: flagging potentially unreliable domains before users engage with their content, rather than debunking claims after they’ve already spread. The dataset is deliberately compact enough for on-device deployment in mobile apps, browser extensions, or edge computing scenarios, with no server calls required.
This privacy-preserving approach means users can benefit from credibility signals without their browsing behavior being sent to external servers.
Domain Distribution
The dataset covers domains across several categories:
- Unreliable: 1,117 domains (41.8%)
- Bias: 561 domains (21.0%)
- Fake: 493 domains (18.4%)
- Conspiracy: 153 domains (5.7%)
- Satire: 94 domains (3.5%)
Open and Reproducible
Everything is open:
- The full Python pipeline rebuilds the dataset from scratch
- All source data uses openly licensed datasets (no proprietary dependencies)
- Licensed under CC BY 4.0
Links
- Preprint (SSRN): doi:10.2139/ssrn.6448466
- GitHub Repository: github.com/aloth/cred-1
- Dataset Archive (Zenodo): doi:10.5281/zenodo.18769460
Citation
If you use CRED-1 in your research, please cite:
@article{loth2026cred1,
title = {{CRED-1}: An Open Multi-Signal Domain Credibility
Dataset for Automated Pre-Bunking of Online
Misinformation},
author = {Loth, Alexander and Kappes, Martin
and Pahl, Marc-Oliver},
year = {2026},
doi = {10.2139/ssrn.6448466},
url = {https://ssrn.com/abstract=6448466},
note = {Preprint available at SSRN}
}
To cite the dataset archive directly:
@dataset{loth2026cred1data,
title = {{CRED-1}: An Open Multi-Signal Domain
Credibility Dataset},
author = {Loth, Alexander},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.18769460}
}
This work is part of my ongoing doctoral research on agentic AI and information integrity. Feedback, questions, and collaboration ideas are welcome. Feel free to open an issue on the GitHub repository or reach out directly.
