CRED-1: An open multi-signal domain credibility dataset for misinformation research with 2,672 domains and 5 enrichment signals

Introducing CRED-1: An Open Domain Credibility Dataset for Fighting Online Misinformation

Feb 28, 2026

—

in Artificial Intelligence, Blog, Technology

How do you know whether a news source is trustworthy? That question is at the heart of my latest research project. Today, I am releasing CRED-1, an open multi-signal domain credibility dataset that assigns credibility scores to 2,672 domains known for publishing misinformation, conspiracy theories, or other unreliable content.

Why Another Dataset?

Existing source lists like OpenSources.co or Iffy.news provide valuable labels, but they are binary: a domain is either flagged or not. Real-world credibility is more nuanced. A satirical outlet is different from a state propaganda channel, and a site flagged by multiple independent lists is more concerning than one flagged by a single source.

CRED-1 addresses this by combining multiple openly-licensed source lists with five computed enrichment signals to produce a continuous credibility score between 0.0 (least credible) and 1.0 (most credible).

Five Independent Signals

Each domain in the dataset is enriched with signals from independent sources:

Source Category (50% weight) — Consensus label from OpenSources.co and Iffy.news (fake, unreliable, conspiracy, satire, mixed)
Iffy.news Score (15%) — Credibility rating from the Iffy.news index, derived from Media Bias/Fact Check assessments
Fact-Check Frequency (15%) — Number of fact-check claims found via the Google Fact Check Tools API. More claims suggest more scrutiny from fact-checkers.
Web Popularity (5%) — Tranco Top-1M rank. Higher reach means higher potential impact.
Domain Age (5%) — Registration date via RDAP/WHOIS. Freshly registered domains are a common indicator for disposable misinformation sites.

Additionally, Google Safe Browsing acts as an override: any domain flagged for malware or social engineering gets a hard cap at 0.05.

Key Numbers

2,672 domains with credibility scores
Score range: 0.000 to 0.962 (mean: 0.299)
Category breakdown: 50% mixed, 22% unreliable, 18.4% fake, 5.7% conspiracy, 3.5% satire
Tranco matches: 704 domains (26.3%), including 56 in the Top 10K
Fact-check claims: 67 domains with 332 total claims

Designed for On-Device Deployment

One of the core design goals is privacy. The compact JSON format (117 KB) is small enough to ship inside a browser extension or mobile app. No server calls needed, no browsing history leaves the device. This makes CRED-1 ideal for pre-bunking: warning users before they engage with unreliable content, right at the delivery stage of the misinformation kill chain.

I am already integrating CRED-1 into Trackless Links, my Safari extension for tracker removal, to add credibility warnings for flagged domains.

Fully Reproducible

The entire pipeline is open source. Two Python scripts rebuild the dataset from scratch using only the standard library (no external dependencies). You can reproduce every score, extend the dataset with new sources, or adapt the scoring model for your own research.

python3 pipeline/build_dataset.py        # Fetch & merge sources
python3 pipeline/enrich_dataset.py       # Enrich with 5 signals

Get the Dataset

GitHub: github.com/aloth/cred-1
Zenodo (DOI): 10.5281/zenodo.18769460
License: CC BY 4.0 (free to use, even commercially, with attribution)

Paper

A companion paper describing the methodology, scoring model, and limitations has been submitted to Data in Brief (Elsevier) and is available as an arXiv preprint. If you use CRED-1 in your research, please cite:

@article{loth2026cred1,
  title   = {CRED-1: An Open Multi-Signal Domain Credibility Dataset
             for Automated Pre-Bunking of Online Misinformation},
  author  = {Loth, Alexander},
  journal = {Data in Brief},
  year    = {2026},
  doi     = {10.5281/zenodo.18769460}
}

What’s Next

CRED-1 v1.0 is a starting point. I am working on automated monthly updates via a CI/CD pipeline that detects new domains from upstream sources, enriches them incrementally, and publishes delta releases to Zenodo. If you have ideas for additional signals or data sources, open an issue or reach out.

Fighting misinformation is a collective effort. I hope CRED-1 makes it a little easier.

Credibility Datasets Disinformation Fact-Checking misinformation Open Source Research