JudgeGPT & RogueGPT: Open-Source Platforms for Studying AI-Generated Misinformation

Can people tell AI-written news from human-written journalism? As large language models grow more capable, the answer is becoming increasingly uncomfortable. This is the question at the heart of two new research platforms we have been building at Microsoft: JudgeGPT and RogueGPT.

Both projects are open source under the GNU General Public License v3, and both have companion papers accepted at The Web Conference 2026 (WWW ’26).

The Problem: Industrialized Deception

Generative AI has created an asymmetric arms race. Producing convincing synthetic news now costs almost nothing. Detecting it reliably does not. While the research community has focused heavily on automated detection, a critical gap exists in understanding human perception: how do ordinary readers judge authenticity when the lines between human- and machine-generated text are increasingly blurred?

Two papers accepted at WWW ’26 address this directly:

“Industrialized Deception: The Collateral Effects of LLM-Generated Misinformation on Digital Ecosystems” (arXiv:2601.21963) — examines the systemic effects of LLM-generated misinformation on digital platforms, trust networks, and information ecosystems.
“Eroding the Truth-Default: A Causal Analysis of Human Susceptibility to Foundation Model Hallucinations and Disinformation in the Wild” (arXiv:2601.22871) — reports the human perception findings from stimuli generated by our framework. Key finding: the human “truth-default” — the cognitive bias toward assuming content is genuine — is being measurably eroded by the presence of LLM-generated content.

RogueGPT: Controlled Stimulus Generation

RogueGPT is the first stage of the research pipeline. It generates controlled news stimuli — fragments of text produced under explicit experimental conditions — and stores them in a shared MongoDB database with full provenance metadata.

The current corpus contains 2,663 multilingual news fragments:

37 model configurations across 10 providers (OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, Microsoft, Zhipu, Moonshot, Qwen, MiniMax)
4 languages: English, German, French, Spanish
3 formats: tweet, headline, short article
5 journalistic styles per language (e.g., NYT, BBC, CNN, Fox News, WSJ for English)
222 human-sourced fragments (164 legitimate, 58 fake news) as experimental anchors

RogueGPT follows a clean three-layer architecture separating data logic from interfaces: a core.py data layer, a cli.py for scripted ingestion and retrieval, a Streamlit app.py for interactive use, and an MCP server (mcp_server.py) exposing tools for AI agent integration via the Model Context Protocol.

Fake news fragments in the corpus are not artificially constructed examples. They are sourced from domains classified as fake, unreliable, or conspiracy in the CRED-1 Domain Credibility Dataset, ensuring they reflect authentic disinformation language.

The full corpus is available on Zenodo under restricted access for academic research: DOI: 10.5281/zenodo.18703138.

JudgeGPT: Human Evaluation at Scale

JudgeGPT is the second stage: a live research platform that systematically collects and analyzes human judgments on news authenticity. Participants are presented with fragments from the RogueGPT corpus and asked to evaluate them on three continuous 7-point scales:

Source attribution: Human vs. machine-generated
Veracity assessment: Legitimate vs. fake news
Topic familiarity: Self-reported domain knowledge

After each submission, participants receive immediate feedback: the ground truth, the specific model that produced the content, and whether their assessment was correct. This “inoculation” mechanism serves both educational and research purposes while sustaining participant motivation.

A shareable score card is generated every 5 responses, enabling participants to challenge others to evaluate the exact same set of fragments — creating a viral recruitment mechanism for the study.

You can participate in the live survey at judgegpt.streamlit.app.

Why This Matters

The research infrastructure we have built is designed to produce findings that are reproducible and falsifiable. Every fragment has full provenance: which model generated it, under what parameters, from what seed. Human responses are linked to those parameters, enabling multi-factor analysis — not just “can humans detect AI content?” but which models are hardest to detect, in which languages, by which demographic groups.

This matters because the policy conversation around AI-generated misinformation is often based on anecdote and intuition. The goal of JudgeGPT and RogueGPT is to replace that with data.

Getting Started

RogueGPT is installable via pip:

git clone https://github.com/aloth/RogueGPT
pip install -r requirements.txt

# Ingest a fragment
python cli.py ingest --text "..." --model "gpt-4o" --language en --style nyt --format article

# Retrieve fragments
python cli.py retrieve --model "gpt-4o" --language en --limit 10

JudgeGPT is a Streamlit app:

git clone https://github.com/aloth/JudgeGPT
pip install -r requirements.txt
streamlit run app.py

Both repos are licensed under GPLv3. Contributions and dataset access requests for academic research are welcome.

JudgeGPT on GitHub: github.com/aloth/JudgeGPT
RogueGPT on GitHub: github.com/aloth/RogueGPT
arXiv (Industrialized Deception): arxiv.org/abs/2601.21963
arXiv (Eroding the Truth-Default): arxiv.org/abs/2601.22871
Dataset on Zenodo: DOI: 10.5281/zenodo.18703138