From Misinformation to Agentic AI: Where My Research Is Heading

Two years ago, I started studying how AI generates misinformation. The question felt urgent: large language models were producing convincing fake news, synthetic quotes, and fabricated facts at scale. What could researchers, journalists, and technologists do about it?

That question led to four papers at The Web Conference 2026 (WWW ’26), two open-source platforms — JudgeGPT and RogueGPT — and a lot of hard-won clarity about what the real problem is.

Where I Have Been

The misinformation research broke into two tracks. The first was evaluating AI-generated content: building systems to assess whether text looked human or machine-written, and whether claims were verifiable. JudgeGPT emerged from this — an open platform for benchmarking how well AI systems can judge the authenticity of news and narratives.

The second track was about provoking controlled failure. RogueGPT generates synthetic misinformation in a controlled research setting, creating stimulus material for human and automated evaluation. You have to understand how something breaks to understand how to fix it.

Along the way, I surveyed domain experts — journalists, fact-checkers, digital forensics practitioners — and documented their perception of a growing verification crisis. The consensus: current tools and workflows cannot keep pace with AI-generated content. Reproducible provenance, not detection after the fact, is the most promising path forward.

What I Noticed

Something shifted in my thinking during the agent era. The misinformation problem is fundamentally about trust in AI output: can I believe what this system says? But agentic AI introduces a harder question: can I trust what this system does?

When an AI agent reads your email, schedules a meeting, edits a document, or submits a form on your behalf, the verification challenge multiplies. You are no longer just evaluating text. You are evaluating sequences of actions — often opaque, often irreversible. The stakes are higher and the audit trail is thinner.

I kept running into the same questions:

How do humans verify that an agent did what they intended?
What trust calibration is appropriate for autonomous AI acting in real-world systems?
How do we design agent-tool interfaces so that safety is structural, not incidental?

These are not rhetorical questions. They are gaps where research is needed.

Where I Am Going

My research is expanding into agentic AI systems, with a focus on trust, oversight, and safety. The core insight from the misinformation work transfers: detection after the fact is too slow. You need to build verifiability into the system from the start.

For agentic systems, that means structured interfaces, audit logs, human-in-the-loop checkpoints, and capability boundaries that are explicit rather than implicit. It means thinking carefully about what agents should be able to do, not just what they can do.

What I Am Building

PowerSkills is an early practical case study. It is an open-source toolkit (MIT licensed) that gives AI agents structured access to Windows capabilities: Outlook email and calendar, Edge browser via Chrome DevTools Protocol, desktop automation, and shell commands. Every action returns a consistent JSON envelope with status, exit code, and timestamp.

PowerSkills - AI agents with structured Windows capabilities

The design philosophy of PowerSkills reflects the research question: each skill is self-describing (agents discover capabilities via SKILL.md), actions are atomic and auditable, and capabilities are modular so you grant agents only what they need. It is a framework for bounded access with a clear interface, not unlimited delegation.

PowerSkills is compatible with the AgentSkills ecosystem and works with Claude Code, Cursor, Codex, Gemini CLI, and 40+ other agents via npx skills add aloth/PowerSkills.

Open Questions for the Community

These are the questions I am actively working through:

Verification at action time. How should humans verify agent intent before execution? What is the right UX for confirming an agent action without creating so much friction that delegation becomes pointless?
Trust calibration over time. What signals indicate that an agent is behaving within expected bounds, and how do users develop appropriate trust?
Capability provenance. When an agent takes an action, who authorized it? How do we trace delegation in multi-agent systems?
Failure modes under pressure. Agents are often used precisely when humans are too busy to pay attention. How do we design for safe degradation when oversight lapses?

The misinformation research taught me that trust in AI systems is not binary. It is dynamic, context-dependent, and deeply human. I expect the agentic research to teach me the same lesson, harder.

If you are working on any of these questions, I would like to talk. Find me on Bluesky or Mastodon.