News

Researchers Detect AI-Written Fiction With 93 Percent Accuracy by Looking Only at Plot Structure

ResearchPatryk RabaJuly 5, 2026

A team from the University of Maryland and Google DeepMind showed that AI-generated literary fiction can be identified by narrative structure alone, even when writing style is carefully disguised. The study covered more than 61,000 short stories and coincides with a high-profile controversy over the Commonwealth Short Story Prize.

Contents

How StoryScope Works
How AI Prose Differs From Human Writing
The Commonwealth Prize Controversy
What It Means for Publishers and Contests

Researchers from the University of Maryland and Google DeepMind have published StoryScope, a tool that detects AI-written short stories with 93.2 percent accuracy by analyzing only narrative structure - how a plot is built, how threads are introduced, and how endings are resolved - rather than sentence style, which has until now been the main signal used by AI detectors.

How StoryScope Works

The study's authors, Jenna Russell, Rishanth Rajendhran, Chau Minh Pham, Mohit Iyyer and John Wieting, built an automated system that generates answers to hundreds of detailed structural questions for each story, such as whether the narrator states the story's moral outright, whether the plot contains subplots, and how much the chronology is rearranged relative to the order of events. These 304 features create a space in which human-written and AI-generated texts occupy clearly distinct regions.

The key finding is that classifiers based solely on narrative features retain more than 97 percent of the accuracy of models that also use text style. In other words, even when a test author deliberately edits prose to remove AI-typical phrasing and sentence rhythm, plot structure reveals the text's origin almost as effectively.

How AI Prose Differs From Human Writing

The team described recurring patterns in texts generated by language models: over-explaining the story's moral directly in the text, linear single-track plots, and limited moral ambiguity in characters' choices. Human-written texts, by contrast, show greater temporal complexity in the narrative and a tendency to present characters' choices as morally ambiguous, without a clear resolution of who is right.

The researchers also noted distinct fingerprints for individual models. Claude tends toward flat tension escalation without clear peaks, GPT frequently reaches for dream sequences as a plot device, and Gemini bases characterization mainly on external description rather than characters' internal reflection.

AI-generated stories cluster in a shared, narrow region of narrative space, while human-authored texts show far greater diversity - from the StoryScope research paper, Jenna Russell and co-authors

The Commonwealth Prize Controversy

Publication of the study coincided with a high-profile scandal surrounding the 2026 Commonwealth Short Story Prize. One of the winning entries, the short story "Serpent in the Grove," was accused of being written entirely by artificial intelligence. Pangram, a company that offers AI-text detection tools, ran the piece through its system and got a result indicating that 100 percent of the text was AI-generated. Pangram also flagged two other winning stories as suspicious.

The Commonwealth Foundation responded that until a sufficiently reliable tool or procedure exists for detecting AI use in previously unpublished fiction, the organization must rely on a principle of trust toward authors submitting work to the competition. The case showed that existing style-based detectors fail against authors who deliberately mask traces of AI involvement.

What It Means for Publishers and Contests

For Polish publishing houses, literary editors, and writing contest organizers, the StoryScope results mean that sentence-style analysis alone is no longer enough to verify authorship. The structure-based method, though it requires more complex analysis of the whole text, provides a signal that is much harder to game, since authors writing with AI assistance rarely alter a story's deep architecture, even when they carefully polish the language.

The team has made its code and methodology publicly available, allowing other researchers and developers of anti-plagiarism tools to build their own classifiers based on the same 304 features. This is the first study of this scale to systematically show that stories generated by different models share recognizable, recurring structural patterns, regardless of how much their stylistic layer varies.

Sources: AI Fiction Detection Reaches 93% on Structure Alone (techtimes.com), StoryScope: Investigating idiosyncrasies in AI fiction (arxiv.org), Could a controversial award-winning short story signal a new era of literary AI slop (france24.com)