AI models and book memorization, new study sparks debate

June 25, 2025

What if your favorite AI chatbot wasn’t just trained on books, but could recreate them, word-for-word? A recent study demonstrates that open-weight language models can reproduce copyrighted book passages verbatim, raising serious questions about how these models store and potentially misuse proprietary data.

The paper investigates how open-weight LLMs, such as LLaMA or Pythia, can memorize exact passages from their training data, even when not explicitly prompted to do so. The authors found that text segments from copyrighted books, sometimes entire paragraphs, could be extracted using relatively simple techniques.

This finding contradicts common claims by AI developers that these models only learn statistical patterns rather than specific content.

Legal issues

The research provides concrete evidence that could help shape legal arguments about AI, potentially affecting how future models are trained and used.

In current lawsuits, plaintiffs claim that LLMs act like illegal copy machines that hold onto and reproduce copyrighted content. On the other side, AI companies argue that LLMs don’t store or copy anything directly; instead, they learn general patterns and statistical relationships from data.

The reality is more nuanced. If a model memorizes and can reproduce exact text from its training data, it’s not just learning patterns; it may be storing protected content. That could support the argument that the model includes a “copy” of copyrighted works it memorized.

Memorization and extraction

Training-data extraction and memorization are closely related, but they have important differences.

Memorization means that a model has stored specific content from its training data inside its internal parameters. This isn’t just about remembering facts in general. It refers to the model retaining exact or nearly exact pieces of the data it was trained on, such as a paragraph from a book. The model “knows” the data internally, even if no one has tried to get it out yet. Memorization means the content exists somewhere in the model’s parameters, whether or not it’s ever used.

Extraction is actually getting that memorized content out of the model, typically by the user prompting the model in just the right way so it reproduces a specific phrase, sentence, or passage from its training data. Extraction proves that memorized content can be retrieved and reproduced.

Extraction proves memorization, and both have serious implications for privacy and copyright law.

Using probabilities to measure memorization

The authors propose a probabilistic approach to measure memorization in language models, evaluating how likely it is that a model will reproduce a specific text based on learned statistical patterns. Instead of treating memorization as a binary outcome (either a passage is reproduced or not), this method quantifies how likely a model is to reproduce memorized content based on the probabilities it assigns to next tokens.

Unlike earlier techniques, which relied on manually crafted prompts or trial-and-error guesswork, this probabilistic approach is more aligned with how LLMs naturally function: by assigning conditional probabilities to each next token based on the preceding ones.

They concentrate on the Books3 dataset and only test open-weight models (e.g., LLaMA, Mistral, Pythia), where they have full access to the model’s internal structure and outputs. Each selected passage (denoted as z) is divided into two parts:

Prefix: A sequence of tokens taken from the beginning of the passage, used as the prompt.
Suffix: The remaining portion of the text, which is the target output the researchers hope the LLM will generate.

The model is given the prefix as input, and it’s instructed to continue generating text. The goal is to see if the LLM will complete the prefix with the exact suffix that matches the original memorized text (see the next picture).

Examples of prompting and generated outputs for memorized text (source: paper)

To quantify this, they calculate the prefix-to-zero (p_z) probability by multiplying the conditional probabilities assigned to each token in the suffix. A higher p_z indicates stronger memorization and easier extractability.

To test real extraction, the study uses two complementary techniques which provide a more nuanced understanding than prior binary memorization checks.

Greedy extraction: The model generates tokens one by one using greedy decoding. If at least 50 consecutive tokens match the original text exactly, the passage is considered as memorized.
Probabilistic extraction: This method estimates the probability that the model will generate the full suffix verbatim (reproduce a passage word-for-word), given the prefix. This approach allows the researchers to estimate memorization strength at different confidence thresholds (e.g., p_z ≥ 0.01 or p_z ≥ 0.75), revealing how deeply specific passages are embedded in the model’s internal parameters.

The figure below illustrates the extraction probability p_z of different models, for the “careless people” quote from The Great Gatsby. It clearly demonstrates that open-weight LLMs can indeed memorize and reproduce copyrighted text, and the ease of this extraction varies significantly between models.

The left plot shows the direct p_z (probability of extraction on one attempt or under a specific condition), while the right plot translates this probability into how many attempts (n) would be needed to extract the text with a certain overall probability (e.g., 99% certainty).

Extraction probability (p_z) of a copyrighted passage from The Great Gatsby across various language models (source: paper)

There’s a substantial difference in the direct extractability of memorized copyrighted text across different open-weight LLMs. Here are the results for the five models they tested:

Llama 1 30B is the most prone to directly reproducing this specific memorized text among the Llama models tested, with a 35.2% chance of extraction in a single attempt. Llama 1 65B has a lower p_z than Llama 1 30B, indicating that larger models don’t always memorize or reproduce equally effectively, or that certain intermediate sizes might be more susceptible to this particular type of memorization.
Phi 4 and Pythia 12B are much less likely to directly reproduce the memorized text than the Llama models, implying either less memorization or a greater resistance to simple extraction techniques for this specific content.

The right plot translates the abstract memorization scores (p_z) into practical effort: how hard it is to extract a memorized quote (higher p_z = easier and faster extraction and lower p_z = many more attempts needed). In simpler terms, it shows how many prompts you’d need to extract a specific passage from the model, based on your desired level of confidence.

Each curve represents a different model:

LLaMA 1 30B (red dashed): High p_z = 0.352 → very memorized → only 10–20 prompts needed to extract it with high confidence.
LLaMA 1 65B (red solid): p_z = 0.210 → still strongly memorized → needs fewer than 50 prompts for a confident extraction.
Pythia 12B (green): p_z = 0.055 → weakly memorized → requires ~100+ prompts to extract confidently.
Phi-4 (orange): p_z = 0.016 → barely memorized → needs ~200+ prompts to extract the passage even once reliably.

Even for the models less prone to direct extraction (like Phi 4), the curves show that a persistent “attacker” or researcher can still achieve a high probability of extracting the memorized content by increasing the number of prompts.

How much do LLMs memorize from books? A deep dive into extraction rates

As LLMs are becoming increasingly powerful at information extraction, their performance is influenced by the strategies employed, their underlying architecture and size, and crucially, the prevalence and representation of the target information within their training data.

The figure below compares two key methods of measuring memorization: greedy extraction with probabilistic extraction.

Greedy extraction refers to prompting the model with the start of a passage and checking if it completes the rest word-for-word using its default decoding path.
Probabilistic extraction calculates the chance that a model would eventually produce the exact continuation, even if it doesn’t happen immediately.

Greedy-sampled extraction vs probabilistic extraction rates (source: paper)

Every model shows much higher probabilistic extraction rates (orange) than greedy ones (blue), especially the larger models like LLAMA 3.1 70B, which reaches a 0.8 extraction rate probabilistically. That means that 80% of the tested passages have a significant chance of being fully reproduced. This gap illustrates that greedy sampling significantly underestimates the extent of memorization in LLMs.

The table on the right shows how much of each book can be extracted based on different p_z. Some works, like Harry Potter, show extensive memorization: at p_z ≥ 0.01 (a 1% chance of exact reproduction), LLAMA 3.1 70B memorizes 91.14% of the book. In contrast, Sandman Slim shows minimal memorization across all models. This highlights that certain texts in the training data are retained far more strongly than others.

Conclusion

This research demonstrates that LLMs, including models like LLaMA 3.1 70B, do memorize precise passages from some of the books within their training datasets, even copyrighted works like Harry Potter and the Sorcerer’s Stone. However, this memorization is highly selective and influenced by factors such as the model’s scale, architecture, and the particular book involved. While the vast majority of datasets like Books3 are not memorized verbatim, the clear retention of specific, word-for-word content raises critical questions about AI and copyright compliance.