Technical Reports from the Front Lines of Software & Systems.

Amazon recently introduced an “Ask the Book” feature in Kindle that allows readers to ask questions about a book and receive answers constrained to what they’ve read so far. Amazon has not published any technical or engineering details about how this feature is built. There is no public blog post, no whitepaper, and no architecture diagram.

This post is therefore a best-effort reconstruction: an attempt to reason from observable behavior, product constraints, and system-level incentives to arrive at a design that would plausibly work at Amazon’s scale.

I am not claiming this is how Amazon built it. I am claiming that, given what the system demonstrably does, this is a design that fits the constraints unusually well.


What the Product Tells Us (Without Saying Anything)

There are several important things the product makes clear just by how it behaves.

1. It only works for books purchased through Amazon

The feature does not work for:

  • sideloaded PDFs
  • imported EPUBs
  • personal documents

This is a critical signal. It suggests the system relies on preprocessed, structured, backend-owned book content, not raw text extracted on the fly. Amazon controls the ingestion, chunking, and metadata for purchased Kindle books; they do not for user-imported files.

That alone rules out a large class of “lightweight” or on-device approaches.


2. The entire book up to your last read page is in scope

The feature is not limited to:

  • the current page
  • the current chapter
  • recently viewed text

You can ask questions about material far earlier in the book, as long as it is before your current reading position. This works reliably even when the span covers hundreds of pages.

That makes one thing very clear: the system cannot be passing all eligible text into the model. The context window would explode, latency would be unacceptable, and costs would be uncontrolled.

Therefore, text selection is mandatory.


3. The answers are genuinely contextual

The system:

  • handles synonyms (“thievery” → “stealing money”)
  • responds correctly to exact phrases
  • answers questions about distant parts of the book
  • refuses to speculate when the book is ambiguous

This is not keyword search plus snippet display. It is a language model answering questions — but with very strict grounding.


A Design That Fits These Constraints

If I were building this at Amazon, here is the architecture I would converge on.

1. Offline preprocessing (Amazon-controlled content only)

For every Kindle book eligible for the feature:

  • Split the book into chunks (e.g., a few hundred tokens each).
  • Assign each chunk metadata:
    • book_id
    • start_location
    • end_location
    • possibly chapter_id
  • Generate embeddings for each chunk.
  • Index the chunks for:
    • lexical retrieval (BM25)
    • semantic retrieval (vector search)

This preprocessing step neatly explains why the feature excludes sideloaded content: the system depends on pre-indexed text with precise positional metadata.


2. A hard spoiler gate

At query time:

  1. Determine the user’s last read location.
  2. Apply a strict filter:
  1. book_id = X AND end_location <= last_read_location

This filter is non-negotiable. It is what enforces spoiler safety at the system level rather than trusting the model to behave.


3. Retrieval over the eligible text

Once filtered, the remaining text may still be very large. The system must choose relevant excerpts.

Based on observed behavior, this retrieval step almost certainly supports:

  • exact phrase matching
  • synonym and paraphrase matching
  • long-distance recall

That strongly suggests hybrid retrieval:

  • lexical search for precision
  • vector search for semantic recall

This is the only way to simultaneously satisfy:

  • the synonym test
  • the exact-phrase test
  • the distance test

4. Prompt assembly and generation

The top retrieved excerpts are then:

  • assembled into a prompt
  • paired with strong instructions
  • passed to a general-purpose LLM

The model is not trained on the book. The book text is supplied at inference time only.


Why This Architecture Fits Amazon Particularly Well

This design is not just plausible in the abstract — it aligns unusually well with Amazon’s incentives and ecosystem.

AWS-native infrastructure

A system like OpenSearch fits naturally here:

  • it supports BM25 and vector search
  • it supports metadata filtering
  • it scales horizontally
  • it is built and operated by AWS (an Amazon company)

Using OpenSearch (or an internal derivative) minimizes organizational friction and leverages existing expertise.


Rights, control, and safety

By limiting retrieval to:

  • owned books
  • known content
  • strict positional filters

Amazon avoids:

  • training-rights controversies
  • cross-book leakage
  • uncontrolled hallucination

This is a conservative, defensible design — exactly what you would expect for a feature operating on copyrighted text.


What the Model’s Behavior Reveals

Some of the most interesting signals come from how the model refuses to answer.

Ambiguity test

When asked about an implicit idea (e.g., whether a character might be a ghost), the system responded that:

  • the book does not explicitly state this
  • it cannot speculate

That tells us the model is operating under extreme grounding constraints.


The champagne test

In the book:

  • a character denies stealing champagne
  • later, the narration reveals he did steal it

When asked whether he stole it, the system answered no, citing what the character explicitly said at the time.

When corrected, the system explained its internal principles:

“Starting every response with what the book actually says rather than what it doesn’t say; extreme literalism—stating only what’s explicitly written without inferring or adding plausible details; precision with identity—verifying who said or did what before attributing anything…”

This is revealing. It suggests:

  • strong system-level instructions
  • an evaluation rubric that prioritizes explicit textual evidence
  • penalties for inference, speculation, or narrative synthesis

In other words, the system is optimized to behave more like a faithful textual analyst than a “smart reader.”


Why This Matters

This feature is interesting not because it is flashy, but because it is carefully constrained.

Amazon appears to have optimized for:

  • correctness over creativity
  • literal grounding over inference
  • safety over impressiveness

From an engineering perspective, that points directly toward:

  • retrieval with hard filters
  • conservative prompting
  • models instructed to refuse rather than guess

Whether or not Amazon ever publishes details, the observable behavior already tells a coherent story.


Final Thoughts

Amazon has not explained how Ask the Book works. They may never do so. But systems leave fingerprints in their behavior, and this one is surprisingly consistent.

If you were tasked with building an AI feature that:

  • operates over long copyrighted texts,
  • avoids spoilers,
  • scales globally,
  • and must not hallucinate,

you would likely end up very close to this design.

If you’ve built similar systems and see flaws in this reasoning — or better alternatives — I’d welcome the critique. The point of this post is not certainty, but informed discussion.

Leave a comment