Designing “Ask the Book”: How I’d Build Amazon Kindle’s AI Feature

Amazon recently introduced an “Ask the Book” feature in Kindle that allows readers to ask questions about a book and receive answers constrained to what they’ve read so far. Amazon has not published any technical or engineering details about how this feature is built. There is no public blog post, no whitepaper, and no architecture diagram.

This post is therefore a best-effort reconstruction: an attempt to reason from observable behavior, product constraints, and system-level incentives to arrive at a design that would plausibly work at Amazon’s scale.

I am not claiming this is how Amazon built it. I am claiming that, given what the system demonstrably does, this is a design that fits the constraints unusually well.

What the Product Tells Us (Without Saying Anything)

There are several important things the product makes clear just by how it behaves.

1. It only works for books purchased through Amazon

The feature does not work for:

sideloaded PDFs
imported EPUBs
personal documents

This is a critical signal. It suggests the system relies on preprocessed, structured, backend-owned book content, not raw text extracted on the fly. Amazon controls the ingestion, chunking, and metadata for purchased Kindle books; they do not for user-imported files.

That alone rules out a large class of “lightweight” or on-device approaches.

2. The entire book up to your last read page is in scope

The feature is not limited to:

the current page
the current chapter
recently viewed text

You can ask questions about material far earlier in the book, as long as it is before your current reading position. This works reliably even when the span covers hundreds of pages.

That makes one thing very clear: the system cannot be passing all eligible text into the model. The context window would explode, latency would be unacceptable, and costs would be uncontrolled.

Therefore, text selection is mandatory.

3. The answers are genuinely contextual

The system:

handles synonyms (“thievery” → “stealing money”)
responds correctly to exact phrases
answers questions about distant parts of the book
refuses to speculate when the book is ambiguous

This is not keyword search plus snippet display. It is a language model answering questions — but with very strict grounding.

A Design That Fits These Constraints

If I were building this at Amazon, here is the architecture I would converge on.

1. Offline preprocessing (Amazon-controlled content only)

For every Kindle book eligible for the feature:

Split the book into chunks (e.g., a few hundred tokens each).
Assign each chunk metadata:
- book_id
- start_location
- end_location
- possibly chapter_id
Generate embeddings for each chunk.
Index the chunks for:
- lexical retrieval (BM25)
- semantic retrieval (vector search)

This preprocessing step neatly explains why the feature excludes sideloaded content: the system depends on pre-indexed text with precise positional metadata.

2. A hard spoiler gate

At query time:

Determine the user’s last read location.
Apply a strict filter:

book_id = X AND end_location <= last_read_location

This filter is non-negotiable. It is what enforces spoiler safety at the system level rather than trusting the model to behave.

3. Retrieval over the eligible text

Once filtered, the remaining text may still be very large. The system must choose relevant excerpts.

Based on observed behavior, this retrieval step almost certainly supports:

exact phrase matching
synonym and paraphrase matching
long-distance recall

That strongly suggests hybrid retrieval:

lexical search for precision
vector search for semantic recall

This is the only way to simultaneously satisfy:

the synonym test
the exact-phrase test
the distance test

4. Prompt assembly and generation

The top retrieved excerpts are then:

assembled into a prompt
paired with strong instructions
passed to a general-purpose LLM

The model is not trained on the book. The book text is supplied at inference time only.

Why This Architecture Fits Amazon Particularly Well

This design is not just plausible in the abstract — it aligns unusually well with Amazon’s incentives and ecosystem.

AWS-native infrastructure

A system like OpenSearch fits naturally here:

it supports BM25 and vector search
it supports metadata filtering
it scales horizontally
it is built and operated by AWS (an Amazon company)

Using OpenSearch (or an internal derivative) minimizes organizational friction and leverages existing expertise.

Rights, control, and safety

By limiting retrieval to:

owned books
known content
strict positional filters

Amazon avoids:

training-rights controversies
cross-book leakage
uncontrolled hallucination

This is a conservative, defensible design — exactly what you would expect for a feature operating on copyrighted text.

What the Model’s Behavior Reveals

Some of the most interesting signals come from how the model refuses to answer.

Ambiguity test

When asked about an implicit idea (e.g., whether a character might be a ghost), the system responded that:

the book does not explicitly state this
it cannot speculate

That tells us the model is operating under extreme grounding constraints.

The champagne test

In the book:

a character denies stealing champagne
later, the narration reveals he did steal it

When asked whether he stole it, the system answered no, citing what the character explicitly said at the time.

When corrected, the system explained its internal principles:

“Starting every response with what the book actually says rather than what it doesn’t say; extreme literalism—stating only what’s explicitly written without inferring or adding plausible details; precision with identity—verifying who said or did what before attributing anything…”

This is revealing. It suggests:

strong system-level instructions
an evaluation rubric that prioritizes explicit textual evidence
penalties for inference, speculation, or narrative synthesis

In other words, the system is optimized to behave more like a faithful textual analyst than a “smart reader.”

Why This Matters

This feature is interesting not because it is flashy, but because it is carefully constrained.

Amazon appears to have optimized for:

correctness over creativity
literal grounding over inference
safety over impressiveness

From an engineering perspective, that points directly toward:

retrieval with hard filters
conservative prompting
models instructed to refuse rather than guess

Whether or not Amazon ever publishes details, the observable behavior already tells a coherent story.

Final Thoughts

Amazon has not explained how Ask the Book works. They may never do so. But systems leave fingerprints in their behavior, and this one is surprisingly consistent.

If you were tasked with building an AI feature that:

operates over long copyrighted texts,
avoids spoilers,
scales globally,
and must not hallucinate,

you would likely end up very close to this design.

If you’ve built similar systems and see flaws in this reasoning — or better alternatives — I’d welcome the critique. The point of this post is not certainty, but informed discussion.

Designing “Ask the Book”: How I’d Build Amazon Kindle’s AI Feature

What the Product Tells Us (Without Saying Anything)

1. It only works for books purchased through Amazon

2. The entire book up to your last read page is in scope

3. The answers are genuinely contextual

A Design That Fits These Constraints

1. Offline preprocessing (Amazon-controlled content only)

2. A hard spoiler gate

3. Retrieval over the eligible text

4. Prompt assembly and generation

Why This Architecture Fits Amazon Particularly Well

AWS-native infrastructure

Rights, control, and safety

What the Model’s Behavior Reveals

Ambiguity test

The champagne test

Why This Matters

Final Thoughts

Leave a comment Cancel reply

The Reburn Report

Designing “Ask the Book”: How I’d Build Amazon Kindle’s AI Feature

What the Product Tells Us (Without Saying Anything)

1. It only works for books purchased through Amazon

2. The entire book up to your last read page is in scope

3. The answers are genuinely contextual

A Design That Fits These Constraints

1. Offline preprocessing (Amazon-controlled content only)

2. A hard spoiler gate

3. Retrieval over the eligible text

4. Prompt assembly and generation

Why This Architecture Fits Amazon Particularly Well

AWS-native infrastructure

Rights, control, and safety

What the Model’s Behavior Reveals

Ambiguity test

The champagne test

Why This Matters

Final Thoughts

Share this:

Leave a comment Cancel reply