Technical Reports from the Front Lines of Software & Systems.

As organizations scale their data pipelines and onboard diverse external data partners, it’s natural to revisit the foundational architectural models that shape our data platform. One such model — the medallion architecture — has gained traction for organizing data lakes into Bronze, Silver, and Gold tiers. But how strictly should we adhere to the canonical definitions? And are we over-engineering the Bronze layer in ways that hinder agility?

This post explores a pragmatic approach to the medallion model, emphasizing flexibility, fast ingestion, and staged transformation — especially in contexts where data comes from many heterogeneous sources.


💡 The Core Analogy: Pallets, Shelves, and Displays

Imagine your data platform as a warehouse:

  • Bronze is the receiving bay. Data lands here quickly and in bulk — often messy, wrapped, and unstructured.
  • Silver is the shelving area, where items are unpacked and labeled for easy access.
  • Gold is the storefront, where products are curated and arranged for customers and decision-makers.

🥉 Bronze: Raw Ingestion, Not a Bottleneck

The Bronze layer should prioritize acceptance over accessibility. Think of it as a staging area that captures data in its original form — warts and all — with minimal transformation.

“Bronze should be like raw pallets of data in the receiving bay of the warehouse. It’s more important to receive the data than to process it.”

For example, if a partner sends XML, store the XML as-is with metadata about the source and ingestion timestamp. If a CSV comes in, store the file with minimal parsing. This allows for:

  • Fast onboarding of new data sources
  • Preservation of lineage and fidelity
  • Deferred transformation, reducing upfront complexity

Critically, this lowers the barrier to entry for data ingestion teams, enabling scale and flexibility.


🥈 Silver: First-Class Queryable Data

The Silver tier is where the real structure emerges. Here, raw data is unwrapped, normalized, and validated to produce:

  • Consistent schemas
  • Well-defined types
  • Queryable tables (e.g., Iceberg, Delta Lake)

This is where your analytics, reporting, and machine learning workflows begin to interface with data in a meaningful way. For instance:

  • XML is parsed into structured tables
  • CSVs are validated and deduplicated
  • Nulls and type mismatches are resolved

Silver balances data quality and access, setting the stage for downstream consumers.


🥇 Gold: Business-Curated Outputs

At the Gold layer, you’re aligning data with business meaning and use cases. This tier often includes:

  • Aggregates (e.g., monthly revenue by region)
  • Entity resolution (e.g., matching customer records across sources)
  • Joins across domains (e.g., linking people, topics, and works)

Gold is what populates dashboards, APIs, and executive reports. It’s the final, polished product.


🧪 Should Bronze Be Queryable?

It’s a nuanced question.

Bronze DesignQueryable?When to Use
Raw (e.g., XML blobs, JSON dumps)❌ NoIngestion-first strategies, especially when data formats vary widely
Lightly structured (e.g., CSVs parsed into rows)✅ SometimesWhen it’s low effort to make data queryable
Fully structured (normalized, typed)⚠️ RiskyCan lead to early coupling, transformation bottlenecks, and brittle onboarding processes

The key point is this:

“If your Bronze efforts are slowing down data onboarding, you’re doing too much too early.”


✅ Recommended Design Principles

  1. Make Bronze cheap and fast: Prioritize ingestion, not transformation.
  2. Structure at Silver: This is your first real transformation point.
  3. Curate in Gold: Apply business logic and build domain-specific outputs.
  4. Embrace traceability: Ensure metadata and lineage are tracked, even in raw form.
  5. Design for change: Your partners will evolve, and so should your ingestion strategy.

Final Thoughts

Data platforms grow fastest when friction is minimized. By viewing the medallion tiers not as rigid walls but as stages of readiness, teams can scale ingestion, maintain flexibility, and defer complexity until it’s truly needed.

Whether you’re just adopting medallion architecture or looking to refine it, ask yourself:

  • Are we onboarding data quickly enough?
  • Are we doing too much too soon?
  • Are we treating raw and queryable as the same thing?

If not, perhaps it’s time to let Bronze just be… Bronze.

Leave a comment