The Hidden Cost of ETL | Aditya Gollapudi

We've spent a decade moving data to the cloud. Now it's time to move the applications. The cost of not doing so is hiding in plain sight.

In over twenty years of working with data architectures — from on-premises data warehouses to cloud-native Lakehouse platforms — I've watched one assumption persist almost unchallenged: that operational systems and analytical systems are fundamentally separate concerns. One writes. One reads. ETL pipelines bridge the gap.

That assumption made sense in 2005. It was defensible in 2015. In 2026, it is the single most expensive architectural decision most enterprises are still making by default — and the cost is hiding in plain sight.

The ETL tax nobody talks about

Every enterprise I've worked with has a version of the same problem. Data is generated in operational systems — order management, inventory, customer support, ERP. It gets extracted, transformed, and loaded into an analytical platform. Dashboards get built. Reports get generated. Insights get surfaced — hours, sometimes days after the underlying event occurred.

We've normalized this lag so thoroughly that we've stopped questioning it. But the latency gap between an operational event and an analytical insight isn't just a technical inconvenience. It is a direct constraint on every business decision your organization makes.

"The latency gap between an operational event and an analytical insight isn't just a technical inconvenience. It is a direct constraint on every business decision your organization makes."

Consider what that gap actually costs. A customer service agent working from data that is six hours stale cannot tell a caller whether their order shipped this morning. A demand planner working from yesterday's inventory feed is making replenishment decisions on information that no longer reflects reality. A risk model running on last night's batch is blind to this morning's market movement.

We have accepted these constraints as the cost of doing business with data. We shouldn't.

What Databricks Lakebase changes

The introduction of Databricks Lakebase reframes this problem entirely. By integrating a PostgreSQL-compatible transactional database directly into the Lakehouse architecture, Lakebase eliminates the boundary between operational and analytical data. There is no longer a separate operational system feeding a separate analytical platform through ETL pipelines. There is one governed, unified environment — and applications can run directly against it.

The implications of this are more significant than they first appear.

When an operational application runs on Lakebase, every transaction is immediately available for analytics — not after a pipeline runs, not after a scheduled batch, but instantly. The AI agent querying inventory data is looking at the same source of truth as the warehouse management system updating it. The Genie-powered analytics dashboard a CFO opens at 9am reflects a decision made at 8:59am.

Architect's note

Unity Catalog plays a critical role here that often gets overlooked in Lakebase discussions. When your operational app and your analytical layer share the same governed data foundation, lineage, access control, and data quality enforcement apply uniformly — across both workloads, with no additional configuration. This is not a small thing. In regulated industries especially, the ability to demonstrate that the data your application served to a customer is traceable to the same governed source as your compliance reports is genuinely transformative.

The real barrier isn't technology — it's legacy application coupling

Having said all of this, I want to be direct about where the actual difficulty lies. Moving to Lakebase is not primarily a technology problem. The technology is proven, and for organizations already on Databricks the path is shorter than most teams expect.

The real challenge is that operational applications in most enterprises are deeply coupled to their underlying databases. Stored procedures, proprietary SQL dialects, embedded transformation logic, tightly integrated ETL pipelines — these are not just technical artifacts. They represent years of institutional knowledge encoded in code that nobody fully understands anymore, running in systems nobody wants to touch.

This is precisely why migration efforts stall. A team scopes the work, estimates eighteen months of manual rewriting, and the project gets deprioritized. The legacy system stays. The ETL tax keeps compounding.

5,000+

Legacy pipelines migrated for a global pharma enterprise

4wks

End-to-end migration timeline using DBShift™ automation

80%

Reduction in time and cost vs. manual migration approach

We recently helped a global pharmaceutical enterprise migrate over 5,000 Informatica BDM pipelines to Databricks in four weeks — a project that had been estimated at over a year of manual effort. The automation-led approach using DBShift™ achieved more than 90% conversion accuracy, maintained continuity of critical pharmaceutical operations throughout, and delivered a governed, scalable foundation the team could build on immediately.

The lesson I take from that engagement — and from similar work across manufacturing, financial services, and retail — is that the coupling problem is solvable. What it requires is an automation-first mindset, not a rewrite-from-scratch mindset.

What I tell enterprise architects who are evaluating this

If you are an enterprise architect evaluating whether to move operational workloads onto Lakebase, here is the framing I would suggest.

Do not think of this as a migration project. Think of it as retiring a tax. The ETL infrastructure you are maintaining today — the pipelines, the orchestration, the scheduling, the failure handling, the monitoring — is overhead that exists solely to compensate for the separation of your operational and analytical layers. Every sprint your engineering team spends on that infrastructure is a sprint not spent building capability.

Lakebase does not just give you faster analytics. It gives you back that engineering capacity. It gives your application developers a single platform — one governance model, one catalog, one security boundary — instead of two systems that must be kept in sync. And it gives your AI and analytics teams the one thing they have always needed but rarely had: data that is current at the moment of the question.

"Do not think of this as a migration project. Think of it as retiring a tax."

The organizations I see pulling ahead in their data maturity are not the ones with the most sophisticated analytics platforms. They are the ones that have closed the gap between where data is created and where it is used. Lakebase is the most direct architectural path to closing that gap that I have seen in twenty years of this work.

The question worth asking is not whether your operational apps belong on the Lakehouse. They do. The question is how long you can afford to wait.

The Hidden Cost of ETL — Why Operational Apps Belong on the Lakehouse

The ETL tax nobody talks about

What Databricks Lakebase changes

The real barrier isn't technology — it's legacy application coupling

What I tell enterprise architects who are evaluating this

See this in practice — in your environment