data warehouse vs data lakes vs data lakehouse

data warehouse vs data lakes vs data lakehouse

Overview of Comparison of capabilities of Data Warehouse vs Data Lakes vs Data Lakehouse

Capabilities Data Warehouse Data Lake Data Lakehouse
Use Cases BI and Reporting Supports many use cases — not just data science and ETL — in modern and mature environments Unified Platform which supports BI, Data science, ML and AI workloads
Low latency and high concurrency workloads Supports, for example, operational reporting, self-service data, data sharing, customer 360 and archiving
Data Format Closed proprietary format Open format Open format
Data Type Structured data, with limited support for semistructured data All types: Structured data, semistructured data, textual data, unstructured (raw) data All types: Structured data, semistructured data, textual data, unstructured (raw) data
Data Access SQL only Open APIs for direct access to files with SQL, R, Python and other languages SQL, along with API extensions to access tables and data
Reliability High quality — reliable data with ACID transactions Low quality — becomes a data swamp if implemented without data catalogs and the right use cases and governance High quality — reliable data with ACID transactions
Governance and Security Fine-grained security and governance at the row/column level for tables Poor governance, as security needs to be applied to files Fine-grained security and governance at the row/column level for tables
Performance High Low High
Scalability Scalable but expensive Highly Scalable with low cost Scales to hold any amount of data at low cost, regardless of type
Streaming Partial; limited scale Yes Yes