data warehouse vs data lakes vs data lakehouse
Overview of Comparison of capabilities of Data Warehouse vs Data Lakes vs Data Lakehouse
Capabilities | Data Warehouse | Data Lake | Data Lakehouse |
Use Cases | BI and Reporting | Supports many use cases — not just data science and ETL — in modern and mature environments | Unified Platform which supports BI, Data science, ML and AI workloads |
Low latency and high concurrency workloads | Supports, for example, operational reporting, self-service data, data sharing, customer 360 and archiving | ||
Data Format | Closed proprietary format | Open format | Open format |
Data Type | Structured data, with limited support for semistructured data | All types: Structured data, semistructured data, textual data, unstructured (raw) data | All types: Structured data, semistructured data, textual data, unstructured (raw) data |
Data Access | SQL only | Open APIs for direct access to files with SQL, R, Python and other languages | SQL, along with API extensions to access tables and data |
Reliability | High quality — reliable data with ACID transactions | Low quality — becomes a data swamp if implemented without data catalogs and the right use cases and governance | High quality — reliable data with ACID transactions |
Governance and Security | Fine-grained security and governance at the row/column level for tables | Poor governance, as security needs to be applied to files | Fine-grained security and governance at the row/column level for tables |
Performance | High | Low | High |
Scalability | Scalable but expensive | Highly Scalable with low cost | Scales to hold any amount of data at low cost, regardless of type |
Streaming | Partial; limited scale | Yes | Yes |