Introduction of data lake
✍ Do you know Data Lake?
The data lake is a central repository designed to store, process, and protect large volumes of all types of data (structured, semi-structured and unstructured data) and can store the data in its native format and use a variety of data without worrying about size limitations.
✍ Data Lake Characteristics
A data lake provides a 𝗵𝗶𝗴𝗵𝗹𝘆 𝘀𝗰𝗮𝗹𝗮𝗯𝗹𝗲 𝗮𝗻𝗱 𝘀𝗲𝗰𝘂𝗿𝗲 𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺 that allows ingesting any data from any system at any speed without worrying about the size.
A data lake can include structured data from 𝗿𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲𝘀 (𝗿𝗼𝘄𝘀 𝗮𝗻𝗱 𝗰𝗼𝗹𝘂𝗺𝗻𝘀), 𝘀𝗲𝗺𝗶-𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲𝗱 𝗱𝗮𝘁𝗮 (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs), and binary data (images, audio, video).
Processing data in 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝗼𝗿 𝗯𝗮𝘁𝗰𝗵 𝗺𝗼𝗱𝗲 𝗮𝗻𝗱 𝗮𝗻𝗮𝗹𝘆𝘇𝗶𝗻𝗴 𝗱𝗮𝘁𝗮 using SQL, Python, R, or any other major languages and other third-party options are available.
Data lake 𝗵𝗲𝗹𝗽𝘀 𝘁𝗼 𝗲𝗻𝗵𝗮𝗻𝗰𝗲 𝗮𝗻𝗱 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗲 reporting, visualization, advanced analytics, and machine learning techniques.
It provides the foundation for analytics and is highly valuable for heavy data industries like 𝗙𝗶𝗻𝗮𝗻𝗰𝗶𝗮𝗹 𝗦𝗲𝗿𝘃𝗶𝗰𝗲𝘀, 𝗛𝗲𝗮𝗹𝘁𝗵𝗰𝗮𝗿𝗲, 𝗧𝗲𝗹𝗲𝗰𝗼𝗺𝗺𝘂𝗻𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀, 𝗮𝗻𝗱 𝗲𝗻𝘁𝗲𝗿𝘁𝗮𝗶𝗻𝗺𝗲𝗻𝘁.
✍ Cloud Providers and Related Services
📂 Data Lake on AWS
Data Lake Azure
Google Cloud Data Lake
Oracle Data Lakehouse
Tags
#aws #oraclecloud #googlecloud #datalake #azure