Introduction of data lake

Do you know Data Lake?

The data lake is a central repository designed to store, process, and protect large volumes of all types of data (structured, semi-structured and unstructured data) and can store the data in its native format and use a variety of data without worrying about size limitations.

Data Lake Characteristics

  • A data lake provides a 𝗵𝗶𝗴𝗵𝗹𝘆 𝘀𝗰𝗮𝗹𝗮𝗯𝗹𝗲 𝗮𝗻𝗱 𝘀𝗲𝗰𝘂𝗿𝗲 𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺 that allows ingesting any data from any system at any speed without worrying about the size.

  • A data lake can include structured data from 𝗿𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲𝘀 (𝗿𝗼𝘄𝘀 𝗮𝗻𝗱 𝗰𝗼𝗹𝘂𝗺𝗻𝘀), 𝘀𝗲𝗺𝗶-𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲𝗱 𝗱𝗮𝘁𝗮 (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs), and binary data (images, audio, video).

  • Processing data in 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝗼𝗿 𝗯𝗮𝘁𝗰𝗵 𝗺𝗼𝗱𝗲 𝗮𝗻𝗱 𝗮𝗻𝗮𝗹𝘆𝘇𝗶𝗻𝗴 𝗱𝗮𝘁𝗮 using SQL, Python, R, or any other major languages and other third-party options are available.

  • Data lake 𝗵𝗲𝗹𝗽𝘀 𝘁𝗼 𝗲𝗻𝗵𝗮𝗻𝗰𝗲 𝗮𝗻𝗱 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗲 reporting, visualization, advanced analytics, and machine learning techniques.

  • It provides the foundation for analytics and is highly valuable for heavy data industries like 𝗙𝗶𝗻𝗮𝗻𝗰𝗶𝗮𝗹 𝗦𝗲𝗿𝘃𝗶𝗰𝗲𝘀, 𝗛𝗲𝗮𝗹𝘁𝗵𝗰𝗮𝗿𝗲, 𝗧𝗲𝗹𝗲𝗰𝗼𝗺𝗺𝘂𝗻𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀, 𝗮𝗻𝗱 𝗲𝗻𝘁𝗲𝗿𝘁𝗮𝗶𝗻𝗺𝗲𝗻𝘁.

Cloud Providers and Related Services

Tags

#aws #oraclecloud #googlecloud #datalake #azure

✍ Related Articles