Data Lakes: Why a Lake and not an Ocean


Data Lakes: Why a Lake and not an Ocean

Dustin Dewberry

What is a “Data Lake”? Think of it as an architectural approach allowing you to store massive amounts of unstructured, structured, and raw data in a central location. Storing data has become inexpensive, but there are other risks associated with a data strategy merely comprised of “hope” – it will eventually lead to your Data Lake failing to meet the needs of its end users. 

The challenge with a Data Lake is not so much creating one as it is taking advantage of the opportunities it presents, like solving business challenges, goals, and problems through its utilization. Many organizations that build successful data lakes do so gradually, maturing the lake as they figure out which data and metadata are important to the organization’s challenges and goals.

But what you want to avoid is storing a massive amount of data, hoping to do something with it down the road.

Three ways to avoid creating a Data Lake: 

  1. Align around primary functionality based on business goals and work backwards to help establish a framework for future success. This initial approach provides a variety of downstream positive impacts. It helps stakeholders understand the value that the data asset can provide while in parallel helping educate the potential utilizations of the data. 
  2. Establish a measurement plan to track the value of the data asset over time. This methodology also helps create the framework for your data access and retention policies. 
  3. Audit potential data sources to determine the validity and need of storing the data. Some data can’t be re-created; it is important to identify those sources and have a corresponding policy. Additionally, the end of this process will help in the creation of your governing policies. Data ownership and how it gets used can be a difficult topic. Approaching the governance problem from a functional standpoint helps ensure that your policies are enabling your end users to meet their needs.

Many of the challenges we experience with Data Lakes are not new.  The ability of a Data Lake to enable data so that it’s readily available to be categorized, processed, analyzed, and consumed by diverse groups within an organization exponentially compounds many historic data warehousing problems. 

Bottom line: Like with all great things, it’s what you’re able to do with them that make them great.  Data Lakes are not an exception to that rule.  Using your business challenges and goals as your primary focus will help your Data Lake succeed.

*Dustin appeared at RampUp’s 2018 Conference panel titled: Sink or Swim: Using a First-Party Data Lake