The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

Data Quality Management @Walmart Data Lake

Submitted by Ravi Ramchandran (@rramchan) on Feb 27, 2019

Session type: Full talk of 40 mins Status: Rejected


Erroneous decisions made from bad data are not only inconvenient, but also extremely costly. According to Gartner research, “the average financial impact of poor data quality on organizations is $9.7 million per year.”
In additional research for organizations that Gartner has surveyed, the analyst firm “estimate that poor-quality data is costing them on average $14.2 million annually.” Definetely, Bad data is bad for business.
In a Data Lake environment, Robust Ecosystem Products are required to govern Data Quality.Since Data Lakes are based on an ELT model, quality is much more difficult to govern. With legacy systems, and also an existing lake, it becomes a more uphill task.
In this talk I will take the audience on the journey to build DQAF (Data Quality Assessment Framework), which is Walmarts Product for Continuous Data Quality Assessment, and how we are changing the entire way we look at quality. DQAF is based on Wang’s Philosophy of TDQM. We believe To increase productivity, organizations must manage information as we manage products.


Key areas I will cover

Data Quality Management - Overview and Challenges

  • Here I will define Data Governance, and how Data Quality Management ties into the same
  • Different facets of Data Quality Management,
  • Problem Areas
  • Solution Areas

The DQAF (Data Quality Assessment Framework) - Providing Data Quality Management for Walmart Data Lake

  • Overall Data Quality Management Journey for Walmart
  • Introduce DQAF, and Definintion
  • Architecture Overview
  • Architecture Blocks Details


  • How DQAF Solved our Problem Areas, and Built into our Solution Areas
  • Quality Improvement is a Continous Journey ! - Journey Going Forward
  • Thinking of Quality in your Data Lake - Tips on how to baby step it


Internet Access

Speaker bio

Ravi is an Senior Architect with the GDAP(Global Data and Analytics Platforms) Group in Walmart Labs. Ravi started, building products for primarily for banks, and governments. He developed a passion for scaling large scale products, early on when he was involved with Gujarat Govt for creating workflow solutions. Looking at ways to impact lives directly, he moved towards healthcare, where was build API and Cloud Products for PHC, IoT Devices, MIC, Data Pipelines, Data Conversational Engines
Data being the next frontier, Ravi has been dabbling in all things Data Platform since the last couple of years. He was instrumental in developing the IoT Platform, for GE Healthcare. Currently Ravi is responsible for key oversight of Walmarts next gen Data Platform on hybrid cloud, primarily in areas of Quality, Governance, Privacy and Operability.



Preview video


  • Anwesha Das (@anweshasrkr) a year ago

    Thank you for submitting this proposal. We require slides and preview video by 11th March, latest, to evaluate your proposal and make a decision.

    • Ravi Ramchandran (@rramchan) Proposer a year ago

      I will sending a slide outline to you by Monday Post lunch. Also do IM/mail/call me if further details are required. I can explain, more to help get a gist.

      • Ravi Ramchandran (@rramchan) Proposer a year ago

        Hi Anwesha,
        Slides are updated.

    • Ravi Ramchandran (@rramchan) Proposer a year ago

      Updated video.. Will try for a better audio as well. Any more clarification, pls do comment

  • Anwesha Sarkar (@anweshaalt) a year ago

    We will get back to you shortly.

  • Zainab Bawa (@zainabbawa) a year ago

    Hello Ravi, thanks for the submission. Is the Data Quality Assessment Framework open source? Is this something participants at The Fifth Elephant can use after listening to this talk? Apart from describing DQAF, what is the takeaway for participants? What will they learn from your talk?

  • Ravi Ramchandran (@rramchan) Proposer a year ago (edited a year ago)

    Hi Zainab,
    We propose to make the framework Open Source, its a roadmap item. One thing that people tend to think is that Quality is just running Quality rules, and doing it at scale. Thats just 30% of the problem(Lots of tools are already available)
    Where companies struggle is in the remaining 70% - Quantifying Results(How), Scoring(for Business and For Entire Org), Defining and Enabling Continuious Quality monitoring
    I will share the concepts of doing the same, which are repeatable and can be implemented in any organization. THis is direct takeaway
    With CCPA/GDPR Governance of Data becomes key. Companies are struggling to implement the same, but the terminology is very hazy. I will also give a picture of where Quality Management fits in wrt, Governance. (Next year hopefully will also showcase the Governance Roadmap that we are implementing). This will be very helpful for people implementing Governance for their lakes.
    Will be glad to provide more details. Do let me know

    • Zainab Bawa (@zainabbawa) a year ago

      Thanks for the detailed response, Ravi. We will complete the review and get back within 4-5 days.

  • Venkata Pingali (@pingali) a year ago

    Hi! Ravi,

    Critical topic. Glad to see the proposal.

    The slides discussing the core framework are empty. Can you fill in more details for the audience to make an assessment of the talk?


    • Zainab Bawa (@zainabbawa) a year ago

      Thanks Venkata. Ravi, Venkata is one of the reviewers for your proposal. We’ll need to see more detailed content on the core framework in order to make a full assessment.

      • Ravi Ramchandran (@rramchan) Proposer a year ago

        Thanks Zainab

    • Ravi Ramchandran (@rramchan) Proposer a year ago

      Hi Venkata,
      Thank you, for reviewing.
      Have updated the slides with the core framework, and other aspects. Please note that this is a draft.
      Will improve further once its confirmed.
      Any other inputs, would be glad to elaborate on the same.


      • Ravi Ramchandran (@rramchan) Proposer a year ago

        Paul, I updated the deck again with specific solution details. Pls use this version for the review

  • Paul Meinshausen (@pmeins) a year ago

    Hi Ravi,

    It’s a great topic that deserves attention. As Venkata and Zainab have pointed out, it’s important to have the slides complete for us to review your proposal. So looking forward to seeing them completed and posted!

    In the meantime, I’d suggest that you focus the talk as much as possible on concrete examples from your experience at WalMart or elsewhere. Your abstract and initial slides are concept-overloaded and acronym-heavy. Stories that convey a lesson are more meaningful to your listeners than a formal framework that is unlikely to translate well to a large and diverse audience. And then in terms of your slides: cut down on text-heavy slides. The audience is there to hear you speak to them, not to read long passages on the screen.

    These two critiques/suggestions go together. When you have a lot of text and acronyms and lists on your slides, it’s usually a good hint that your content has too much abstract conceptual baggage and not enough clear and compelling stories.



    • Ravi Ramchandran (@rramchan) Proposer a year ago

      Hi Paul,
      Thanks for the detailed feedback. I have updated the presentation accordingly(Adding more data would require a few approvals). Pls note that,this is only a draft, and will improve on this further, once the presentation is a confirmed proposal.


      • Ravi Ramchandran (@rramchan) Proposer a year ago

        Paul, I updated the deck again with more focus on takeaways. Pls use this version for the review

      • Zainab Bawa (@zainabbawa) a year ago

        Sounds like the classic chicken-and-egg problem. :) We need to see details before we can confirm the talk. In the current form, the talk is too high level and doesn’t serve the needs of participants who come to The Fifth Elephant.

    • Ravi Ramchandran (@rramchan) Proposer a year ago

      Thanks for the great feedback, and it helped improve.(Thats irrespective of whether this presenatation moves forward or not)
      I had done a new (rather revamped version), after looking at your inputs. Do have a look and let me know if this is more relevant.
      Considering the short time that I have, if your can give me sense that its likely to make the cut, then I will spend more time on this, else we can look at it for the future.

  • Zainab Bawa (@zainabbawa) a year ago

    Ravi, as Paul mentioned, the slides and the content have far too many jargons. This makes the proposed talk very dense and indigestible.

    Also, in the current form, this talk is too high level to arrive at any concrete learnings and outcomes.

    The way to turn around this talk is to make it an experiential case study, covering:

    1. Why did Walmart decide it wanted to have a Data Quality Assessment Framework? What were the factors that led to this decision? Could the pushes have led to something other than DQAF?
    2. How did you go about building this framework? Why did you choose the approach that you did? Did you consider other approaches? What were the points of comparison?
    3. What challenges did you face in building this framework?
    4. How do different teams in Walmart use this framework? How has the situation changed since the launch and adoption of DQAF?
    5. Are there any trade-offs or compromises?
    6. How are teams adjusting to this? Do you see resistance in this approach? How do you reconcile resistance to using this approach?

    Covering the above points will make this more interesting and palatable.

    • Ravi Ramchandran (@rramchan) Proposer a year ago (edited a year ago)

      Zainab, This is great feedback. Appreciate your specific pointers.
      Is this a final feedback or can I give it another shot.
      What I can see is there is value, but needs refinement.
      I think all data points you are inferring are present in my notes, Only point is can I bring it out within the time lines.

      • Zainab Bawa (@zainabbawa) a year ago

        Suggest you take about a week or so, and submit the revised slides by 28 May.

        • Ravi Ramchandran (@rramchan) Proposer a year ago

          Zainab, I have redone the entire presentation, based on your feedback.
          Can you do a quick review and give a sense of how it look. If its in the right direction, I can add more pointers, and make it meaningful.
          Didnt want to spend more effort, if its not going to make the cut.

  • Venkata Pingali (@pingali) a year ago

    Hi! Ravi,

    Had a chance to review your updated slides.

    I am sure a lot of thought and discussion has gone into DQAF. But the essence of the challenges and approach is not coming through due to a mix of presentation style and choice/flow of content. That is a lost opportunity because it is an important problem - persistent, industry-wide, and lacks good frameworks of thought and action.

    Please see if these thoughts help. My objective is to maximize the value of time for the audience*:

    Presentation style:

    1. I am still missing the flavor of presentation that Paul was referring to. Typically FifthEl presentations tend to have a experiential flavor where the speaker talks about problems he/she encountered first hand and gives you enough detail to put yourself in their position. Your presentation has a ‘third-person’ flavor - which works in many contexts but it is not common in this context (FifthEl).

    2. Also consistent with Paul’s other suggestion - can you make the slides less text heavy and compress the background discussion? You can assume a reasonably knowledgeable audience. They have an appreciation for data quality.

    Content (Some of which repeats Zainab’s points):

    1. Can you reframe data quality requirements into a definition of the problem (what is data quality? Data has quality if it is x, y, and z). Examples will help make this concrete.
    2. Can you give a sense of “why” you care about the problem (scale/impact of problem) - quantification by any metric you can share (data volume, people etc.). Audience knows the importance abstract but it will help to make the problem tangible to the audience.
    3. Can you move the “what makes the problem non-trivial” discussion before the architecture discussion? (problem space; objectivity, lineage, ownership). Examples will help a lot here highlight the tussles underlying the design.
    4. What were the considerations for the architecture? (expressiveness, maintenance costs, longevity etc. etc)
    5. After discussing the architecture, can you add what the implementation looked like (in slides itself)
    6. What did you not know when you built the system? What is its current state? You could add this after the demo.
    7. Slide 23 - may be you can give some guidance as well
    • Full disclosure: I have a proposal too. Suggestions draw upon my past presentations at FifthEl.
    • Ravi Ramchandran (@rramchan) Proposer a year ago

      Hi Venkata,
      I did a refresh on the entire deck, based on the feedback that you have.
      Can you do a quick review, and indicate if its on the right track. I am planning to add some more detail if the current look makes sense.

  • Abhishek Balaji (@booleanbalaji) a year ago

    Hi Ravi,

    Here’s some additional feedback:
    - More work & effort needed in making the presentation understandable and consumable.
    - The presentation and solution presented seems very complicated and the message in this is missed. What is the key takeaway for an audience member?
    - Restrcuture the talk around how your product scaled, how the needs changed and how data quality became critical. This sort of introduction to your talk might pique the audience interest.

    Ravi, we recognize that your idea is good but the structure doesnt not fit well for the audience at Fifth Elephant. The proposal would need a lot of work in the content and would need you to work with one of our community members to make it suitable for the audience. If not, you can take some more time and update the content. We’ll consider this for a future edition of The Fifth Elephant where it would be a better fit.

  • Arvind kumar (@pari098) a year ago

    If you get some fast sharing features than you will be use the latest explorer it helps to know that how this process will be work it looks really more advanced.

Login to leave a comment