Rootconf Pune edition

On security, network engineering and distributed systems

Tickets

Implementing distributed tracing in FaaS

Submitted by Bhavin Gandhi (@bhavin192) on Monday, 15 July 2019

Section: Full talk (40 mins) Category: Monitoring and logging

View proposal in schedule

Abstract

Microservices as well as functions have changed the way applications are built and deployed now a days. Adoption of these distributed architectures have helped teams to implement scalable, efficient and reliable systems. Operational tasks like debugging have became quite tricky with this change. With functions, these tasks become more complicated when you have message queues like Kafka in between. Having a system to which we can ask questions about state of our services is really crucial. Distributed tracing helps to have observability in our system. It helps to get more insight into how our services are communicating.

This talk will be a walk through about my journey of implementing distributed tracing in functions which run on a Function as a Service platform, called Fission. It will also brief about changes made to the Fission’s components, as well Jaeger’s client library along the way.

Outline

  1. What is observability?
    - Logging and metrics
    - Distributed tracing
  2. About Fission
    - Fission function environments
  3. About Jaeger
    - Instrumenting applications using client libraries
  4. Architecture of demo application
    - Kafka as message queue
  5. Instrumenting the functions
    - Modifications made to the Fission’s environments
    - Propagating the context through Kafka
  6. Changes made to Fission
  7. Changes made to Jaeger’s Python library

Requirements

  • It is better to have basic idea about microservices

Speaker bio

Bhavin is working with InfraCloud Technologies, Pune. His main area of interest are Free/Libre and Open Source Software, DevSecOps, containers and Kubernetes. Have contributed to Fission as well Jaeger’s components.

Links

Slides

https://bhavin192.gitlab.io/talks/2019/rootconf-pune/distributed-tracing-faas.html

Comments

  • Zainab Bawa (@zainabbawa) Reviewer 4 months ago

    Thanks for an interesting proposal, Bhavin. I have a few questions:

    1. Tell us how your proposal builds upon or differs from this proposal: https://hasgeek.tv/rootconf/2018-day-2/1509-distributed-tracing-with-jaeger-at-scale You can respond in the comments.
    2. We also had a workshop on Open FaaS as part of Rootconf 2018. Again, what is the difference between the workshop proposal and your talk (beyond the fact that you will be talking about your experience and use cases): https://hasgeek.com/rootconf/2018/proposals/design-and-implement-a-scalable-application-using–cNnMN2CSMLaBLKgreCRXwG

    We have shortlisted your proposal. The next steps are:

    1. Respond to the above comments in 7 days.
    2. Submit your slides, within 10 days, from the date of this comment and upload the link here, so that we can evaluate your slides.
    3. After recevingy your slides, we will organize a review walk-through in the following week to close on your proposal and move it to confirmed, subject to review decision.

    Look forward to your response.

  • Bhavin Gandhi (@bhavin192) Proposer 4 months ago
    1. This proposal talks more about implementing tracing in the applications/functions.Basically, it builds upon the concepts explained in the ‘Distributed tracing with Jaeger at scale’. I will also brief about client libraries used to achieve that, whereas ‘Distributed tracing with Jaeger at scale’ talks more on the overall importance of tracing. First point might sound like a slight overlap, but it would help to set the ground for rest of the talk.
    2. That is really a great workshop proposal. I will be covering only specific components of Fission and what modifications were made to those components. Whereas the workshop has covered complete details of what, why, how of FaaS.

    Both 1 and 2 set really good foundation for my talk proposal

    • Zainab Bawa (@zainabbawa) Reviewer 4 months ago

      Thanks for the response, Bhavin. We will go ahead and confirm your proposal. You will hear from us on pre-event rehearsals next week, when we set this up. Meanwhile, start preparing your slides.

  • Talina Shrotriya (@talina06) 3 months ago (edited 3 months ago)

    Hi @bhavin192, quick notes on today’s rehearsal.

    1. Spend more time on explaining the intent of why distributed tracing is needed, i.e. for Observability. Talk about Observability in detail, what does it mean? And then mention the ways of achieving Observability. Do note that Observability goes beyond monitoring, so you will have to explain it that manner.
    2. For the next slide, start with introducing your demo example which is a microservices based example.
    3. For point 2, include an architecture diagram (A very simple one, which depicts data flow and components involved), then proceed to explain how you are using Fission and Jaeger for this. Talk about them here.
    4. In the slide where you explain resources: explain triggers, environment and functions using diagrams in 3 separate slides.
    5. Remove the code for user function, only explain how you’re wrapping the function inside the initialize_tracing function.
    6. split initialize_tracing function’s code into 3 slides, each explaining every step of adding tracing. Spend a good amount of time dwelling into the nittygritties here.
    7. Then, speak of why you need to link context spans and explain that slide, then proceed to explaining what you changed in jaeger client and fission.
    8. For every transition for the above points, add a transition slide while switching from one concept to another, make these transition slides interesting by giving them catchy names.
    9. There was a moment when you referenced a previous slide, instead of going back, simply add that slide again.
    10. Do mention about tracer.close() and if there ever is a performance overhead and how to beat it in separate slides?
    11. Instead of concluding with what you learned, conclude with the same point you make i.e this is why one should implement distributed tracing.

    Thank you.

  • Anwesha Sarkar (@anweshaalt) Reviewer 3 months ago

    Hello,

    Thank you Talina for such a detailed feedback.

    Bhavin you need to submit your revised slides based on the feedback by 30th August 2019 (latest).

    Look forward to your reply.

    Regards
    Anwesha

  • Aaditya Talwai (@talwai) 3 months ago
    1. Show a situation where the function fails or is slow and equivalent trace in Jaeger. Focus on the value of distributed tracing as a debugging tool in addition to end-to-end visualization.
    2. Spend some time on why Tracing in FaaS is non-trivial and requires bespoke solutions. Short-lived / Opaque runtimes, lack of access to infrastructure, stateless.
    3. Echoing Talina’s point - spend time explaining trigger , environments, function concepts in Fission. Take extra time on the significance of the ‘/specialize` endpoint because your solution depends on it.
    4. Your code uses flask’s g object without discussion about where it comes from or what it does. Do not breeze over this since it is a critical piece. Understanding it will help audience adapt your solution to other languages / FaaS platforms.
    5. Elaborate on why Distributed Tracing is a complement to traditional logs + metrics. What problems does it help you solve and why should a team invest in it?
  • Cyrus Dasadia (@extremeunix) 3 months ago

    @bhavin192: I like the idea of the talk,

    Slide 3: TBH, move the ‘validator’ block down-right and let the page look like a synchronous call-tree. Audiences like simplicity when viewing visual representations unless you are explicitly highlighting complexity. In your case you are trying to show modularity of serverless and not complexity.

    Slide 4: Observability: Being the core theme of your talk, this begs to be a section and not just a slide. You might want to spend bit more time in explaining why ‘you’ think it is important and how lack of observability exacerbates tracing/debugging in serverless environments.

    From there onwards, you are walking the audience through an idea of distributed tracing of functions and not an abosolute implementation of Fission+Jaeger. The latter makes your target audience a very finite subset, whereas the former helps folks get a general idea on how one could aproach the idea of ‘Observabilty of serverless functions’. So take time in explaining a bit more about the alternative approaches, the concepts and then pick one e.g. Fission+Jaeger.

    Lastly, please add more visuals - it keeps things interesting :)

    For the rest of the part, go with @talina06’s recommendations, I will not reiterate those.

    Goodluck!

  • Zainab Bawa (@zainabbawa) Reviewer 3 months ago

    Thanks for all th valuable feedback, folks. Bhavin, this should give you a good idea how to rework your slides, and the storyline.

  • Anwesha Das (@anweshasrkr) 3 months ago

    Microservices and its Characteristics

    Monolith to Microservices has been the talk of the town for a while. Distributed architectures has increased the scalability, efficiency of the system. In his talk Bhavin will be discussing implementation of distributed tracing in FaaS. In the following two links we are trying to elucidate the term Microservices and its characteristics.

    To further the discussion join us at Rootconf Pune 2019 on 21st September 2019.

  • Anwesha Das (@anweshasrkr) 3 months ago

    Hello,

    The deadline for submitting your revised slides was 30th August. I haven’t received an update on your revised slides. Since the conference is drawing near, 4 September is the hard stop for your revised slides. It is crucial that you submit your revised slides on time. There are a lot of steps to be carried out after the submission of the revised slides.

    I hope you understand the time crunch. Look forward to your cooperation.

    Regards,
    Anwesha

    • Bhavin Gandhi (@bhavin192) Proposer 3 months ago

      Sorry for not submitting the revisied slides on time. Have been going through weird situations. I should have informed that accordingly. Anyways, I have updated the slides and those are available over the same link.

  • Anwesha Sarkar (@anweshaalt) Reviewer 3 months ago (edited 3 months ago)

    Explaining Observability

    Let us understand what is observability and what is future of this : https://www.youtube.com/watch?v=MkSdvPdS1oA

    To further the discussion join us at Rootconf Pune 2019 on 21st September 2019.

  • Anwesha Sarkar (@anweshaalt) Reviewer 3 months ago

    Bhavin,

    Here are the feedback from today’s rehearsal:

    1. After he mentioned the 3 aspects of observability, logs,metrics and tracing and in the next slide when he started talking about his demo application.
      - the transition could be better done.
      - somehow observability gets lost while he goes ahead. Which is okay, maybe he brings it back at the end of his talk, but he needs to conclude with how he used tracing to make the application observable.
    2. The 2nd demo needs to be before the take away points.
    3. The speaker needs to be more engergetic.
    4. Time taken : 37 minutes scheduling it for 40 minutes.

    Submit your revised slides by Monday, 9th.

    Regards,
    Anwesha

    • Bhavin Gandhi (@bhavin192) Proposer 3 months ago

      I have updated the slides as discussed

Login with Twitter or Google to leave a comment