Elixir Wizards is an interview-format podcast, focused on engineers who use the Elixir programming language. Initially launched in early 2019, each season focuses on a specific topic or topics, with each interview focusing on the guest's experience and opinions on the topic. Elixir Wizards is hosted by Eric Oestrich and Sundi Myint of SmartLogic, a dev shop that’s been building custom software since 2005 and running Elixir applications in production since 2015. Learn more about how SmartLogic uses Phoenix and Elixir. (https://smartlogic.io/phoenix-and-elixir?utm_source=podcast)
Telemetry & Observability for Elixir Apps at Cars.com with Zack Kayser & Ethan Gunderson
Zack Kayser and Ethan Gunderson, Software Engineers at Cars Commerce, join the Elixir Wizards to share their expertise on telemetry and observability in large-scale systems. Drawing from their experience at Cars.com—a platform handling high traffic and concurrent users—they discuss the technical and organizational challenges of scaling applications, managing microservices, and implementing effective observability practices.
The conversation highlights the pivotal role observability plays in diagnosing incidents, anticipating system behavior, and asking unplanned questions of a system. Zack and Ethan explore tracing, spans, and the unique challenges introduced by LiveView deployments and WebSocket connections.
They also discuss the benefits of OpenTelemetry as a vendor-agnostic instrumentation tool, the significance of Elixir’s telemetry library, and practical steps for developers starting their observability journey. Additionally, Zack and Ethan introduce their upcoming book, Instrumenting Elixir Applications, which will offer guidance on integrating telemetry and tracing into Elixir projects.
Topics Discussed:
- Cars.com’s transition to Elixir and scaling solutions
- The role of observability in large-scale systems
- Uncovering insights by asking unplanned system questions
- Managing high-traffic and concurrent users with Elixir
- Diagnosing incidents and preventing recurrence using telemetry
- Balancing data collection with storage constraints
- Sampling strategies for large data volumes
- Tracing and spans in observability
- LiveView’s influence on deployments and WebSocket behavior
- Mitigating downstream effects of socket reconnections
- Contextual debugging for system behavior insights
- Observability strategies for small vs. large-scale apps
- OpenTelemetry for vendor-agnostic instrumentation
- Leveraging OpenTelemetry contrib libraries for easy setup
- Elixir’s telemetry library as an ecosystem cornerstone
- Tracing as the first step in observability
- Differentiating observability from business analytics
- Profiling with OpenTelemetry Erlang project tools
- The value of profiling for performance insights
- Making observability tools accessible and impactful for developers
Links Mentioned
https://www.carscommerce.inc/
https://www.cars.com/
https://hexdocs.pm/telemetry/readme.html
https://kubernetes.io/
https://github.com/ninenines/cowboy
https://hexdocs.pm/bandit/Bandit.html
https://hexdocs.pm/broadway/Broadway.html
https://hexdocs.pm/oban/Oban.html
https://www.dynatrace.com/
https://www.jaegertracing.io/
https://newrelic.com/
https://www.datadoghq.com/
https://www.honeycomb.io/
https://fly.io/phoenix-files/how-phoenix-liveview-form-auto-recovery-works/
https://www.elastic.co/
https://opentelemetry.io/
https://opentelemetry.io/docs/languages/erlang/
https://opentelemetry.io/docs/concepts/signals/traces/
https://opentelemetry.io/docs/specs/otel/logs/
https://github.com/runfinch/finch
https://hexdocs.pm/telemetry_metrics/Telemetry.Metrics.html
https://opentelemetry.io/blog/2024/state-profiling
https://www.instrumentingelixir.com/
https://prometheus.io/
https://www.datadoghq.com/dg/monitor/ts/statsd/
https://x.com/kayserzl
https://github.com/zkayser
https://bsky.app/profile/ethangunderson.com
https://github.com/open-telemetry/opentelemetry-collector-contrib
Special Guests: Ethan Gunderson and Zack Kayser.