The podcast about Python and the people who make it great
Be Data Driven At Any Scale With Superset
Summary
Becoming data driven is the stated goal of a large and growing number of organizations. In order to achieve that mission they need a reliable and scalable method of accessing and analyzing the data that they have. While business intelligence solutions have been around for ages, they don’t all work well with the systems that we rely on today and a majority of them are not open source. Superset is a Python powered platform for exploring your data and building rich interactive dashboards that gets the information that your organization needs in front of the people that need it. In this episode Maxime Beauchemin, the creator of Superset, shares how the project got started and why it has become such a widely used and popular option for exploring and sharing data at companies of all sizes. He also explains how it functions, how you can customize it to fit your specific needs, and how to get it up and running in your own environment.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial.
- Your host as usual is Tobias Macey and today I’m interviewing Max Beauchemin about Superset, an open source platform for data exploration and visualization
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by giving an overview of what Superset is and what it might be used for?
- What problem were you trying to solve when you created it?
- What tools or platforms did you consider before deciding to build something new?
- There are a few different ways that someone might categorize Superset, such as business intelligence, data exploration, dashboarding, data visualization. How would you characterize it and how it fits in the current state of the industry and ecosystem?
- What are some of the lessons that you have learned from your work on Airflow that you applied to Superset?
- Can you give an overview of how Superset is implemented?
- How have the goals, design and architecture evolved since you first began working on it?
- Given its origin as a hackathon project the choice of Python seems natural. What are some of the challenges that choice has posed over the life of the project?
- If you were to start the whole project over today what might you do differently?
- Can you describe what’s involved in getting started with a new setup of Superset?
- What are the available interfaces and integration points for someone who wants to extend it or add new functionality?
- What are some of the most often overlooked, misunderstood, or underused capabilities of Superset?
- One of the perennial challenges with a tool that allows users to build data visualizations is the potential to build dashboards or charts that are visually appealing but ultimately meaningless or wrong. How much guidance does Superset provide in helping to select a useful representation of the data?
- In addition to being the original author and a project maintainer you have also started a company to offer Superset as a service. What are your goals with that business and what is the opportunity that it provides?
- What are some of the most interesting, innovative, or unexpected ways that you have seen Superset used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while building and growing the Superset project and community?
- When is Superset the wrong choice?
- What do you have planned for the future of Superset and Preset?
Keep In Touch
- @mistercrunch on Twitter
- mistercrunch on GitHub
Picks
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- Superset
- Preset
- Airflow
- AirBnB
- Lyft
- Django
- Flask
- CRUD == Create, Read, Update, Delete
- Business Intelligence
- Apache Druid
- Presto
- Trino (formerly known as Presto SQL)
- Redash
- Looker
- Metabase
- Flask App Builder
- React Redux
- Typescript
- GraphQL
- Celery
- Redis
- RabbitMQ
- S3
- AirBnB Superset Blog Post
- D3
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA