The podcast about Python and the people who make it great
Data Exploration and Visualization Made Effortless with Lux
Summary
Data exploration is an important step in any analysis or machine learning project. Visualizing the data that you are working with makes that exploration faster and more effective, but having to remember and write all of the code to build a scatter plot or histogram is tedious and time consuming. In order to eliminate that friction Doris Lee helped create the Lux project, which wraps your Pandas data frame and automatically generates a set of visualizations without you having to lift a finger. In this episode she explains how Lux works under the hood, what inspired her to create it in the first place, and how it can help you create a better end result. The Lux project is a valuable addition to the toolbox of anyone who is doing data wrangling with Pandas.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial.
- Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at pythonpodcast.com/hightouch.
- Your host as usual is Tobias Macey and today I’m interviewing Doris Lee about Lux, a Python library that facilitates fast and easy data exploration by automating the visualization and data analysis process
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing what Lux is and how the project got started?
- What is the role of visualization in a data science workflow?
- What are the challenges that data scientists face in the exploratory phase of their analysis?
- There are a wide variety of data visualization tools in the Python ecosystem with differing areas of focus. What is the role of Lux in that ecosystem?
- How does Lux compare to tools such as scikit-yb?
- What is the workflow for someone using Lux in their analysis and what problems does it solve for them?
- Can you talk through how Lux is architected?
- How have the goals and design of Lux changed or evolved since you first began working on it?
- Data visualization is a broad field. How do you determine which kinds of charts or plots are best suited to a particular data set or exploration?
- What are some of the capabilities of Lux that are often overlooked or underutilized?
- How has Lux impacted your own work in data analysis/data science?
- What are some of the other gaps that you see in the available tooling for data science?
- What are some of the most interesting, innovative, or unexpected ways that you have seen Lux used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on and with Lux?
- When is Lux the wrong choice?
- What do you have planned for the future of the project?
Keep In Touch
Picks
- Tobias
- Pirates of the Carribean movies
- Doris
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- Lux
- UC Berkeley
- RISE Lab
- School of Information
- Pandas
- Bokeh
- Seaborn
- Altair
- Matplotlib
- Grammar of Graphics
- Plotly
- Scikit YellowBrick
- D3.js
- Vega
- Numpy
- xarray
- Tensorflow
- Jupyter Widget
- Chloropleth Map
- G10 Countries
- Ray
- Modin
- Dask
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA