The podcast about Python and the people who make it great
Growing Dask To Make Scaling Python Data Science Easier At Coiled
Summary
Python is a leading choice for data science due to the immense number of libraries and frameworks readily available to support it, but it is still difficult to scale. Dask is a framework designed to transparently run your data analysis across multiple CPU cores and multiple servers. Using Dask lifts a limitation for scaling your analytical workloads, but brings with it the complexity of server administration, deployment, and security. In this episode Matthew Rocklin and Hugo Bowne-Anderson discuss their recently formed company Coiled and how they are working to make use and maintenance of Dask in production. The share the goals for the business, their approach to building a profitable company based on open source, and the difficulties they face while growing a new team during a global pandemic.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- This portion of Python Podcast is brought to you by Datadog. Do you have an app in production that is slower than you like? Is its performance all over the place (sometimes fast, sometimes slow)? Do you know why? With Datadog, you will. You can troubleshoot your app’s performance with Datadog’s end-to-end tracing and in one click correlate those Python traces with related logs and metrics. Use their detailed flame graphs to identify bottlenecks and latency in that app of yours. Start tracking the performance of your apps with a free trial at datadog.com/pythonpodcast. If you sign up for a trial and install the agent, Datadog will send you a free t-shirt.
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to pythonpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
- Your host as usual is Tobias Macey and today I’m interviewing Matthew Rocklin and Hugo Bowne-Anderson about their work building a business around the Dask ecosystem at Coiled
Interview
- Introductions
- How did you get introduced to Python?
- Can you give a quick overview of what Dask is and your motivations for creating it?
- How has Dask changed or evolved in the past 3 1/2 years since we last talked about it?
- How has the rest of the ecosystem changed in that time?
- After working on Dask for the past few years, what led you to the decision to build a business around it?
- What are the sharp edges of programming for Dask that users are looking for help on solving?
- What are the difficulties that users face in deploying and maintaining a production installation of Dask?
- What are the limitations of Dask when scaling both up and down?
- What are you building at Coiled to improve the user experience for users of Python and Dask?
- What are your thoughts on the pros and cons of orienting your messaging around the scalability of Python, as opposed to focusing on a specific industry or problem domain?
- What are the challenges that you are facing in managing the tensions between the open source and proprietary work that you are doing?
- How are you handling the ongoing governance of the Dask project?
- What are some of the most interesting, unexpected, or challenging lessons that you have learned while building and launching a company based on an open source project?
- What do you have planned for the future of both Coiled and Dask?
Keep In Touch
Picks
- Tobias
- The Hobbit
- Audiobook
- Audible Free Trial (affiliate link)
- The Hobbit
- Matt
- Hugo
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- Sign up for the Coiled Beta!
- Coiled
- Dask
- Data Engineering Podcast Interview About Dask
- PyData
- NumPy
- SciPy
- Cell Biology
- Datacamp
- Dataframed
- Matthew Rocklin on Podcast.__init__ about functional programming with Toolz
- IPython Notebook
- PyTorch
- Airflow
- Prefect
- XGBoost
- Tornado
- Coiled Blog Post About The Goals of Dask
- Spark
- AsyncIO
- Concurrent.futures
- Pangeo
- Xarray
- RAPIDS
- Nvidia
- Cuda
- Prefect
- Celery
- Life Sciences
- Tensorflow
- Snorkel
- Dagster
- DevOps
- Docker
- Kubernetes
- Metaflow
- Ray
- Anyscale
- Yarn
- Gartner Hype Cycle
- Travis Oliphant
- Postgres
- Amazon ECS
- Django
- Django Allauth
- Quansight
- Wes McKinney
- Ursa Labs
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA