The podcast about Python and the people who make it great
Building The Seq Language For Bioinformatics
Summary
Bioinformatics is a complex and computationally demanding domain. The intuitive syntax of Python and extensive set of libraries make it a great language for bioinformatics projects, but it is hampered by the need for computational efficiency. Ariya Shajii created the Seq language to bridge the divide between the performance of languages like C and C++ and the ecosystem of Python with built-in support for commonly used genomics algorithms. In this episode he describes his motivation for creating a new language, how it is implemented, and how it is being used in the life sciences. If you are interested in experimenting with sequencing data then give this a listen and then give Seq a try!
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, node balancers, a 40 Gbit/s public network, fast object storage, and a brand new managed Kubernetes platform, all controlled by a convenient API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they’ve got dedicated CPU and GPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on great conferences. And now, the events are coming to you, with no travel necessary! We have partnered with organizations such as ODSC, and Data Council. Upcoming events include the Observe 20/20 virtual conference on April 6th and ODSC East which has also gone virtual starting April 16th. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
- Your host as usual is Tobias Macey and today I’m interviewing Ariya Shajii about Seq, a programming language built for bioinformatics and inspired by Python
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing what Seq is and your motivation for creating it?
- What was lacking in other languages or libraries for your use case that is made easier by creating a custom language?
- If someone is already working in Python, possibly using BioPython, what might motivate them to consider migrating their work to Seq?
- Can you give an impression of the scope and nature of the tasks or projects that a biologist or geneticist might build with Seq?
- What was your process for identifying and prioritizing features and algorithms that would be beneficial to the target audience?
- For someone using Seq can you describe their workflow and how it might differ from performing the same task in Python?
- How is Seq implemented?
- What are some of the features that are included to simplify the work of bioinformatics?
- What was your process of designing the language and runtime?
- How has the scope or direction of the project evolved since it was first conceived?
- What impact do you anticipate Seq having on the domain of bioinformatics and genomics?
- What have you found to be the most interesting, unexpected, and/or challenging aspects of building a language for this problem domain?
- What is in store for the future of Seq?
Keep In Touch
Picks
- Tobias
- Ariya
- Breakthrough documentary
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- Seq
- MIT CSAIL
- Bioinformatics
- LLVM
- Intermediate Representation
- MatLab
- Moore’s Law
- BioPython
- Smith Waterman Algorithm
- Hamming Distance
- Pattern Matching in Functional Programming
- SIMD == Single Instruction Multiple Data
- Computational Genomics
- Phylogenetics
- Sequence Read Archive public data set
- Google Cloud Life Sciences
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA