Conversations with the hackers, leaders, and innovators of the software world. Hosts Adam Stacoviak and Jerod Santo face their imposter syndrome so you don’t have to. Expect in-depth interviews with the best and brightest in software engineering, open source, and leadership. This is a polyglot podcast. All programming languages, platforms, and communities are welcome. Open source moves fast. Keep up.

ANTHOLOGY — Open source AI

May 24, 2023 1:38:06 94.35 MB Downloads: 0

This week on The Changelog we’re taking you to the hallway track of The Linux Foundation’s Open Source Summit North America 2023 in Vancouver, Canada. Today’s anthology episode features: Beyang Liu (Co-founder and CTO at Sourcegrpah), Denny Lee (Developer Advocate at Databricks), and Stella Biderman (Executive Director and Head of Research at EleutherAI).

Special thanks to our friends at GitHub for sponsoring us to attend this conference as part of Maintainer Month.

Leave us a comment

Changelog++ members get a bonus 3 minutes at the end of this episode and zero ads. Join today!

Sponsors:

  • DevCycle – Build better software with DevCycle. Feature flags, without the tech debt. DevCycle is a Feature Flag Management platform designed to help you build maintainable code at scale.
  • Sentry – See the untested code causing errors - or whether it’s partially or fully covered - directly in your stack trace, so you can avoid similar errors from happening in the future. Use the code CHANGELOG and get the team plan free for three months.
  • Rocky Linux – Enterprise Linux, the open source community way.
  • Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.

Featuring:

Show Notes:

The common denominator for these conversations is open source AI.

Beyang Liu and his team at Sourcegraph are focused on enabling more developers to understand code and their approach to a completely open source, model agnostic, coding assistant called Cody has significant interest from us.

Denny Lee and the team at Databricks recently released Dolly 2.0, the first open source, instruction-following LLM, that has been fine-tuned on a human-generated instruction dataset and is licensed for research and commercial use. They want to be the platform of choice the future of AI development.

Stella Biderman gave the keynote address on generative AI at the conference and works at the base layer doing open source research, model training, and AI ethics. Stella trained the EleutherAI pythia model family that Databricks’ used to create Dolly - 2.0.

Something missing or broken? PRs welcome!

Timestamps:

(00:00) - This week on The Changelog
(01:44) - Sponsor: DevCycle
(04:38) - Start the show!
(05:46) - We met Beyang 10 years ago!
(06:08) - The mission of Sourcegraph
(07:22) - Adam still Googles, just less
(08:30) - Plugins make models interesting
(09:35) - When did you start thinking about this?
(12:16) - This is a "Eureka!" momement in time
(13:11) - The gospel of text based input
(15:44) - Is this the future interface of Sourcegraph?
(17:21) - Iterating the interface
(17:59) - How can you access Cody?
(18:27) - Cody is open source
(20:13) - How does it get code intelligence?
(21:58) - What about privacy?
(26:11) - GPT for X
(26:53) - Cody vs Copilot
(29:25) - Open source + model agnostic
(31:22) - What's next?
(33:19) - How high up the stack can AI tooling go?
(36:07) - Is this a step change to plateau?
(38:21) - The ultimate flattener
(42:56) - Will AI awallow all of programing?
(45:52) - Sponsor: Sentry
(50:08) - We're fine-tuned
(50:51) - JIT conference presenter
(52:32) - This time 4 weeks ago
(53:54) - Let's generate our own data
(55:05) - All 15,000 Q&A data is open
(56:12) - Verbose is not always desirable
(56:42) - I want my own Dolly 2.0
(58:14) - How did you collect the Q&A data?
(1:00:39) - We thought we'd need more data
(1:01:40) - Dolly proved it could be done
(1:03:24) - Google's leaked memo
(1:06:06) - Databricks' play in this chess game
(1:08:45) - Turning AI on our transcripts
(1:11:03) - Chain or foundational model?
(1:12:42) - Sponsor: Rocky Linux
(1:15:19) - The base layer
(1:16:27) - What should the world know?
(1:17:40) - Where does the money come from?
(1:18:13) - Training LLMs is NOT that expensive
(1:22:07) - Focused on open source AI research
(1:25:49) - Interpreting LLMs
(1:28:30) - Influencing the properties of the model
(1:31:40) - Do you have fear of where this is going?
(1:32:58) - Connecting with Stella and team
(1:34:07) - Stella's news source is their Discord server
(1:36:22) - Outro