Hello! This is The Vergecast, the flagship podcast of The Verge... and your life. Every Friday, Nilay Patel and Dieter Bohn make sense of the week's tech news with help from our wide-ranging staff. Join us every week for a fun, deeply nerdy, often off-the-rails conversation about what's happening now (and next) in technology and gadgets.

How to train your data

June 25, 2026 0:26:41 4.82 MB ( -4.83 MB less) Downloads: 0

Training data is the raw material of the AI industry. Claude, ChatGPT, Gemini, and the rest are built on top of oceans of stuff. What is that stuff? Books. Blog posts. YouTube videos. Reddit comments. All of it and more, in virtually incomprehensible quantities. Alex Reisner, a staff writer at The Atlantic who has been investigating training data, explains how AI companies get all this data, why they'd really prefer you not know what's in it, and whether training data could ever be a fair trade.


Further reading:

Subscribe to The Verge for unlimited access to theverge.com, subscriber-exclusive newsletters, and our ad-free podcast feed.

We love hearing from you! Email your questions and thoughts to vergecast@theverge.com or call us at 866-VERGE11.

Learn more about your ad choices. Visit podcastchoices.com/adchoices