Natural Language Processing and How ML Models Understand Text

#python #datascience #python2 #python3 #learnpython #django #coding #programming #development #developer #dev #code #opensource #software #testing #technology

A weekly Python podcast hosted by Christopher Bailey with interviews, coding tips, and conversation with guests from the Python community. The show covers a wide range of topics including Python programming best practices, career tips, and related software development topics. Join us every Friday morning to hear what's new in the world of Python programming and become a more effective Pythonista.

Natural Language Processing and How ML Models Understand Text

July 29, 2022 0:58:49 57.49 MB Downloads: 0

How do you process and classify text documents in Python? What are the fundamental techniques and building blocks for Natural Language Processing (NLP)? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, talks about how machine learning (ML) models understand text.

Jodie explains how ML models require data in a structured format, which involves transforming text documents into columns and rows. She covers the most straightforward approach, called binary vectorization. We discuss the bag-of-words method and the tools of stemming, lemmatization, and count vectorization.

We jump into word embedding models next. Jodie talks about WordNet, Natural Language Toolkit (NLTK), Word2vec, and Gensim. Our conversation lays a foundation for starting with text classification, implementing sentiment analysis, and building projects using these tools. Jodie also shares multiple resources to help you continue exploring NLP and modeling.

Course Spotlight: Learn Text Classification With Python and Keras

In this course, you’ll learn about Python text classification with Keras, working your way from a bag-of-words model with logistic regression to more advanced methods, such as convolutional neural networks. You’ll see how you can use pretrained word embeddings, and you’ll squeeze more performance out of your model through hyperparameter optimization.

Topics:

00:00:00 – Introduction
00:02:47 – Exploring the topic
00:06:00 – Perceived sentience of LaMDA
00:10:24 – How do we get started?
00:11:16 – What are classification and sentiment analysis?
00:13:03 – Transforming text in rows and columns
00:14:47 – Sponsor: Snyk
00:15:27 – Bag-of-words approach
00:19:12 – Stemming and lemmatization
00:22:05 – Capturing N-grams
00:25:34 – Count vectorization
00:27:14 – Stop words
00:28:46 – Text Frequency / Inverse Document Frequency (TFIDF) vectorization
00:32:28 – Potential projects for bag-of-words techniques
00:34:07 – Video Course Spotlight
00:35:20 – WordNet and NLTK package
00:37:27 – Word embeddings and Word2vec
00:45:30 – Previous training and too many dimensions
00:50:07 – How to use Word2vec and Gensim?
00:51:26 – What types of projects for Word2vec and Gensim?
00:54:41 – Getting into GPT and BERT in another episode
00:56:11 – How to follow Jodie’s work?
00:57:36 – Thanks and goodbye

Show Links:

Support the podcast & join our community of Pythonistas

Add New Podcast

Subscribe to this podcast

Natural Language Processing and How ML Models Understand Text