The Power of BERT: NLP Topic Modelling and Analyzing Podcast Transcripts

3 min readSep 6, 2022

Recently I was scrolling through many podcast episodes and I had the idea to run topic modelling on podcast transcripts to help me determine if there were topics interesting enough for me to listen to the episode. This article is how I did that and what the results looked like.

BERT: Bidirectional Encoder Representations from Transformers
Bert bases its architecture on self attention mechanisms. If you want to learn more about BERT I recommend reading this book here:

Getting Started with Google BERT

Kickstart your NLP journey by exploring BERT and its variants such as ALBERT, RoBERTa, DistilBERT, VideoBERT, and more…

www.oreilly.com

BERT outputs word embeddings. These embeddings can be used for a variety of things such as text summarization and topic modelling. Unlike traditional topic modelling / NLP techniques, BERT doesn’t require techniques such as stemming/lemmatizing and removal of stop-words. In this case I am going to use BERT’s topic modelling abilities.

If you wish to use BERT yourself you can find it here: https://pypi.org/project/bertopic/

The Podcast Episode:

#317 - John Vervaeke: Meaning Crisis, Atheism, Religion & the Search for Wisdom - Transcripts

John Vervaeke is a psychologist and cognitive scientist at University of Toronto. Please support this podcast by…

steno.ai

I picked the most recent podcast episode here. Honestly the title is good enough of a description already, but let’s see if BERT will be able to pick the topics mentioned in the title (and perhaps other side conversations). I used Selenium to scrape the data from this site, if you want to learn how to web scrape: https://www.scrapingbee.com/blog/selenium-python/

There are many articles on how to use BERT, here’s one that may help if you want to get started with topic modelling with BERT: https://towardsdatascience.com/topic-modeling-with-bert-779f7db187e6

Evidently the BERT model is picking up relevant topics from the transcript— very similar to the title of the podcast (albeit slightly cryptically naming them) :

Meaning_life_love
religion_religious_theism_religions
wisdom_rationality_puzzle_solving
myths_patterns_stories_mythos
sin_immoral_evil_immortality

Other interesting subjects:

consciousness_unconcious_do_concious
cognition_congitive_distributed_science
flow_state_induction_need
bullshit_deception_truth
illusion_reality_we_math
data_neural_networks_overfitting
Death_mortality_problematic_die
Video_games_world_game

It also happens to pick up times when the two speakers agree or disagree under the category no_yes_yeah_very. This is interesting as it indicates during these time frames, the two speakers share similar / dissimilar views (or just a misunderstanding and includes followup for clarification) on what ever subject they were talking about:

Another Interesting topics is also picked up: shampoo

If you want to know what library I used to create these visuals I used the Streamlit package. You can try it out here: https://docs.streamlit.io/library/get-started

Conclusion: Overall BERT is really good at identifying relevant topics. At the same time it does generate some garbage topics — which likely can be solved by tuning hyper parameters. The training time was less than 2 minutes on roughly 1000 sentences. And this project was set up and up in running in less than 3 hours. However, BERT has some issues when creating the topic names. They often come out awkward and reordered — why this happens? Maybe you can explain if you know!

The Power of BERT: NLP Topic Modelling and Analyzing Podcast Transcripts

Getting Started with Google BERT

Kickstart your NLP journey by exploring BERT and its variants such as ALBERT, RoBERTa, DistilBERT, VideoBERT, and more…

#317 - John Vervaeke: Meaning Crisis, Atheism, Religion & the Search for Wisdom - Transcripts

John Vervaeke is a psychologist and cognitive scientist at University of Toronto. Please support this podcast by…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Richard Gao

Responses (2)