Peter’s R — NLP resources

Peter Hahn
2 min readJan 9, 2022
Photo by Patrick Tomasso on unsplash

Text mining with R

For years I am struggling with NLP in R. My first steps based on Julia Silge and David Robinson famous book: Text mining with R.

They based this book on the „tidy“ format. Hadley Wickham introduced this format. This format has a specific structure:

- Each variable is a column
- Each observation is a row
- Each type of observational unit is a table

Applied to text mining, this means:

- one token per row.

The basic techniques, tokenization, TF-IDF and topic modelling give an excellent introduction to tidy text and basic applications.

Machine learning for text analysis

One step further is the new textbook
Supervised Machine Learning for Text Analysis in R
by Emil Hvitfeldt and Julia Silge.

This book gives deeper insights into the basic techniques as tokenization, stop-words, stemming and word embeddings. It also covers machine learning methods as regression, classification and deep learning.

Basic knowledge of the tidyverse and the basics covered in text mining with R (see above) are useful.

Natural language processing for prediction

Recently I discovered this series of stories on Medium:

Natural Language Processing for predictive purposes with R written by Jurrian van Nagelkerke and Wouter van Gils.

The authors meticulously describe their efforts to predict the next Michelin star based on restaurant reviews from guests. The articles cover basic bag of word techniques and more advanced, like embedding and transformers.

Reproducing their steps and applying them to my recent problems advanced my knowledge.

Semantic information extraction

This video is a presentation from Why-R 2021. It gave me some deeper insights into embeddings and their use in sentences.

I hope these resources are useful for your further studies. My next story will cover some keyboard shortcuts within RStudio and some useful macros I have written with Keyboard Maestro to ease my coding within RStudio. You can find more stories about R here: Peter’s R

Stay tuned.

If you enjoy reading this and want to support my further writing, consider signing up as a Medium member. You’ll get full access to all stories on Medium. If you sign up using my link, I’ll earn a small commission.

[https://kphahn57.medium.com/membership]

--

--