Peter’s R — Solving prolonged waiting-times with tidymodels P1

Part 1: Problem and data

Peter Hahn
4 min readSep 29, 2022

Complaints about waiting-time in an hospital are a frequent problem, as in our hospital for orthopedics surgery. I’m the head of the hand surgery unit. Approximately 80% of the operations are outpatients and 20% are inpatients.

Histogram of waiting time in minutes

Waiting time has a mean of 130 minutes and a median of 115 minutes and is left skewed.

To analyze this problem, I exported the data of all operations from 2019 to September 2022 from our HIS (hospital information system). I will not go directly to the ultimate result, but I will let you follow my thoughts.

After some preprocessing, I identified the following variables for a first analysis and modeling. The names are mixed in German and English

Variables

age: age of the patient
arzt_code: name of the doctor (encoded)
t_diff: time from admission to operation (variable to be optimized)
p2p: time from the beginning of an operation until the beginning of the next operation
ops: Code of operating procedure (ICPM) abbreviated
icd: International Classification of Diseases abbreviated
stat: outpatient (1) or inpatient (0)
sn_zeit: Schnitt-Naht-Zeit (time from beginning, cut to end suture)

Data

You can find this data here to track all steps for yourself and play with them as you like https://github.com/phahn57/medium_wz. There you will find some of the code I used for analysis in .qmd files for each part of my analysis.

Exploratory data analysis:

The first step is always EDA. I will abbreviate it here into some basic steps.

Basic EDA with skimr

We have three character variables and five numeric variables. Two character variables, although abbreviated, have many unique values.

You can get more detailed insights by using the „explore“ library, which works interactively.

image of the interactive “explore” library

ggplot can produce additional plots e.g.

wait %>% ggplot(aes(arzt_code,sn_zeit)) + geom_boxplot()

Problem analysis

The aim is reducing t_diff, the time from admission until the beginning of the operation. These time depends on several variables within the operation as sn_zeit and other variables which are not in the data. Admission also depends on the number and length of the preceding operations. As a result it seems best to predict p2p or sn_zeit and sum up the times from the beginning of the day, to calculate the admission schedule for each patient.

Modeling

With some experience in Kaggle competitions, I decided to use machine learning to solve this problem. Exactly at this time the new book from
Max Kuhn and Julia Silge: Tidy modeling with R was published,
which is also available online. Because I wanted to dive deeper into the tidyverse, I completed the entire project with tidy modeling. On that wy I read the entire book chapter by chapter and improved my project step by step.

Book cover

Tidy modeling

The tidymodels packages follow the design philosophy of the tidyverse by Hadley Wickham.
„The tidymodels framework is like mlr or caret in adopting a unification of the function interface, as well as enforcing consistency in the function names and return values.“ We will see the advantages in detail in the following parts of the journey.

Part 2

Will focus on a basic linear model and will introduce data splitting, recipe, workflow, fitting.
Part 2 is here

Before you leave

In contrast to my previous Kaggle competitions and except for a few small projects, this is my first project which is deployed in production … today. So you can follow me from the early beginning until deployment and hopefully positive results.

Stay tuned.

--

--

Peter Hahn
Peter Hahn

Written by Peter Hahn

Former Hand surgeon now busy with Data Science, Rstat, Machine learning, Aikido

No responses yet