Occupancy from beginning and end date with R

Tidyverse and purrr

Peter Hahn
3 min readJul 12, 2023
Photo by Jordan Graff on Unsplash

Often you face the problem, that you have the beginning and the end date of several stays, e.g. in a hospital, hotel or number of cars in a parking lot. You want to know how many people, cars are present on each day (occupancy). How to solve that problem in R?

Necessary libraries

library(tidyverse)

Data

Here are some example data:

structure(list(aufnr = c("282", "339", "258", "415", "210",
"357", "436", "421", "382", "059"),
aday = structure(c(17903, 17904, 17906, 17910, 17911,
17913, 17917, 17918, 17920, 17924), class = "Date"),
eday = structure(c(17904, 17916, 17908, 17921, 17919,
17919, 17922, 17929, 17926, 17929), class = "Date")),
row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

which you can use to build this tibble:

aufnr aday       eday      
<chr> <date> <date>
1 282 2019-01-07 2019-01-08
2 339 2019-01-08 2019-01-20
3 258 2019-01-10 2019-01-12
4 415 2019-01-14 2019-01-25
5 210 2019-01-15 2019-01-23
6 357 2019-01-17 2019-01-23
7 436 2019-01-21 2019-01-26
8 421 2019-01-22 2019-02-02
9 382 2019-01-24 2019-01-30
10 059 2019-01-28 2019-02-02

Build a tibble with all dates

The first step is to build a tibble with all dates from the first admission to the last dismission.

date_sequence <- seq(min(df_stat$aday), max(df_stat$eday), by = "day")

census_data <- tibble(
date = date_sequence)

The first command builds a sequence of all dates from beginning until the end. This is the sequence:

"2019-01-07" "2019-01-08" "2019-01-09" "2019-01-10"
"2019-01-11" "2019-01-12" "2019-01-13" "2019-01-14"
"2019-01-15" "2019-01-16" "2019-01-17" "2019-01-18"
"2019-01-19" "2019-01-20" "2019-01-21" "2019-01-22"
"2019-01-23" "2019-01-24" "2019-01-25" "2019-01-26"
"2019-01-27" "2019-01-28" "2019-01-29" "2019-01-30"
"2019-01-31" "2019-02-01" "2019-02-02"

The second command builds a tibble from the sequence (vector).

Count the occupancy for every date

census <- date_sequence %>%
map_dbl(~ sum(.x >= df_stat$aday & .x <= df_stat$eday))

census_data$num <- census

This shows how concise we can work by using the map-functions from the purrr-library.
map-dbl defines the result of the mapping as double. The tilde ~ is a short form for an anonymous function.
The entire command looks up the date-sequence and the df_stat. It sums up for every entry in the sequence(.x) when .x lies between the begin(aday) and the end(eday).
The second line add the results as a column.

Plot the data

We can build a tsibble (tidy temporal data) from the result and plot it using autoplot. We need to extra libraries.

library(tsibble)
library(fable)
census_ts <- as_tsibble(census_data,index =date)
autoplot(census_ts)
Values from the tibble plotted with autoplot

--

--