Occupancy from beginning and end date with R

Tidyverse and purrr

3 min readJul 12, 2023

Often you face the problem, that you have the beginning and the end date of several stays, e.g. in a hospital, hotel or number of cars in a parking lot. You want to know how many people, cars are present on each day (occupancy). How to solve that problem in R?

Necessary libraries

library(tidyverse)

Data

Here are some example data:

structure(list(aufnr = c("282", "339", "258", "415", "210",
 "357", "436", "421", "382", "059"), 
aday = structure(c(17903, 17904, 17906, 17910, 17911,
 17913, 17917, 17918, 17920, 17924), class = "Date"), 
    eday = structure(c(17904, 17916, 17908, 17921, 17919,
 17919, 17922, 17929, 17926, 17929), class = "Date")),
 row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

which you can use to build this tibble:

aufnr aday       eday      
   <chr> <date>     <date>    
 1 282   2019-01-07 2019-01-08
 2 339   2019-01-08 2019-01-20
 3 258   2019-01-10 2019-01-12
 4 415   2019-01-14 2019-01-25
 5 210   2019-01-15 2019-01-23
 6 357   2019-01-17 2019-01-23
 7 436   2019-01-21 2019-01-26
 8 421   2019-01-22 2019-02-02
 9 382   2019-01-24 2019-01-30
10 059   2019-01-28 2019-02-02

Build a tibble with all dates

The first step is to build a tibble with all dates from the first admission to the last dismission.

date_sequence <- seq(min(df_stat$aday), max(df_stat$eday), by = "day")

census_data <- tibble(
  date = date_sequence)

The first command builds a sequence of all dates from beginning until the end. This is the sequence:

"2019-01-07" "2019-01-08" "2019-01-09" "2019-01-10"
"2019-01-11" "2019-01-12" "2019-01-13" "2019-01-14"
"2019-01-15" "2019-01-16" "2019-01-17" "2019-01-18"
"2019-01-19" "2019-01-20" "2019-01-21" "2019-01-22"
"2019-01-23" "2019-01-24" "2019-01-25" "2019-01-26"
"2019-01-27" "2019-01-28" "2019-01-29" "2019-01-30"
"2019-01-31" "2019-02-01" "2019-02-02"

The second command builds a tibble from the sequence (vector).

Count the occupancy for every date

census <- date_sequence %>%
  map_dbl(~ sum(.x >= df_stat$aday & .x <= df_stat$eday))

census_data$num <- census

This shows how concise we can work by using the map-functions from the purrr-library.
map-dbl defines the result of the mapping as double. The tilde ~ is a short form for an anonymous function.
The entire command looks up the date-sequence and the df_stat. It sums up for every entry in the sequence(.x) when .x lies between the begin(aday) and the end(eday).
The second line add the results as a column.

Plot the data

We can build a tsibble (tidy temporal data) from the result and plot it using autoplot. We need to extra libraries.

library(tsibble)
library(fable)
census_ts <- as_tsibble(census_data,index =date)
autoplot(census_ts)