Occupancy from beginning and end date with R

Tidyverse and purrr

Peter Hahn
3 min readJul 12, 2023
Photo by Jordan Graff on Unsplash

Often you face the problem, that you have the beginning and the end date of several stays, e.g. in a hospital, hotel or number of cars in a parking lot. You want to know how many people, cars are present on each day (occupancy). How to solve that problem in R?

Necessary libraries



Here are some example data:

structure(list(aufnr = c("282", "339", "258", "415", "210",
"357", "436", "421", "382", "059"),
aday = structure(c(17903, 17904, 17906, 17910, 17911,
17913, 17917, 17918, 17920, 17924), class = "Date"),
eday = structure(c(17904, 17916, 17908, 17921, 17919,
17919, 17922, 17929, 17926, 17929), class = "Date")),
row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

which you can use to build this tibble:

aufnr aday       eday      
<chr> <date> <date>
1 282 2019-01-07 2019-01-08
2 339 2019-01-08 2019-01-20
3 258 2019-01-10 2019-01-12
4 415 2019-01-14 2019-01-25
5 210 2019-01-15 2019-01-23
6 357 2019-01-17 2019-01-23
7 436 2019-01-21 2019-01-26
8 421 2019-01-22 2019-02-02
9 382 2019-01-24 2019-01-30
10 059 2019-01-28 2019-02-02

Build a tibble with all dates

The first step is to build a tibble with all dates from the first admission to the last dismission.

date_sequence <- seq(min(df_stat$aday), max(df_stat$eday), by = "day")

census_data <- tibble(
date = date_sequence)

The first command builds a sequence of all dates from beginning until the end. This is the sequence:

"2019-01-07" "2019-01-08" "2019-01-09" "2019-01-10"
"2019-01-11" "2019-01-12" "2019-01-13" "2019-01-14"
"2019-01-15" "2019-01-16" "2019-01-17" "2019-01-18"
"2019-01-19" "2019-01-20" "2019-01-21" "2019-01-22"
"2019-01-23" "2019-01-24" "2019-01-25" "2019-01-26"
"2019-01-27" "2019-01-28" "2019-01-29" "2019-01-30"
"2019-01-31" "2019-02-01" "2019-02-02"

The second command builds a tibble from the sequence (vector).

Count the occupancy for every date

census <- date_sequence %>%
map_dbl(~ sum(.x >= df_stat$aday & .x <= df_stat$eday))

census_data$num <- census

This shows how concise we can work by using the map-functions from the purrr-library.
map-dbl defines the result of the mapping as double. The tilde ~ is a short form for an anonymous function.
The entire command looks up the date-sequence and the df_stat. It sums up for every entry in the sequence(.x) when .x lies between the begin(aday) and the end(eday).
The second line add the results as a column.

Plot the data

We can build a tsibble (tidy temporal data) from the result and plot it using autoplot. We need to extra libraries.

census_ts <- as_tsibble(census_data,index =date)
Values from the tibble plotted with autoplot



Peter Hahn
Peter Hahn

Written by Peter Hahn

Former Hand surgeon now busy with Data Science, Rstat, Machine learning, Aikido

No responses yet