Occupancy from beginning and end date with R
Tidyverse and purrr
Often you face the problem, that you have the beginning and the end date of several stays, e.g. in a hospital, hotel or number of cars in a parking lot. You want to know how many people, cars are present on each day (occupancy). How to solve that problem in R?
Necessary libraries
library(tidyverse)
Data
Here are some example data:
structure(list(aufnr = c("282", "339", "258", "415", "210",
"357", "436", "421", "382", "059"),
aday = structure(c(17903, 17904, 17906, 17910, 17911,
17913, 17917, 17918, 17920, 17924), class = "Date"),
eday = structure(c(17904, 17916, 17908, 17921, 17919,
17919, 17922, 17929, 17926, 17929), class = "Date")),
row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
which you can use to build this tibble:
aufnr aday eday
<chr> <date> <date>
1 282 2019-01-07 2019-01-08
2 339 2019-01-08 2019-01-20
3 258 2019-01-10 2019-01-12
4 415 2019-01-14 2019-01-25
5 210 2019-01-15 2019-01-23
6 357 2019-01-17 2019-01-23
7 436 2019-01-21 2019-01-26
8 421 2019-01-22 2019-02-02
9 382 2019-01-24 2019-01-30
10 059 2019-01-28 2019-02-02
Build a tibble with all dates
The first step is to build a tibble with all dates from the first admission to the last dismission.
date_sequence <- seq(min(df_stat$aday), max(df_stat$eday), by = "day")
census_data <- tibble(
date = date_sequence)
The first command builds a sequence of all dates from beginning until the end. This is the sequence:
"2019-01-07" "2019-01-08" "2019-01-09" "2019-01-10"
"2019-01-11" "2019-01-12" "2019-01-13" "2019-01-14"
"2019-01-15" "2019-01-16" "2019-01-17" "2019-01-18"
"2019-01-19" "2019-01-20" "2019-01-21" "2019-01-22"
"2019-01-23" "2019-01-24" "2019-01-25" "2019-01-26"
"2019-01-27" "2019-01-28" "2019-01-29" "2019-01-30"
"2019-01-31" "2019-02-01" "2019-02-02"
The second command builds a tibble from the sequence (vector).
Count the occupancy for every date
census <- date_sequence %>%
map_dbl(~ sum(.x >= df_stat$aday & .x <= df_stat$eday))
census_data$num <- census
This shows how concise we can work by using the map-functions from the purrr-library.
map-dbl defines the result of the mapping as double. The tilde ~ is a short form for an anonymous function.
The entire command looks up the date-sequence and the df_stat. It sums up for every entry in the sequence(.x) when .x lies between the begin(aday) and the end(eday).
The second line add the results as a column.
Plot the data
We can build a tsibble (tidy temporal data) from the result and plot it using autoplot. We need to extra libraries.
library(tsibble)
library(fable)
census_ts <- as_tsibble(census_data,index =date)
autoplot(census_ts)