How I tried to solve day 61

Peter Hahn
3 min readJun 6, 2020

--

Cassie Kozyrkov asked if we can find the pattern in the data [https://towardsdatascience.com/when-not-to-use-machine-learning-or-ai-8185650f6a29] and predict the dose for day 61.
Although I am a hand surgeon, I am interested in data science and accepted the challenge.

The data

(1,28) (2,17) (3,92) (4,41) (5,9) (6,87) (7,54) (8,3) (9,78) (10,67) (11,1) (12,67) (13,78) (14,3) (15,55) (16,86) (17,8) (18,42) (19,92) (20,17) (21,29) (22,94) (23,28) (24,18) (25,93) (26,40) (27,9) (28,87) (29,53) (30,3) (31,79) (32,66) (33,1) (34,68) (35,77) (36,3) (37,56) (38,86) (39,8) (40,43) (41,92) (42,16) (43,30) (44,94) (45,27) (46,19) (47,93) (48,39) (49,10) (50,88) (51,53) (52,4) (53,80) (54,65) (55,1) (56,69) (57,77) (58,3) (59,57) (60,86)

Plot the data

The first and most important step is plotting the data. A simple x-y plot gave this:

Simple plot of the data

The data appear to come from three sinus functions. Cassie draws the data alternating from three functions. I added families (abc) to the data and plotted again, using colors for each family.

Plot enhanced with families and colors

Day 60 is from family c, the entry for day 61 must be from family a.

Last lines of the data

How to reveal the value for day 61 from family a?

First solution: visual extrapolation

I refined the y-axis ticks and added three lines for orientation. The curves appear to be symmetric around their minimum and maximum values.

Refindes plot with lines for orientation

These lines show that the symmetric value after the turning point is smaller than before the turning point.
Visual interpolation gives me a value of 9 for day 61.

Second solution: calculate the sinus function

The formula for the sinus function is given with:
f(x)=a⋅sin(b⋅(x−c))+d

From the data table we get:
max(a) is x = 22 and y = 94
min(a) is x = 55 and y = 1
d= (max(a) + min(a)) /2 = 47.5
a= (max(a) — min(a)) /2 = 46.5
b = 2pi/period
I estimate the period from 2* (x at maximum — x at minimum) which is 66.
C is the x-value for sin(0) and can be estimated from the curve and data with approximately 5.4

I built a function and calculated the dose for four days and compared them to the given values.

x(1) = 28.5867461 dataset: 28
x(16) = 86.8558433 dataset: 86
x(37) = 53.6791736 dataset: 56
x(55) = 1.0021071 dataset: 1

Except for x=37 the results are precise. The function for the curve contains estimated parameters, which can cause that.

Using the formula and calculating for day 61 gives a value of 8.62.

Final Solution

Both of my primitive methods predict a dose of 9 for day 61.
I publish this one day before Cassie will publish here code for generating the data.

Conclusion

The important first step is plotting of the data. From there further analysis is possible.

Calculations

are in R. Details are here: https://rpubs.com/phahn/622013

--

--

Peter Hahn
Peter Hahn

Written by Peter Hahn

Former Hand surgeon now busy with Data Science, Rstat, Machine learning, Aikido

No responses yet