How I tried to solve day 61
Cassie Kozyrkov asked if we can find the pattern in the data [https://towardsdatascience.com/when-not-to-use-machine-learning-or-ai-8185650f6a29] and predict the dose for day 61.
Although I am a hand surgeon, I am interested in data science and accepted the challenge.
The data
(1,28) (2,17) (3,92) (4,41) (5,9) (6,87) (7,54) (8,3) (9,78) (10,67) (11,1) (12,67) (13,78) (14,3) (15,55) (16,86) (17,8) (18,42) (19,92) (20,17) (21,29) (22,94) (23,28) (24,18) (25,93) (26,40) (27,9) (28,87) (29,53) (30,3) (31,79) (32,66) (33,1) (34,68) (35,77) (36,3) (37,56) (38,86) (39,8) (40,43) (41,92) (42,16) (43,30) (44,94) (45,27) (46,19) (47,93) (48,39) (49,10) (50,88) (51,53) (52,4) (53,80) (54,65) (55,1) (56,69) (57,77) (58,3) (59,57) (60,86)
Plot the data
The first and most important step is plotting the data. A simple x-y plot gave this:
The data appear to come from three sinus functions. Cassie draws the data alternating from three functions. I added families (abc) to the data and plotted again, using colors for each family.
Day 60 is from family c, the entry for day 61 must be from family a.
How to reveal the value for day 61 from family a?
First solution: visual extrapolation
I refined the y-axis ticks and added three lines for orientation. The curves appear to be symmetric around their minimum and maximum values.
These lines show that the symmetric value after the turning point is smaller than before the turning point.
Visual interpolation gives me a value of 9 for day 61.
Second solution: calculate the sinus function
The formula for the sinus function is given with:
f(x)=a⋅sin(b⋅(x−c))+d
From the data table we get:
max(a) is x = 22 and y = 94
min(a) is x = 55 and y = 1
d= (max(a) + min(a)) /2 = 47.5
a= (max(a) — min(a)) /2 = 46.5
b = 2pi/period
I estimate the period from 2* (x at maximum — x at minimum) which is 66.
C is the x-value for sin(0) and can be estimated from the curve and data with approximately 5.4
I built a function and calculated the dose for four days and compared them to the given values.
x(1) = 28.5867461 dataset: 28
x(16) = 86.8558433 dataset: 86
x(37) = 53.6791736 dataset: 56
x(55) = 1.0021071 dataset: 1
Except for x=37 the results are precise. The function for the curve contains estimated parameters, which can cause that.
Using the formula and calculating for day 61 gives a value of 8.62.
Final Solution
Both of my primitive methods predict a dose of 9 for day 61.
I publish this one day before Cassie will publish here code for generating the data.
Conclusion
The important first step is plotting of the data. From there further analysis is possible.
Calculations
are in R. Details are here: https://rpubs.com/phahn/622013