Our article COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability just appeared in MMNP (Mathematical Modelling of Natural Phenomena).

# Tag Archives: covid-19

# Moindres carrés, STT5100

On avance dans le cours STT5100 de **modèles linéaires appliqués**. Les supports de cours sont en ligne sur https://github.com/freakonometrics/STT5100. Cette session (mais aussi celle d’hiver) étant en distanciel, le cours est asynchrone, et je poste régulièrement des capsules vidéos. Les capsules en lien avec les moindres carrés sont en ligne

- Régressions video + pdf (40:01)
- Régression simple sur variable continue (1) video + pdf (41:33)
- Régression simple sur variable continue (2) video + pdf (30:26)
- Régression simple sur variable continue (3) video + pdf (30:39)
- Régression simple sur variable factorielle video + pdf (39:59)
- Régression multiple (1) video + pdf (45:56)
- Régression multiple (2) video + pdf (37:40)
- Modèle Gaussien et tests video + pdf (55:43)
- Bootstrap video + pdf (45:37)
- Diagnostique video + pdf (50:08)
- ANOVA (1) video + pdf (37:52)
- ANOVA (2) video + pdf (28:28)
- ANOVA (3) video + pdf (16:42)
- Régression pondérée video + pdf (16:33)
- Multicolinéarité (1) video + pdf (43:25)
- Sélection de variables video + pdf (44:13)
- MCG & hétéroscédasticité video + pdf (21:30)
- Multicolinéarité (2) video + pdf (25:05)
- ANOVA (4) pdf (
*hors programme*) - Transformations video + pdf (13:20)
- Lissage, non-linéarités video + pdf (34:51)
- Discontinuités video + pdf (21:36)
- Exemple video + pdf (40:24)
- Exemple video + pdf (28:17)

# Modèles épidémiologiques pour analyse coût-efficacité sous incertitude

Mardi 29 septembre, le matin (heure canadienne), je donnerai un exposé au groupe de travail modcov19, du CNRS, en France, pour présenter notre article COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability

# R0 and the exponential growth of a pandemic, an update

A few days ago, I wrote a blog post – R0 and the exponential growth of a pandemic – where I was trying to generate some visualization of some exponential growth, in the context of a pandemic. After giving some thoughts, the previous graph might not be the best one to see an exponential based contagion.

Having graphs evolving, from the left to the right, gives us the (false) idea of some temporal evolution. Which is no necessarily correct. It simply means that contaminated people will contaminate other people, and we look at the number of iterations here. So maybe some concentric dots would look better.

And from a technical perspective, what I did was fun, but probably too complicated. In my previous post, I wanted to pack *optimally* k identical disks intro a unit circle. On http://hydra.nat.uni-magdeburg.de/packing, it was possible to get the “best known packings of equal circles in a circle”, with the coordinates. But as we will see, we can use something much more simple here.

My idea is now to create some picture like one below, with concentric colored dot. In the center, we have the first people that were contaminated, and then, we can see the transmission, somehow

From a technical perspective, here, I use a different strategy. I decided to draw random points, uniformly. The problem with randomness is the natural high discrepancy, with monte carlo methods: it is very likely that some disks will overlap. It is not a major issue, but it might distort the message. So I decided to use some low-discrepancy sequences, such as Halton‘s sequence.

library(randtoolbox) S = halton(n=5000, dim = 2)*2-1 |

Here, I have disk coordinates in [-1,+1]^2. Then, to get disks in a circle, I simply compute the distance to the origin (0,0),

D0 = S[,1]^2+S[,2]^2 |

and take the ranks. If I want to visualize k=200 people, I consider the 200 smaller ranks. To get concentric circles, each part having k_i individuals, I use as thresholds R_0^{\bar k_{i-1}},R_0^{\bar k_{i}},R_0^{\bar k_{i+1}}, etc, where \bar k_i=\bar k_{i-1}+k_i,

R0 = rank(D0,ties.method = "random") C0 = as.numeric(cut(R0,c(0,cumsum(k)+.5)),100000) |

where

R0=1.8 k=round(R0^(seq(1,9,by=2))) |

Then we can plot the dots, with appropriate colors,

points(S,pch=19,col=colrpal[C0],cex=.75) |

And of course, we can try that with different values, for R_0

R0=2.2 k=round(R0^(seq(1,9,by=2))) kmax=max(k) S = halton(n=5000, dim = 2)*2-1 plot(S,col="light yellow",axes=FALSE,xlab="",ylab="",xlim=c(-1.3,1),ylim=c(-1,1),cex=.75,pch=19) D0 = S[,1]^2+S[,2]^2 R0 = rank(D0,ties.method = "random") C0 = as.numeric(cut(R0,c(0,cumsum(k)+.5)),100000) points(S,pch=19,col=colrpal[C0],cex=.75) |

# R0 and the exponential growth of a pandemic

For some dissemination work, I want to create a nice graph to explain the exponential growth in pandemics, related to the value of R_0. Recall that R_0 corresponds to the average number of people that a contagious person can infect. Hence, with R_0=1.5, 4 people will contaminate 6 people, and those 6 will contaminate 9, etc. After n iteration, the number of contaminated people is simply R_0{}^n. As explained by Daniel Kahneman

people, certainly including myself, don’t seem to be able to think straight about exponential growth. What we see today are infections that occurred 2 or 3 weeks ago and the deaths today are people who got infected 4 or 5 weeks ago. All of this is I think beyond intuitive human comprehension

For different values of R_0 (on each row), I wanted to visualise the number of contaminated people after 3, 5 or 7 iterations, since graphs are usually the most simple way to give some intuition. The graph I had in mind was the following

(to be honest, I am quite sure I had seen it somewhere, but I cannot find where). The main challenge here is pack *optimally* k identical circles intro a unit circle: we need here the location of the points (center of the disks) and the radius. It seems to be a rather complicated mathematical problem. Nicely, on http://hydra.nat.uni-magdeburg.de/packing, it is possible to get the “best known packings of equal circles in a circle” (up to 5000, but many k‘s are missing). For instance, for k=37, we have

And interestingly, on the same website, we can get the coordinates of the centers, for example with 37 disks, so it is possible to recreate the R graph.

k = 37 base = read.table(paste("http://hydra.nat.uni-magdeburg.de/packing/cci/txt/cci",k,".txt",sep=""), header=FALSE) |

The problem, as discussed earlier, is that some cases are not solved, yes, for instance k=2^{12}=4096: the next feasable case is 4105. To avoid that issue, one can use

T = "Error" while(T == "Error"){ T = substr(try(base = read.table(paste("http://hydra.nat.uni-magdeburg.de/packing/cci/txt/cci",k,".txt",sep=""), header=FALSE),silent = TRUE),1,5) k=k+1 } k=k-1 |

Now we can almost plot it. The problem is that the radius of the circles is missing, here. But we can compute it

D=as.matrix(dist(x = base[,2:3])) diag(D)=1e5 i=which(D == min(D), arr.ind = TRUE) r = D[i[1,1],i[1,2]] |

Here the radius is

r [1] 0.2959118 |

To plot it, use

plot(base$V2,base$V3,xlim=c(-1,1),ylim=c(-1,1)) n=100 theta=seq(0,pi,length=n+1) circ= function(x,y,r,h=1){ vu=x+r*cos(theta) vv=r*sin(theta) cbind(c(vu,rev(vu))*h,c(y+vv,y-rev(vv))*h) } for(i in 1:k) polygon(circ(base[i,2],base[i,3],r/2*.95),col=colr,border=NA) |

We can now use that code to create the graph above, with k=R_0{}^n for various values of n

And we can also use it to visualize more subtile differences, like R_0=1.1, R_0=1.3, R_0=1.5 and R_0=1.7

# Hidding values in the output of the summary function for a (linear) regression

Since our Fall 2020 session will be 100% online (and off-site), I have to work hard this summer to prepare online quizz and exams. I started intensively to play with Achim’s awesome r-exams package. But there are still a few things that I wanted to add, so I will post a series of posts on my blogs to keep tracks of updated functions I will write. Most of them a modification of R internal functions, so the code is hard to read. Here is the file, and I will update it frequently

url = "http://freakonometrics.free.fr/onlineExams.R" source(url) |

I have updated the summary function (more precisely the summary.lm function). To see how it works, consider a simple regression

library(car) reg = lm(prestige ~ women, data=Prestige) my_summary(reg) Call: lm(formula = prestige ~ women, data = Prestige) Residuals: Min 1Q Median 3Q Max -33.444 -12.391 -4.126 13.034 39.185 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 48.69300 2.30760 21.101 2e-16 *** women -0.06417 0.05385 -1.192 0.236 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 17.17 on 100 degrees of freedom Multiple R-squared: 0.014, Adjusted R-squared: 0.004143 F-statistic: 1.42 on 1 and 100 DF, p-value: 0.2362 |

A classical question I ask in my quizz is to hide the p-value of the F-test, and ask what it is (to make sure that students understand the equivalence between the F and the *t* test, in a simple regression). To hide the p-value, use

my_summary(reg, Fisher=TRUE) Call: lm(formula = prestige ~ women, data = Prestige) Residuals: Min 1Q Median 3Q Max -33.444 -12.391 -4.126 13.034 39.185 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 48.69300 2.30760 21.101 2e-16 *** women -0.06417 0.05385 -1.192 0.236 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 17.17 on 100 degrees of freedom Multiple R-squared: 0.014, Adjusted R-squared: 0.004143 F-statistic: 1.42 on 1 and 100 DF, p-value: ■■■■■ |

(and then, in a multiple choice exam, I ask if it is 1%, 5%, 12%, 23%, 47%, for example). That one was easy, since all those lines are based on the cat function, so I just modify it, if necessary

if(Fisher) cat("\nF-statistic:", formatC(x$fstatistic[1L], digits = digits), "on", x$fstatistic[2L], "and", x$fstatistic[3L], "DF, p-value:", "■■■■■") if(!Fisher) cat("\nF-statistic:", formatC(x$fstatistic[1L], digits = digits), "on", x$fstatistic[2L], "and", x$fstatistic[3L], "DF, p-value:", format.pval(pf(x$fstatistic[1L], x$fstatistic[2L], x$fstatistic[3L], lower.tail = FALSE), digits = digits)) |

(here I use the unicode ‘black square‘ symbol to hide numbers). Of course, I can hide the value of \sigma, or the (adjusted or not) R ^2, etc.

Now, a little bit more tricky: what if we want to change the regression table, with the coefficients, their standard deviation, etc. It is tricky since those values are numeric, with an appropriate format (not too many digits), but it can be done easily since that formating is done through the printCoefmat function. So in my code, I have my internal function, where I ask to put some ‘black square‘ (and the good number to keep a readable format) at some specific locations. Consider a more complex regression

reg = lm(prestige ~ ., data=Prestige) |

and assume that we want to hide the value of the intercet, \widehat{\beta}_0 (i.e. located at (1,1) in the matrix) and the p-value of the *t*-test for the fourth one (i.e. located at (4,4) in the matrix – since the first colum is \widehat{\beta}_3, the second one its standard deviation, the thirst one the t value, and then, the fourth one, the p-value of the test). I use the following two vectors

vligne = c(1,4), vcolonne = c(1,4) |

with rows and columns in the matrix (of course, the two should have the same length). Then, the good thing is that the printCoefmat function convert numerical values into characters (to have things that look like columns actually). So we simply have to remove numerical digits, and use squares instead

Cf2=Cf if(length(vligne)>0){ for(i in 1:length(vligne)){ long = nchar(Cf[vligne[i],vcolonne[i]]) Cf2[vligne[i],vcolonne[i]] = paste(rep("■",long),collapse = "") }} |

Then, we print the updated version of the table

print.default(Cf2, quote = quote, right = right, na.print = na.print,...) |

For example, here, it would be

my_summary(reg, vligne=c(1,4), vcolonne=c(1,4)) Call: lm(formula = prestige ~ ., data = Prestige) Residuals: Min 1Q Median 3Q Max -12.9863 -4.9813 0.6983 4.8690 19.2402 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) ■■■■■■■■■■ 8.018e+00 -1.513 0.13380 education 3.933e+00 6.535e-01 6.019 3.64e-08 *** income 9.946e-04 2.601e-04 3.824 0.00024 *** women 1.310e-02 3.019e-02 0.434 ■■■■■■■ census 1.156e-03 6.183e-04 1.870 0.06471 . typeprof 1.077e+01 4.676e+00 2.303 0.02354 * typewc 2.877e-01 3.139e+00 0.092 0.92718 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 7.037 on 91 degrees of freedom (4 observations deleted due to missingness) Multiple R-squared: 0.841, Adjusted R-squared: 0.8306 F-statistic: 80.25 on 6 and 91 DF, p-value: < 2.2e-16 |

Of course, it is hand made, I do not check for typos (like you should not ask to put squares in the seventh column), but that works well enough to generate random regressions in a quizz (or identical regressions on subsamples of a large dataset), and to hide values, in a quizz.

# Lancement du groupe en épidémiologie et en santé publique du centre de recherches mathématiques

Ce vendredi, lancement des activités du groupe en épidémiologie et en santé publique du centre de recherches mathématiques de Montréal. D’autres évènements seront à venir…

# Talk on Reinforcement Learning at IACS

I will give a 30min talk at IACS in Toronto tomorrow, to present our recent State-of-the-Art of Reinforcement Learning, in Economics and Finance. The paper is online on ArXiv. and the slides are available online too,

# ENBIS Seminar, about Covid-19

Wednesday, I will be giving a talk at the seminar at the European Network for Business and Industrial Statistics (ENBIS), to present our model on COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability. The slides can be downloaded from here.

# Talk at CMAP on optimal Control and COVID-19

Friday, I will be giving a talk at the seminar at Ecole Polytechnique, on our model on COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability. The slides can be downloaded from here.

# Automne 2020, à distance

C’est maintenant officiel, la session d’automne se fera (elle aussi) à distance, majoritairement. On en saura plus cet été, je pense…

# COVID19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability

After almost two months, with Romuald Elie, Chi (Tran Viet Chi) and Mathieu Laurière, we finally have a draft of the paper related to our recent work, entitled “COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability” (available on HAL ArXiv and MedrXiv)

Our model is an extention of the classical SIR model, but more realistic for the COVID-19 (pronounded “cider”)

We use scenarios to see the impact on various quantities

but also optimal control to see the best strategy, when it comes to lockdown, and testing. All comments are welcome…

# Testing for Covid-19 in the U.S.

For almost a month, on a daily basis, we are working with colleagues (Romuald, Chi and Mathieu) on modeling the dynamics of the recent pandemic. I learn of lot of things discussing with them, but we keep struggling with the tests. Paul, in Montréal, helped me a little bit, but I think we will still have to more to get a better understand. To but honest, we stuggle with two very simple questions

**how many people are tested on a daily basis ?**

Recently, I discovered Modelling COVID-19 exit strategies for policy makers in the United Kingdom, which is very close to what we try to do… and in the document two interesting scenarios are discussed, with, for the first one, “1 million ‘reliable’ daily tests are deployed” (in the U.K.) and “5 million ‘useless’ daily tests are deployed”. There are about 65 millions unhabitants in the U.K. so we talk here about 1.5% people tested, on a daily basis, or 7.69% people ! It could make sense, but our question was, at some point, is that realistic ? where are we today with testing ? In the U.S. https://covidtracking.com/ collects interesting data, on a daily basis, per state.

url = "https://raw.githubusercontent.com/COVID19Tracking/covid-tracking-data/master/data/states_daily_4pm_et.csv" download.file(url,destfile="covid.csv") base = read.csv("covid.csv") |

Unfortunately, there is no information about the population. That we can find on wikipedia. But in that table, the state is given by its full name (and the symbol in the previous dataset). So we new also to match the two datasets properly,

url="https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States_by_population" download.file(url,destfile = "popUS.html") #pas contaminé 2/3 R=3 library(XML) tables=readHTMLTable("popUS.html") T=tables[[1]][3:54,c("V3","V4")] names(T)=c("state","pop") url="https://en.wikipedia.org/wiki/List_of_U.S._state_abbreviations" download.file(url,destfile = "nameUS.html") tables=readHTMLTable("nameUS.html") T2=tables[[1]][13:63,c(1,4)] names(T2)=c("state","symbol") T=merge(T,T2) T$population = as.numeric(gsub(",", "", T$pop, fixed = TRUE)) names(base)[2]="symbol" base = merge(base,T[,c("symbol","population")]) |

Now our dataset is fine… and we can get a function to plot the number of people tested in the U.S. (cumulated). Here, we distinguish between the positive and the negative,

drawing = function(st ="NY"){ sbase=base[base$symbol==st,c("date","positive","negative","population")] sbase$DATE = as.Date(as.character(sbase$date),"%Y%m%d") sbase=sbase[order(sbase$DATE),] par(mfrow=c(1,2)) plot(sbase$DATE,(sbase$positive+sbase$negative)/sbase$population,ylab="Proportion Test (/population of state)",type="l",xlab="",col="blue",lwd=3) lines(sbase$DATE,sbase$positive/sbase$population,col="red",lwd=2) legend("topleft",c("negative","positive"),lwd=2,col=c("blue","red"),bty="n") title(st) plot(sbase$DATE,sbase$positive/(sbase$positive+sbase$negative),ylab="Ratio of positive tests",ylim=c(0,1),type="l",xlab="",col="black",lwd=3) title(st)} |

Let us start with New York

drawing("NY") |

As at now, 4% of the entiere population got tested… over 6 weeks…. The graph on the right is the proportion of people who tested positive… I won’t get back on that one here today, I keep it for our work. In New Jersey, we got about 2.5% of the entiere population tested, overall,

drawing("NJ") |

Let us try a last one, Florida

drawing("FL") |

As at today, it is 1.5% of the population, over 6 weeks. Overall, in the U.S. less than 0.1% people are tested, on a daily basis. Which is far from the 1.5% in the U.K. scenarios. Now, here come the second question,

**what are we actually testing for ?**

On that one, my experience in biology is… very limited, and Paul helped me. He mentioned this morning a nice report, from a lab in UC Berkeley

One of my question was for instance, if you get tested positive, and you do it again, can you test negative ? Or, in the context of our data, do we test different people ? are some people tested on a regular basis (perhaps every week) ? For instance, with antigen tests (Reverse Transcription Quantitative Polymerase Chain Reaction (RT-qPCR) – also called molecular or PCR – Polymerase Chain Reaction – test) we test if someone is infectious, while with antibody test (using serological immunoassays that detect viral-specific antibodies — Immunoglobin M (IgM) and G (IgG) — also called serology test), we test for immunity. Which is rather different…

I have no idea what we have in our database, to be honest… and for the past six weeks, I have seen a lot of databases, and most of the time, I don’t know how to interpret, I don’t know what is measured… and it is scary. So, so far, we try to do some maths, to test dynamics by tuning parameters “the best we can” (and not estimate them). But if anyone has good references on testing, in the context of Covid-19 (for instance on specificity, sensitivity of all those tests) I would love to hear about it !

# “Family History” and Life Insurance

Today and tomorrow, I will attend the Online International Conference in Actuarial science, data science and finance, organised by colleagues in Lyon. But I won’t be in Lyon, I will be at home, in Montréal…

I will give a talk on wednesday afternoon, on a joint paper with Ewen Gallic and Olivier Cabrignac. Slides are available here, and if I can get a copy of the video, I will share it…

# Telematics: a revolution ?

Tomorrow morning, Laurence Barry will present some joint work at the Online International Conference in Actuarial science, data science and finance, organised by colleagues in Lyon. The paper, “Personalization as a Promise: Can Big Data Change the Practice of Insurance?” is online, as a Working Paper of the PARI chair,

The purpose of this paper is to measure the impact of technologies from the Big Bang. data on thehe pricing ofhe products car insurance. The first part describes how the aggregated view buildsuit by statistics enables highlighting invisible regularities at the individual level. Despite a very granular segmentations in automobile insurance, the approach remained classificatory, hypothesizing the risk identity of individuals from the same class. The second part highlights the reversal of big data-induced perspective in the’analysis ofgiven ; awith theur volume and the new algorithms, the aggregate viewpoint is questioned.. The hypothesis of class homogeneity is becoming increasingly difficult to test. maintain, especially since predictive analysis boasts the ability to predict the rs results at the individual level. The third part is studying the’influence of telematics boxes able to import the new pinsurance aradigm automobile. However, a reading of the most recent research articles on a pricing automobile including this new monter that the epistemological leap, at least for now, has not taken place.