File Exchange

image thumbnail

Generalized SEIR Epidemic Model (fitting and computation)

version 4.8.6 (13.6 MB) by E. Cheynet
Numerical implementation of an extended SEIR model with time-dependent death and recovery rates

215 Downloads

Updated 28 Jun 2020

GitHub view license on GitHub

A generalized SEIR model with seven states [2] is numerically implemented. The implementation is done from scratch except for the fitting, that relies on the function "lsqcurvfit". Therefore, the present implementation likely differs from the one used in ref.[2].

This Matlab implementation includes also some major differences with respect to ref. [2]. Among them is the expression of the death rate and recovery rate, which are analytical and empirical functions of the time. The idea behind this time-dependency as that the death and recovery rate should converge toward a constant value as the time increases. If the death rate is kept constant, the number of death may be overestimated. Births and natural death are not modelled here. This means that the total population, including the number of deceased cases, is kept constant. Note that ref. [2] is a preprint that is not peer-reviewed and I am not qualified enough to judge the quality of the paper.

The present submission contains:

- A function SEIQRDP.m that is used to simulate the time histories of the infectious, recovered and dead cases (among others)
- A function fit_SEIQRDP.m that estimates the eight parameters used in SEIQRDP.m in the least square sense.
- One example file Documentation.mlx, which presents the numerical implementation.
- One example file Example_province_region.mlx, which uses data collected by the Johns Hopkins University for the COVID-19 epidemy [3] for Hubei province (China).
- One example file Example_Country.mlx, which uses data collected by the Johns Hopkins University for the COVID-19 epidemy [3] for a country.
- One file "ItalianRegions.mlx" written by Matteo Secli (https://github.com/matteosecli) that I have modified for a slightly more robust fitting.
- One example file ChineseProvinces.mlx, which illustrates how the function fit_SEIQRDP.m is used in a for loop to be fitted to the data [3] from the different Chinese provinces.
- One example "uncertaintiesIssues.mlx", which illustrates the danger of fitting limited data sets.
- One example "Example_US_cities.mlx" that illustrates the fitting when "recovered" data are not available.
- One example simulateMultipleWaves,mlx that illustrates the fitting for multiple epidemic waves.
- One function getDataCOVID, which read from [3] the data collected by Johns Hopkins University.
- One function getDataCOVID_ITA written by Matteo Secli (https://github.com/matteosecli), that collects the updated data of the COVID-19 pandemic in Italy from the Italian government [4]
- One function getDataCOVID_US that collects the updated data in the USA from [3]
- One function checkRates.m that plots the fitted and computed death and recovery rates (quality check)
- One function getMultipleWaves.m that simulate and fit the SEIRQDP model to the situations where multiple epidemic waves are detected.

If you need to import data that are not included in the present submission, you can interactively import them in the Matlab workspace (https://se.mathworks.com/videos/importing-data-from-text-files-interactively-71076.html)

Any question, comment, or suggestion is welcome.

References

[1] https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology#Bio-mathematical_deterministic_treatment_of_the_SIR_model

[2] Peng, L., Yang, W., Zhang, D., Zhuge, C., & Hong, L. (2020). Epidemic analysis of COVID-19 in China by dynamical modeling. arXiv preprint arXiv:2002.06563.

[3] https://github.com/CSSEGISandData/COVID-19

[4] https://github.com/pcm-dpc/COVID-19

Cite As

Cheynet, E. Generalized SEIR Epidemic Model (Fitting and Computation). Zenodo, 2020, doi:10.5281/ZENODO.3911854.

View more styles

Comments and Ratings (101)

E. Cheynet

Hi Bruno,

the vector [a,b,c,...] groups the different guess for each parameters. That is a compact way to group the different guess values. The lower and upper bound are set inside the function fit_SEIQRDP, so the user do not need to specify them.

There are 10 parameters in total, so the fitting is not done by trying to find the ten best values simultaneously (at least not directly). A first fitting is done for the fatality (3 parameters) and recovery rates (3 parameters). The parameters are estimated using the individual measured rates. This fitting gives the narrow upper and lower boundaries for the parameters kappa and lambda. So after the first step, six of the ten parameters are almost fixed.

Bruno Silva

Hi Cheynet,

Thanks for the answer. You clarified part of my doubts. Sorry if I wasn't clear on my lambda and kappa question: When I define the vector [a, b, c], what does that mean? For example: 'a' is a low limit, and 'c' is the upper limit, and 'b' is a rate? Or is the time series defined by [low limit, midpoint, upper limit]?

E. Cheynet

Hi Bruno,

Thank a lot for the feedback!

I am not sure I understood properly your question. So I try a long answer:
lambda_guess and kappa_guess are values used to facilitate the fitting. Let say one wishes to get the minimum of a function. If the values of the first guess provided by the users are close to the absolute minimum, the algorithm will converge appropriately. If the guess values are not adequate, one may end with a local minimum. That is a common problem with least-square fitting algorithms.

If your questions are about the "physical" values of kappa and lambda, the file Documentation.mlx describes them, but I think there is a visual bug in the online display of the file Documentation.mlx because these equations do not appear.

The dispersion in checkRates is from the reported data (the "measured" values). So the dispersion depends on the data source. If you consider data from Italy, the dispersion is quite small for example. The dispersion will be larger if you take for example France or Belgium.

Did I answer appropriately your questions?

Bruno Silva

Hello, Cheynet.
First of all, congratulations to your interesting work. I have a few small questions to ask: What is the meaning of choosing vectors for lambda_guess and kappa_guess?
In the checkRates test, the dispersion of the measured values must always present a dispersion similar to the example_Country? What are these measured values?

E. Cheynet

Hi Raj,
Q is the number of active cases. In the present Matlab submission, It is called "quarantined" because it is assumed that each case that tested positive is isolated. This new "state", denoted Q can have two outcomes: recovered or deceased. Note that in the case of the US states and cities, the number of active cases was/is not available, so I am using the number of totals confirmed cases minus the number of deceased cases, instead. It's not ideal, but it's better than nothing.

Raj Kishor

Hi Cheynet, a very small doubt. Is the Q (quarantined cases) is same as total active cases? or it is equal to total confirmed cases??

E. Cheynet

Hi Vikas,
Oh I see! thank you, I did not noticed the typo it for the point (2). I will correct it in the next version.

Hello Cheynet, great to see you consider my thoughts. As far as (2) is concerned, as I pointed out earlier, the typos exist in
Documentation.mlx illustration where you explain the Numerical solution part.
This typos is not there in any m files or implementation part. So results are not affected by this typos. However, one
should not get confused when he see the matrix A in illustration para of documentation.mlx.
Regards

E. Cheynet

Hi Vikas,

Thank a lot for the feedback!
1) You are right. I also noted that kappa_0 and lambda_0 should have the dimension of the inverse of a time. I have updated the file documentation.mlx with appropriate definition of these constant.
2) It is strange, I still see that the matrix is 7x7. is it in the function SEIQRDP or fit_SEIQRDP?
3) You raised an important point! I have spent a couple of hours looking at E0 and I0. They, indeed, can affect the fit. Ideally, they should also be free parameters, but that makes a lot of unknown... I have updated the examples with a slightly modified version of your suggestions, which seems to work well with data from China.

Hello E. Cheynet. Many congratulation for wonderful work. Some typos/suggestion need attention such as :
1. Both k1 and lamda1 should have dimension of T^(-1) or other depending upon different formula for recovery and mortality rate. It isdefinitely not having any impact on results but mathematically the exponential term should have zero dimension as a whole.
2. The matrix A should be 7by7, but it is showing 7by8 currently.
3. Current E0 and I0 is making model highly sensitive to perturb data specially for India in terms of final estimation. So setting initial guess for incubation time of 5 days (As reported in most of the literature) should correspond to following assumptions
I0 = 0.1*Q0(1); % Initial number of infectious cases 10% of reported case
E0 = 5*I0; % Initial number of exposed cases is 5 times initial infected.
Regards

E. Cheynet

Hi Javier,
That is a good question. Both E and I are not known. So E0 and I0 are not known either. If the number of confirmed case is not zero, then it is highly unlikely that E0 or I0 is zero. It would be wrong to set them as equal to zero as it would hinder the simulation. The choice of E0 and I0 is arbitrary. I chose to use a value of E0 and I0 equal to Q0, but other choices are possible.

Hello E. Cheynet! Very interesting project! I am a bit of a beginner and have a question, what crietria did you follow when choosing a value for the initial exposed and infected population (E0 and I0)? I see that Q0 is the initial active cases (Q0=Confirmed(1)-Recovered(1)-Deaths(1)), but why did you set the values E0 and I0 to the initial confirmed cases (Confirmed(1), total repoerted cases at initial time)?
Thank you!

E. Cheynet

To avoid any misunderstanding: no machine learning is involved in this model, so "model training" is a non-sense here.

E. Cheynet

Hi Tsardoz,

If you look at the function fit_SEIQRDP, you will see that the fitting algorithm does not fit all the parameters at the same time. For example, for the fatality rate, a first estimate is obtained and constrained to limit the risk to achieve local minima. For the recovery rate, if available, a similar approach is used. This greatly reduce the problem of overfitting. Could it be improved? Without a doubt. Nevertheless, the user is still responsible for assessing whether the fitting is meaningful or not. This is partly the purpose of the function "checkRate.m".

Besides, the fitting algorithm relies on a constrained range of parameters. This is the reason why lsqcurvfit is used instead of nlinfit. Finally, evaluating what are "realistic parameter values" is barely meaningful since there is limited information about which parameter is realistic, what they really mean, and what is the source of data. Example: there was a discussion in this thread that pointed out that the parameter delta used by Peng et al. was incorrectly defined as the "inverse of the quarantine time". The physical interpretation of the parameter was the problem, not its value.

Tsardoz

"There will be overfitting if the fatality or recovery rate is constant with time."
Opposite is true I am afraid. The more parameters you try and estimate with limited data the more models overfit and produce poor results.
The goal should be to produce realistic parameter values not LSE on model training set predictions.

E. Cheynet

Hi Tsardoz,
I ignore how you compute the R0 value, which means I cannot tell if you apply an appropriate method. Note that I did not (so far) include any calculation of the basic reproduction number. That is a study on its own. There will be overfitting if the fatality or recovery rate is constant with time. It is explicitly stated that the model deals with time-dependent recovery and mortality rates. The appropriate use of the fitting algorithm is the user's responsibility. In summary: garbage in, garbage out.

E. Cheynet

Hi Md Humayun Kabir,
Check that the function 'getDataCOVID' is in your workspace. The Matlab version should not affect this issue.

Hello E. Cheynet,
My MATLAB is the 2018b version. When I run .mlx file, I am facing the following problem:
Undefined function or variable 'getDataCOVID'.
Could you please let me know what wrong am I doing?
Advance Thank You.

E. Cheynet

Hi Lu Krul,
λ0,λ1,κ0,κ1 are empirical coefficients obtained by the least-square fitting. If possible, I calculate a first estimate using the calculated rate of deceased and recovered cases. The final estimate is obtained using the total set of equations.
As the variable "guess" denotes, this is a guess. It's arbitrarily chosen at the beginning.
I did not try to calculate the basic reproduction number. I cannot help much here.
The "tau" variable is a time delay that improves the modelling of the time-dependant recovery rate. λ0,λ1 are non-dimensional coefficients to be adjusted. λ0 is the recovery rate as time becomes large.
κ0 and κ1 are also empirical coefficients. κ0 is the initial death rate. κ1 includes implicitly the information about a time delay for the death rate. I may change him for a more explicit formula in a further version.

Lu Krul

Hi!
I am a beginner, and I am very interested in your project, which helps me a lot. To better understand your work, I have several questions to ask.
How did you get λ0,λ1,κ0,κ1?
How did you estimate the initial GUESS parameter?
What is the formula for basic reproduction number? Is the formula β*δ*(1-α)^T correct?
What’s the meaning of the three variables in lamda_guess? What’s the meaning of the two variables in kappa_guess?
I would appreciate it if u could give me a reply <3

Fryese39

E. Cheynet

Hi Bill,
That is a good question. If the number of recovered is unknown, I think it is a good idea to set the initial conditions with a number of recovered E0 different from zero. I did not try to see how a different value of E0 would affect the computed curves of quarantined and deceased. The number you propose may be realistic. I don't know enough on the topic to judge. I guess the choice of E0 would affect the parameter "lambda". However, the system of ODEs is coupled, so the parameter "lambda" is involved in more than one equation. That implies that E0 has only a partial (but maybe important?) role in the fitting procedure.

Bill Budd

Hi Etienne, Many thanks for your heroic efforts in handling the case where the number of Recovered is unknown. Your examples highlight the US but I am mainly interested in getting a forward prediction for the UK, where I live (the UK also doesn't publish numbers for Recovered).

I was wondering, for the case where the Recovered is unknown, have you tried seeding the solver with a number of Recovered equal to a simple multiple of the number of Deaths?

I tried this for the UK using Recovered=Deaths*1.5; with no other adjustments to the input arrays and the code returned very similar results to a time-delayed version of Italy, which has good reported data and very plausible curve fits.

I was surprised to see that the ratio of Recovered/Deaths for the fitted curves for my UK example, which was seeded with a scalar of 1.5, is time variant and quite similar to the one for Italy. So your solver seems to have almost magical properties.

No reply needed, but interested in any comments you may have.

Bill Chen

E. Cheynet

Hi ZAFAR, there is no .csv file to download. A function read the content of a table from a GitHub repository and store it as a csv. Then the csv is imported in a local workspace, read and deleted. There are alternatives approach that can be used though.

Need to download .csv file

Hi Cheynet, sorry for the posting of the script. I just thought that you could run the script for 'US' and 'Korea' and see that it works. You might be interested in how the plots of the "Active" cases are approaching a Peak ...around the 30th of April. What is interesting is that a lot of scientists are using the data and predictions of the Confirmed Daily Change to predict the Peak, which is a symmetric shaped curve, whereas the prediction curve of the "Active" chases is asymmetric with a long tail, which I believe is more realistic. Thanks for your comments!

E. Cheynet

Hi Dale,
I do not recommend posting an entire Matlab script in the comment section. It is known that there is no "recovered data" for US cities. I have no error with the country "Japan". The data loads properly and the fitting completes successfully with the up-to-date version I uploaded. You may have used an older version that was more prone to bugs.

E.Cheynet, run the code I posted for Location 'US' , 'Korea', and 'Japan' and you will see the problem. 'US' and 'Korea' work, but 'Japan' does not. When you look at the csv data files, it appears there is data for Confirmed, Deaths and Recovered, but when you use Location 'Japan' your codes does not pick up tableConfired, tableDeaths or tableRecovered.

E. Cheynet

Hi Dale,
I am not sure I understand what you mean. If you talk about the Github page, you cannot modify the master branch, which is the one I use.

E.Cheynet, note that if you run your code with my additions, the figures need the file folders to be setup. That is, if you run the US data, there must be a file sub folder labeled US in the Figs folder...Figs/US...or if your location is Korea, then there must be a file setup as ...Figs/Korea.

E. Cheynet

Hi Samia,
I cannot reproduce the error you get (with R2019). Check if you use the latest version of the submission. If you still have the problem, you can use the manual debugging tools by Matlab

Hello E. Cheynet,
my MATLAB is 2019 version. I am facing the following problem:

Error using lsqcurvefit (line 262)
Function value and YDATA sizes are not equal.

Error in fit_SEIQRDP (line 90)
[Coeff,~,residual,~,~,~,jacobian] = lsqcurvefit(@(para,t) modelFun1(para,t),...

E. Cheynet

Another alternative is to use the function: websave('dummy.csv',yourFileName); which will read the file you want to access on the internet and store it as a csv file. Then you can use a traditional approach to read the csv file in Matlab.

E. Cheynet

Hi M Farooq,
Check if your Matlab version is recent enough. I think you need at least Matlab R2018b to use the function "delimitedTextImportOptions".
Alternatively, you can manually import the table you want and ask Matlab to automatically generate for you a function that you will use to the data.

M Farooq

M Farooq

Hello E. Cheynet,

I'm trying to run
[tableConfirmed,tableDeaths,tableRecovered,time] = getDataCOVID()
and getting following error

Undefined function or variable 'delimitedTextImportOptions'.

Error in getDataCOVID (line 14)
opts = delimitedTextImportOptions("NumVariables", Ndays+5);

Could you plz let me know what wrong am I doing?
Regards

E. Cheynet

As far as I know, the rate "delta" has the dimension of the inverse of a time so it should not be expressed in percentage but in days^(-1).

E. Cheynet

Thank Dionne. I have redefined the variable delta as the rate at which infectious cases enter in quarantine in the updated code.

Thanks E. Cheynet for the fast response and all your efforts in creating this fantastic code implementation. Treating delta as a rate makes much more sense. Just to be explicit about delta v delta^(-1) in the papers and in your code: In the code, delta = 1/8 means that people are entering quarantine at a rate of 12.5%, while delta = 1/10 means entering at a rate of 10%. Thus, increasing the denominator would logically decrease the actual number of quarantines and increase infections.

E.Cheynet...THANKS for including the state of Washington.

E. Cheynet

Hi Dionne,
I agree with you that the definition of delta is weird. There is maybe a better definition if you look at the paper by ref. [2]. In their paper, they do not define it as the inverse of quarantine time but the rate at which people enter quarantine. This make more sense....

[2] Munz, P., Hudea, I., Imad, J., & Smith, R. J. (2009). When zombies attack!: mathematical modelling of an outbreak of zombie infection. Infectious Disease Modelling Research Progress, 4, 133-150.

Delta is defined as "inverse of average quarantine time", which I take to mean the inverse time in days an infected individual spends in quarantine. But, increasing the denominator in delta (i.e., increasing the number of days in quarantine) results in more infections. Treating delta as time in quarantine results in increased quarantine time yielding fewer infections, but the overall curves are unrealistic. I cannot tell from the code if I am misunderstanding the meaning of delta (perhaps it is rate of quarantine, though the cited Peng et al paper defines it as quarantine time) or if there is a bug. Has anyone else played with the delta parameter?

E.Cdheynet...Here is what I have tried...I modified your code with a new Locations statement for Example_US_cities.mlx as follows:

fprintf(['Most recent update: ',datestr(time(end)),'\n'])
Location = ", Washington";
try
indC = find(contains(tableConfirmed.Combined_Key,Location)==1);
indD = find(contains(tableDeaths.Combined_Key,Location)==1);
catch exception
searchLoc = strfind(tableConfirmed.Combined_Key,Location);
indC = find([searchLoc{:}]==1);

searchLoc = strfind(tableConfirmed.Combined_Key,Location);
indD = find([searchLoc{:}]==1);
end

Here is my output:
Most recent update: 04-Apr-2020
Combined_Key
______________________________

"Adams, Washington, US"
"Asotin, Washington, US"
"Benton, Washington, US"
.
.
.
"Whitman, Washington, US"
"Yakima, Washington, US"
"Out of WA, Washington, US"
"Unassigned, Washington, US"

Error Messages:

Matrix index is out of range for deletion.

Error in datetime/parenAssign (line 58)
thisData(rowIndices) = [];

Error in DaleExample_Washington (line 38)
time(Confirmed<=minNum)= [];

E. Cheynet

"Example_US_cities.mlx" contains one example for an entire state. A state is here defined as the sum of every city collected by the database associated with the same state name. Apparently it seems good enough if we look at the total population.

E. Cheynet

Hi Dale,
That should be possible for individual states. However, one would need the database to exist and to be up-to-date on e.g. Github. I did not try to find one.

E. Cheynet

Hi Dale,
Thank a lot for your message!
I am not sure I understand your request, but the present Matlab submission contains already an example with cities in the US: "Example_US_cities.mlx"

E. Cheynet

Hi Jacobo,

The main reason is that the script removes confirmed cases with little data, which are not meaningful for the fitting. Depending on the number of confirmed cases, it is possible that the variable "Confirmed" becomes empty. It is also possible to trigger the error if one runs (manually) two times the same lines because the size of the vector "Confirmed" changes. In both cases, the user has the responsibility to check the size of the vector "Confirmed".

Jacobo CHoy

Hello,

I can not run canada with the example3 file. It says on line 40 : " Matrix index is out of range for deletion".

Hi E.Cheynet, I have been playing with your code...which is terrific! I an now trying to analyze individual states, like the state of Washington. I am not having much luck. Do you have any code like the examples you posted for multiple cities?

E. Cheynet

The bad fit for some of the US cities seems to be due to the fact that I modified the recovery rate to make it constant. That was not a good idea. So I decided to come back to a time-dependant recovery rate.

The new codes with the example from the US run well with my Matlab 20a with the optimization toolbox. The new US cities live script is very nice (with a few poor fits to some cities), but it is a shame that the US doesn't have accurate data on those that were infected but that are now recovered. It would be interesting to know what China, Italy and France are doing to get those recovered data that the US is not doing.

Nik N

Hello, Thaks for your nice codes,
I have error in getDatacovid.m in line : opts = delimitedTextImportOptions

Thanks again.

Rejikumar K

Hello,
I am Reji Kumar.
My MATLAB is older than 2012 version. How can I use the code in it? Anybody Please help.

Hi Etienne,
Thank you very much for explaining. Now my doubt is clear. So in the figure in mlx file 'Confirmed(Reported)'=Confirmed(JHU data) -Recovered(JHU data) -Death(JHU data) ; from JHU data and then matched to Quarntined Q(t). I was confusing 'Confirmed(Reported)' in your code as 'Confirmed' numbers in JHU data. Thank you again.

Hi Etienne,
Thank you very much for explaining. I still have a gap in my understanding. Please ignore my last post. Here is my query. In the Figure generated in mlx files how are the confirmed(reported), recovered(reported), Deaths(reported) are related to actual JHU data? Especially how confirmed(reported) data in the figure is related to the 'confirmed' data in JHU. Since, confirmed(reported) data points decrease after sometime in the figure but 'confirmed' data in JHU never falls.

E. Cheynet

Hi Ranajoy,
You raise a fair point. In the Github depository of John Hopkins University, only three categories are available: Recovered, confirmed and deceased. Their definition of "confirmed" means "has been tested positive". That is why the number of confirmed cannot decrease in this database. I guess the term "confirmed (infectious)" is not the best choice as the infectious cases do not remain infectious forever. It is true that in the examples I uploaded, I also use the term confirmed. The confirmed cases are, in fact, the quarantined cases since I assume that in a real-life situation, as soon as someone is "confirmed", they are put in quarantine, which is likely the closest from the definition "active cases = confirmed-recovered-deaths".

Hello,
If I understood correctly, the data is being extracted from John Hopkins site where there are four categories Confirmed, Deaths, Recovered, Active in the John Hopkins github (where Confirmed=Deaths +Recovered+Active ; at any point of time in data ). Thus 'Confirmed' as per the John Hopkins definition can never fall as it is the sum total of Death, Recovered, Active. But in this generalized SEIR model the 'Confirmed' graph keeps falling after a time as per the model , so should we not take 'Active' data instead of 'Confirmed' when we match with the model?

E. Cheynet

Hi Eugene,
It seems that John Hopkins university has started uploading new dataset with specific focus on the US on their Github page. I will wait they finish uploading everything before looking at the data.

All of the *mlx files are running beautifully now. The plots of the actual data for the Exampl4.mlx are still showing some negative values but with a note that these are ignored. The most amazing graphic is the Lombardi region graph in ItalianRegions.mlx. That flateening of the infected curve is obvious but would be missed if the recovered and dead states were ignored.
It is a shame that you have been unable to fit the US data with your SEIR model because of a lack of data on recovered. Just now (4/1 1 PM EST) Johns Hopkins reports 190740 US confirmed infected, 4127 dead, and 7141 recovered.

E. Cheynet

Update: The data for the Italian regions have been updated after the database format was changed. If you try to run the old version of the code, it will not work well.

E. Cheynet

Hei Eugene,
I have also noticed this warning. Apparently, the number of reported recovered + deaths became higher than the number of "confirmed" in some of the cases. That's weird but can be due to reporting errors. Since it's an issue that comes from the database, I have implemented a quick-and-dirty solution. In the updated the function fit_SEIQRD, I simply forbidding the number of "actual cases" to become negative and set it to zero

Bill Budd

Hi Etienne, I fixed the issue I mentioned yesterday by desampling the finely sampled Fitted curves before taking the 'diff'. I now get a very nice plot of Fitted vs Reported daily values, at least for Italy. Thanks for your patience and hope you get your wifi working.

E. Cheynet

Hi,
Sorry for the delay in replies. I have temporarly lost access to wifi (only mobile net). I try to reply in a near future

Bill Budd

Hi Etienne, I'm trying not to be lazy here but I only started using Matlab yesterday (I have experience in IDL). I hope you can give me some pointers to solving a question relating to your SEIR modelling .

I took your Example3 and made it work for several countries including UK, where I live. I then applied the 'diff' operator to the cumulative arrays (Fitted and Reported) to simulate the daily cases.

I get good agreement between diff-ing the cumulative Reported arrays in your code and the Reported daily cases from other sources, so my 'diff' is working ok. However when applying 'diff' to the Fitted curves, they underestimate the Reported daily cases by a long way.

I traced this to the fact that the Fitted curves are more finely sampled than the Reported values (eg 471 elements compared to 37 in the case of Italy).

Please could you give me some advice in very general terms, of how to make the Fitted arrays conformable with the Reported arrays so that the 'diff' returns comparable results, eg to make D array conformable with Deaths in Example3.

Appreciate any advice you can offer and Thanks again for some great work

Jacobo CHoy

Hello,

I am getting an issue while modeling for canada with the example3 file. It says on line 40 : " Matrix index is out of range for deletion".

Hi, I just ran Example4 after downloading the latest code and data. Before today, the SEIR model was running nicely for all provinces, but today 9 of the Chinese Provinces are modeling negative confirmed cases. The program prints a warning.

Bill Budd

Hi, FYI I just downloaded ECheynet-SEIR-aefecf6 and ran all the examples in Matlab R2020a and they all execute fine.

Many thanks and Respect to you Sir for some excellent code!

E. Cheynet

Hi MJ,
You can check the value of time(1). If time(1) >=datetime(2020,04,01), then N= 0. I noted also that the databased by John Hopkins university has been changed yesterday, so the examples won't work well. I need to update examples 1 to 4.

MJ

Something is not right here, as it returns N = 0, is this dt correct? Thanks for your help!!!
dt = 0.1; % time step
time1 = datetime(time(1)):dt:datetime(2020,04,01);
N = numel(time1);
t = [0:N-1].*dt;

Thanks much. It is a shame that the Johns Hopkins site isn't putting recovered data in the csv files. Let's hope that they start putting in the recovered data so that your code can be run with future dates. For the first time, the Johns Hopkins site was listing recovered for the US on the JH website. On their interactive web page, today is the 1st day that they are providing the recovered data for the US (827 dead, 354 recovered Italy: 20,499 dead and 112,982 recovered):
https://coronavirus.jhu.edu/map.html

Thank you very much, Cheynet.

E. Cheynet

Hi Joshua and Eugene,
You are right, there was a typo in the url address used. It should now be corrected. The Github account from John Hopkins university uses an updated dataset without the number of "recovered cases". That is a pity because this is a valuable piece of information for the fitting procedure. Therefore I decided to use the data updated up to the 23-03-2020 only. The main reason is that the purpose of the submission was simply to shows how the fitting of a generalized SEIR model can be done in Matlab.

Please I downloaded the new update but I having this error message. Please, how can I fix this

Error using urlreadwrite (line 92)
The server did not find a resource to match this request.

Error in urlwrite (line 52)
[f,status] = urlreadwrite(mfilename,catchErrors,url,filename,varargin{:});

Error in getDataCOVID (line 36)
urlwrite(fileName,'dummy.csv');

Etienne. I downloaded the new files. Example1 runs, but Example2, Example3 and Example4 all stop with the following error. Thanks so much for your work on this code.

Error using urlreadwrite (line 92)
The server did not find a resource to match this request.
Error in urlwrite (line 52)
[f,status] = urlreadwrite(mfilename,catchErrors,url,filename,varargin{:});
Error in getDataCOVID (line 36)
urlwrite(fileName,'dummy.csv');

E. Cheynet

The examples files should be working normally now.

E. Cheynet

Hi Eugene,
Thank for the information! I see that Johns Hopkins University has created a new database. However, they do no longer include the number of "recovered" as a function of the time. That is strange... I will try to update the example this evening.

Etienne, All 4 examples and the Italy analysis were working last night, but when Johns Hopkins updated their data, Example2, Example3, and Example4 all produce the error that the time variable for 3-24-2020 is invalid (NaT in the time vector). The Italy example still works, but it downloads its data from Italy. Here is the error message, and the last element of the time vector is NaT, not a valid date.
Error using dateformverify (line 18)
DATESTR failed converting date number to date vector. Date number out of range.
Error in datestr (line 200)
S = datef.ormverify(dtnumber, dateformstr, islocal);
Error in datetime/datestr (line 801)
s = datestr(datenum(this),varargin{:});

Hi.

How can I fix the following:

>> fit_SEIQRDP(Q, R, D, Npop, E0, I0, time, guess, varargin)
Attempt to execute SCRIPT varargin as a function:

Thanks

E. Cheynet

Hi Eugene,
Thank for the message! I had not noticed the financial toolbox was required for the function "today". In the updated version of the code I have written: "floor(datenum(now)" instead of "datenum(today)"

These codes are really great. Thank you Dr E. Cheynet, for helping us with these.
Please, can you help us with how to plot the residuals?

This programm is outstanding. I'll be using it in my graduate probability and statistics course to demonstrate fitting the parameters of a model with data. I needed one edit to get the program running, "today" which retrieves today's date is only available in Mathworks financial toolbox, so I just typed in today's date in the appropriate format.

E. Cheynet

Hi Jose,
Thank you for your suggestion. Unfortunately, I not the qualifications nor the time to achieve or contribute to this work.

E. Cheynet

He Jacobo,

I do not think it will work very well for the world due to the "time delays" between the different epidemy locations. It is also important to remember that one should not try to do long-term prediction with the function SEIQRDP.m. This is due to the large uncertainties that exist, especially when dataset are limited.

Jacobo CHoy

Hello,
Hi Jacobo,
I have updated the example files with more pedagogic description of the variable "guess" and what are the numbers used.
Response:
Seems you have fixed it and made more intuitive! Amazing thanks. Really good code.
___
Further questions:
I am trying to make a SEIR model for the world. I have used your code, but maybe creating a region called "world" adding all cases, deaths, etc... would help?
I went around the issue by adding 4 .mat files which I imported from a .csv where I added the region "world".
Just as a suggestion, it would be a good idea to have a region for the world.

So far I am unable to get a proper prediction for the world, the curve is not steep enough. I am still trying with several values.

Jose Reis

Hi! I agree.... If you don't mind, although at a slower pace, we can collaborate using this platform..... I already have some good people working on it, so if there is anything you would like to implement and would like to parallelize with other people, let me know.

This is what I am thinking right now:
-> creat a "wrapper" around the "fitter" to allow running all lines of data (countries and regions)
-> output to a .csv or xls the input and output from the Fit
-> add other metrics to the simulated results, such as
---> days until peak
---> total number of infected
---> total number of dead
-> merge the data (maybe with an API) with other demographic information, maybe from the WEF or WHO
-> run some data analytics in the output data, trying to find patterns and correlations
-> run a Correlation Matrix between input and output, so as to assess the sensibility of the fitting parameters into the results both of the fit and the "post proccessed" variables

What do you think?

E. Cheynet

Hi Jose,
I have actually uploaded an updated version where the "contains" function was replaced by "strfind" yesterday, for those who work with older Matlab version. I do not advise you about revealing your email. You will get heavily spammed otherwise.

Jose Reis

Hello Cheynet,

This is what I did to work around the compatibility issue

%%%%%% Used in 2015 since "contains" wouldnt work
% id=strfind(tableRecovered.ProvinceState,'US');
% Index = find(not(cellfun('isempty',id)));

For the other issue, I simply used the plot command instead of the original one, as follows:

plot(time1,Q,'r',time1,R,'c',time1,D,'g',time1,I,'b','linewidth',2);
hold on
plot(time,Confirmed-Recovered-Deaths,'ro',time,Recovered,'bo',time,Deaths,'go','linewidth',1);

I am leading a group of friends here in Brazil, working on different models, one being one of them. If you could get in contact with us, that would be great. I dont know if I could publish my email here, but you can find me on linkdin as José Roberto Clark Reis.

Thanks,

José

Fabian

Oh that was simultaneously posted. Thanks a lot Mr. Cheynet, that is awesome! I am a scientist in the filed of Electrochemistry but right now i am using my time at home to take a look into other important topics right now:)

E. Cheynet

Hi Fabian,
I have uploaded a new version where the DATA.mat file is no longer required. Using the function "getDataCOVID", the data are directly read from the Github repository mentioned above.

Fabian

Hey Community,

i might have a very basic question. I got the recent statistic data for the disease from GitHub (.csv files). How can i transfer the data to the Data.m file?
I tried to copy paste additional columns into the single tables (recovered, confirmed, deaths, time) directly in Matlab and saving the changes of the Data.m file.
I am getting the error:
Error using strfind
Cell must be a cell array of character vectors.

Which i do not really understand since this function is, as far as i understood, used to find the desired country i want to evaluate. (I did not change anything in the code of example 3)
When i reload the original Data.m file, the code works well.

In general: Thank you for the awesome work!

Best regards,

Fabian

E. Cheynet

Hi Jacobo,
I have updated the example files with more pedagogic description of the variable "guess" and what are the numbers used.

Jacobo CHoy

*I solved the issue* => i don't know which the issue was I just ran it in a .mlx instead of a .m
____
question for example 4 => What are the numbers in your guess? what is the order I can't figure it out.
I found them to be
alpha: protection rate
beta: infection rate
gamma: Inverse of the average latent time
delta: inverse of the average quarantine time
lambda0 and lambda1: coefficients used in the time-dependant cure rate
kappa0 and kappa1: coefficient used in the time-dependant mortality rate

Jacobo CHoy

I published a request for help here: https://la.mathworks.com/matlabcentral/answers/511876-issue-with-seir-model-for-mathlab
I am looking forward to using your software for my work and I really hope I could use it.

I am getting this mistake:

Error using SEIQRDP
Too many input arguments.

Error in example3 (line 34)
[S,E,I,Q,R,D,P] = SEIQRDP(alpha1,beta1,gamma1,delta1,Lambda1,Kappa1,Npop,E0,I0,Q0,R0,D0,t);

E. Cheynet

Hei Jose,
Thank for the feedback!
The function "contains" was introduced in Matlab 2016b and if you have an older version, you will not be able to use it. I think an alternative to it is " strfind"

For the second error, this may also be a compatibility issue, although it is a little more obscure to me. The function "datetime" was introduced in Matlab 2014b, but there have been maybe some updates that only work with more recent versions. I did write the function with Matlab 2019b. Alternatively, you can use "datenum" and the function "datetick".

Jose Reis

Hi,

I am trying to run your very nice implementation. Example 1 works fine. Example 2, I ran into at least two problems. The first one is with funcion "contains", right in the beginining of the code. I bypassed it and started it with the 156 value corresponding to the Hubei province. It ran ok.

Then, it halts when I come into:

>> figure
semilogy(time1,Q,'r',time1,R,'b',time1,D,'k');
Error using semilogy
A numeric or double convertible argument is expected

I am using 2015a. Could this be a compatibility issue?

Thanks

Updates

4.8.6

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.8.6

4.8.4

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.8.4

4.8.3

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.8.3

4.8.2

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.8.2

4.8.1

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.8.1

4.8

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.8

4.7

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.7

4.6

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.6

4.5

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.5

4.4

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.4

4.3

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.3

4.2

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.2

4.1

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.1

4.0

See release notes for this release on GitHub: https://github.com/ECheynet/SEIR/releases/tag/v4.0

3.6

A bug was found when calling the function "strfind", which is only used when the function "contains" is not available. The bug should be now corrected.

3.5

Summary updated

3.4

Description updated

3.3

Improved fitting for the case where only the sum "recovered + quarantined" is known. See the example with the cities in the US.

3.2

Added example file with US data when the fitting need to be applied without knowledge of the "recovered" cases + I have written "Active" cases rather than "confirmed" to avoid any confusion with the number of "tested positive"

3.1

Added an example for french regions. However, the data quality is not as good as expected.

3.0

The new database by John Hopkins University is now used.

2.3

Added one example file "uncertaintiesIssues.mlx" illustrating the danger of fitting limited data sets. + Some optional output for the function fit_SEIQRDP.m are now provided.

2.2.2

To avoid using the financial toolbox, the portion of the code "datenum(today)" was replaced with "floor(datenum(now)"

2.2.1

typo

2.2

An additional example for the Italian regions (credit to https://github.com/matteosecli)

2.1

correction of a minor bug in getDataCOVID.m that raised an error for Matlab version < R2019b

2.0

A new function is introduced to download directly the up-to-date data from https://github.com/CSSEGISandData/COVID-19

1.10.1

Description updated

1.10

Added an optional argument in the function fit_SEIQRDP.m to choose the tolerance for the fitting and to hide the iterations, which can be annoying. Added an example file to shows how the fitting is applied iteratively to multiple locations

1.9.2

Minor preformance improvement in SEIQRDP.m

1.9.1

typos

1.9

Description updated

1.8

- New example file added (Italy) and more exhaustive database
- Minor speed up improvement

1.7

Description updated (8 parameters are used after version 1.2 instead of six parameters)

1.6

A possible error has been spotted in Example 2: I was using as input the number of "confirmed case", which should be interpreted as the number of quarantined cases instead of the number of "infectious cases".

1.5

Added Github repository

1.4.2

Some typos (again!)

1.4.1

typos

1.4

The description of the numerical method is done in Example1.mlx
The coefficients kappa and lambda are now better defined

1.3.3

typos

1.3.2

Description

1.3.1

Description updated

1.3

Description updated

1.2

The death rate is now a decreasing function of the time while the recovery rate is increasing with time.

1.1

Corrected the population number in example 2

1.0.4

Title updated

1.0.3

Description updated

1.0.2

Title updated

1.0.1

Description updated

MATLAB Release Compatibility
Created with R2019b
Compatible with R2018b to any release
Platform Compatibility
Windows macOS Linux