Source: SEJ.com

Data Challenge on Corona Days

Levent Can Özokutucu

--

When the corona epidemic that started in China evolved into a European-based epidemic, humanity’s war against this microorganism (in some cases, it is not even called live) spread on two fronts-lines.

  1. To isolate and treat infected patients.
  2. To prevent the emergence of new patients.

How can you treat a disease if you don’t have treatment? Of course, trying to predict the future with data-driven analysis. It all focuses on one graph, for example…👇

Source: NYT

The number of cases occurring around the country’s healthcare capacity will both prevent the spread of the virus and facilitate the care of patients. In other words, when the number of patients reaches the top point, it is crucial to stay below the health capacity curve. This makes citizens feel comfortable. The survival rate increases with adequate care and treatment opportunities. So how can we predict this?

Here begins the task of data-based analysis. You already know your hospital capacity and equipment level. After the first case (t = 0), you calculate a spreading rate according to the speed of the emergence of new patients. You can find the estimated infection rate of your population by comparing this rate with the spread and death rates of the virus in previous countries. Can you find it, can you get the right information when there is such dirty information around?

CoronadurumTR

Question 1: How bad is the situation?

This is the question everyone is curious about. Have enough precautions been taken? Even if the above graphic (from 23 March 2020) tells us that we are facing a much worse situation than Italy, it is not enough to make a meaningful conclusion. It is not enough because we do not know when the first case occurred in us. Is it really March 11? Or was there already a case in the country when Corona was found on the THY plane? According to Reuters, the test of a passenger taking off from Istanbul on March 3 was positive.

Singapore sends Turkish Airlines flight home empty after coronavirus case

Answer 1: It is not possible to look at the graphic above and predict the on-going progress in the country. We do not know the first date the case occurred in the country. As explained, it is not meaningful to compare with Italy or any other country in the chart as above, since we do not know how many patients were infected in the country on March 11. Because (t = 0), we do not know when the first case entered the country. The health capacities of countries are another variable.

Question 2: How many people will be sick?
Patients and test numbers come from competent authorities every day. This allows us to make an assumption for the next days. Although the first figures announced doubled every day like a chess account, the rate of increase slowed down after two weeks after the corona case. So, where will the number of patients reach?

Answer 2: Again, a question we cannot answer quickly. Because to know how many people are sick (Covid-19), we first need to know how many people have been tested. (Of course, we accept that the tests are reliable and confident. We are not sure about the reliability of the tests conducted in the country before the rapid test kits arrived from China on 20 March. We have established in this discussion that the test kits are reliable.)

We need to know how many people have been tested, but there is an inconsistency in the number of tests from the authorities. There is a difference of 703 people according to the tweets among days. Since the tweets are still online, it is certain that there was a tapping error. Therefore, we are not sure how many people have been tested (p value> 0.05). However, we learned that the more tests performed during the prior period, the lower the rate of infection and mortality would occur. South Korea and Iceland both tested 5000 to 10,000 per million people, isolating infected individuals from the community, and trying to establish the link between the infected and the non-infected. With the new test kits, it is aimed to increase the number of tests.

I highly recommend looking at Dr. Murat Kubilay’s article according to the bias on corona data.

Question 3: Is the mortality rate constant?
Looking at the data from all over the world, it is possible to reach an average value of the mortality rate.

Answer 3: When an epidemic ends, you can reach the death rate caused by that outbreak in the number of deaths/cases. However, it is wrong to talk about such a rate while the epidemic continues. Because we do not yet know how many people are affected as in the formula. We can only make assumptions. So the correct formula should be as follows;

MortalityRate=the number of deaths per day.x /number of cases per day.x-(t)

t = time from occurs of the case to death.

E.g., March 20 deaths / March 13 cases (7 = average time from occurs of the case to death.)

Let’s repeat. If the data you have is incorrect, your formula will also get the wrong result. Giving death rates to any country is not correct until the outbreak ends.

Question 4: Do those who already have a chronic disease become sick faster?
In the light of current data and information from countries such as China, which has controlled the epidemic, it is among the cases that have a disease such as diabetes, high blood pressure, which are affected quickly and difficult to treat. Can we establish this correlation yet?

Answer 4: Yes, for specific geographies. Because to establish this correlation, first of all, it is necessary to investigate why the survivors survived. At the moment, it is possible to reach this result, since a data collection has been carried out only on cases that have resulted in death. Another variable may be travel free youth generation. It may be said that deaths occur more in geographies where free movement is high since young people are a higher virus carrier when the epidemic ends. To answer all of these, we should wait for the pandemic to end or all case data to be shared transparently.

Data Challenge of White Collar People

Nowadays, starting with the world of Big Data and finding ourselves in a “data lake,” we have a lot to learn from the corona epidemic. The data you have can be useful for getting out of an unknown situation.

But with one condition: if you can correctly interpret!

Here is an example from the real life of the error in reading the data we make consciously or unconsciously, which we often encounter in business life.

Can we read the data right what we have? Or are we looking for data to confirm what we believe? How realistic can we remain in this chaotic environment?

It is also difficult to decide if there is dirty information around.
Here are three simple suggestions to get you started;

  1. Determine your metrics correctly.
    What would you like to measure? Quality or Quantity? If you know what you are looking for, how to call will be much easier.
  2. Avoid reporting shows.
    Often, everyone’s mistake will be to give a possible increase in proportion rather than on a quantity basis. Of course, a 100% increase from 3 to 6 and a 100% increase from 500 to 1000 do not have the same value.
  3. Learn and adapt. (Learn, Unlearn, Relearn)
    As data analysis gets deeper, you may need to exclude some of your data from the analysis. Or you may feel that you are moving in a completely different direction. In this case, you should not be aware of the process and insist on achieving the result. Otherwise, you will reach the data you believe, not the correct data.

It is obvious that reading data is not easy and requires serious training and knowledge for those who are interested in this job. You will always find the mistake in the referee in the matches you play without training.

--

--

Levent Can Özokutucu

🔑Marketing Manager, 🏄‍♂️KiteSurf Lover, 🚗Motorsports Professional, 🕯️Arts&History Reader, 👶Most importantly, father!