I Model Biology

My name is Torkel, a mathematicians and one of the Pathsense early stage researchers. Within Pathsense my focus is on creating and analysing mathematical models of how B. Subtilis senses stress. Why does a mathematician start working in biology? Because mathematics is fun and biology is interesting, that’s why. By working as an applied mathematician within biology I can get the enjoyment of working with mathematics (I have heard that not everyone enjoys mathematics, but certainly that’s a small minority?). In addition rather than working with abstract mathematical objects I get to focus on actual and interesting problems within biology. Because of the applied nature of the work I also get the satisfaction of seeing how my research can lead to improvements in the real world (Theoretical Mathematics also leads to improvements in the real world, but it might take centuries before those improvements actually happens).

In my blog I will try to describe simple mathematical models (within biology). Using simple examples I will try to explain various features of mathematical models. In this blog post I will describe a model for population growth. While the model itself is hardly interesting I will use it to explain some terminology (so I do not have to explain that while also dealing with a more complicated model). In my next post I hope to move on to a slightly more interesting model of infectious diseases.

A Very Simple Mathematical Model, Population Growth

First let us look at a very basic biological model, that of population growth. While this model will have little practical use it will serve as a first introduction of the various parts of a mathematical model. We will be looking at the the population growth in the European Union. The type of model we will consider is a so called dynamic model, that is a model which considers change over time. Simple put these models are composed of two parts:

  • The state of the model.
  • A rule of how the state develops with time.

The state of the model is usually the object or system we want to model, in this case it will be the population of the EU. We will designate the population as p. The population will change with time (t), it will be a function of time and we will write it as p(t). For every value of t we can assign it one number, e.g. in the year 2015 the population was 508,191,116 people hence we have p(2015)=508191116. Usually we do not know the value of this function for all values of t (e.g. we do not know what value to assign to p(2020)). If this was the case we would already know everything there is to know about the system and there would be no need to model it (however a product of a successful model is often an estimate of the function p(t) for future time points). What we usually do know is some kind of initial condition, what the state of the model is at some point in time. In our case we know that the population of the EU is 508,191,116 in the year 2015. We would usually designate this time point as t0=0, and then we would let t be a measure of how much time have passed since the year 2015. In this case t=5 would correspond to the year 2020.

The first part of making a mathematical model is to determine what the state should be. The second (and generally harder) part is to determine how this state develops with time (the time evolution of the system). According eurostat (http://ec.europa.eu/eurostat all data used here will be take from eurostat) the population increase in 2015 was 0.35%. The time evolution part of the system consists of a differential equation, which is a mathematical objects describing how a function changes with respect to time (note: they may also describe change with respect to other variables, but that is not important for our applications). For our model we have the differential equation
dpdt = 0.0035·p
Here dpdt is an expression for the rate of change in the function p(t) as time changes (also called its derivative with respect to t). If  dpdt=0 then there would be no change in the value of p(t) and the function would remain constant for all values. If  dpdt attains some large value we would expect p(t) to change a lot as we change t. Note here that the rate of the function p(t) at a certain time actually depends on the value of p(t). In the year of 2015 the rate of change is an increase with about 1,800,000 persons per year. However in 2020 when the population have increased the rate of increase will be even higher (this is not to surprising, the larger population of 2020 will be able to reproduce at a higher rate than that of 2015).

Now when we have created our model we can start to analyse it to see what kind of conclusions we can draw about our system. One thing we could use it for is to predict what the population of the EU will be in the future. If we know both the population in 2015 and how the population should increase with time then this is possible. We can do this in two different way:

  • We can find an analytic solution to the problem.
  • We can find a numerical solution the the problem.

Some note on terminology. In this case I use the term system to describe our differential equation, primarily this term makes sense for the differential equations produced by more complicated models. However it is technically correct in this case as well. In many cases the terms model and system are equivalent. Our model consists of a system (which determines the time evolution of some phenomena). The states are implicitly defined by the system. If we have a system and some initial condition we can find a solution to it, that is the exact values with which the state evolves with time. When solving the system using a numerical method we can also say that we simulate the system.

We will start with the analytic solution. I mentioned that the expression  dpdt = 0.0035·p was called an differential equation. Just like a normal equation it can be solved and the value of some variable can be determined (e.g. the equation 2x+5=9 have the solution x=2). A differential equation can be solved by finding a function that fulfil it. In this case it is possible to determine that the function p(t)=508191116·e0.0035·t is a solution to our differential equation (where e≈2.718) given the initial condition that p(0)=508191116. The details of how this solution is found is of no importance here. We can then predict the population in 2020 by inserting the value 5 into this function (remember, we are measuring in years after 2015). In this case we can predict the population in 2020 as p(5)=508191116·e0.0035·5=517162733.219. Here we can note that that the predicted population number is a decimal number, which is obviously not accurate (we cannot have 0.219 of a person). However the model still tells us that we will have about 517,162,733 persons in the EU in 2020. This simple model is not accurate and the actual population in 2020 will likely be several hundred thousands off of our prediction, under these circumstances we do not need to worry about the error of 0.219. It is possible to make models which only will yield integer population numbers, however in many cases it is simpler to employ a model like this one and not worry about the decimal population numbers.

While it was possible to find an analytic solution to our system in the form of a single function that gives us exact population numbers for any given point in time this will not be the case for the vast majority of all systems we will look at. Instead we can solve our systems numerically. While the analytic solution requires us to use mathematics to find a solution, when we solve the system numerically we will use a computer to numerically simulate the evolution of our system with time.

Very roughly this is what we will do. We start at some initial condition (in our case p(0)=508191116). Next we will try to calculate p(0+Δt) for some very small value of Δt, lets say Δt=0.01. We will estimate this value as
p(0+Δt)≈p(0)+Δt· dpdt =p(0)+Δt·0.0035·p(0) ⇒
p(0+0.01)=p(0)+0.01· dpdt =p(0)+0.01·0.0035·p(0)
p(0)=508191116 + 0.01·0.0035·508191116 = 508208902.689
Now when we have the value of p(0.01) we can use the same approach to calculate p(0.01+Δt)=p(0.02) from p(0.01) as we did to calculate p(0.01) from p(0). In a similar manner we can calculate the entire function p(t) for all values of t. This method for numerically solving differential equations is called Eulers Method (named after the Swiss mathematician Leonhard Euler) and is one of the simplest methods for numerically solving a differential equation and as the value of t increases its estimates of p(t) will become less and less accurate. In reality we would not use this method but some more advanced method which can provide better estimates of the solution. While it is encouraged to learn about the numerical methods one uses for simulating a system it is not actually needed.

As we can see in the figure below the numerical solutions appears identical to the analytic solution (in reality they are ever so slightly different). While these methods will never generate the exact solution, the error is so small that we for all practical purposes can consider it identical to the real solution.

        To the left, the analytic solution. To the right, the numeric solution. For all practical
purposes the two solutions are identical.

Reviewing our Model, is it Accurate?

We now have a complete model which we can use to predict the population growth of the EU. However we will have to ask whenever our model accurate, will we be able to trust the predictions it makes? There are several ways we can do this. For a starter we can can see if it is able to predict real data. In our case we can use it to predict the population in any year after 2015. Since a few years have passed since then we can look into the databases and see that the population in 2017 was 511,805,088. Or model predicts a population of 511,760,933. While this number is quite close to the actual number we should note that 2 years is not a very long time and we should not claim the model is good simply because it pass this simple test (however if the model did not make an accurate guess of the population in 2017 it would most likely be bad).

We can make another test of the accuracy of this model by running the simulation backwards (This is not possible for all models, but in this case it is relatively simple to check what the model predicts for the year 2005 by setting t=-10). The model predicts that the population in 2005 should have been about 490 millions, which is 10 millions less than the actual value of 500 millions. Compare it to the 2015 value the reduction predicted by the model is more than twice the actual reduction. This clearly shows that the model might not be very accurate. From this we should not be surprised if we get a similar sized error when we use the model to predict the population of the EU in year 2025.

At this stage we might start to think about what is wrong with our model. We might think about the data from 2015 an notice that there were an unusually high amount of immigration into the EU as compared to other years. This likely caused us to overestimate the population growth, and this is likely also what made the model predict such a low value for the population in 2005 (equivalent to it overestimated the population growth between the years 2005 and 2015). When we have found a potential problem with our model we can proceed at try to correct it accordingly. For example while population growth as a result of birth and death might be tied to the current population size, immigration is likely less so. Maybe we should separate these two types of growth in our differential equation and make it
dpdt =n·p + i
were n·p is the natural growth due to birth and death and i is constant immigration. We could also think on whenever we should use the the growth statistics from only the year 2015, maybe we should use an average over several years? Once we have thought about these kinds of problems we might come up with a new model which we can try an evaluate. In the end when we are satisfied with our tests of a model we can try and use it to make some predictions. Most time modelling will be spent iterating between various models, trying to find one that as accurately as possible depicts reality. I will now leave this introductory example. Next I hope to move on to a slightly more complicated (yet very simple) model.

In short summary the important part is that our model is composed of a state which changes over time as well as some rule for how this change occur. Our state will be a function of time and the rule will be a differential equation. Once we have defined our model we can from any given initial condition use some numerical method to simulate the time evolution of the system and thus also make predictions of the subject of our model.