Before we begin, take a second. Open this link in a new tab. Did you do it? I’ll wait. Oh, you opened it? Great. Sorry for the forwardness, but this book is one of the greatest tools I have used in taking myself from a beginner in programming to a full time data scientist. Not only is this book an exceptional resource for bolstering your skills and making yourself a more marketable and employable programmer, but it also revolves around the subject which this article focuses on: regression. I bring this resource to your attention because I have struck a deal and managed to make it 50% off (a lot less than I had to pay for it, let me tell you). At the very least, just take a look at it, read a preview of it, or something. You have my word that this tool has great potential for making you a titan in the coding community. If you haven’t already, check it out here.
Introduction to Probability With Biological Data
The subject of probability is fundamental to the process of statistical modeling with biological data. Approaching this subject results from tedious elaboration on a variety of other statistical concepts. The initial articles gave an initial foray into the topic, addressing the uses of statistics and a bit of insight into the concept of sampling. Following this, we expanded upon the general array of different types of models that populate this series. Subsequently, in another article, we elaborated on computationally based concepts of statistics through discussion of descriptive statistics. In further elaboration of variable comparisons, in one article, we discussed in great detail the methods of developing estimations for biological data. In our most recent article, we took our statistical analysis to the next level by establishing relationships between two variables with correlation. Presently, however, we seek to elaborate upon the intricacies associated with probability in biological models. Let us begin.
Probability fundamentally rests upon the idea of a random trial. Random trials represent a procedure involving an experiment with several potential outcomes which can not be predicted. Examples of this include a coin flip, rolling a dice, and many more.
The basic components for conceptualizing probability include the event and the sample space. The event represents any potential outcome that may occur within the sample space, while the sample space confers the totality of possible outcomes. With these concepts in mind, the probability of an event can readily be defined as the proportion of occurrences that a given event may occur over the totality of attempts. We can model probability mathematically and symbolically as follows:
Probabilities are ratios of potential outcomes, and thus probability must fall between the values of 0 and 1. If probability is 0, then the event never occurs. If the probability is 1, however, then the event always occurs.
Mutual Exclusivity and Probability Distribution
Two events are said to be mutually exclusive when it’s impossible for them to occur simultaneously. For example, if a coin is flipped, it cannot simultaneously be heads and tails. Thus, these events are mutually exclusive. To model mutual exclusivity, we might say that:
A probability distribution is a random distribution of a particular variable throughout an entire population. Specifically with regards to a random trial, the probability distribution represents the probability of each possible outcome in a random trial.
Discrete Probability Distribution
A probability distribution for discrete variables provide the probability for each possible outcome of the variable. For example, when it comes to rolling a dice, there are six possible outcomes, each of which has an equal chance of being an outcome. Therefore, each event has a probability of .167. The probability distribution appears as it does in the graph below:
Here, we can see that the particular event is plotted on the x-axis, while the y-axis reflects the probability of this event occurring.
Suppose we are rolling two dice, and attempt to compute the probability of acquiring a sum of the two dice. The probability distribution would then look like:
Continuous Probability Distributions
Whereas discrete variables have precise values which the outcome might be, the number of potential values that a continuous variable might take on are infinite. The probability distribution for continuous variables reflects a curve where the height of the curve confers the probability density. The probability density reflects the probability over a range of values.
With continuous probabilities, the height of the curve at a single x-value does not reflect the probability at that particular event. Because the probability of density reflects probability of obtaining a single value out of an infinite number of values between two bounds, the probability of obtaining any given value is infinitesimally small. Rather, it is more informative of the probability of obtaining some value within a range. The probability for obtaining a value within a range is reflected by the area beneath the probability density curve. The area under the curve between boundaries a and b reflects the probability of obtaining a value between a and b. Therefore, the probability may be computed by integration.
The Addition Rule
Statistics and probability can be used to answer ‘either/or’ questions, as well as ‘and’ questions, albeit several major differences included. The first of these predicaments revolves around mutually exclusive events wherein we compute the probability of one event occurring or another. For example, if we are rolling two dice, we may want to compute the probability of rolling a 7 or an 11. Mathematically speaking, the probability of an ‘or’ scenario appears as:
This computation reflects the addition rule, which contends that the probability of either two mutually exclusive events occurring stipulates the sum of their individual probabilities.
Sum of Mutually Exclusive Probabilities Adds Up to 1
According to this book on Regressional analysis,” The probability of all mutually exclusive probabilities must add up to one in a given sample space. A series of ‘or’ statements may prove this conviction. Let us consider this idea with respect to rolling a dice:
Two events reflect independent possibilities if the outcome of one event does not affect the outcome of the other event. For example, the probability of rolling a one on a dice is 1/6. The probability of rolling a one again is unaffected by the outcome of the previous event, and thus the probability of rolling a one again is 1/6. Suppose that we desire to compute the probability of rolling a one twice in a row. Because these two events are independent, we may use the multiplication rule to compute this probability.
Because the two events are independent and mutually exclusive, the probability of both of them occurring may be computed using the multiplication rule. The multiplication rule mathematically models as follows:
Independent events make for rapid probability computation because the influence of one over another does not need to be taken into consideration. However, much of biology involves variables whose behavior are codependent. Therefore, there a variety of different probability outcomes that may be computed for dependent events.
Suppose we have two variables, x and y, where x reflects the environment and y reflects the offspring. If ‘x’ has two possible events, hot and cold, and ‘y’ has two possible events, male or female, then we have to take into account both of these possibilities. Computing probabilities for certain outcomes will necessarily depend on utilizing the addition and multiplication rules.
If we intend to compute the probability of a chance event, we must take into account all of the potentially confounding features that may affect the outcome. Conditional probability refers to the probability of a given event occurring provided that another event occurs. The conditional probability for a certain event models as:
If we want to compute the probability of an event in a variety of different conditions, we must sum its probability in all of those conditions. This concept reflects the law of total probability.
With conditional probability, we compute the probability in a combination of two events provided that those events reflect dependent conditions. In this case, the mathematical relationship for conditional probability is modeled with an ‘and’ statement such that:
With Bayes’ theorem, we may state:
The Take Away
This present article has taken a great deal of time to expand upon the theory and essential computations that underly the concept of probability. Probability is an essential feature of statistical modeling in biology because it allows us to predict the likelihood of obtaining a specific outcome. Furthermore, we have elaborated upon the different methodologies of computing probabilities based on indpendence and dependence of variables.
Additionally, we have discussed at length the methods of computing probabilities for different events under a variety of conditions. All of these concepts will prove to be quite helpful for the multitude of scenarios where computing probability is necessary. If you seek to explore these topics further, consider checking out this resource. If this has proven sufficient for you, we look forward to seeing you at the next articles in our series. There we will expand upon the subjects of regression and hypothesis testing.