A Guide to Estimations with Biological Data: Topics of Data Modeling

Introduction to Estimations with Biological Data

Our series on modeling biological data has elaborated on a variety of statistical concepts. The initial articles gave an initial foray into the topic, addressing the uses of statistics and a bit of insight into the concept of sampling. Following this, we expanded upon the general array of different types of models that populate this series. Our most recent article elaborated on more computationally based concepts of statistics through discussion of descriptive statistics. All of these concepts are quite important for the purpose of deriving quantitative insights into a sample of a population. However, endeavors for modeling biological data rarely stop at statistically elaborating on a population. Rather, the primary goal is to obtain some insight to the global population as a whole. Estimations with biological data achieve this goal of drawing some insight into a feature of a population.

Estimations with Biological Data

We have already had a slight introduction to the subject at hand. In previous articles, we obtained the mean of a sample, and modeled this by the symbol ‘x̅’. However, this only delivers the average of the sample. When we quantify the average of a sample, we can use this parameter to discern the mean of the entire population, a feature modeled as ‘μ’. In the same manner, while ‘s’ was used to denote the standard deviation, we use the ‘σ’ to denote the standard deviation of the population.

These population-wide parameters are computed by utilizing their sample values. By estimation, we can take our sample measurements and use them to infer the values across the population. This conviction is the premise of this present article.

Sample Distribution of an Estimate

The concept of statistical estimation revolves around the process of inferring a particular parameter of a population based upon features of a sample. When we intend to make an estimation, one of the principal concerns is how representative its value delineates the true value of the population.

To discern the veracity of an estimation’s accuracy, we can create a sampling distribution of the estimate. The sampling distribution of an estimate represents a probability measurement of all the values which the estimate could have been when sampling a population.

The sampling distribution of appears quite similar to a histogram when graphically plotted. The unique feature of the sampling distribution is the fact that the vertical axis is a percentage that reflects the probability of existing within a particular category. From the sampling distribution, we can derive the estimated population mean and standard deviation.

Estimating Mean From Random Distribution

Suppose we randomly select 100 individuals from a population and measure some particular attribute of these individuals. Furthermore, suppose that we then represent these parameters using a histogram. Thus, the histogram procured is not precisely the distribution of the population due to random chance. However, important features can be be inferred therefrom, such as the location, spread, and shape of the distribution.

In many cases, the mean of the sampling distribution is often a bit different from the estimated population mean. However, these differences are obviously inevitable due to the constraints of a sample that give rise to the population mean. Nevertheless, its up to the statistician to discern whether these differences are significant enough to invalidate the estimation.

Sampling Distribution of the Mean

When discerning the sampling mean for a specific sample, we obtain a specific value. However, with another sample, we could have just as easily acquired a different value. If we repeat this methodology of taking a random sample and computing the mean of the sample an infinite number of times, we create a probability distribution of our estimate. This facet constitutes the sampling distribution.

In a theoretical sense, the sampling distribution is not based in reality but is an artificial representation of potential population values for an estimate. Taking a random sample from a population and calculating the x̅ value is equivalent to randomly sampling a single value of x̅ from the sampling distribution.

It’s critical to note that the spread of the sampling distribution is a function of the number of observations made. Generally speaking, the greater number of observations, the narrower the sampling distribution, which facilitates enhanced estimation.

Computing Uncertainty of an Estimate

Computations with Standard Error

With respect to the sampling distribution of an estimate, the standard deviation best represents the standard error. This is predominantly due to the fact that the standard error in this case represents the differences. Thus, standard error quantifies precision of an estimate. The smaller the standard error is for a sampling distribution of an estimate, the more precise that estimate is.

Standard Error of the Mean

Suppose we specifically desire to compute how precise the estimation of a population mean is. In order to compute the standard error of the mean, all we must do is derive the quotient between the standard deviation estimate and the square root of the number of observations. The standard error of the mean denotes the accuracy of our estimate for the mean. In addition, if the standard deviation of the population is not readily available, the standard deviation of the sample may serve as proxy.

Confidence Intervals

Computation of the confidence interval allows the degree of uncertainty for an estimate to be quantified. The confidence interval represents a range of values stipulating the degree of likelihood that the true value lies within the interval.

Summarizing Estimations with Biological Data

  • Estimation endeavors to infer the value for a parameter of a population
  • Sampling distributions of an estimate represent probability of certain values for an estimate
  • One particular use of the sampling distribution is in determination of the mean
  • Random samples can also be used to a lesser degree for quantification of population parameters
  • A variety of techniques exist for computation of veracity of an estimate
  • The standard error derives from the standard deviation of the estimate from the target value
  • Standard error of the mean is the quotient between the population standard deviation and the square root of the number of observations in a sample
  • Confidence intervals are another measurement of certainty providing a range by which we can be 95% certain that a parameter falls within a particular interval

For more on estimations, check out this article.

Leave a Reply

%d bloggers like this: