Using CONFIDENCE function

Could someone confirm the use of the CONFIDENCE function for me in the following sample dataset? I am either using the function incorrectly or there is an error in the function.

The data are from http://people.math.sfu.ca/~cschwarz/Stat-301/Handouts/node76.html

100, 105, 97, 103, 96, 106, 102, 97, 99, 103

I calculate the following. Sample std. deviation = 3.5214, n = 10, std. error (std. deviation/SQRT n) = 1.1136, t with n-1 degrees of freedom = 2.2622, 95% confidence limits (std error * t) = plus or minus 2.519.

Using the CONFIDENCE function in Numbers as follows: CONFIDENCE (0.05,3.5214, 10), I get 95% confidence limits = plus or minus 2.1825.

I have tried this same comparison for 5 online datasets for which others have calculated confidence limits. See the following.

http://www.ehow.com/how5933144calculate-confidence-interval-mean.html
http://www.stattutorials.com/EXCEL/EXCEL-DESCRIPTIVE-STATISTICS.html
http://www.comfsm.fm/~dleeling/statistics/notes009.html http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
http://www.stattutorials.com/EXCELDATA/

My calculation using std error and t matches their results in all cases. Using the CONFIDENCE function in numbers as above produces a different (lower) answer in all cases.

I suspect that I am not using the CONFIDENCE function correctly, but its mathematics are quite opaque to me, and there is the possibility that the function is actually calculating the confidence limits incorrectly.

Any help you can give me with this would be most welcome.

Dave Mayhood

iMac, Mac OS X (10.6.6)

Posted on Jan 25, 2011 8:06 PM

Reply
11 replies

Jan 26, 2011 1:00 AM in response to Dave Mayhood

Further to this issue, I have calculated the true confidence limits of the examples for the CONFIDENCE function provided in the Numbers Functions Browser. The results cause me to believe that the function is calculating the confidence intervals incorrectly, especially for small samples (n < 30 or so).

For normally-distributed data, the confidence limits are provided by t(s/SQRT n) in small samples (<30 or so), where t is Student's t with n-1 degrees of freedom, s is standard deviation, and n is sample size. As sample n increases, the t distribution increasingly approaches the normal distribution, so that by n >120 or so, the two distributions are virtually identical, and either t or z (the normal coefficient) can be used.

The Numbers Function Browser examples return the following results when the CONFIDENCE function is used (top line). The true confidence intervals as calculated by t*(s/SQRT n) are on the second line.

0.62, 0.52, 0.44, 0.36, 0.31 *Numbers CONFIDENCE function*
0.72, 0.58, 0.47, 0.37, 0.32 *t(s/SQRT n)*

Comparing only the 4 examples with the same alpha value (0.05), the same comparison would be as follows, with the n values and % error values below.

0.62, 0.44, 0.36, 0.31 *Numbers CONFIDENCE function*
0.72, 0.47, 0.37, 0.32 *t(s/SQRT n)*
10, 20, 30, 40 n
13.4, 6.4, 4.2, 3.1 *% error*

Note that the percentage errors decrease as sample n increases. For this reason I suspect that CONFIDENCE is using the normal distribution rather than the t distribution to calculate confidence intervals, which is only accurate for large sample sizes. Using the normal distribution for small sample n will give confidence intervals that are too small.

My conclusion is that the CONFIDENCE function in Numbers should be used to calculate confidence intervals only for sample sizes (n) > 120, although the function will give reasonably accurate results for samples with n > 60 or so. For smaller sample sizes, calculate the standard error, use Numbers' TINV function to get the t value, and calculate confidence intervals as t(s/SQRT n). Using this method will always be correct for normally-distributed data, no matter what the sample size.

Jan 26, 2011 12:23 PM in response to Dave Mayhood

Dave,

Have you tried the same function in Excel or OpenOffice? The Numbers algorithms seem always to be identical to Excel's as far as the end user can tell. I imagine this would be for compatibility and that even if you prove some weakness in the function, they will opt for compatibility. On the other hand, if you prove an incompatibility, I'd think they would be all over it.

Jerry

Jan 26, 2011 1:28 PM in response to F Shippey

F Shippey, thanks for the idea. I now have done that. with the suggestions that (1) users should be warned that the CONFIDENCE function is not accurate for small samples, and that the function, while accurate for large samples, could be made useful for all sample sizes by using Student's t.

All of this depends of course on my guess being correct -- that the CONFIDENCE algorithm uses the normal distribution to set confidence limits. I would still like to see someone well-versed in stats to check that claim. What I do know is that calculating confidence intervals from sample standard deviation, n and the appropriate t value always gives the correct answer for data that can reasonably be assumed to come from an underlying normally-distributed population. The CONFIDENCE function in Numbers does not seem to do that.

Jan 26, 2011 2:08 PM in response to Jerrold Green1

Jerry,

Thanks -- another good idea.

I don't have Excel or access to it, but there are other ways I can check its calculations. OpenOffice I don't use, but maybe I should. I will look into getting a copy and trying your idea.

I think the main issue is this. In my experience, most datasets for which confidence intervals are needed are quite small. You could probably argue that confidence intervals really are most useful for small datasets, because that is where precision concerns are the greatest. So it is precisely when accurate confidence intervals are most needed that unaware users are most likely to use the CONFIDENCE function, which as I say is seriously inaccurate for small samples. As I also noted, there is another approach that will give the correct answer for small samples, but users need to be made aware that the built-in function for confidence intervals in Numbers is not what they should use in such a case.

An indication that this is not well understood is that the very examples used in the documentation for the CONFIDENCE function are all small samples. The expected distribution is Student's t, not the normal distribution. Yet the documentation also clearly states

+The confidence estimate assumes that values in the sample are normally distributed.+

Another point worth mentioning is that confidence intervals computed using t appear to be conservative at moderately large to large sample sizes (i.e., n > 60 or so). That is, they are marginally wider than the same intervals calculated using a coefficient derived from the normal distribution. This means that you are (marginally) less likely to erroneously detect differences among sample means when in fact there are none. In many applications, this is a good thing.

And finally, like other statistics confidence limits are estimators calculated from slightly to very squishy data to begin with. Close is usually good enough. If your work depends on detecting a difference just beyond the confidence limit, you really should go out and collect more and better data, just to make sure. But you can save yourself a lot of grief by using the theoretically most accurate algorithm to determine the confidence intervals in the first place.

Jan 26, 2011 3:07 PM in response to KOENIG Yvan

F. Shippey & Yvan Koenig,

Thanks, yes. Your link and the User Guide give the Numbers CONFIDENCE example data that I mention in my posts, although I got the data from the Function Browser built in to Numbers.

Try calculating the confidence interval from the expression t*(s/SQRTn) using the data at that link. You can obtain the t you need from standard statistical tables for Student's t, or from the TINV function built in to Numbers. Enter the tables with the alpha value and n-1 degrees of freedom. You should get the values I calculated in my second post.

By the way, the above formula for confidence intervals is not mine. The formula t*(s/SQRTn), or some algebraic equivalent, is given in all basic statistics texts that I know of as the formula for determining confidence limits for small sample sizes from populations whose underlying distribution is expected to be normal. One example is

Zar, J. H. 1974. Biostatistical analysis. Prentice-Hall, Inc. Englewood, NJ. xiv+620 p. (There are much more recent editions.)

Also, there are many online Google-able treatments of confidence limits for the means of normally-distributed populations.

Dave Mayhood

Jan 27, 2011 12:05 AM in response to Badunit

Badunit wrote:
Assuming the Numbers function is the same as the Excel function,

http://support.microsoft.com/kb/828124

the formula is the same as =NORMSINV(1 – alpha/2) * sigma / SQRT(n)



Badunit, your link allowed me to resolve the last lingering issue. Thank you so much.

CONFIDENCE in Numbers returns the same values as CONFIDENCE in Excel as used on the example data in the above link. Clearly both spreadsheets use the CONFIDENCE function in the same way, and both call the NORMSINV function, not TINV. That is, neither uses Student's t.

The consequence is that CONFIDENCE in both Excel and Numbers will give strictly accurate results only for large samples. For small samples, it is better to use t*(s/SQRT n).

The example data in the above link show how large the difference is in a relatively large sample (n = 50). For those data

CONFIDENCE gives an upper and lower limit for the true mean as 100.8 - 109.2;
*t(s/SQRT n)* gives an upper and lower limit for the true mean as 100.7 - 109.3.

This difference is negligible for all practical purposes (even though the error is 2.5%), but remember that this is a largish sample with a fairly small standard deviation relative to the mean.

Let's say in the same example that the sample size is much smaller, which would typically lead to a much larger standard deviation. Let's use the following.

alpha = 0.05 (i.e., the same as in the example)
std deviation = 30
n = 15
mean = 105 (i.e., the same as in the example)

In this case,

CONFIDENCE returns 15.1818157426, giving lower & upper limits 89.8 - 120.2
*t(s/SQRT n)* returns 16.613446246946, giving lower & upper limits 88.4 - 121.6

The error is 8.6%. This is a problem now, because CONFIDENCE gives a clearly and falsely narrower confidence interval that could lead to erroneous conclusions.

The conclusion is that CONFIDENCE should only be used for large samples. t*(s/SQRT n) is much preferable for small samples, and I would argue as before that t*(s/SQRT n) is preferable to CONFIDENCE in all cases because it is never going to give an erroneous result -- particularly a result that is too low, which is the critical issue.

I want to thank everyone for helping me to work through this problem. It really was driving me crazy.

Dave Mayhood

May 20, 2011 8:03 PM in response to Dave Mayhood

More information about s (STDEV) and sigma (STDEVP) may help. The sample standard deviation (s) is only an estimate of sigma (the population standard deviation). The CONFIDENCE function assumes sigma is known. We use z (NORMSINV) when we know sigma and t (TINV) when we don't and only have the sample standard deviation (s). As we are estimating the population mean, using statistics from a sample, we may not know what sigma is (the formula to calculate sigma uses the population mean). In some settings it is my understanding that the variation is well known and the CONFIDENCE function could be used, though my experience is that most use the t distribution. I hope this is helpful.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Using CONFIDENCE function

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.