Clustify: Organize Your Documents

Exact Confidence Interval Calculator

This calculator computes the exact confidence interval for sampling without replacement, so it can be used for predictive coding calculations where very low/high prevalence or small sample size may cause approximate formulas to give wrong results. Enter the total number of documents for the project, the number randomly sampled, and the number of sampled documents that were found to be relevant, and it gives the range for the prevalence of relevant documents.

Confidence Level%
Population Size
Sample Size
Number of Relevant Items in Sample
Lower Bound for Prevalence%
Upper Bound for Prevalence%

Technical Details: The calculator above uses the Clopper-Pearson approach to compute the exact confidence interval for the hypergeometric distribution (sampling without replacement), meaning that there is no assumption made that the sample size or number of relevant items is within a particular range, and the requested confidence level acts as a lower bound, making the interval rather conservative (discretization effects cause the coverage as a function of prevalence to be bumpy, and approximate methods often aim to match the average of the bumps to the confidence level, meaning that for some values of the prevalence the confidence is actually lower than the requested level, which doesn't happen for the exact approach). If you want the confidence interval for the binomial distribution (sampling with replacement), just enter a population size that is much larger (e.g., a factor of 1000 larger) than the sample size.