| Title: | Calculating Optimum Sampling Effort in Community Ecology |
|---|---|
| Description: | A system for calculating the optimal sampling effort, based on the ideas of "Ecological cost-benefit optimization" as developed by A. Underwood (1997, ISBN 0 521 55696 1). Data is obtained from simulated ecological communities with prep_data() which formats and arranges the initial data, and then the optimization follows the following procedure of four functions: (1) prep_data() takes the original dataset and creates simulated sets that can be used as a basis for estimating statistical power and type II error. (2) sim_beta() is used to estimate the statistical power for the different sampling efforts specified by the user. (3) sim_cbo() calculates then the optimal sampling effort, based on the statistical power and the sampling costs. Additionally, (4) scompvar() calculates the variation components necessary for (5) Underwood_cbo() to calculate the optimal combination of number of sites and samples depending on either an economic budget or on a desired statistical accuracy. Lastly, (6) plot_power() helps the user visualize the results of sim_beta(). |
| Authors: | Edlin Guerra-Castro [aut, cph] (ORCID: <https://orcid.org/0000-0003-3529-4507>), Arturo Sanchez-Porras [aut, cre] (ORCID: <https://orcid.org/0000-0002-1691-286X>) |
| Maintainer: | Arturo Sanchez-Porras <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.0.0 |
| Built: | 2026-05-27 07:11:24 UTC |
| Source: | https://github.com/arturosp/ecocbo |
A system for calculating the optimal sampling effort, based on the ideas of "Ecological cost-benefit optimization" as developed by A. Underwood (1997, ISBN 0 521 55696 1). Data is obtained from simulated ecological communities, and the optimization follows the following procedure of two functions (1) scompvar() calculates the variation components necessary for (2) sim_cbo() to calculate the optimal combination of number of sites and samples depending on either an economical budget or on a desired statistical accuracy. Additionally, (3) sim_beta() estimates statistical power and type 2 error by using Permutational Multivariate Analysis of Variance, and (4) plot_power() represents the results of the previous function.
The functions in ecocbo package can be used to identify the optimal number of sites and samples that must be considered in a community ecology study by using simulated data. Together with SSP package, ecocbo proposes a novel approach to the determination of he appropriate sampling effort in community ecology studies.
ecocbo is composed by five functions: prep_data gives the appropriate format to the data so that it can be used by the other functions in the package. scompvar calculates the components of variation for the analized dataset, and finally, sim_cbo determines an estimate of the number of sites and samples to consider to optimize the cost-benefit for an ecological sampling study. For getting more information on the data, sim_beta calculates statistical power for different sampling efforts and plot_power plots those results to help the user define the a combination of sampling effort and power to move on.
ecocbo is being developed at Github(https://github.com/arturoSP/ecocbo), where up-to-date versions can be found.
The ecocbo development team is Edlin Guerra-Castro and Arturo Sanchez-Porras.
Underwood, A. J. (1997). Experiments in ecology: their logical design and interpretation using analysis of variance. Cambridge university press.
Underwood, A. J., & Chapman, M. G. (2003). Power, precaution, Type II error and sampling design in assessment of environmental impacts. Journal of Experimental Marine Biology and Ecology, 296(1), 49-70.
Anderson, M. J. (2014). Permutational multivariate analysis of variance (PERMANOVA). Wiley statsref: statistics reference online, 1-15.
Guerra‐Castro, E. J., Cajas, J. C., Simões, N., Cruz‐Motta, J.J., & Mascaró, M. (2021). SSP: an R package to estimate sampling effort in studies of ecological communities. Ecography, 44(4), 561-573.
# Load and adjust data. data(epiDat) simResults <- prep_data(data = epiDat, type = "counts", Sest.method = "average", cases = 5, N = 100, M = 10, n = 5, m = 6, k = 20, transformation = "none", method = "bray", dummy = TRUE, useParallel = FALSE, model = "single.factor") simResults # Computing components of variation compVar <- scompvar(data = simResults) compVar # Determination of statistical power epiBetaR <- sim_beta(simResults, alpha = 0.05) epiBetaR # Cost-benefit optimization cboResult <- sim_cbo(epiBetaR, cn = 75) cboResult # Visualization of statistical power plot_power(data = epiBetaR, method = "power")# Load and adjust data. data(epiDat) simResults <- prep_data(data = epiDat, type = "counts", Sest.method = "average", cases = 5, N = 100, M = 10, n = 5, m = 6, k = 20, transformation = "none", method = "bray", dummy = TRUE, useParallel = FALSE, model = "single.factor") simResults # Computing components of variation compVar <- scompvar(data = simResults) compVar # Determination of statistical power epiBetaR <- sim_beta(simResults, alpha = 0.05) epiBetaR # Cost-benefit optimization cboResult <- sim_cbo(epiBetaR, cn = 75) cboResult # Visualization of statistical power plot_power(data = epiBetaR, method = "power")
The dataset contains the results of applying ecocbo::sim_beta() to the dataset from PAPIIT experiment. The result is a list with 4 components.
betaNestedbetaNested
An object of class "ecocbo_beta", also a list containing four components. The format is:
number of sites considered for the result.
number of replicates within each site for the result.
estimated statistical power.
estimated type II error.
estimated pseudoF value that corresponds to the 1-alpha quartile of the distribution of pseudoF.
simulation from which the results are obtained.
number of resample for the result.
number of sites considered for the result.
number of replicates within each site for the result.
observed F value for the experimental design, when all observations belong to one site.
observed F value for the experimental design, when observations belong to different sites.
calculated mean squares among sites in the experiment.
calculated mean squares for the residuals in the experiment.
usually 0.05
"nested.symmetric"
"ecocbo.beta"
This dataset can be used to study the variability of the pseudoF-statistic, beta and the power when an experiment is applied to a varying number of samples, sampling units, or sampling sites.
Data available from the GitHub Digital Repository: https://github.com/edlinguerra/IA206320_publico/tree/main/datos (Guerra-Castro et al. 2022).
The dataset contains biomass data from monitoring fish individuals in coral reefs in Puerto Rico, through the Puerto Rico Coral Reef Monitoring Program.
dataFishdataFish
A dataframe. The data is formed by:
A list of regions in which the observations were made.
A list of locations in which the observations were made.
A series of observations.
A series of observations.
This dataset can be used to test the functions in the nested.symmetric model, and using the cover/biomass version of the algorithms.
Data available from the PRCRMP.
The dataset contains the results of applying ecocbo::sim_beta() to an excerpt from the dataset epibionts from the package SSP. The result is a list with 4 components.
epiBetaRepiBetaR
An object of class "ecocbo_beta", also a list containing four components. The format is:
number of sites considered for the result.
number of replicates within each site for the result.
estimated statistical power.
estimated type II error.
estimated pseudoF value that corresponds to the 1-alpha quartile of the distribution of pseudoF.
simulation from which the results are obtained.
number of resample for the result.
number of sites considered for the result.
number of replicates within each site for the result.
observed F value for the experimental design, when all observations belong to one site.
observed F value for the experimental design, when observations belong to different sites.
calculated mean squares among sites in the experiment.
calculated mean squares for the residuals in the experiment.
usually 0.05
nested.symmetric
ecocbo.beta
This dataset can be used to study the variability of the pseudoF-statistic, beta and the power when an experiment is applied to a varying number of samples, sampling units, or sampling sites.
Data available from the GitHub Digital Repository: https://github.com/edlinguerra/SSP/tree/master/data (Guerra-Castro et al. 2022).
This is a dataset containing a subset from the epibionts dataset from SSP
which was made by using the three local communities that differ the most.
epiDatepiDat
A data frame with count of individuals for 24 observations on 151 species.
Data available from the Dryad Digital Repository: doi:10.5061/dryad.3bk3j9kj5 (Guerra-Castro et al. 2020).
This is a dataset containing a subset from the macrofauna recorded in the PAPIIT experiment.
macrofDatmacrofDat
A dataframe with counts of individuals for 43 observations on 34 species.
Data available from the GitHub Digital Repository: https://github.com/edlinguerra/IA206320_publico/tree/main/datos (Guerra-Castro et al. 2022).
Visualizes the statistical power of a study as a function of the sampling effort.
The power curve plot illustrates how power increases with sample size, while
the density plot highlights overlapping areas where and
are significant.
plot_power( data, cbo = NULL, n = NULL, m = NULL, method = "power", completePlot = TRUE )plot_power( data, cbo = NULL, n = NULL, m = NULL, method = "power", completePlot = TRUE )
data |
Object of class |
cbo |
Optional. Object of class |
n |
Optional. Integer. Number of samples |
m |
Optional. Integer. Number of replicates |
method |
Character. Type of plot to generate:
|
completePlot |
Logical. Is the plot to be drawn complete? If TRUE the plot will be trimmed to present a better distribution of the density plot. |
A plot displaying:
If method = "power", power curves for different values of m, with the
selected n highlighted in red.
If method = "density": a density plot of observed pseudo-F values with
a vertical line indicating significance from sim_beta().
If method = "both": a composite figure with both the power curve and the
density plot.
If method = "surface": a surface plot for the statistical power in different
sampling designs.
The selected values of m, n, and the corresponding component of variation
are displayed in all cases.
Edlin Guerra-Castro ([email protected]), Arturo Sanchez-Porras
Underwood, A. J. (1997). Experiments in ecology: their logical design and interpretation using analysis of variance. Cambridge university press.
Underwood, A. J., & Chapman, M. G. (2003). Power, precaution, Type II error and sampling design in assessment of environmental impacts. Journal of Experimental Marine Biology and Ecology, 296(1), 49-70.
sim_beta()
scompvar()
sim_cbo()
prep_data()
# Power curve visualization plot_power(data = epiBetaR, method = "power") # Density plot of pseudo-F values plot_power(data = betaNested, method = "density") # Composite plot with both power curve and density plot plot_power(data = betaNested, method = "both")# Power curve visualization plot_power(data = epiBetaR, method = "power") # Density plot of pseudo-F values plot_power(data = betaNested, method = "density") # Composite plot with both power curve and density plot plot_power(data = betaNested, method = "both")
Formats and arranges the initial data so that it can be readily used by the other functions in the package. The function first gets the species names and the number of samples for each species from the input data frame. Then, it permutes the sampling efforts and calculates the pseudo-F statistic and the mean squares for each permutation. Finally, it returns a data frame with the permutations, pseudo-F statistic, and mean squares.
prep_data( data, type = "counts", Sest.method = "average", cases = 5, N = 100, M = NULL, n, m = NULL, k = 50, transformation = "none", method = "bray", dummy = FALSE, useParallel = TRUE, model = "single.factor", jitter.base = 0.5 )prep_data( data, type = "counts", Sest.method = "average", cases = 5, N = 100, M = NULL, n, m = NULL, k = 50, transformation = "none", method = "bray", dummy = FALSE, useParallel = TRUE, model = "single.factor", jitter.base = 0.5 )
data |
Data frame where columns represent species names and rows correspond to samples.
|
type |
Character. Nature of the data to be processed. It may be presence / absence ("P/A"), counts of individuals ("counts"), or coverage ("cover"). |
Sest.method |
Character Method for estimating species richness using
|
cases |
Integer. Number of simulated datasets. |
N |
Integer. Total number of samples simulated per site. |
M |
Integer. Total number of replicates simulated per dataset. Not needed for single factor experiments. |
n |
Integer. Maximum number of samples to consider (must be |
m |
Integer. Number of replicates to consider. (must be |
k |
Integer. Number of resampling iterations. Defaults to 50. |
transformation |
Character. Transformation applied to reduce the weight of dominant species: "square root", "fourth root", "Log (X+1)", "P/A", "none". |
method |
Character. Dissimilarity metric used |
dummy |
Logical. If |
useParallel |
Logical. If |
model |
Character. Select the model to use. Options are |
jitter.base |
Numeric. Standard deviation multiplier used to add Gaussian
jitter to |
The input dataset should have:
One or two leading columns for treatment/replicate labels.
Subsequent columns representing species presence/absence, counts, or coverage.
"single.factor" requires a single column for replicates.
"nested.symmetric" requires two columns: treatment and replicate in that
order.
prep_data() returns an object of class "ecocbo_data".
An object of class "ecocbo_data" is a list containing:
$Results, a data frame that lists the estimates of pseudoF for
simH0 and simHa, useful for statistical power analysis. It also
includes mean squares for variance component estimation.
$model, a label for keeping track of the model that is being used
in the analysis.
$a, an integer for the number of treatments recorded from the original
data.
Edlin Guerra-Castro ([email protected]), Arturo Sanchez-Porras
Underwood, A. J. (1997). Experiments in ecology: their logical design and interpretation using analysis of variance. Cambridge university press.
Underwood, A. J., & Chapman, M. G. (2003). Power, precaution, Type II error and sampling design in assessment of environmental impacts. Journal of Experimental Marine Biology and Ecology, 296(1), 49-70.
sim_beta()
plot_power()
sim_cbo()
scompvar()
simResults <- prep_data(data = epiDat, type = "counts", Sest.method = "average", cases = 5, N = 100, M = 10, n = 5, m = 5, k = 30, transformation = "none", method = "bray", dummy = FALSE, useParallel = FALSE, model = "single.factor", jitter.base = 0) simResultssimResults <- prep_data(data = epiDat, type = "counts", Sest.method = "average", cases = 5, N = 100, M = 10, n = 5, m = 5, k = 30, transformation = "none", method = "bray", dummy = FALSE, useParallel = FALSE, model = "single.factor", jitter.base = 0) simResults
Computes the average components of variation among sampling units and within samples in relation to sampling effort.
scompvar(data, n = NULL, m = NULL)scompvar(data, n = NULL, m = NULL)
data |
Object of class |
n |
Optional. Integer. Number of samples to consider. |
m |
Optional. Integer. Number of replicates to consider. |
If m or n are set to NULL, the function automatically uses the
largest available values from the experimental design set in sim_beta().
A data frame containing the values for the variation component
among sites compVarA and in the residuals compVarR.
Edlin Guerra-Castro ([email protected]), Arturo Sanchez-Porras
Underwood, A. J. (1997). Experiments in ecology: their logical design and interpretation using analysis of variance. Cambridge university press.
Underwood, A. J., & Chapman, M. G. (2003). Power, precaution, Type II error and sampling design in assessment of environmental impacts. Journal of Experimental Marine Biology and Ecology, 296(1), 49-70.
sim_beta()
plot_power()
sim_cbo()
prep_data()
scompvar(data = simResults) scompvar(data = simResults, n = 5, m = 2)scompvar(data = simResults) scompvar(data = simResults, n = 5, m = 2)
Estimates the statistical power of a study by comparing variation under null
and alternative hypotheses. For instance, if the beta error is 0.25, there is
a 25% chance of failing to detect a real difference, and the power of the study
is , meaning 0.75 in this case.
sim_beta(data, alpha = 0.05)sim_beta(data, alpha = 0.05)
data |
An object of class |
alpha |
Numeric. Significance level for Type I error. Defaults to 0.05. |
The function displays a summary matrix with estimated power values for various sampling efforts.
A list of class "ecocbo_beta", containing:
$Power: a data frame with power and beta estimates across different
sampling efforts (m sites and n samples).
$Results: a data frame with pseudo-F estimates for simH0 and simHa.
$alpha: significance level for Type I error.
Edlin Guerra-Castro ([email protected]), Arturo Sanchez-Porras
Underwood, A. J. (1997). Experiments in ecology: their logical design and interpretation using analysis of variance. Cambridge university press.
Underwood, A. J., & Chapman, M. G. (2003). Power, precaution, Type II error and sampling design in assessment of environmental impacts. Journal of Experimental Marine Biology and Ecology, 296(1), 49-70.
Anderson, M. J. (2014). Permutational multivariate analysis of variance (PERMANOVA). Wiley statsref: statistics reference online, 1-15.
Guerra‐Castro, E. J., Cajas, J. C., Simões, N., Cruz‐Motta, J. J., & Mascaró, M. (2021). SSP: an R package to estimate sampling effort in studies of ecological communities. Ecography, 44(4), 561-573.
plot_power()
scompvar()
sim_cbo()
prep_data()
SSP::assempar()
SSP::simdata()
sim_beta(data = simResults, alpha = 0.05)sim_beta(data = simResults, alpha = 0.05)
Given a table of statistical power estimates produced by sim_beta,
sim_cbo finds the sampling design (number of replicates/site and sites)
that minimizes total cost while achieving a user‐specified power threshold.
sim_cbo(data, cn, cm = NULL, perm = 100)sim_cbo(data, cn, cm = NULL, perm = 100)
data |
Object of class |
cn |
Numeric. Cost per sampling unit. |
cm |
Numeric. Fixed cost per replicate. |
perm |
Integer. Minimum number of permutations needed to reject the null hypothesis. Defaults to 100, as it would allow for rejecting with alpha = 0.05, the user can change this value to make the testing more strict (e.g. 200 for testing alpha = 0.01 or 5000 for testing alpha = 0.001). |
A data frame with one row per candidate design. In the single factor
case, the results include the available n values, their statistical
power and cost. For the nested symmetric experiments, the results include all
the available values for m, the optimal n, according to the
power, and the associated cost. The results also mark a suggested sampling
effort, based on the cost and power range as selected by the user.
Edlin Guerra-Castro ([email protected]), Arturo Sanchez-Porras
Underwood, A. J. (1997). Experiments in ecology: their logical design and interpretation using analysis of variance. Cambridge university press.
Underwood, A. J., & Chapman, M. G. (2003). Power, precaution, Type II error and sampling design in assessment of environmental impacts. Journal of Experimental Marine Biology and Ecology, 296(1), 49-70.
sim_beta()
plot_power()
scompvar()
Underwood_cbo()
# Optimization of single factor experiment sim_cbo(data = epiBetaR, cn = 80) # Optimization of a nested factor experiment sim_cbo(data = betaNested, cn = 80, cm = 180)# Optimization of single factor experiment sim_cbo(data = epiBetaR, cn = 80) # Optimization of a nested factor experiment sim_cbo(data = betaNested, cn = 80, cm = 180)
The dataset contains the results of applying ecocbo::prep_data() to epiDat. The result is a list with one level: $Results is a data frame with the results of applying PERMANOVA to epiDat a number of times, it contains the values of pseudoF and the mean squares for different repeated sampling efforts.
simResultssimResults
An object of class "ecocbo_data", also a list containing one data frame. The format is:
simulation from which the results are obtained.
number of resample for the result.
number of replicates within each site for the result.
observed F value for the experimental design, when all observations belong to one site.
observed F value for the experimental design, when observations belong to different sites.
calculated mean squares for the residuals in the experiment.
"single.factor"
class: ecocbo_data
This dataset can be used to study the variability of the pseudoF-statistic, beta and the power when an experiment is applied to a varying number of samples, sampling units, or sampling sites.
Data available from the Dryad Digital Repository: doi:10.5061/dryad.3bk3j9kj5 (Guerra-Castro et al. 2020).
The dataset contains the results of applying ecocbo::prep_data() to epiDat. The result is a list with one level: $Results is a data frame with the results of applying PERMANOVA to epiDat a number of times, it contains the values of pseudoF and the mean squares for different repeated sampling efforts.
simResultsNestedsimResultsNested
An object of class "ecocbo_data", also a list containing one data frame. The format is:
simulation from which the results are obtained.
number of resample for the result.
number of sites considered for the result.
number of replicates within each site for the result.
observed F value for the experimental design, when all observations belong to one site.
observed F value for the experimental design, when observations belong to different sites.
calculated mean squares among sites in the experiment.
calculated mean squares for the residuals in the experiment.
"single.factor"
class: ecocbo_data
This dataset can be used to study the variability of the pseudoF-statistic, beta and the power when an experiment is applied to a varying number of samples, sampling units, or sampling sites.
Source data is available from https://github.com/edlinguerra/IA206320_publico/tree/main/datos (Guerra-Castro et al. 2020).
Applies a cost-benefit optimization model based on either a desired level of precision or a predefined budget, following the approach of Underwood (1997).
Underwood_cbo( comp.var, multSE = NULL, budget = NULL, a = NULL, ca = NULL, cm = NULL, cn )Underwood_cbo( comp.var, multSE = NULL, budget = NULL, a = NULL, ca = NULL, cm = NULL, cn )
comp.var |
Data frame as obtained from |
multSE |
Optional. Numeric. Required multivariate standard error for the sampling experiment. |
budget |
Optional. Numeric. Total budget available for the sampling experiment. |
a |
Numeric. Number of treatments to consider. |
ca |
Numeric. Cost per treatment. |
cm |
Numeric. Cost per replicate. |
cn |
Numeric. Cost per sampling unit. |
A data frame containing the optimized values for m number of
sites to sample and n number of samples per site.
Edlin Guerra-Castro ([email protected]), Arturo Sanchez-Porras
Underwood, A. J. (1997). Experiments in ecology: their logical design and interpretation using analysis of variance. Cambridge university press.
Underwood, A. J., & Chapman, M. G. (2003). Power, precaution, Type II error and sampling design in assessment of environmental impacts. Journal of Experimental Marine Biology and Ecology, 296(1), 49-70.
sim_beta()
plot_power()
scompvar()
sim_cbo()
compVar <- scompvar(data = simResults) # Optimization based on budget constraint Underwood_cbo(comp.var = compVar, multSE = NULL, budget = 20000, a = 3, ca = 2500, cn = 100) # Optimization based on precision constraint Underwood_cbo(comp.var = compVar, multSE = 0.15, cn = 150)compVar <- scompvar(data = simResults) # Optimization based on budget constraint Underwood_cbo(comp.var = compVar, multSE = NULL, budget = 20000, a = 3, ca = 2500, cn = 100) # Optimization based on precision constraint Underwood_cbo(comp.var = compVar, multSE = 0.15, cn = 150)