5 Special topics
The utility of SECR is enhanced by several extensions to the basic theory in the preceding chapters.
The most important of these are the linear sub-models for detection and density that we describe in their respective chapters. Others we list here.
5.1 Multi-session likelihood
The data may comprise multiple independent datasets to be analysed together. We call these ‘sessions’, following the terminology of secr. They may be synchronous spatial samples from non-overlapping populations or samples of the same population widely separated in time. If there are J such datasets we can denote them \Omega_j, j = 1,...,J. The likelihood to be maximized is then \prod_j L(\phi, \theta | \Omega_j). While the parameter vector (\phi, \theta) is common to all sessions, the mechanism of linear submodels allows ‘real’ parameter values to be specific to a session, common to multiple sessions, or modelled as a function of session-level covariates.
Failure of the independence assumption in a multi-session analysis results in underestimation of the sampling variance.
5.2 Groups
Users of MARK (Cooch & White, 2023) will be familiar with the stratification of a population into ‘groups’, each with potentially different values for parameters such as survival probability. Grouping requires that each animal is assigned to a group on first detection, and that the assignment is permanent. Then density and any detection parameter may be modelled as a function of the grouping factor. It seems natural to treat females and males as two groups as their movement behaviour and hence detection parameters may differ greatly, but in Chapter 16 we suggest other ways to model sex differences. Group models are available in secr only when maximizing the full likelihood. When maximizing the conditional likelihood the model may specify an individual factor covariate directly, with the same effect as grouping.
5.3 Effective sampling area
The effective sampling area is defined (Borchers & Efford, 2008) as a(\theta) \equiv \int_{R^2} p_\cdot(\mathbf x; \theta) \, d\mathbf x. \tag{5.1}
This is a scalar effective area for which \hat D = n / a(\hat \theta) is an unbiased estimate of density as developed in Section 5.4. It does not correspond to a geographic region or delimited polygon on the ground. It bears no relation to the traditional ‘effective trapping area’ A_W (e.g., Otis et al., 1978) for which \hat D = \hat N / A_W, given a non-spatial population estimate \hat N and boundary strip width W. Variation in a(\theta) depends not only on obviously spatial quantities such as the extent of the detector array and the spatial scale of detection \sigma, but also on non-spatial quantities such as sampling effort (e.g., the number of days of trapping) and the intercept of the distance-dependent detection function (g_0, \lambda_0).
Gardner et al. (2009) and Royle et al. (2009) defined an ‘effective trapping area’ or ‘effective sample area’ A_e somewhat differently, omitting the intercept of the detection function. To our knowledge their definition has not found further use. A version closer to ours appears in Royle et al. (2014, Section 5.12).
Efford & Mowat (2014) defined a ‘single-detector sampling area’ a_0(\theta) = 2\pi \lambda_0 \sigma^2 that is equal to Eq. 5.1 for an isolated detector with detection functions HHN or HEX. For K isolated detectors a(\theta) = Ka_0(\theta), but overlap of the ‘catchment areas’ of adjacent detectors leads to a(\theta) < Ka_0(\theta).
5.4 Conditional likelihood
The detection parameters (\theta) may be estimated by maximizing the likelihood conditional on n (Eq. 3.3). When density is constant this reduces to
L_n (\theta| \Omega) \propto \prod_{i=1}^n \frac{\int_{R^2} \mathrm{Pr}(\omega_i | \mathbf x; \theta) \; d\mathbf x}{a(\theta)}.
Conditioning on n allows individual covariates \mathbf{z}_i to be included in the detection model, so that the parameter vector takes a potentially unique value \theta_i = f(\mathbf{z}_i) for each individual. The effective sampling area then varies among individuals as a function of their covariates. A corresponding Horvitz-Thompson-like derived estimate of density is \hat D_{HT} = n / \sum_{i=1}^n a(\hat \theta_i) (Borchers & Efford, 2008).
When the conditional likelihood is maximized, the inverse Hessian provides variances for \theta. The variance of the derived estimate of density depends also on the distribution of n. Following Huggins (1989), \mathrm{var}(\hat D_{HT}) = s^2 + \hat {\mathbf G}^T_\theta \hat {\mathbf I} \hat {\mathbf G}_\theta \tag{5.2} where s^2 is the variance of \hat D when \theta is known, \hat {\mathbf I} is the estimated information matrix (inverse Hessian), and \hat {\mathbf G} is a vector containing the gradients of \hat D with respect to the elements of \theta, evaluated at the maximum likelihood estimates. Numerical evaluation of the second term is straightforward.
When the distribution of AC is inhomogeneous Poisson and detections are independent we expect n to have a Poisson distribution. If n is Poisson then s^2 = \sum_{i=1}^n a(\hat \theta_i)^{-2}. This simplifies to n/a(\hat \theta)^2 in the absence of individual covariates.
When N(A) is fixed, n is binomial and s^2 = \sum_{i=1}^n [1 - a(\hat \theta_i)/|A|] / a(\hat \theta_i)^2.
The variance from Eq. 5.2 is on the natural scale, and the Wald confidence interval computed on this scale is symmetrical. Intervals that are symmetric on the log scale and asymmetric on the natural scale have better coverage properties. These are obtained as (\hat D_{HT}/C, \hat D_{HT}C) where C = \exp \{ z_{\alpha/2} \sqrt{\log[1 + \frac{\mathrm{var}(\hat D_{HT})}{\hat D_{HT}^2}]} \} (Burnham et al., 1987; Chao, 1989).
5.5 Relative density
Conditioning on n without requiring uniform density as in the previous section leads to another result that is useful. We can estimate relative density, the distribution of AC in relation to habitat covariates, rather than absolute density. Conditioning allows individual covariates of detection \mathbf z_i as before, and fitting is faster than for absolute density (Efford, 2025).
We define relative density by D^\prime(\mathbf x | \phi^-) \equiv k^{-1} D(\mathbf x | \phi) where \phi^- is a reduced vector of coefficients for the density sub-model and k is a constant of proportionality. Then we can discard the first factor in Eq. 3.1 and maximize a likelihood based on the second factor alone L_r(\phi^-, \theta | \Omega, n) \propto \prod_{i=1}^n \frac{\int \mathrm{Pr}(\omega_i | \theta, \mathbf x) \, D^\prime(\mathbf x | \phi^-) \, p_\cdot(\mathbf x | \theta, \mathbf z_i) \; d\mathbf x}{\int D^\prime(\mathbf x | \phi^-) \, p_\cdot(\mathbf x | \theta, \mathbf z_i) \; d\mathbf x}. \tag{5.3} For a log link the new parameter vector \phi^- corresponds to the original \phi with one coefficient, the intercept, fixed at zero. For an identity link the intercept is fixed at 1 and other coefficients are scaled. A derived estimate of the constant of proportionality is \hat k = \sum_{i=1}^n 1 / \int D^\prime(\mathbf x|\hat \phi^-) p_\cdot(\mathbf x| \hat \theta, \mathbf z_i)\, d \mathbf x. A derived estimate of the absolute density at point \mathbf x is \hat D(\mathbf x | n, \hat \phi^-, \hat \theta) = \sum_{i=1}^n \frac{D^\prime(\mathbf x| \hat \phi^-)}{\int D^\prime(\mathbf x| \hat \phi^-) p_\cdot(\mathbf x| \hat \theta, \mathbf z_i)\, d \mathbf x} .
This approach makes the same sampling assumptions as the full model, including that individuals enter the sample by a spatial detection process whose parameter vector \theta we estimate from the data. For poisson-distributed AC, parameter estimates from maximizing Eq. 5.3 are identical to those of the full likelihood except for the missing density intercept.
5.6 Adjustment for spatially selective prior marking
In some scenarios, the only individuals at risk of detection are those that were marked in an earlier phase of the study. This is the case for the automated detection of animals with implanted acoustic telemetry tags or passive integrated transponders (PIT) (e.g., Whoriskey et al., 2019). The data resemble mark–resight data (Appendix E) except that they do not include counts of unmarked animals. Estimates of \phi^- using Eq. 5.3 then describe only the distribution of the individuals marked previously. Even if we can assume that AC are stationary between the two phases, \phi^- is biased as a model for population distribution as it incorporates the spatial selectivity of marking.
Lack of information on the marking phase also afflicts mark-resight analyses (e.g., Efford & Hunter, 2018). The possible ‘solutions’ are all ad hoc. We can assume that the probability of becoming marked was independent of location, or we can scale p_\cdot(\mathbf x; \phi^-) by an externally computed variable q(\mathbf x) that is proportional to the spatial probability of marking. Essentially, q(\mathbf x) is an offset in the model for relative density.
An example is shown in Section 11.7. A model with flat (constant) probability of prior marking is inevitably a poor fit because in reality tagged individuals are concentrated near the detectors.
5.7 Population size N
Population size is the number of individual AC in a particular region; we denote this N(A) for region A. For a flat density surface \mathrm{E}[N(A)] = D.|A| where |A| is the area of A.
D and N(A) are interchangeable for specified A. In most applications of SECR there is no naturally defined region A, and therefore \hat N(A) depends on the arbitrary choice of A. This weakness is not shared by \hat D.
Population size is sometimes termed ‘abundance’. We avoid this usage because ‘abundance’ has also been used as an umbrella term for density and population size, and its overtones are vague and biblical rather than scientific.
The population size of any region B may be predicted post hoc from a fitted density model using \hat N(B) = \int_B \hat D(\mathbf x) \, d\mathbf x. The prediction variance of \hat N(B) follows from Poisson assumptions regarding D(\mathbf x) (Efford & Fewster, 2013).
5.8 Finite mixture models
Finite mixture models for individual heterogeneity of capture probability were formalised for non-spatial capture–recapture by Pledger (2000) and remain widely used (e.g., Cooch & White, 2023). These are essentially random-effect models in which the distribution of capture probability comprises two or more latent classes, each with a capture probability and probability of membership.
Borchers & Efford (2008, p. 381) gave the likelihood for a Poisson SECR model with U latent classes in proportions \psi = (\psi_1, …, \psi_U). In all examples we have tried U is 2 or 3. For each class u there is an associated vector of detection parameters \theta_u (collectively \theta). Omitting the constant multinomial term, \mathrm{Pr}(\Omega | n, \phi,\theta, \psi) \propto \prod_{i=1}^n \sum_{u=1}^U\int\frac{\mathrm{Pr}\{\omega_i | \mathbf x; \theta_u\}}{p_\cdot(\mathbf x; \theta_u)} f(\mathbf x, u | \omega_i>0) \,d\mathbf x where f(\mathbf x, u | \omega_i > 0) = \frac{D(\mathbf x; \phi) p_\cdot(\mathbf x; \theta_u) \psi_u}{\sum_{u=1}^U \int D(\mathbf x; \phi) p_\cdot(\mathbf x; \theta_u) \psi_u \; d\mathbf x}.
The expected number of detected animals n, replacing Eq. 3.2, is a weighted sum over latent classes: \Lambda(\phi, \theta,\psi) = \sum_{u=1}^U \psi_u \int D(\mathbf x; \phi) p_\cdot(\mathbf x;\theta_u) \; d\mathbf x. \tag{5.4} Integration is over points in potential habitat, as usual.
5.9 Hybrid finite mixture models
We can modify the finite mixture likelihood for data in which the class membership of some or all individuals is known. Indicate the class membership of the i-th individual by a variable u_i that may take values 0, 1, …, U, where u_i = 0 indicates an individual of unknown class, and the class frequencies are n_0, n_1, …, n_U (not to be confused with n_1,…,n_C in Eq. 3.3). We assume here that detection histories are sorted by class membership, starting with the unknowns.
The expression for \lambda in Eq. 5.4 is unchanged, but we must split \mathrm{Pr}(\Omega | n, \phi, \theta, \psi) and include a multinomial term for the observed distribution over classes:
\begin{split} \mathrm{Pr}(\Omega | n, \phi,\theta, \psi) \propto \; &\prod_{i=1}^{n_0}\sum_{u=1}^U\int\frac{\mathrm{Pr}\{\omega_i | \mathbf x; \theta_u\}}{p_\cdot(\mathbf x ; \theta_u)} f(\mathbf x ,u|\omega_i>0) \; d\mathbf x \\ &\times \prod_{i={n_0+1}}^n \int \frac{\mathrm{Pr}\{\omega_i | \mathbf x; \theta_{u_i}\}}{p_\cdot(\mathbf x ; \theta_{u_i})} f'(\mathbf x | \omega_i>0; u_i) \; d\mathbf x \\ &\times {n - n_0 \choose n_1, ...,n_U} \prod_{u=1}^U \left[ \frac{\lambda_u}{\lambda} \right] ^{n_u}, \end{split} \tag{5.5}
where \lambda_u = \psi_u \int D(\mathbf x ) p_\cdot(\mathbf x ; \theta_u) \; d\mathbf x, and the multinomial coefficient {n - n_0 \choose n_1, ...,n_U} is a constant that can be omitted. Rather than representing the joint probability density of \mathbf x and u_i as in f(\cdot) previously, f'(\cdot) is the probability density of \mathbf x for given u_i: f'(\mathbf x | \omega_i > 0; u_i) = \frac{D(\mathbf x )p_\cdot(\mathbf x ; \theta_{u_i})}{\int D(\mathbf x )p_\cdot(\mathbf x ; \theta_{u_i}) d\mathbf x }.
The likelihood conditions on the number of known-class animals detected (n-n_0), rather than modelling class identification as a random process. It assumes that the probability that class will be recorded does not depend on class, and that such recording when it happens is without error.
For homogeneous density the likelihood simplifies to \begin{split} \mathrm{Pr}(\Omega | n, \phi,\theta, \psi) \propto &\prod_{i=1}^{n_0}\sum_{u=1}^U\int\frac{\mathrm{Pr}\{\omega_i | \mathbf x; \theta_u\} \psi_u} {\sum_u a(\theta_u) \psi_u } \; d\mathbf x \\ &\times \prod_{i={n_0+1}}^n \int \frac{\mathrm{Pr}\{\omega_i | \mathbf x; \theta_{u_i}\}}{a(\theta_{u_i})} \; d\mathbf x \prod_{u=1}^U { \left[ \frac{a(\theta_u)\psi_u} {\sum_u a(\theta_u) \psi_u } \right] ^{n_u}}, \end{split}
where a(\theta_u) = \int p_\cdot(\mathbf x ; \theta_{u}) \; d\mathbf x.
5.10 Alternative parameterizations
The ‘real’ parameters in SECR are typically assumed to be independent (orthogonal). However, some parameter pairs co-vary in predictable ways owing to constraints on animal behaviour. Here it is more straightforward to work with the hazard detection functions. The intercept of the detection function \lambda_0 declines with increasing \sigma, all else being equal (Efford & Mowat, 2014). This is inevitable if detection is strictly proportional to time spent near a point, given a bivariate home range utilisation model (pdf for activity). Also, home-range size and the SECR parameter \sigma decline with population density (Efford et al., 2016). Allowing for covariation may improve biological insight and lead to more parsimonious models.
Covariation may be ‘hard-wired’ into SECR models by reparameterization. In each case a new ‘surrogate’ parameter is proportional to a combination of the co-varying parameters. One of the co-varying parameters is seen as driving variation, while the other is inferred from the surrogate and the driver. Deviations from the expected covariation are implied when the surrogate is found to vary (i.e. a model with varying surrogate is superior to a model with constant surrogate).
For concreteness, consider a difference in home range size over the seasons, causing variation in \sigma. A reasonable null hypothesis is that there will be reciprocal seasonal variation in \lambda_0 such that a_0 = 2 \pi \lambda_0 \sigma^2 is constant. Here variation in \sigma is the driver, a_0 is the surrogate, and \lambda_0 is derived.
Parameters | Driver | Surrogate | Derived | Effect |
---|---|---|---|---|
(D, a_0, \sigma) | \sigma | a_0 | \lambda_0 = a_0 / (2 \pi \sigma^2) | Reciprocal \sigma^2, \lambda_0 |
(D, \lambda_0, k) | D | k | \sigma = k/\sqrt D | Density-dependent \sigma |
(D, a_0, k) | D | k, a_0 | \sigma = k/\sqrt D | both |
\lambda_0 = a_0 / (2 \pi \sigma^2) |
See Appendix H for further detail on alternative parameterizations and their implementation in secr.
5.11 Model-based location of AC
Assume we have fitted a spatial model by integrating over the unknown locations of AC for a given SECR dataset \Omega = {\omega_1, \omega_2, ..., \omega_n}. We may retrospectively infer the probability density of the AC corresponding to each detection history, using the model and the estimated parameters:
f(\mathbf x | \omega_i; \hat \phi, \hat \theta) = \frac{ \mathrm{Pr} (\omega_i | \mathbf x; \hat \theta) D(\mathbf x ; \hat \phi)}
{\int_{R^2} \mathrm{Pr}(\omega_i | \mathbf x; \hat \theta) D(\mathbf x ; \hat \phi) \, d \mathbf x}.
\tag{5.6} This is equivalent to the posterior distribution of each latent AC in Bayesian applications. See fxi
and related functions in secr. For known \theta and known \phi (unless D(\mathbf x; \phi) uniform) the modal location of the AC for animal i may be estimated by maximizing \mathrm{Pr} (\omega_i | \mathbf x; \theta)D(\mathbf x; \phi) with respect to \mathbf x. The distribution often has more than one mode for animals at the edge of a detector array or searched area. It should not be confused with the home range utilisation pdf.
Section 11.6 discusses the use and interpretation of f(\mathbf x | \omega_i; \hat \phi, \hat \theta) (see also Durbach et al., 2024).