3  Likelihood-based SECR

Part II provides background on the theory of SECR that is mostly for reference. This chapter covers general notation and theory for point detectors. Chapter 4 describes theory for area and transect searches. Chapter 5 has extensions for multiple sessions, groups, effective sampling area, conditional estimation, relative density, population size, finite mixtures, hybrid mixture models, alternative parameterizations and model-based location of activity centres (‘fxi’). These chapters aim to be independent of any particular implementation, although some cross-references to secr creep in.

3.1 Notation

We use notation and terminology from Borchers & Efford (2008), with minor variations and extensions from Efford, Borchers, et al. (2009), Efford (2011) and elsewhere (Table 3.1).

Table 3.1: Mathematical notation for SECR
Category Symbol Meaning
General
AC activity centre
\mathbf x point (x,y) in the plane
Data
n number of individuals detected
S number of sampling occasions
K number of detectors
\omega_i spatial detection history of the i-th animal
\Omega set of all detection histories \omega_i, i = 1..n
Parameters
D(\mathbf x; \phi)1 intensity at \mathbf x of AC Poisson point process
\phi parameter vector for AC point process
\theta vector of detection parameters (minimally (g_0, \sigma) or (\lambda_0,\sigma))
g_0 intercept of distance-probability-of-detection function
\lambda_0 intercept of distance-hazard-of-detection function
\sigma spatial scale parameter of distance-detection function
Model
d_k(\mathbf x) distance of point \mathbf x from detector k
\lambda(d) hazard of detection at distance d (distance-hazard-of-detection function)
g(d) probability of detection at distance d (distance-probability-of-detection function)
p_\cdot(\mathbf x; \theta) probability that an individual with AC at \mathbf x is detected at least once
h_{isk}(\mathbf x; \theta) hazard of detection at detector k for animal i on occasion s
p_{isk}(\mathbf x; \theta) probability of detection corresponding to h_{isk}(\mathbf x; \theta)
p_{k}(\mathbf x; \theta) p_{isk}(\mathbf x; \theta) constant across individuals i and occasions s
\Lambda(\phi, \theta) expected number of detected individuals n
Habitat
A potential habitat (‘habitat mask’, ‘state space’) A \subset R^2
|A| area of A
N(A) number of AC in A

3.2 Likelihood

Parameters of the state model (\phi) and the detection model (\theta) are estimated jointly by maximizing the logarithm of the likelihood:

L(\phi, \theta | \Omega ) = \mathrm{Pr}(n| \phi, \theta) \; \mathrm{Pr}(\Omega | n, \phi, \theta). \tag{3.1}

When density is constant across space, \phi drops out of the rightmost term, which then relates to the detection (observation) model alone, and maximization of this component gives unbiased estimates of \theta (see Conditional likelihood).

3.2.1 Number of individuals

If AC follow an inhomogeneous Poisson process then n has a Poisson distribution with parameter \Lambda(\phi, \theta) = \int_{R^2} D(\mathbf x; \phi) \, p_\cdot(\mathbf x;\theta) \, d\mathbf x, \tag{3.2} where D(\mathbf x; \phi) is the density at \mathbf x and p_\cdot(\mathbf x|\theta) is the overall probability of detecting an AC at \mathbf x (see Section 3.4 and Chapter 4). Thus \mathrm{Pr}(n | \phi, \theta) = \Lambda^n \exp (-\Lambda) / n!.

If the population size in a defined area A is considered to be fixed rather than Poisson then the distribution of n is binomial with size N(A).

The region of integration R^2 refers to the entire plane, an infinite extent in two dimensions. Why use this when we know animal populations do not go on forever, and we later restrict integration to just part of the plane for computational reasons? The reason is that the integration over R^2 is a clear, model-based benchmark; restriction to a finite region A is arbitrary and must be done deliberately, with awareness of the consequences. Restriction intrudes distractions such as whether N(A) is fixed or Poisson.

3.2.2 Detection histories

In general we have \mathrm{Pr} (\Omega | n, \phi, \theta) = \binom {n}{n_1,...,n_C} \prod_{i=1}^n \mathrm{Pr}( \omega_i | \omega_i>0, \phi, \theta), \tag{3.3}

where \omega_i>0 indicates a non-empty detection history. The multinomial coefficient uses the frequencies n_1,...,n_C of each of the C observed histories. The coefficient is a constant not involving parameters and it can be omitted without changing the model fit (consistent inclusion or exclusion is needed for valid likelihood-based comparisons such as those using AIC).

We do not know the true AC locations, but they can be integrated2 out of the likelihood using an expression for their spatial distribution, i.e.  \mathrm{Pr}( \omega_i | \omega_i>0, \phi, \theta) = \int_{R^2} \mathrm{Pr}( \omega_i | \omega_i>0, \theta, \mathbf x) \, f(\mathbf x | \omega_i>0, \phi, \theta) \; d\mathbf x \tag{3.4} where f(\mathbf x| \omega_i>0, \phi, \theta) is the conditional density of an AC given that the individual was detected. The conditional density is given by f(\mathbf x| \omega_i>0, \phi, \theta) = \frac{D(\mathbf x ; \phi) p_\cdot(\mathbf x ; \theta)}{\Lambda(\phi,\theta)}. \tag{3.5}

3.3 Distance-dependent detection

The key idea of SECR is that the probability of detecting a particular animal at a particular detector on one occasion can be represented as a function of the distance between its AC and the detector. The function should decline effectively to zero at large distances. Distances are not observed directly, and we rely on functions of somewhat arbitrary shape. Fortunately, the estimates are not very sensitive to the choice. Detection functions are covered in detail in Chapter 10. Either probability g(d) or hazard \lambda(d) may be modelled as a function of distance. A halfnormal form is commonly used (e.g., g(d) = g_0 \exp (-d^2/2/\sigma^2)). The shapes of, e.g., halfnormal g(d) and halfnormal \lambda(d) are only subtly different, but \lambda(d) is preferred because it lends itself to mathematical manipulation and occurs more widely in the literature (often with different notation).

The function \lambda(d) may be transformed into a probability with g(d) = 1 - \exp[-\lambda(d)] and the reverse transformation is \lambda(d) = -\log[1-g(d)]. The intercept parameter g_0 has been replaced by \lambda_0; although the name \sigma is retained for the spatial scale parameter this is not exactly interchangeable between the models.

3.4 Detector types

The SECR data \omega_i for each detected individual comprise a matrix with dimensions S (occasions) and K (detectors). The matrix entries \omega_{isk} may be binary (0/1) or integer (0, 1, 2, …). Various probability models exist for \omega_{isk}. The appropriate probability model follows in most cases directly from the type of detector device; we therefore classify probability models according to device type. Table 3.2 matches this classification to that of Royle et al. (2014). This section covers passive detection at a point; area searches are considered later.

Table 3.2: Detector types based on Efford & Boulanger (2019, Table 1) with cross references to Royle et al. (2014)
Detector type Royle et al. Data
Binary proximity Bernoulli1 binary animal \times occasion \times detector
Count proximity
   Poisson Poisson integer animal \times occasion \times detector
   Binomial Binomial integer animal \times occasion \times detector
Multi-catch trap Multinomial binary animal \times occasion
Single-catch trap binary animal \times occasion, exclusive
Area search integer animal \times occasion \times detector
Transect search integer animal \times occasion \times detector
Exclusive area search2 binary animal \times occasion
Exclusive transect search2 binary animal \times occasion
  1. Also ‘Binomial’ in Royle & Gardner (2011)
  2. ‘Exclusive’ here means that an individual can be detected at no more than one detector (polygon or transect) per occasion.

For each type of detector we require \mathrm{Pr}(\omega_{isk} | \mathbf x) and the overall probability of detection p_\cdot(\mathbf x). For some detector types it is more natural to express the probability in terms of the occasion- and trap-specific hazard h_{sk} = \lambda[d_k(\mathbf x); \theta^\prime] = -\log(1-g[d_k(\mathbf x); \theta]) than the probability p_{sk} (\mathbf x) = g[d_k(\mathbf x); \theta] = 1 - \exp\{-\lambda[d_k(\mathbf x); \theta^\prime]\}3.

We summarise the probability models in Table 3.3, with comments on each point detector type below.

Table 3.3: Summary of point detector types (conditioning on \theta omitted to save space)
Detector type \mathrm{Pr}(\omega_{isk} | \mathbf x) p_\cdot(\mathbf x)
Binary proximity p_{sk}(\mathbf x) ^{\omega_{isk}} [1-p_{sk}(\mathbf x)]^{(1-\omega_{isk})} 1 - \prod_s\prod_k 1 - p_{sk} (\mathbf x)
Count proximity
   Poisson \{h_{sk} (\mathbf x)^{\omega_{isk}} \exp [-h_{sk}(\mathbf x)]\} / \omega_{isk}! 1 - \exp [- \sum_s \sum_k h_{sk}(\mathbf x)]
   Binomial1 \binom{B_s}{\omega_{isk}} p_{sk}(\mathbf x)^{\omega_{isk}} [1-p_{sk}(\mathbf x)]^{(B_s-\omega_{isk})} 1 - \prod_s\prod_k [1 - p_{sk} (\mathbf x)]^{B_s}
Multi-catch trap2 \{1 - \exp [-H_s(\mathbf x)]\}\frac{h_{sk}(\mathbf x)}{H_s(\mathbf x)} 1 - \exp[ -\sum_s H_s(\mathbf x)]
  1. B_s is the size of the binomial distribution, the number of opportunities for detection, assumed constant across detectors
  2. H_s = \sum_k h_{sk}(\mathbf x) is the hazard summed over traps

3.4.1 Binary proximity detector

A proximity detector records the presence of an individual at or near a point without restricting its movement. The data are binary when there can be no more than one detection of an individual at a particular detector on a particular occasion, or repeat detections are deliberately discarded.

Note

Discarding hard-won data may seem perverse, but it may be justified if the fraction is small (i.e. repeat detections rare). Binary data are less prone to problems due to non-independence of repeated visits to a detector, especially when a few individuals generate many repeat detections. Binary models also tend to be faster to fit and dodge the question of what discrete distribution should be used for the counts.

Assuming independence among detectors, the distance-detection model applies directly as the probability of detection in a particular detector, and the overall probability of detection is the complement of the product of probabilities of non-detection in all detectors.

3.4.2 Count proximity detectors

Some devices count visits by each individual: each datum is then a non-negative integer rather than binary. Automatic cameras (‘camera traps’) are a common example. Modelling depends on the distribution of the counts, for which the primary options are Poisson and binomial.

3.4.2.1 Poisson count proximity detector

If each visit by an individual is independent of other visits in a given interval (occasion) then the number of visits has a Poisson distribution. The sole parameter of the discrete Poisson distribution is the expected number of visits.

Independence may seem implausible, but it is commonly indistinguishable from non-independence when the expected number is small. However, it is not uncommon for one or two individuals to be photographed many times by one camera, and the Poisson is then unsuitable. The negative binomial is an alternative that models the overdispersion that results from non-independence, while requiring an additional parameter. There do not yet seem to be convincing applications of the negative binomial in SECR, and collapsing data to binary is recommended.

3.4.2.2 Binomial count proximity detector

Binomial counts arise when there is a known, finite number of opportunities for detection within each occasion that we denote B_s. This is the result when binary proximity data over many occasions are collapsed to a count on single occasion (e.g., Efford, Dawson, et al., 2009). The initial number of occasions is known (B_s = S) and places an upper limit on the count. Collapsing is often efficient. It precludes modelling parameter variation or learned responses across occasions.

Each count is binomial with size B_s and probability equal to the per-occasion detection probability.

3.4.3 Multi-catch trap

A trap is a device that detains an animal until it is released, allowing only one detection of that animal per occasion. The single-detector, single-AC probability from a distance-dependent detection function (preceding section) must be modified to allow for prior capture in a different trap: traps effectively “compete” for animals. If the trap remains open for captures of further animals then the solution is a straightforward competing risk model (Borchers & Efford, 2008).

The competing risk model uses the occasion- and trap-specific hazard h_{sk}.

3.4.4 Single-catch trap

A single-catch trap can catch only one animal at a time. This entails competition both among traps for animals and among animals for traps. No simple likelihood is available. Simulation-based methods (Efford, 2004, 2023) must be used for unbiased estimation of \theta and trend in density unless the time of detection has been recorded (Distiller & Borchers, 2015). The multi-catch likelihood provides nearly unbiased estimates of uniform density.

3.5 Fixed N

The number of individuals of the target species (population size or N) is the parameter of primary interest in non-spatial closed population capture–recapture. This is seldom true in spatial capture–recapture. It is nevertheless the source of considerable confusion for two related reasons:

  1. The population size depends on the extent of the modelled habitat (‘habitat mask’ or ‘state space’, area A) which is usually arbitrary, and
  2. Having defined an arbitrary extent for numerical convenience (Chapter 12) the results, particularly the variance of \hat D or \hat N(A), depend on whether the enclosed population is assumed to be a random variable or fixed.

We expand here on the consequences of treating N as fixed.

The formulation of the state model as a Poisson process in two dimensions (Chapter 1) does not refer to population size. Under that model, the number of AC in an arbitrary area A is a poisson-distributed random variable regardless of whether the 2-D process is homogeneous or inhomogeneous.

The state model may also be cast as a ‘conditional’ or ‘binomial’ Poisson process’ (Illian et al., 2008). For an arbitrary area A the number of AC N(A) is then considered fixed rather than Poisson. This is implicit in the Bayesian formulation of Royle et al. (2014). The distribution of n in the conditional model is binomial with size N(A) and probability p_c(\phi, \theta) = \int_A p_\cdot(\mathbf x; \theta) f(\mathbf x; \phi) d\mathbf x, where f(\mathbf x; \phi) = D(\mathbf x; \phi) / \int_A D(\mathbf x; \phi) d\mathbf x.

The form conditional on N(A) leads to narrower confidence intervals for density owing to the exclusion of variation in N(A) among realisations of the Poisson process for AC. This makes sense when A contains an isolated population with a natural boundary, but most applications do not meet this criterion.

3.6 Confidence intervals

Maximizing the log likelihood leads to a straightforward estimate of the asymptotic covariance matrix \mathbf V of the beta parameters. If \hat \theta is the vector of estimates and \mathbf H(\hat \theta) is the Hessian matrix evaluated at \hat \theta then an estimate of the covariance matrix is \hat {\mathbf V} = \mathbf H(\hat \theta)^{-1}.

Hessian

The Hessian matrix is the square matrix of second-order partial derivatives of the log likelihood. For more on asymptotic variances of MLE see Seber (1982), Borchers et al. (2002), Cooch & White (2023) 1.3.2, and many statistics texts.

The sampling error of MLE is asymptotically normal, and symmetric (Wald) intervals for SECR parameters appear to work well on the link scale i.e. \hat \theta_j \pm z_{\alpha/2} \hat \sigma_j is a 100(1-\alpha)\% interval for \hat \theta_j where -z_{\alpha/2} is the \alpha/2 quantile of the standard normal deviate (z_{0.025} = 1.96) and \hat \sigma_j^2 is the estimated variance from \hat{\mathbf V}.

On back transformation to the natural (‘real’) scale these intervals become asymmetrical and generally have good coverage properties.

The method of profile likelihood is also available (e.g., Evans et al. (1996); secr::confint.secr), but it is seldom used as no problem has been shown with intervals based on asymptotic variances. Similarly, the additional computation required by parametric bootstrap methods is not usually warranted.

3.7 Varying effort

When sampling effort varies between detectors or over time in a capture–recapture study we expect a commensurate change in the number of detections. Allowing for known variation in effort when modelling detections has these benefits:

  • detection parameters are related to a consistent unit of effort (e.g., one trap set for one day)
  • the fit of the detection model is improved
  • trends in the estimates may be modelled without confounding.

Borchers & Efford (2008) allowed the duration of exposure to vary between sampling occasions in their competing-hazard model for multi-catch traps. Efford et al. (2013) generalised the method to allow joint variation in effort over detectors and over time (occasions), and considered other detector types.

We use T_{sk} for the effort on occasion s at detector k. At its simplest, T_{sk} can be a binary indicator taking the values 0 (detector not used) or 1 (detector used) (when T_{sk} = 0, no detections are possible and \omega_{isk} = 0). For small, continuously varying, T_{sk} we expect the number of detections to increase linearly with T_{sk}; saturation may occur at higher effort, depending on the detector type. Examples of possible effort variables are the number of days that each automatic camera was operated in a tiger study, or the number of rub trees sampled for DNA in each grid cell of a grizzly bear study.

Following convention in non-spatial capture–recapture (Cooch & White, 2023) we could model g_0 or \lambda_0 on the link scale (logit or log) as a linear function of T_{sk} (a time covariate if constant across detectors, a detector covariate if constant across occasions, or a time-varying detector-level covariate). However, this is suboptimal because varying effort has a linear effect only on \lambda_0 for Poisson count detectors, and the estimation of additional parameters is an unnecessary burden. T_{sk} is like an offset in a generalised linear model: it can be included in the SECR model without estimating an additional coefficient.

The SECR models for various detectors (Table 3.3) are expressed in terms of either the probability p_{sk} or the hazard h_{sk}. Each of these scales differently with T_{sk} as shown in Table 3.4. Only in the Poisson case is the expected number of detections linear on effort.

Table 3.4: Including effort in SECR models for various detector types. p^\prime_{sk}(\mathbf x) and h^\prime_{sk}(\mathbf x) replace the matching quantities in Table 3.3.
Detector type Adjusted probability or hazard
Multi-catch trap h^\prime_{sk}(\mathbf x) = h_{sk}(\mathbf x) T_{sk}; H^\prime_s(\mathbf x) = \sum_k h^\prime_{sk}(\mathbf x)
Binary proximity p^\prime_{sk}(\mathbf x) = 1 - (1 - p_{sk}(\mathbf x))^{T_{sk}}
Poisson count proximity h^\prime_{sk}(\mathbf x) = h_{sk}(\mathbf x) T_{sk}
Binomial count proximity see below

For binomial count detectors we use a formulation not based directly on instantaneous hazard, as explained by Efford et al. (2013). For these detectors T_{sk} (assumed integer) is taken as the size of the binomial (maximum possible detections) and p_{sk}(\mathbf x) is unchanged.


  1. We use D(\mathbf x) in preference to \lambda(\mathbf x) because \lambda has multiple meanings.↩︎

  2. Integration is commonly performed by summing over many small cells for a finite region near the detectors, as both \mathrm{Pr}(\omega_i) and f(\mathbf x) decline to zero at greater distances. We state the model in terms of the real plane and defer discussion of the region of integration to Chapter 12.↩︎

  3. The parameter vectors \theta and \theta^\prime differ for detection functions expressed in terms of probability (g()) and hazard (\lambda()). ↩︎