Through his questioning of the validity of various analyses in our paper (Emlen and Wrege 2004), Nisbet (2005) teaches a valuable lesson regarding the importance of clear methods and avoiding circularity in statistical analyses. Nisbet is correct to criticize us for failing to detail our sexing methods, but we show below that the analyses in our paper are both valid and robust. Characteristics of our study species, combined with intensive observation, made our sexing of individuals extremely reliable, but we argue more generally that a low frequency of errors in assigning subjects to groups does not necessarily compromise conclusions drawn from statistical inference.
In our 2004 paper, we erred by not explicitly describing how we sexed our study animals. Resident Wattled Jacanas (Jacana jacana) (i.e. adults defending exclusive territories and observed displaying either sexual or reproductive behavior or both) were sexed initially through behavioral observations (764 h of focal behavior samples, plus uncounted thousands of hours of ad libitum behavioral observations), confirmed by measures of mass for nearly all individuals. We found no overlap in the mass distributions of resident males and females (Fig. 1A). Similarly, the mass distributions of floater males and floater females did not overlap each other (Fig. 1B). Although we could not use behavioral criteria to sex floaters, the clear difference in mass, as well as an easily perceived “chunkiness” to females in the hand, made sexing decisions seem unambiguous. Nonetheless, our Table 1 (Emlen and Wrege 2004) pools residents and floaters of each sex and, therefore, those data were inappropriate for a statistical test of mass difference between males and females. Nisbet correctly points out that this is circular, because the sexing of floaters was based at least in part on their mass at capture. That circularity, however, does not extend to tests of sexual size dimorphism in other characters (nor, if sexing was based on behavioral criteria, to the testing of all hypotheses about sexspecific behavior). For example, in our Table 1, we examined the sexual dimorphism of morphological characters associated with territorial defense (wing spur) and sexual signaling (shield and wattle size and color), none of which would a priori vary with body mass. Even purely structural characters, such as tarsus length, are not necessarily correlated with body mass. In situations where the grouping character is necessarily highly correlated with some other character of interest, statistical methods can be used to examine residual differences, after removing the correlated component.
Nisbet's letter raises another point with broad applicability: do errors in assignment of subjects to groups compromise the validity of statistical inference based on those groups? For example, Nisbet suggests that a few large male jacanas acting like females, or small females acting like males, could cast doubt on our conclusions about sex-specific behaviors and sexual selection. We disagree. Any method used to determine the sex of an individual (or determine any other form of group assignment) is subject to human error (e.g. measurement and recording error), including genetic methods for sexing (e.g. through contamination, scoring, and mislabeling error). In addition, errors may arise because of factors intrinsic to the method (e.g. overlapping distributions on the discriminating character[s]). From a statistical inference viewpoint, such errors often are not a problem, unless the errors are both relatively frequent and introduce a bias with respect to the hypotheses being tested. Unbiased errors, even if relatively frequent, increase unexplained variance and thus reduce the probability of type I error (i.e. rejecting a null hypothesis when it is true). However, unbiased errors would tend to increase type II error (i.e. accepting a null hypothesis when it is false), and could be misleading if, for example, data were pooled on the basis of failure to reject the null hypothesis. Clearly, researchers must assess the potential frequency of errors, and carefully examine whether such errors could bias tests of hypotheses.
Inspection of Figure 1 shows that the only sexing errors we might have made would have been to classify as male a small female floater. The intensive schedule of behavioral observation essentially eliminated the possibility that any such females would remain incorrectly sexed had they achieved resident status. As part of the male floater class, such individuals would appear as extreme outliers in the statistical analyses of resident versus floater morphology presented in our Table 2 (Emlen and Wrege 2004), and such outliers were not observed. Finally, small females classed as males during censuses to estimate the floater population would have tended to increase our estimates of variance in male lifetime mating success (and decrease that of females), a conservative contaminant that would not affect our conclusions.
This research was supported by National Science Foundation grant IBN-9317988 to S.T.E.