Hypotheses relating to the annual frequency distribution of mammalian births are commonly tested using a goodness-of-fit procedure. Several interacting factors influence the statistical power of these tests, but no power studies have been conducted using scenarios derived from biological hypotheses. Corresponding to theories relating reproductive output to seasonal resource fluctuation, we simulated data reflecting a winter reduction in birth frequency to test the effect of four factors (sample size, maximum effect size, the temporal pattern of response and the number of categories used for analysis) on the power of three goodness-of-fit procedures – the G and Chi-square tests and Watson’s U2 test. Analyses resulting in high power all had a large maximum effect size (60%) and were associated with a sample size of 200 on most occasions. The G-test was the most powerful when data were analysed using two temporal categories (winter and other) while Watson’s U2 test achieved the highest power when 12 monthly categories were used. Overall, the power of most modelled scenarios was low. Consequently, we recommend using power analysis as a research planning tool, and have provided a spreadsheet enabling a priori power calculations for the three tests considered.