# Econstudentlog

## Statistical Models for Proportions and Probabilities

“Most elementary statistics books discuss inference for proportions and probabilities, and the primary readership for this monograph is the student of statistics, either at an advanced undergraduate or graduate level. As some of the recommended so-called ‘‘large-sample’’ rules in textbooks have been found to be inappropriate, this monograph endeavors to provide more up-to-date information on these topics. I have also included a number of related topics not generally found in textbooks. The emphasis is on model building and the estimation of parameters from the models.

It is assumed that the reader has a background in statistical theory and inference and is familiar with standard univariate and multivariate distributions, including conditional distributions.”

The above quote is from the the book‘s preface. The book is highly technical – here’s a screencap of a page roughly in the middle:

I think the above picture provides some background as to why I do not think it’s a good idea to provide detailed coverage of the book here. Not all pages are that bad, but this is a book on mathematical statistics. The technical nature of the book made it difficult for me to know how to rate it – I like to ask myself when reading books like this one if I would be able to spot an error in the coverage. In some contexts here I clearly would not be able to do that (given the time I was willing to spend on the book), and when that’s the case I always feel hesitant about rating(/’judging’) books of this nature. I should note that there are pretty much no spelling/formatting errors, and the language is easy to understand (‘if you know enough about statistics…’). I did have one major problem with part of the coverage towards the end of the book, but it didn’t much alter my general impression of the book. The problem was that the author seems to apply (/recommend?) a hypothesis-testing framework for model selection, a practice which although widely used is frankly considered bad statistics by Burnham and Anderson in their book on model selection. In the relevant section of the book Seber discusses an approach to modelling which starts out with a ‘full model’ including both primary effects and various (potentially multi-level) interaction terms (he deals specifically with data derived from multiple (independent?) multinomial distributions, but where the data comes from is not really important here), and then he proceeds to use hypothesis tests of whether interaction terms are zero to determine whether or not interactions should be included in the model or not. For people who don’t know, this model selection method is both very commonly used and a very wrong way to do things; using hypothesis testing as a model selection mechanism is a methodologically invalid approach to model selection, something Burnham and Anderson talks a lot about in their book. I assume I’ll be covering Burnham and Anderson’s book in more detail later on here on the blog, so for now I’ll just make this key point here and then return to that stuff later – if you did not understand the comments above you shouldn’t worry too much about it, I’ll go into much more detail when talking about that stuff later. This problem was the only real problem I had with Seber’s book.

Although I’ll not talk a lot about what the book was about (not only because it might be hard for some readers to follow, I should point out, but also because detailed coverage would take a lot more time than I’d be willing to spend on this stuff), I decided to add a few links to relevant stuff he talks about in the book. Quite a few pages in the book are spent on talking about the properties of various distributions, how to estimate key parameters of interest, and how to construct confidence intervals to be used for hypothesis testing in those specific contexts.

Some of the links below deal with stuff covered in the book, a few others however just deal with stuff I had to look up in order to understand what was going on in the coverage:

Inverse sampling.
Binomial distribution.
Hypergeometric distribution.
Multinomial distribution.
Binomial proportion confidence interval. (Coverage of the Wilson score interval, Jeffreys interval, and the Clopper-Pearson interval included in the book).
Fisher’s exact test.
Marginal distribution.
Fischer information.
Moment-generating function.
Factorial moment-generating function.
Delta method.
Multidimensional central limit theorem (the book applies this, but doesn’t really talk about it).
Matrix function.
McNemar’s test.

January 11, 2015 - Posted by | Books, Mathematics, Statistics

No comments yet.

This site uses Akismet to reduce spam. Learn how your comment data is processed.