Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

In terms of the frequency of letters, how is it possible to have different frequ

ID: 648482 • Letter: I

Question

In terms of the frequency of letters, how is it possible to have different frequent letters when the length of the text I'm analyzing is shorter?

At the moment, I'm comparing the frequencies of a long text and a subtext from that text. To my surprise, the most frequent letters changed. In the long one it was the letter e followed by the letter t, however in the small text it was t followed by e. Also, when I checked the frequency of different types of texts (e.g news articles), the frequency of letters also changed as well as the most frequent one.

The bottom line is, how can that be possible? It makes no sense to me.

Explanation / Answer

Speaking in statistical terms, this is the difference between the law of large numbers and the "law of small numbers" (e.g. see Poisson distribution).

Short texts are not statistically significant, or more detailed: If you assume statistical independent letters (not true in general, but can be used as simplification), for short texts the variance will be much higher, so that you have to expect a larger gab between the expected value and your actual measurement.

If you want to know whether a sample coincides with a given frequency distribution, there are statistical hypothesis tests, e.g. the chi-squared test, where the result indicates how likely your sample matches the given distribution.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote