¶ … Zipf's Law and Benford's Law, in order for the reader to clearly understand both of the laws and their significance to mathematics. "Zipf's law, named after the Harvard linguistic professor George Kingsley Zipf (1902-1950), is the observation that frequency of occurrence of some event (P), as a function of the rank (i) when the rank is determined by the above frequency of occurrence, is a power-law function Pi ~ 1/ia with the exponent a close to unity" (Li, n.d.).
Benford's Law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit occurs much more often than the others (namely about 30% of the time" (Wikipedia, 2006). Both of these statements, however, must be clarified in order to truly understand what these laws mean and what they are about.
Where Benford's Law is concerned, the larger that the digit is, the less the likelihood that it will be the leading digit where any number is concerned (Bogomolny, n.d.). This applies to any kind of figures, from those that have social significance to those that are more closely tied to the natural world. These can include numbers taken from newspaper articles, stock prices, electricity bills, population numbers, areas and/or lengths for rivers, death rates, both mathematical and physical constants, and any processes that are described by the 'power laws,' which are seen as being very common within nature (Bogomolny, n.d.).
In order to explain this, it is important to understand that the first digits have a certain select distribution, and this distribution must be completely independent from the measuring system that is used. To be more specific, this indicates that, if an individual would convert from feet to meters, for example, the distribution would not be changed (Bogomolny, n.d.). It is what is termed as 'scale invariant,' and therefore it is also logarithmic (Bogomolny, n.d.). When measuring either the distance or the length of something, the first digit, which is non-zero, should have a distribution that is the same regardless of what the unit of measure is. This unit of measure could be inches, yards, meters, feet, miles, light years, or virtually any other type of measurement (Hill, 1995).
It is important to be aware, however, when considering the example of feet and yards, that there are three feet in every yard, so one must consider that there is probability regarding the first digit of this length and that there is equal probability of this digit being 1 (in yards) or 3, 4, or 5 (in feet) (Bogomolny, n.d.). By applying this idea to all of the possible scales that could be used for measurement, one would get a distribution that is logarithmic. Combining that with the idea that log 10(1) is equal to zero and log 10(10) is equal to one, Benford's Law is seen (Bogomolny, n.d.). In other words, if there is a distribution of the first digits, this distribution has to apply to a data set, no matter what type of measuring unit is used, and it can be seen that the only kind of distribution of the first digits that actually fits into that category is the distribution that indicates Benford's Law (Hill, 1995).
A more precise form for Benford's Law can be addressed if the individual assumes that not only are the numbers distributed uniformly, but so are their logarithms (Hill, 1995). In other words, a number is equally as likely to be between 100 and 1000 (with a logarithm that is between 2 and 3) as that number is to be between 10,000 and 100,000 (with a logarithm that is between 4 and 5) (Bogomolny, n.d.). For a large number of number sets, especially those number sets that have exponential growth such as those that are involved with stock prices and incomes, this assumption appears to be very reasonable.
For numbers that are drawn from a large distribution, such as human heights and IQ scores that generally follow a normal distribution this law is not seen to be valid (Bogomolny, n.d.). However, if an individual 'mixes' the numbers from various distributions, such as taking various numbers from articles in newspapers, a reappearance of Benford's Law is seen (Hill, 1995). This issue can be mathematically proven thusly: if an individual chooses a probability distribution that is random, and then chooses a number based on that distribution that is also random, the list of numbers that results from that exercise will fall in line with Benford's Law (Bogomolny, n.d.).
Now that Benford's Law has been addressed and explained from a mathematical standpoint so that the reader has a better idea of not only what the law says but also what it means, it is time to address Zipf's Law as well. Benford's Law is highly important to the field of mathematics, but Zipf's Law also has a great deal of significance and therefore should be explained, addressed, and studied as well.
Originally, the law that Zipf created indicated that, in the corpus of utterances during natural language, the frequency that any word appears is generally, in a rough sense, proportional on an inverse level to the rank that it appears within the frequency table (Li, n.d.). In other words, the word that is used most frequently will appear roughly two times as often as the word that is seen to be the second most frequent, which will then be seen two times as often as the word that appears fourth most frequently, and this trend will continue for the entire list of words. The idea of Zipf's Law relates also to probability distributions and the power law (Li, n.d.; Hill, 1995).
Zipf's Law is not a theoretical law, but is rather an experimental law (Li, n.d.). Issues that take place because of Zipf's Law are commonly called Zipfian distributions. These kinds of distributions are seen in all different types of phenomena, but there are many that say that the Zipfian distributions that take place in real life are somewhat controversial, and that they may not be true Zipfian distributions (Li, n.d.). The easiest way to observe the work of Zipf's Law is to scatterplot the data. When this is done, the axes are log (rank order) and log (frequency) (Hill, 1995).
You’re 79% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.