To allow an opportunity to experiment with the guidelines presented here, I am including a small piece of mathematics often called the birthday paradox. The first part is a concise mathematical article explaining the problem and solving it. The second is an imaginary Reader’s attempt to understand the article by using the appropriate reading protocol. This article’s topic is probability and is accessible to a bright and motivated reader with no background at all.
为了做个实验来展示这里介绍的能力和方法,我引入了名为“生日悖论”的数学 讨论。第一分部分是是一篇数学论文,讲解这个问题并解决它,第二部分是一个 虚构的读者尝试使用之前的阅读规则以理解它。该论文的主题是概率论,它可以 为完全没有知识背景但聪明、积极的读者所理解。
Our Reader Attempts to Understand the Birthday Paradox 阅读理解
In this section, a naive Reader tries to make sense out of the last few paragraphs. The Reader’s part is a metaphor for the Reader thinking out loud, and the Professional’s comments represent research on the Reader’s part. The appropriate protocols are centered and bold at various points in the narrative.
这一节中,一个新手读者想要读懂最后几段。下面的“读者”表示我们虚构的那位读 者,“教授”的注解表示与读者的探讨。在叙述中涉及的几点规则居中加粗表 示。
My Reader may seem to catch on to things relatively quickly. However, be assured that in reality a great deal of time passes between each of my Reader’s comments, and that I have left out many of the Reader’s remarks that explore dead-end ideas. To experience what the Reader experiences requires much more than just reading through his/her lines. Think of his/her part as an outline for your own efforts.
这位读者看起来好像很快找到了事物的相关性。然而,为了确认这一点,要在他 的评注中花去大量的时间,我会略过那些他钻牛角尖的讨论。为了体验到读者的 经验,就要深入到他/她的字里行见。通过你的努力去想像他/她的思路。
Know Thyself 自省
Reader (R): I don’t know anything about probability, can I still make it through?
我一点儿也不了解概率,我还能读懂它吗?
Professional (P): Let’s give it a try. We may have to backtrack a lot at each step.
让我们试试看,我们可能要回溯好多步。
R: What does the phrase “30 random students” mean?
“任意的” 30 个学生是什么意思?
“When I use a word, it means just what I choose it to mean 当我使用一个词,它就是我为它选择的意思”
P: Good question. It doesn’t mean that we have 30 spacy or scatter-brained people. It means we should assume that the birthdays of these 30 people are independent of one another and that every birthday is equally likely for each person. The author writes this more technically a little further on: “Assume that the birthdays of n people are uniformly distributed among 365 days of the year.”
好问题,它不是说我们有 30 个宽广的或者心不在焉的人。它的意思是我们假定 这 30 个人每个人的生日都独立于其他人,选择机会都完全平等。更技术化一点 的说法是:“假设这 n 个人的生日平均分布于一年的 365 天中。”
R: Isn’t that obvious? Why bother saying that?
这不是很明显么?为什么要说的这么罗嗦?
P: Yes the assumption is kind of obvious. The author is just setting the groundwork. The sentence guarantees that everything is normal and the solution does not involve some imaginitive fanciful science-fiction.
是的,某种意义上来讲这个假设很明显。作者只是做一个背景设定。这个声明确 认每件事都是平凡的,这个解答不会引入什么科幻小说式的想像。
R: What do you mean?
你的意思是?
P: For example, the author is not looking for a solution like this: everyone lives in Independence Land and is born on the 4th of July, so the chance of two or more people with the same birthday is 100%. That is not the kind of solution mathematicians enjoy. Incidentally, the assumption also implies that we do not count leap years. In particular, nobody in this problem is born on February 29. Continue reading.
例如,作者没有做这样的解答:每个人都生活在一个独立的大陆,生于七月四 日,所以每两个或更多的人生日相同的机率为 100%。这不是数学家喜欢的答案。 顺便说一下,这个假设还意味着我们不需要计算闰年,特别是这个问题中没有人 生于二月二十九日。下一个。
R: I don’t understand that long formula, what’s n?
我不理解那个很长的公式,什么是 n?
P: The author is solving the problem for any number of people, not just for 30. The author, from now on, is going to call the number of people n.
作者解决了任意多个人的问题,不止是 30,于是作者称人数为 n。
R: I still don’t get it. So what’s the answer?
我不太理解,所以那个答案是?
Don’t Be a Passive Reader - Try Some Examples 不要做一个被动的读者,尝试一下
P: Well, if you want the answer for 30, just set n = 30.
好的,如果你想知道 30 人时的答案,n 就是 30。
R: Ok, but that looks complicated to compute. Where’s my calculator? Let’s see: 365 × 364 × 363 × ... × 336. That’s tedious, and the final exact value won’t even fit on my calculator. It reads:
OK,不过看起来计算过程很复杂。我的计算器哪儿去了?我们看看: 365 × 364 × 363 × ... × 336 。这也太可怕了,我的计算器都显示不全结果了。 它读作:
2.1710301835085570660575334772481e+76
If I can’t even calculate the answer once I know the formula, how can I possibly understand where the formula comes from?
如果我所知的这个公式都不能拿来让我算一次答案,我怎么理解它呢?
P: You are right that this answer is inexact, but if you actually go on and do the division, your answer won’t be too far off.
没错,答案并不准确。不过,如果你用除法去消元,就根本不会有这么复杂。
R: The whole thing makes me uncomfortable. I would prefer to be able to calculate it more exactly. Is there another way to do the calculation?
这事儿让我很感棘手。我想把它算得更精确一点。有没有其它的算法?
P: How many terms in your product? How many terms in the product on the bottom?
你乘了几顶?底部的式子总共要多少项相乘?
R: You mean 365 is the first term and 364 is the second? Then there are 30 terms. There are also 30 terms on the bottom, (30 copies of 365).
你的意思是 365 是第一项,364 是第二项?那么有 30 项,下面也有 30 项 ,(30 个 365).
P: Can you calculate the answer now?
现在你能计算答案了吗?
R: Oh, I see. I can pair up each top term with each bottom term, and do 365/365 as the first term, then multiply by 364/365, and so on for 30 terms. This way the product never gets too big for my calculator. (After a few minutes)... Okay, I got 0.29368, rounded to 5 places.
哦,我看看。我可以将每一个分子和每一个分母因数配对,第一顶是 365/365, 然后乘 364/365,类推 30 项。这样对我的计算器来说这个乘法不是很大了。 (过几分钟后)……OK,我求得 0.29368。5位精度。
P: What does this number mean?
那么这个整数意味着什么?
Don’t Miss the Big Picture 不要失去大局观
R: I forgot what I was doing. Let’s see. I was calculating the answer for n = 30. The 0.29368 is everything except for subtracting from 1. If I keep going I get 0.70632. Now what does that mean?
我忘了我要干啥了。让我想想,我计算了 n=30 时的答案。从一中减去所求之后 为 0.29368 。接下去我可以求得 0.70632。那么这说明了什么?
P: Knowing more about probability would help, but this simply means that the chance that two or more out of the 30 people have the same birthday is 70,632 out of 100,000 or about 71%.
知道多一点更好理解,不过也可以简单的理解为 100,000 次实验中,有 70,632 次机会,30 个人中有至少两个生日相同,这个机率约为 71%。
R: That’s interesting. I wouldn’t have guessed that. You mean that in my class with 30 students, there’s a pretty good chance that at least two students have the same birthday?
这很有趣,我没有想到。你的意思是我班上有 30 个学生的话,有很大的机会至 少有两个学生生日想同?
P: Yes that’s right. You might want to take bets before you ask everyone their birthday. Many people don’t thinkthat a duplicate will occur. That’s why some authors call this the birthday paradox.
没错,就是这意思。你在问过他们生日前可能很想打这个赌。很多人都没有想到 会有重复出现。这就是为什么有些作者称之为生日悖论。
R: So that’s why I should read mathematics, to make a few extra bucks?
所以我应该读些数学,可以弄点儿外快?
P: I see how that might give you some incentive, but I hope the mathematics also inspires you without the monetary prospects.
这事儿看来给了你一些鼓励,不过我希望数学还能激励你多一些非功利的想法。
R: I wonder what the answer is for other values of n. I will try some more calculations.
我对 n 为其它值的答案很有兴趣,我想再多算几个。
P: That’s a good idea. We can even make a picture out of all your calculations. We could plot a graph of the number of people versus the chance that a duplicate birthday occurs, but maybe this can be left for another time.
是个好主意,我们甚至可以把你所有的计算结果画成图。我们可以来个人数与生 日重复事件之意的关系曲线,不过这事儿可以下次再弄。
R: Oh look, the author did some calculations for me. He says that for n = 30 the answer is about 71%; that’s what I calculated too. And, for n = 23 it’s about 50%. Does that make sense? I guess it does. The more people there are, the greater the chance of a common birthday. Hey, I am anticipating the author. Pretty good. Okay, let’s go on.
瞧,作者为我做了一些计算,他说 n=30 时答案约为 71%,我验算过了。并且, n=23 时答案约为 50%。这可信吗?我觉得可以。选更多的人数,生日重复的机 会总是会更大。嘿,我会抢答了。太棒了。OK,我们继续。
P: Good, now you’re telling me when to continue.
很好,现在你觉得我们可以继续了。
Don’t Read Too Fast 不要读的太快
R: It seems that we are up to the proof. This must explain why that formula works. What’s this Q(n)? I guess that P stands for probability but what does Q stand for?
看起来我们做出了证明。我们一定要弄清公式的原理。什么是 Q(n) ?我猜 P 代表概率的意思,不过 Q 代表什么?
P: The author is defining something new. He is using Q just because it’s the next letter after P, but Q(n) is also a probability, and closely related to P(n). It’s time to take a minute to think. What is Q(n) and why is it equal to 1 – P(n)?
作者定义了很多东西,这里使用 Q 只是因为它是 P 的下一个字母。Q(n) 也是 个概率,它强相关于 P(n)。我们该花几分钟想想了。什么是 Q(n)?为什么它等 于 1 - P(n)?
R: Q(n) is the probability that no two people have the same birthday. Why does the author care about that? Don’t we want the probability that at least two have the same birthday?
Q(n) 是没有生日相同的概率,为什么作者强调这个?我们不能去考虑至少两个 人在同一个生日的概率?
P: Good point. The author doesn’t tell you this explicitly, but between the lines, you can infer that he has no clue how to calculate P(n) directly. Instead, he introduces Q(n) which supposedly equals 1 – P(n). Presumably, the author will proceed next to tell us how to compute Q(n). By the way, when you finish this article, you may want to deal with the problem of calculating P(n) directly. That’s a perfect follow up to the ideas presented here.
很好。作者没有明确的告诉你,但是在文中你可以看出他没办法直接计算 P(n)。 相反他引入了相当于 1-P(n) 的 Q(n)。大概作者接下来会告诉我们如何计算 Q(n)。顺便,当你读完论文,可能想解决直接计算 P(n) 的方法。这是个很好的 想法。
R: First things first.
那就尽快动手吧。
P: Ok. So once we know Q(n), then what?
OK,我们知道 Q(n) 以后,接下来干什么?
R: Then we can get P(n). Because if Q(n) = 1 – P(n), then P(n) = 1 – Q(n). Fine, but why is Q(n) = 1 – P(n)? Does the author assume this is obvious?
我们可以得到 P(n)。因为如果 Q(n) = 1 - P(n),那么 P(n) = 1 - Q(n)。很 好,但是为什么 Q(n) = 1 - P(n)?作者认为这很明显么?
P: Yes, he does, but what’s worse, he doesn’t even tell us that it is obvious. Here’s a rule of thumb: when an author says clearly this is true or this is obvious, then take 15 minutes to convince yourself it is true. If an author doesn’t even bother to say this, but just implies it, take a little longer.
是啊,他这么想,更糟糕的是他甚至没说这是显然的。有一个基本守则:如果某 作者对你说这是显然的或肯定为真,那就应该能在 15 分钟内向你说明。如果该 作者甚至懒得说出这一点,只是暗示了一下,大概会花更长时间。
R: How will I know when I should stop and think?
我怎么知道我应该在哪里停下来?
P: Just be honest with yourself. When in doubt, stop and think. When too tired, go watch television.
诚实的面对自己就好了。因惑的时候就停下来想一想。太累了就去看看电视。
R: So why is Q(n) = 1 – P(n)?
那么为什么 Q(n) = 1 – P(n) ?
P: Let’s imagine a special case. If the chance of getting two or more of the same birthdays is 1/3, then what’s the chance of not getting two or more?
让我们想像一个特殊的情况。如果有两个或更多的人同一生日的机率是 1/3。那 么没有两个或更多重复的机率是多少?
R: It’s 2/3, because the chance of something not happening is the opposite of the chance of it happening.
是 2/3 。因为没发生某事的机率是发生此事件的机率的互斥数。
Make the Idea Your Own 建立你自己的思想
P: Well, you should be careful when you say things like opposite, but you are right. In fact, you have discovered one of the first rules taught in a course on probability. Namely, that the probability that something will not occur is 1 minus the probability that it will occur. Now go on to the next paragraph.
很好,讨论对互斥性的时候要小心一些,不过你答对了。事实上你已经在概率论 的道路上发现了第一个法则。即某事件发生的概率是 1 减它没有发生的概率。 现在我们讨论下一部分。
R: It seems to be explaining why Q(n) is equal to long complex-looking formula shown. I will never understand this.
看来这解释了为什么 Q(n) 等于看起来这么复杂的公式。我可能永远也理解不了。
P: The formula for Q(n) is tough to understand and the author is counting on your diligence, persistence, and/or background here to get you through.
Q(n) 的公式确实很难理解,作者也是像你一样坚持不懈的计算和推导中才做到。
R: He seems to be counting all possibilities of something and dividing by the total possibilities, whatever that means. I have no idea why.
不管怎么说,看起来他计算了某事所有的可能性再除总的可能性。我不太理解。
P: Maybe I can fill you in here on some background before you try to check out any more details. The probability of the occurrence of a particular type of outcome is defined in mathematics to be: the total number of possible ways that type of outcome can occur divided by the total number of possible outcomes. For example, the probability that you throw a four when throwing a die is 1/6. Because there is one possible 4, and there are six possible outcomes. What’s the probability you throw a four or a three?
可能在你发掘出更多细节前,我可以给你一些背景知识。某一特定类型的事件发 生的概率在数学上定义为:此类事件所有可能发生的结果总和除以此类型事件可 能发生的总数。例如,你扔骰子时扔出 4 的概率是 1/6.因为有六种可能,4 是其中一种。你扔出 4 或 3 的概率是多少?
R: Well I guess 2/6 (or 1/3) because the total number of outcomes is still six but I have two possible outcomes that work.
我猜是 2/6 (即 1/3)。因为所有可能结果的总数是6,但是我有两种可能的结 果。
P: Good. Here’s a harder example. What about the chance of throwing a sum of four when you roll two dice? There are three ways to get a four (1-3, 2-2, 3-1) while the total number of possible outcomes is 36. That is 3/36 or 1/12. Look at the following 6 by 6 table and convince yourself.
很好,有个更难一点儿的例子。当你扔两个骰子的时候,扔出 4 的概率是多大 呢?在所有36种可能中,有三种可能会扔出四(1-3,2-2,3-1)。即3/36,也 就是 1/12。你可以根据下面的 6x6 表格验证一下。
1-1, 1-2, 1-3, 1-4, 1-5, 1-6
2-1, 2-2, 2-3, 2-4, 2-5, 2-6
3-1, 3-2, 3-3, 3-4, 3-5, 3-6
4-1, 4-2, 4-3, 4-4, 4-5, 4-6
5-1, 5-2, 5-3, 5-4, 5-5, 5-6
6-1, 6-2, 6-3, 6-4, 6-5, 6-6
What about the probability of throwing a 7?
扔出 7 的机率是多少?
R: Wait. What does 1-1 mean? Doesn’t that equal 0?
等等, 1-1 是什么意思?不应该是0么?
P: Sorry, my bad. I was using the minus sign as a dash, just to mean a pair of numbers, so 1-1 means a roll of one on each die - snake eyes.
对不起,我的错。我推进的太快了。这里只代表一对数字, 1-1 表示扔出两个 1.
R: Couldn’t you have come up with a better notation?
你不能找个更好的标注法么?
P: Well maybe I could/should have, but commas would look worse, a slash would look like division, and anything else might be just as confusing. We aren’t going to publish this transcript anyway.
说不定有更好的方法,不过逗号看起来好浪费,斜杠看起来像除法,每一种看起 来都有些岐义。不管怎么说,我们又不会拿这个手抄本去出版。
R: That’s a relief. Well, I know what you mean now. To answer your question, I can get a seven in six ways via 1-6, 2-5, 3-4, 4-3, 5-2, or 6-1. The total number of outcomes is still 36, so I get 6/36 or 1/6. That’s weird, why isn’t the chance of rolling a 4 the same as for rolling a 7?
好的,我知道你的意思了。为了回答你的问题,我找到了六种可能的组合 1-6, 2-5, 3-4, 4-3, 5-2, 和 6-1 。结果的总数还是36,所以我求得 6/36,即 1/6。这太奇怪了,为什么 扔出 4 的机率和 扔出 7 的不一样?
P: Because not every sum is equally likely. The situation would be very different if we were simply spinning a wheel with the sums 2 through 12 listed in equally spaced intervals. In that case, each one of the 11 sums would have probability 1/11.
因为不是每种总数都一样。这种情况跟我们简单的在 12 种平均分布的样本中任 取两个不一样。这种情况下,11 个样本中任选一个的机率是 1/11。
R: Okay, now I am an expert. Is probability just about counting?
OK,现在我也是专家了。概率就是数数嘛。
P: Sometimes, yes. But counting things is not always so easy.
某种意义上是这样,不过有时候数数也很难啊。
R: I see, let’s go on. By the way, did the author really expect me to know all this? My friend took Probability and Statistics and I am not sure he knows all this stuff.
我懂,咱们继续。顺便问一句,作者真的认为我应该对这些都懂吗?我的朋友就 是搞概率和统计的,我觉得他也未必都懂。
P: There’s a lot of information implied in a small bit of mathematics. Yes, the author expected you to know all this, or to discover it yourself just as we have done. If I hadn’t been here, you would have had to ask yourself these questions and answer them by thinking, looking in a reference book, or consulting a friend.
这些东西只是数学知识的沧海一粟。没错,作者希望你都懂,或者就像你刚才那 样去学习它们。如果我不在这儿,你可以向自己提问,然后通过思考,查阅参考 书或请教朋友去弄懂它们。
R: So the chance that there are no two people with the same birthday is the number of possible sets of n birthdays without a duplicate divided by the total number of possible sets of n birthdays.
所以没有两人生日相同的机率是n个人所有可能的生日的总数除以所有不重复生 日的组合数目的商。
P: Excellent summary.
完全正确。
R: I don’t like using n, so let me use 30. Perhaps the generalization to n will be easy to see.
我不喜欢用 n,那我用 30。可能用 n 来表示更为通用。
P: Great idea. It is often helpful to look at a special case before understanding the general case.
太对了。通常在理解通用情况时我们会先尝试特定的情景。
R: So how many sets of 30 birthdays are there total? I can’t do it. I guess I need to restrict my view even more. Let’s pretend there are only two people.
那么 30 人生日总共有多少种组合?我算不出来,大概得再加一些限制。让我们 假设只有两个人的情况。
P: Fine. Now you’re thinking like a mathematician. Let’s try n = 2. How many sets of two birthdays are there total?
很好,现在你开始像个数学家一样思考了,让我们考虑一下 n = 2。有多少种生 日组合?
R: I number the birthdays from 1 to 365 and forget about leap years. Then these are the all the possibilities:
忽略闰年,记可能的生日为1到365.下面是所有的可能:
1-1, 1-2, 1-3, ... , 1-365,
2-1, 2-2, 2-3, ... , 2-365,
...
365-1, 365-2, 365-3, ... , 365-365.
P: When you write 1-1, do you mean 1-1 = 0, as in subtraction?
这里你写的 1-1 不是 1-1=0?
R: Stop teasing me. You know exactly what I mean.
别想糊弄我,你知道我的意思。
P: Yes I do, and nice choice of notation I might add. Now how many pairs of birthdays are there?
好的,看来我选的这个标记还挺好用。现在有多少对生日组合?
R: There are 365 × 365 total possibilities for two people.
两个人的话共有 365 x 365 种可能。
P: And how many are there when there are no duplicate birthdays?
那么有多少种不重复的组合?
R: I can’t use 1-1, or 2-2, or 3-3 or ... 365-365, so I get
去掉 1-1 或 2-2 或 3-3……那么得到
- ::
- 1-2, 1-3, ... , 1-365, 2-1, 2-3, ... , 2-365, ... 365-1, 365-2, ... , 365-364
The total number here is 365 × 364 since each row now has 364 pairs instead of 365.
总共有 365 x 364 种,因为现在每行有 364 对而不是 365。
P: Good. You are going a little quickly here, but you’re 100% right. Can you generalize now to 30? What is the total number of possible sets of 30 birthdays? Take a guess. You’re getting good at this.
很好。这次你做的很快,不过完全弄对了。30 人生日的总的样本数是多少?猜 猜看,你很擅长这个的。
R: Well if I had to guess, (it’s not really a guess, after all, I already know the formula), I would say that for 30 people you get 365 × 365 ×... × 365, 30 times, for the total number of possible sets of birthdays.
OK让我猜一下,(也不算真的猜,毕竟我知道公式),30 人生日的所有可能样 本是 365 x 365 x 365 ... x365,共 30 次。
P: Exactly. Mathematicians write 365 30 . And what is the number of possible sets of 30 birthdays without any duplicates?
很好,数学家会将其写做 365 30 。那么 30 个不重复的生日 组合会是多少?
R: I know the answer should be 365 × 364 × 363 × 362 × ... × 336, (that is, start at 365 and multiply by one less for 30 times), but I am not sure I really see why this is true. Perhaps I should do the case with three people first, and work my way up to 30?
我知道这个答案应该是 365 × 364 × 363 × 362 × ... × 336 ,(即从 365 开始逐项递减相乘,共 30 项),但是我不确信它是对的。或许我应该先算 一下三个人的,然后逐项增加到 30?
P: Splendid idea. Let’s quit for today. The whole picture is there for you. When you are rested and you have more time, you can come back and fill in that last bit of understanding.
这个想法非常好。今天先到这里。你已经有了自己的构思。等你休息一下,有了 时间,可以回来完成最终的学习和理解。
R: Thanks a lot; it’s been an experience. Later.
非常感谢,我学到了很多知识。再见。