2010年11月21日星期日

haskell shell prototype

只是一个非常土的原型……

效果:
*Funs> select (from "/home/march/temp/text/\\.(tex|csv)$") >>= whereLines "end"
["\\end{center}","\\end{document}"]
代码:


module Funs where
import Data.List
import Monad
import System.Directory
import System.FilePath
import System.IO
import Text.Regex.PCRE

data Store t p = Text p 
               | Csv p
               | Bin p
               | File p

selectFrom (Text store) = do
  content <- readFile store
  return$lines content

unionIt :: IO [[String]] -> IO[String]
unionIt = liftM concat

select :: IO [FilePath] -> IO [String]
select paths = paths 
               >>= \ps -> unionIt(filterM doesFileExist ps 
                              >>= mapM (\p -> selectFrom (Text p)))

from :: FilePath -> IO [FilePath]
from path = let (dir, files) = splitFileName path
            in liftM(\lines->[dir++line | line <- lines, line=~files])
                   ((getDirectoryContents.takeDirectory) dir)

whereLines :: String -> [String] -> IO [String]
whereLines reg lines = return $filter (\line -> line =~ reg :: Bool) lines
这段代码已经包含了基本的 select from where 模式,甚至可以说已经是一个实 用的实现,但是对于我来说,期待它有进一步的功能:
  • 有真正的前端脚本
  • 可以在 select 后设定输出字段,如匹配文件字和行号,以及进一步用正则对行 文件分组
  • 将 where 嵌入 select 过程,真正实现流式过滤
  • 支持 limit offset
  • 支持CSV
  • 支持 group by … having
  • 支持 order by

2010年11月18日星期四

《Python速成讲座》使用的新幻灯设置

上次的幻灯中,我对平时使用的 latex 代码做了一些调整,新的引言区如下:

\documentclass[utf8x, notes=hide]{beamer}

%\usepackage[bars]{beamerthemetree} % Beamer Theme v 2.2
\usetheme{boxes} % Beamer theme
\usecolortheme{seahorse} % Beamer color theme

\usepackage[boldfont,slantfont]{xeCJK}
\usepackage{fontspec}
\setmainfont{DejaVu Serif}
\setsansfont{DejaVu Sans}
\setmonofont{DejaVu Sans Mono}%{Monaco}
\setCJKmainfont{文泉驿正黑}
\setCJKsansfont{文泉驿微米黑}
\setCJKmonofont{WenQuanYi Micro Hei Mono}
\setCJKfamilyfont{tt}{Monaco}

\usepackage{color}
\definecolor{listinggray}{gray}{0.9} 
\usepackage{xcolor}

\usepackage{listings}
\lstset{language=Python,
numbers=left,
backgroundcolor=\color{listinggray},
frame=single,
framexleftmargin=7mm,
frameshape={RYN}{y}{y}{RYN}}
\usepackage{hyperref}
\usepackage{graphicx}
其中最主要的改动是利用xeCJK 对中西文字体做了定制,以及 lstlisting 代码区的定制(圆角、显示行号)

2010年10月26日星期二

[翻译]如何阅读数学

This article is part of the book Rediscovering Mathematics, which is due out in early 2011. - Rediscovering Mathematics: Patriot Ledger

How to Read Mathematics 如何阅读数学

Mathematics is “a language that can neither be read nor understood without initiation.”
未经启蒙的话,数学是门既不可读又无法理解的语言。
A reading protocol is a set of strategies that a reader must use in order to benefit fully from reading the text. Poetry calls for a different set of strategies than fiction, and fiction a different set than non-fiction. It would be ridiculous to read fiction and ask oneself what is the author’s source for the assertion that the hero is blond and tanned; it would be wrong to read non-fiction and not ask such a question. This reading protocol extends to a viewing or listening protocol in art and music. Indeed, much of the introductory course material in literature, music and art is spent teaching these protocols.
阅读的原则是一些可以让读者受益的方法。诗歌和小说有不同的方法,小说与非 虚构有不同的方法。读了小说以后就要求证主人公的原型和作者写的一样金发碧眼而 且有褐色皮肤就太可笑了。读了非虚构的文学以后不对这些有质疑也不对。这是 欣赏艺术和音乐的原则。的确,很多文学、音乐和艺术作品已经过时了,不再适 合这个原则。
Mathematics has a reading protocol all its own, and just as we learn to read literature, we should learn to read mathematics. Students need to learn how to read mathematics, in the same way they learn how to read a novel or a poem, listen to music, or view a painting. Ed Rothstein’s book, Emblems of Mind, a fascinating book emphasizing the relationship between mathematics and music, touches implicitly on the reading protocols for mathematics.
数学有其自己的阅读原则,就像我们学习文学阅读,我们也应该学习数学阅读。 就像学生需要学会如何阅读长篇小说和诗歌,学会听音乐和观赏画作,他们也同 样需要学会阅读数学。Ed Rothstein 的《心灵的象征》,是本迷人的书,强调 数学和音乐之间的关系,触及了数学阅读的原则。
When we read a novel we become absorbed in the plot and characters. We try to follow the various plot lines and how each affects the development of the characters. We make sure that the characters become real people to us, both those we admire and those we despise. We do not stop at every word, but imagine the words as brushstrokes in a painting. Even if we are not familiar with a particular word, we can still see the whole picture. We rarely stop to think about individual phrases and sentences. Instead, we let the novel sweep us along with its flow and carry us swiftly to the end. The experience is rewarding, relaxing and thought provoking.
当我们阅读长篇小说时,会被情节和人物吸引。我们试图理解情节线索,搞明白 它们如何影响人物的各方面。我们会想像那些被我们赞赏和厌恶的人物成为真实 存在的人物。我们欲罢不能,用文字在脑海中描述图景。甚至我们不必关注每个 具体的字眼,仍可以洞察整个画卷。我们很少停下来思考某个段落或句子。相 反,我们任由小说将我们抛出剧情的激流,裹挟着我们直达结尾。这种体验很 棒,放松而剌激。
Novelists frequently describe characters by involving them in well-chosen anecdotes, rather than by describing them by well-chosen adjectives. They portray one aspect, then another, then the first again in a new light and so on, as the whole picture grows and comes more and more into focus. This is the way to communicate complex thoughts that defy precise definition.
小说家经常将角色置于精心构造的桥段中以描写他们,而不是用精挑细选的形容 词来表达。他们描述一个面孔,然后是另一个,然后第一个又出现在新的瞬间。 反复如此直到整个画卷越来越清晰。这是不使用精确定义而表达复杂含义的方法。
Mathematical ideas are by nature precise and well defined, so that a precise description is possible in a very short space. Both a mathematics article and a novel are telling a story and developing complex ideas, but a math article does the job with a tiny fraction of the words and symbols of those used in a novel. The beauty in a novel is in the aesthetic way it uses language to evoke emotions and present themes which defy precise definition. The beauty in a mathematics article is in the elegant efficient way it concisely describes precise ideas of great complexity.
数学思想天然就是定义完全而精确的,故可以在很短的容量中精确的描述。数学 论文和小说都是讲述一个故事以展显复杂的思想,但是数学论文只用小说词语和 符号的一小部分数量就完成这件事。小说之美在于优美的运用语言来唤起情绪但 拒绝精确定义。数学之美在于它优雅高效的表达方式,简洁的描述出复杂事物中 精确的思想。
What are the common mistakes people make in trying to read mathematics? How can these mistakes be corrected?
想要阅读数学的人通常会犯什么错误?如何避免它们?

Don’t Miss the Big Picture 不要忘记大局

“Reading Mathematics is not at all a linear experience ... Understanding the text requires cross references, scanning, pausing and revisiting”
数学阅读不是线性的体验……理解文本需要交叉引用、扫读、停顿和重读。
Don’t assume that understanding each phrase, will enable you to understand the whole idea. This is like trying to see a portrait painting by staring at each square inch of it from the distance of your nose. You will see the detail, texture and color but miss the portrait completely. A math article tells a story. Try to see what the story is before you delve into the details. You can go in for a closer look once you have built a framework of understanding. Do this just as you might reread a novel.
不要想当然的认为理解每一个段落会有助于你理解整个思想。这就像想要从每一 个平方英寸入手去尝试看清你鼻尖前的一幅肖像油画。你可以看到细节,纹理和 颜色,但是看不到整个肖像。每篇数学论文都讲一个故事。在你深入细节前,尝 试了解故事。你可以在构建起理解的框架后再深入。这就像你读小说时一样。

Don’t be a Passive Reader 不要做一个被动的读者

“A three-line proof of a subtle theorem is the distillation of years of activity.  Reading mathematics… involves a return to the thinking that went into thewriting”
某个巧妙的定理的三段证明可以是经年思考的升华。阅读数学……将我们带回写 下它们的那个时刻。
Explore examples for patterns. Try special cases.
探索模式的示例,尝试特定的例子。
A math article usually tells only a small piece of a much larger and longer story. The author usually spends months discovering things, and going down blind alleys. At the end, he organizes it all into a story that covers up all the mistakes (and related motivation), and presents the completed idea in clean neat flow. The way to really understand the idea is to re-create what the author left out. Read between the lines.
数学论文通常只讲述更大更长的故事中的一个片段。作者通常花费几个月发现事 物,走入死胡同。最终,他将这一切写入一个故事,讲述所有的错误(及相关的 动机),然后在清晰的情节中给出完整的思想。阅读字里行间,这种方式确实可 以重现作者表达的思想。
Mathematics says a lot with a little. The reader must participate. At every stage, he/she must decide whether or not the idea being presented is clear. Ask yourself these questions:
数学在很短的篇幅里讲述很多。读者必须参与其中。在每个章节,他/她必须判 断是否明白了其中的思想。自省以下的问题:
Why is this idea true?
为什么这个想法是对的?
Do I really believe it?
我是否确信它?
Could I convince someone else that it is true?
我能说服别人也信服它吗?
Why didn't the author use a different argument?
为什么作者没有用一个不同的论据?
Do I have a better argument or method of explaining the idea?
我有没有一个更好的论据或方法来说明这个思想?
Why didn't the author explain it the way that I understand it?
为什么作者不用我理解到的方式去阐述它?
Is my way wrong?
我的方法错了吗?
Do I really get the idea?
我确实理解这些方法了吗?
Am I missing some subtlety?
我是否搞错了某些细节?
Did this author miss a subtlety?
作者是否错失了某些细节?
If I can't understand the point, perhaps I can understand a similar
but simpler idea?
如果我不能理解这个观点,我是否能理解类似的更简单的思想?
Which simpler idea?
哪个思想更简单?
Is it really necessary to understand this idea?
对于理解这个思想,它确实是必要的吗?
Can I accept this point without understanding the details of why it
is true?
在不理解它成立的细节依据时,我能接受这个观点吗?
Will my understanding of the whole story suffer from not
understanding why the point is true?
在容忍一些我不理解的内容后,我能理解整个故事吗?
Putting too little effort into this participation is like reading a novel without concentrating. After half an hour, you wake up to realize the pages have turned, but you have been daydreaming and don’t remember a thing you read.
在参与过程中投入的努力太少,就会像阅读一本没有精要的长篇小说。半小时过 去,你清醒过来,意识到翻页了,但是完全不记得刚刚的白日梦中读到了什么。

Don’t Read Too Fast 别读的太快

Reading mathematics too quickly results in frustration. A half hour of concentration in a novel might net the average reader 20-60 pages with full comprehension, depending on the novel and the experience of the reader. The same half hour in a math article buys you 0-10 lines depending on the article and how experienced you are at reading mathematics. There is no substitute for work and time. You can speed up your math reading skill by practicing, but be careful. Like any skill, trying too much too fast can set you back and kill your motivation. Imagine trying to do an hour of high-energy aerobics if you have not worked out in two years. You may make it through the first class, but you are not likely to come back. The frustration from seeing the experienced class members effortlessly do twice as much as you, while you moan the whole next day from soreness, is too much to take.
数学阅读过快会导致挫败。在阅读小说时,视读者的经历和小说而异,半小时内可以 读 20 到 60 页并完全理解。同样阅读数学论文半小时,视你数学阅读的经验和 论文而异,可以读 0 到 10行。这些劳作和时间无可取代。但是要记住,你可以 通过实践提升数学阅读技巧,加快速度。和任何其它技巧一样,如果你做的过快, 可能会徒劳无功,浪费自己的积极性。设想一下,如果你兩年没有做过高强度的健 身操,现在来它一个小时。你也许做的一极棒,但是不会想再来一次。当你第二 天在痛苦中抱怨的时候,老兵级别的轻松达到你的两倍进度,这是你望尘莫及的 程度。
For example, consider the following theorem from Levi Ben Gershon’s manuscript Maaseh Hoshev (The Art of Calculation), written in 1321.
例如,考虑 Levi Ben Gershon 1321 年的 Maaseh Hoshev 手稿(计算的艺术)。
“When you add consecutive numbers starting with 1, and the number of numbers you add is odd, the result is equal to the product of the middle number among them times the last number.” It is natural for modern day mathematicians to write this as:
“当你从 1 开始连续的累加数值,累加奇数个,其结果等于中位数与最后一个数 的积。”现代数学写作:
_images/image002.gif
A reader should take as much time to unravel the two-inch version as he would to unravel the two-sentence version. An example of Levi’s theorem is that 1 + 2 + 3 + 4 + 5 = 3×5.
读者花在理解两英寸版本上的时间应该与两句版本一样。例如,Levi’s 定理的 一个示例如下: 1 + 2 + 3 + 4 + 5 = 3×5 

Make the Idea your Own 建立你自己的思想

The best way to understand what you are reading is to make the idea your own. This means following the idea back to its origin, and rediscovering it for yourself. Mathematicians often say that to understand something you must first read it, then write it down in your own words, then teach it to someone else. Everyone has a different set of tools and a different level of “chunking up” complicated ideas. Make the idea fit in with your own perspective and experience.
阅读时最好的理解方式是建立你自己的思想。这意味着追寻思想的脉络,重 新发现它。数学家们经常讲你要理解某事就要先去读它,然后用自己的语言写下 来,再去向别人传授。每个人都有独特的工具集,“归纳”复杂事物的层次也不同。 让这个思想符合你自己的想法和体验。
“When I use a word, it means just what I choose it to mean”
“当我选择一个词语,它就是我为它选择的含义”
(Humpty Dumpty to Alice in Through the Looking Glass by Lewis Carroll)
“The meaning is rarely completely transparent, because every symbol or word already represents an extraordinary condensation of concept and reference”
这意味着很少有完整的说明,因为每一个符号和词语都代表了一个章节的缩写或引用参考
[4]
A well-written math text will be careful to use a word in one sense only, making a distinction, say, between combination and permutation (or arrangement). A strict mathematical definition might imply that “yellow rabid dog” and “rabid yellow dog” are different arrangements of words but the same combination of words. Most English speakers would disagree. This extreme precision is utterly foreign to most fiction and poetry writing, where using multiple words, synonyms, and varying descriptions is de rigueur.
书写良好的数学文本会非常注意用语无岐义,区分明显,如组合与排列(或放 置)。严格的数学定义可以说明“黄疯狗”和“疯黄狗”是不同的词语排列方 式,但它们是同样的组合。大多数说英语的人都不会同意这个。这种极端精确的 用法对大多数小说和诗绝对都是陌生的,它们出于礼仪需要使用复合词,同义词 和各种描述。
A reader is expected to know that an absolute value is not about some value that happens to be absolute, nor is a function about anything functional.
对读者来说,他应该知道绝对值不是绝对会出现的一些值,也不是某种有实用价 值的函数。
A particular notorious example is the use of “It follows easily that” and equivalent constructs. It means something like this:
有个特别臭名昭著的例子是滥用“显然可以得到”及其等价物,它相当于下面 的意思:
One can now check that the next statement is true with a certain amount of essentially mechanical, though perhaps laborious, checking. I, the author, could do it,but it would use up a large amount of space and perhaps not accomplish much, since it'd be best for you to go ahead and do the computation to clarify for yourselfwhat's going on here.  I promise that no new ideas are involved, though of course you might need to think a little in order to find just the right combination ofgood ideas to apply.
现在可以用一系列明确的步骤机械的判明下一表述的正确性,即使可能是很繁重 的步骤。我,即作者,可以做到它,但是它可能会占用大量的空间,而这可能无 法接受,而由你来完成这些计算,去验证它更好。我承诺不会有什么新的思想再 引入进来,当然你还是需要稍动一下脑子,把好点子正确的组合起来。
In other words, the construct, when used correctly, is a signal to the reader that what’s involved here is perhaps tedious and even difficult, but involves no deep insights. The reader is then free to decide whether the level of understanding he/she desires requires going through the details or warrants saying “Okay, I’ll accept your word for it.”
另一方面,使用得当的话,这种形式很有用。它给读者一个提示,这里可以参与 进来,也许很复杂,很困难,但是不需要更深入的洞见。读者可以自己决定其理 解层次,可以深入细节,也可以讲“OK,我相信你的说法。”
Now, regardless of your opinion about whether that construct should be used in a particular situation, or whether authors always use it correctly, you should understand what it is supposed to mean. “It follows easily that” does not mean
于是,不管你怎么想,要么这种形式用在非常特殊的情况,要么作者总是很得体 的使用它,你应该能理解它的含义。“显然可以得到”不意味着
if you can’t see this at once, you’re a dope,
如果你一下子看不懂,你就是个笨蛋,
neither does it mean
也不表示
this shouldn’t take more than two minutes,
这玩意儿用不了两分钟就能看明白,
but a person who doesn’t know the lingo might interpret the phrase in the wrong way, and feel frustrated. This is apart from the issue that one person’s tedious task is another person’s challenge, so the author must correctly judge the audience.
但是不懂这些隐语的人会因为误解了内容而感到挫折。甲之废话,乙之真言,这 一直是争议所在,作者要恰当的评估读者的水平。

Know Thyself 了解你自己

Texts are written with a specific audience in mind. Make sure that you are the intended audience, or be willing to do what it takes to become the intended audience.
文章总是为由假想中的读者写就,请确认你正是这样的读者,或者至少想要变成一个 适宜的读者。
T.S.Eliot’s
A Song for Simeon:
西面颂歌 [5] 
Lord, the Roman hyacinths are blooming in bowls and
The winter sun creeps by the snow hills;
The stubborn season has made stand.
My life is light, waiting for the death wind,
Like a feather on the back of my hand.
Dust in sunlight and memory in corners
Wait for the wind that chills towards the dead land.
For example, Eliot’s poem pretty much assumes that its readers are going to either know who Simeon was or be willing to find out. It also assumes that its reader will be somewhat experienced in reading poetry and/or is willing to work to gain such experience. He assumes that they will either know or investigate the allusions here. This goes beyond knowledge of things like who Simeon was. For example, why are the hyacinths “Roman?” Why is that important?
举个例子,艾略特这首诗就做了不少假设:读者要么知道西面是谁,要么愿意去 了解他是谁;读者要么有一定诗歌鉴赏基础,要么愿意提高这方面的能力;读者 要么知晓诗中的典故,要么愿意研究个中精妙,比如为什么是“罗马”风信子, 为什么这个意象很重要。
Elliot assumes that the reader will read slowly and pay attention to the images: he juxtaposes dust and memory, relates old age to winter, compares waiting for death with a feather on the back of the hand, etc. He assumes that the reader will recognize this as poetry; in a way, he’s assuming that the reader is familiar with a whole poetic tradition. The reader is supposed to notice that alternate lines rhyme, but that the others do not, and so on.
艾略特还认定读者们会慢慢阅读这首诗,并注意到那些意象:回忆与灰尘的并 置,年老与冬日的关联,将等待死亡降临比作手背上的羽毛等等。读者应该能够 读到这首诗的本质,换句话说,读者应该通晓诗歌的创作传统;读者还应注意到 这首诗押隔行韵等等……这些都是艾略特对自己读者做的假设。
Most of all, he assumes that the reader will read not only with the mind, but also with his/her emotions and imagination, allowing the images to summon up this old man, tired of life but hanging on, waiting expectantly for some crucial event, for something to happen.
大多数情况,他假设读者不仅仅用思想,也用感情和想像去阅读,让已经对人生 感到疲惫,但是仍在坚持,在期待着某些重要的大事将会发生的老人形像鲜明起来。
Most math books are written with assumptions about the audience: that they know certain things, that they have a certain level of “mathematical maturity,” etc. Before you start to read, make sure you know what the author expects you to know.
大多数数学书假定读者为:他们有一定程度的基础,已经有一个确定的“数学基 础”等等。在你开始阅读之前,请确认你已经了解作者希望你了解的知识。

An Example of Mathematical Writing 一个数学写作的例子

To allow an opportunity to experiment with the guidelines presented here, I am including a small piece of mathematics often called the birthday paradox. The first part is a concise mathematical article explaining the problem and solving it. The second is an imaginary Reader’s attempt to understand the article by using the appropriate reading protocol. This article’s topic is probability and is accessible to a bright and motivated reader with no background at all.
为了做个实验来展示这里介绍的能力和方法,我引入了名为“生日悖论”的数学 讨论。第一分部分是是一篇数学论文,讲解这个问题并解决它,第二部分是一个 虚构的读者尝试使用之前的阅读规则以理解它。该论文的主题是概率论,它可以 为完全没有知识背景但聪明、积极的读者所理解。

The Birthday Paradox 生日悖论

A professor in a class of 30 random students offers to bet that there are at least two people in the class with the same birthday (month and day, but not necessarily year). Do you accept the bet? What if there were fewer people in the class? Would you bet then?
有个教授打赌说他班上的30个学生中至少有两个生日在同一天(同月同日,可以 是不同年)。你要不要跟他打赌?如果班上的人再少些呢?还要不要赌?
Assume that the birthdays of n people are uniformly distributed among 365 days of the year (assume no leap years for simplicity). We prove that, the probability that at least two of them have the same birthday (month and day) is equal to:
假设 n 个人的生日平均分布在一年的 365 天中(简单起见,不考虑闰年)。我 们证明,至少有两人生日相同的概率为:
_images/image004.gif
What is the chance that among 30 random people in a room, there are at least two or more with the same birthday? For n = 30, the probability of at least one matching birthday is about 71%. This means that with 30 people in your class, the professor should win the bet 71 times out of 100 in the long run. It turns out that with 23 people, she should win about 50% of the time.
同一个班中随机的 30 人,至少有两人生日相同的机率是多少?令 n = 30 ,至 少两人生日想同的概率是 71% 。这意味着教授一直在找 30 人的班打这个赌, 长期来讲他 100 次里能赢 71 次。23 个人的话,她的赢面是 50% 。
Here is the proof: Let P(n) be the probability in question. Let Q(n) = 1 – P(n) be the probability that no two people have a common birthday. Now calculate Q(n) by calculating the number of n birthdays without any duplicates and divide by the total number of n possible birthdays. Then solve for P(n).
这里有个修正:令 P(n) 为问题中的概率。令 Q(n) = 1 - P(n) 为不存在两个 生日相同的人的概率。现在计算所有 n 个生日的可能性中,全部不重复的概率 Q(n)。然后求得 P(n)。
The total number of n birthdays without duplicates is:
n 个生日的所有不重复的组合是:
365 × 364 × 363 × ... × (365  n + 1).
This is because there are 365 choices for the first birthday, 364 for the next and so on for n birthdays. The total number of n birthdays without any restriction is just 365 n because there are 365 choices for each of n birthdays. Therefore, Q(n) equals
这因为第一个人有365种可能的选择,第二个人有 364 种可能,依次递推直至第 n 个生日。所有 n 种无限制的生日就是 365 n 。因此,Q(n) 等于
_images/image004.gif
Solving for P(n) gives P(n) = 1 – Q(n) and hence our result.
根据 P(n) = 1 - Q(n) ,求得 P(n),解之可得前述答案。

Our Reader Attempts to Understand the Birthday Paradox 阅读理解

In this section, a naive Reader tries to make sense out of the last few paragraphs. The Reader’s part is a metaphor for the Reader thinking out loud, and the Professional’s comments represent research on the Reader’s part. The appropriate protocols are centered and bold at various points in the narrative.
这一节中,一个新手读者想要读懂最后几段。下面的“读者”表示我们虚构的那位读 者,“教授”的注解表示与读者的探讨。在叙述中涉及的几点规则居中加粗表 示。 [6]
My Reader may seem to catch on to things relatively quickly. However, be assured that in reality a great deal of time passes between each of my Reader’s comments, and that I have left out many of the Reader’s remarks that explore dead-end ideas. To experience what the Reader experiences requires much more than just reading through his/her lines. Think of his/her part as an outline for your own efforts.
这位读者看起来好像很快找到了事物的相关性。然而,为了确认这一点,要在他 的评注中花去大量的时间,我会略过那些他钻牛角尖的讨论。为了体验到读者的 经验,就要深入到他/她的字里行见。通过你的努力去想像他/她的思路。
Know Thyself 自省
Reader (R): I don’t know anything about probability, can I still make it through?
我一点儿也不了解概率,我还能读懂它吗?
Professional (P): Let’s give it a try. We may have to backtrack a lot at each step.
让我们试试看,我们可能要回溯好多步。
R: What does the phrase “30 random students” mean?
“任意的” 30 个学生是什么意思?
“When I use a word, it means just what I choose it to mean 当我使用一个词,它就是我为它选择的意思”
P: Good question. It doesn’t mean that we have 30 spacy or scatter-brained people. It means we should assume that the birthdays of these 30 people are independent of one another and that every birthday is equally likely for each person. The author writes this more technically a little further on: “Assume that the birthdays of n people are uniformly distributed among 365 days of the year.”
好问题,它不是说我们有 30 个宽广的或者心不在焉的人。它的意思是我们假定 这 30 个人每个人的生日都独立于其他人,选择机会都完全平等。更技术化一点 的说法是:“假设这 n 个人的生日平均分布于一年的 365 天中。”
R: Isn’t that obvious? Why bother saying that?
这不是很明显么?为什么要说的这么罗嗦?
P: Yes the assumption is kind of obvious. The author is just setting the groundwork. The sentence guarantees that everything is normal and the solution does not involve some imaginitive fanciful science-fiction.
是的,某种意义上来讲这个假设很明显。作者只是做一个背景设定。这个声明确 认每件事都是平凡的,这个解答不会引入什么科幻小说式的想像。
R: What do you mean?
你的意思是?
P: For example, the author is not looking for a solution like this: everyone lives in Independence Land and is born on the 4th of July, so the chance of two or more people with the same birthday is 100%. That is not the kind of solution mathematicians enjoy. Incidentally, the assumption also implies that we do not count leap years. In particular, nobody in this problem is born on February 29. Continue reading.
例如,作者没有做这样的解答:每个人都生活在一个独立的大陆,生于七月四 日,所以每两个或更多的人生日相同的机率为 100%。这不是数学家喜欢的答案。 顺便说一下,这个假设还意味着我们不需要计算闰年,特别是这个问题中没有人 生于二月二十九日。下一个。
R: I don’t understand that long formula, what’s n?
我不理解那个很长的公式,什么是 n?
P: The author is solving the problem for any number of people, not just for 30. The author, from now on, is going to call the number of people n.
作者解决了任意多个人的问题,不止是 30,于是作者称人数为 n。
R: I still don’t get it. So what’s the answer?
我不太理解,所以那个答案是?
Don’t Be a Passive Reader - Try Some Examples 不要做一个被动的读者,尝试一下
P: Well, if you want the answer for 30, just set n = 30.
好的,如果你想知道 30 人时的答案,n 就是 30。
R: Ok, but that looks complicated to compute. Where’s my calculator? Let’s see: 365 × 364 × 363 × ... × 336. That’s tedious, and the final exact value won’t even fit on my calculator. It reads:
OK,不过看起来计算过程很复杂。我的计算器哪儿去了?我们看看: 365 × 364 × 363 × ... × 336 。这也太可怕了,我的计算器都显示不全结果了。 它读作:
2.1710301835085570660575334772481e+76
If I can’t even calculate the answer once I know the formula, how can I possibly understand where the formula comes from?
如果我所知的这个公式都不能拿来让我算一次答案,我怎么理解它呢?
P: You are right that this answer is inexact, but if you actually go on and do the division, your answer won’t be too far off.
没错,答案并不准确。不过,如果你用除法去消元,就根本不会有这么复杂。
R: The whole thing makes me uncomfortable. I would prefer to be able to calculate it more exactly. Is there another way to do the calculation?
这事儿让我很感棘手。我想把它算得更精确一点。有没有其它的算法?
P: How many terms in your product? How many terms in the product on the bottom?
你乘了几顶?底部的式子总共要多少项相乘?
R: You mean 365 is the first term and 364 is the second? Then there are 30 terms. There are also 30 terms on the bottom, (30 copies of 365).
你的意思是 365 是第一项,364 是第二项?那么有 30 项,下面也有 30 项 ,(30 个 365).
P: Can you calculate the answer now?
现在你能计算答案了吗?
R: Oh, I see. I can pair up each top term with each bottom term, and do 365/365 as the first term, then multiply by 364/365, and so on for 30 terms. This way the product never gets too big for my calculator. (After a few minutes)... Okay, I got 0.29368, rounded to 5 places.
哦,我看看。我可以将每一个分子和每一个分母因数配对,第一顶是 365/365, 然后乘 364/365,类推 30 项。这样对我的计算器来说这个乘法不是很大了。 (过几分钟后)……OK,我求得 0.29368。5位精度。
P: What does this number mean?
那么这个整数意味着什么?
Don’t Miss the Big Picture 不要失去大局观
R: I forgot what I was doing. Let’s see. I was calculating the answer for n = 30. The 0.29368 is everything except for subtracting from 1. If I keep going I get 0.70632. Now what does that mean?
我忘了我要干啥了。让我想想,我计算了 n=30 时的答案。从一中减去所求之后 为 0.29368 。接下去我可以求得 0.70632。那么这说明了什么?
P: Knowing more about probability would help, but this simply means that the chance that two or more out of the 30 people have the same birthday is 70,632 out of 100,000 or about 71%.
知道多一点更好理解,不过也可以简单的理解为 100,000 次实验中,有 70,632 次机会,30 个人中有至少两个生日相同,这个机率约为 71%。
R: That’s interesting. I wouldn’t have guessed that. You mean that in my class with 30 students, there’s a pretty good chance that at least two students have the same birthday?
这很有趣,我没有想到。你的意思是我班上有 30 个学生的话,有很大的机会至 少有两个学生生日想同?
P: Yes that’s right. You might want to take bets before you ask everyone their birthday. Many people don’t thinkthat a duplicate will occur. That’s why some authors call this the birthday paradox.
没错,就是这意思。你在问过他们生日前可能很想打这个赌。很多人都没有想到 会有重复出现。这就是为什么有些作者称之为生日悖论。
R: So that’s why I should read mathematics, to make a few extra bucks?
所以我应该读些数学,可以弄点儿外快?
P: I see how that might give you some incentive, but I hope the mathematics also inspires you without the monetary prospects.
这事儿看来给了你一些鼓励,不过我希望数学还能激励你多一些非功利的想法。
R: I wonder what the answer is for other values of n. I will try some more calculations.
我对 n 为其它值的答案很有兴趣,我想再多算几个。
P: That’s a good idea. We can even make a picture out of all your calculations. We could plot a graph of the number of people versus the chance that a duplicate birthday occurs, but maybe this can be left for another time.
是个好主意,我们甚至可以把你所有的计算结果画成图。我们可以来个人数与生 日重复事件之意的关系曲线,不过这事儿可以下次再弄。
R: Oh look, the author did some calculations for me. He says that for n = 30 the answer is about 71%; that’s what I calculated too. And, for n = 23 it’s about 50%. Does that make sense? I guess it does. The more people there are, the greater the chance of a common birthday. Hey, I am anticipating the author. Pretty good. Okay, let’s go on.
瞧,作者为我做了一些计算,他说 n=30 时答案约为 71%,我验算过了。并且, n=23 时答案约为 50%。这可信吗?我觉得可以。选更多的人数,生日重复的机 会总是会更大。嘿,我会抢答了。太棒了。OK,我们继续。
P: Good, now you’re telling me when to continue.
很好,现在你觉得我们可以继续了。
Don’t Read Too Fast 不要读的太快
R: It seems that we are up to the proof. This must explain why that formula works. What’s this Q(n)? I guess that P stands for probability but what does Q stand for?
看起来我们做出了证明。我们一定要弄清公式的原理。什么是 Q(n) ?我猜 P 代表概率的意思,不过 Q 代表什么?
P: The author is defining something new. He is using Q just because it’s the next letter after P, but Q(n) is also a probability, and closely related to P(n). It’s time to take a minute to think. What is Q(n) and why is it equal to 1 – P(n)?
作者定义了很多东西,这里使用 Q 只是因为它是 P 的下一个字母。Q(n) 也是 个概率,它强相关于 P(n)。我们该花几分钟想想了。什么是 Q(n)?为什么它等 于 1 - P(n)?
R: Q(n) is the probability that no two people have the same birthday. Why does the author care about that? Don’t we want the probability that at least two have the same birthday?
Q(n) 是没有生日相同的概率,为什么作者强调这个?我们不能去考虑至少两个 人在同一个生日的概率?
P: Good point. The author doesn’t tell you this explicitly, but between the lines, you can infer that he has no clue how to calculate P(n) directly. Instead, he introduces Q(n) which supposedly equals 1 – P(n). Presumably, the author will proceed next to tell us how to compute Q(n). By the way, when you finish this article, you may want to deal with the problem of calculating P(n) directly. That’s a perfect follow up to the ideas presented here.
很好。作者没有明确的告诉你,但是在文中你可以看出他没办法直接计算 P(n)。 相反他引入了相当于 1-P(n) 的 Q(n)。大概作者接下来会告诉我们如何计算 Q(n)。顺便,当你读完论文,可能想解决直接计算 P(n) 的方法。这是个很好的 想法。
R: First things first.
那就尽快动手吧。
P: Ok. So once we know Q(n), then what?
OK,我们知道 Q(n) 以后,接下来干什么?
R: Then we can get P(n). Because if Q(n) = 1 – P(n), then P(n) = 1 – Q(n). Fine, but why is Q(n) = 1 – P(n)? Does the author assume this is obvious?
我们可以得到 P(n)。因为如果 Q(n) = 1 - P(n),那么 P(n) = 1 - Q(n)。很 好,但是为什么 Q(n) = 1 - P(n)?作者认为这很明显么?
P: Yes, he does, but what’s worse, he doesn’t even tell us that it is obvious. Here’s a rule of thumb: when an author says clearly this is true or this is obvious, then take 15 minutes to convince yourself it is true. If an author doesn’t even bother to say this, but just implies it, take a little longer.
是啊,他这么想,更糟糕的是他甚至没说这是显然的。有一个基本守则:如果某 作者对你说这是显然的或肯定为真,那就应该能在 15 分钟内向你说明。如果该 作者甚至懒得说出这一点,只是暗示了一下,大概会花更长时间。
R: How will I know when I should stop and think?
我怎么知道我应该在哪里停下来?
P: Just be honest with yourself. When in doubt, stop and think. When too tired, go watch television.
诚实的面对自己就好了。因惑的时候就停下来想一想。太累了就去看看电视。
R: So why is Q(n) = 1 – P(n)?
那么为什么 Q(n) = 1 – P(n) ?
P: Let’s imagine a special case. If the chance of getting two or more of the same birthdays is 1/3, then what’s the chance of not getting two or more?
让我们想像一个特殊的情况。如果有两个或更多的人同一生日的机率是 1/3。那 么没有两个或更多重复的机率是多少?
R: It’s 2/3, because the chance of something not happening is the opposite of the chance of it happening.
是 2/3 。因为没发生某事的机率是发生此事件的机率的互斥数。
Make the Idea Your Own 建立你自己的思想
P: Well, you should be careful when you say things like opposite, but you are right. In fact, you have discovered one of the first rules taught in a course on probability. Namely, that the probability that something will not occur is 1 minus the probability that it will occur. Now go on to the next paragraph.
很好,讨论对互斥性的时候要小心一些,不过你答对了。事实上你已经在概率论 的道路上发现了第一个法则。即某事件发生的概率是 1 减它没有发生的概率。 现在我们讨论下一部分。
R: It seems to be explaining why Q(n) is equal to long complex-looking formula shown. I will never understand this.
看来这解释了为什么 Q(n) 等于看起来这么复杂的公式。我可能永远也理解不了。
P: The formula for Q(n) is tough to understand and the author is counting on your diligence, persistence, and/or background here to get you through.
Q(n) 的公式确实很难理解,作者也是像你一样坚持不懈的计算和推导中才做到。
R: He seems to be counting all possibilities of something and dividing by the total possibilities, whatever that means. I have no idea why.
不管怎么说,看起来他计算了某事所有的可能性再除总的可能性。我不太理解。
P: Maybe I can fill you in here on some background before you try to check out any more details. The probability of the occurrence of a particular type of outcome is defined in mathematics to be: the total number of possible ways that type of outcome can occur divided by the total number of possible outcomes. For example, the probability that you throw a four when throwing a die is 1/6. Because there is one possible 4, and there are six possible outcomes. What’s the probability you throw a four or a three?
可能在你发掘出更多细节前,我可以给你一些背景知识。某一特定类型的事件发 生的概率在数学上定义为:此类事件所有可能发生的结果总和除以此类型事件可 能发生的总数。例如,你扔骰子时扔出 4 的概率是 1/6.因为有六种可能,4 是其中一种。你扔出 4 或 3 的概率是多少?
R: Well I guess 2/6 (or 1/3) because the total number of outcomes is still six but I have two possible outcomes that work.
我猜是 2/6 (即 1/3)。因为所有可能结果的总数是6,但是我有两种可能的结 果。
P: Good. Here’s a harder example. What about the chance of throwing a sum of four when you roll two dice? There are three ways to get a four (1-3, 2-2, 3-1) while the total number of possible outcomes is 36. That is 3/36 or 1/12. Look at the following 6 by 6 table and convince yourself.
很好,有个更难一点儿的例子。当你扔两个骰子的时候,扔出 4 的概率是多大 呢?在所有36种可能中,有三种可能会扔出四(1-3,2-2,3-1)。即3/36,也 就是 1/12。你可以根据下面的 6x6 表格验证一下。
1-1, 1-2, 1-3, 1-4, 1-5, 1-6
2-1, 2-2, 2-3, 2-4, 2-5, 2-6
3-1, 3-2, 3-3, 3-4, 3-5, 3-6
4-1, 4-2, 4-3, 4-4, 4-5, 4-6
5-1, 5-2, 5-3, 5-4, 5-5, 5-6
6-1, 6-2, 6-3, 6-4, 6-5, 6-6
What about the probability of throwing a 7?
扔出 7 的机率是多少?
R: Wait. What does 1-1 mean? Doesn’t that equal 0?
等等, 1-1 是什么意思?不应该是0么?
P: Sorry, my bad. I was using the minus sign as a dash, just to mean a pair of numbers, so 1-1 means a roll of one on each die - snake eyes.
对不起,我的错。我推进的太快了。这里只代表一对数字, 1-1 表示扔出两个 1.
R: Couldn’t you have come up with a better notation?
你不能找个更好的标注法么?
P: Well maybe I could/should have, but commas would look worse, a slash would look like division, and anything else might be just as confusing. We aren’t going to publish this transcript anyway.
说不定有更好的方法,不过逗号看起来好浪费,斜杠看起来像除法,每一种看起 来都有些岐义。不管怎么说,我们又不会拿这个手抄本去出版。
R: That’s a relief. Well, I know what you mean now. To answer your question, I can get a seven in six ways via 1-6, 2-5, 3-4, 4-3, 5-2, or 6-1. The total number of outcomes is still 36, so I get 6/36 or 1/6. That’s weird, why isn’t the chance of rolling a 4 the same as for rolling a 7?
好的,我知道你的意思了。为了回答你的问题,我找到了六种可能的组合 1-6, 2-5, 3-4, 4-3, 5-2, 和 6-1 。结果的总数还是36,所以我求得 6/36,即 1/6。这太奇怪了,为什么 扔出 4 的机率和 扔出 7 的不一样?
P: Because not every sum is equally likely. The situation would be very different if we were simply spinning a wheel with the sums 2 through 12 listed in equally spaced intervals. In that case, each one of the 11 sums would have probability 1/11.
因为不是每种总数都一样。这种情况跟我们简单的在 12 种平均分布的样本中任 取两个不一样。这种情况下,11 个样本中任选一个的机率是 1/11。
R: Okay, now I am an expert. Is probability just about counting?
OK,现在我也是专家了。概率就是数数嘛。
P: Sometimes, yes. But counting things is not always so easy.
某种意义上是这样,不过有时候数数也很难啊。
R: I see, let’s go on. By the way, did the author really expect me to know all this? My friend took Probability and Statistics and I am not sure he knows all this stuff.
我懂,咱们继续。顺便问一句,作者真的认为我应该对这些都懂吗?我的朋友就 是搞概率和统计的,我觉得他也未必都懂。
P: There’s a lot of information implied in a small bit of mathematics. Yes, the author expected you to know all this, or to discover it yourself just as we have done. If I hadn’t been here, you would have had to ask yourself these questions and answer them by thinking, looking in a reference book, or consulting a friend.
这些东西只是数学知识的沧海一粟。没错,作者希望你都懂,或者就像你刚才那 样去学习它们。如果我不在这儿,你可以向自己提问,然后通过思考,查阅参考 书或请教朋友去弄懂它们。
R: So the chance that there are no two people with the same birthday is the number of possible sets of n birthdays without a duplicate divided by the total number of possible sets of n birthdays.
所以没有两人生日相同的机率是n个人所有可能的生日的总数除以所有不重复生 日的组合数目的商。
P: Excellent summary.
完全正确。
R: I don’t like using n, so let me use 30. Perhaps the generalization to n will be easy to see.
我不喜欢用 n,那我用 30。可能用 n 来表示更为通用。
P: Great idea. It is often helpful to look at a special case before understanding the general case.
太对了。通常在理解通用情况时我们会先尝试特定的情景。
R: So how many sets of 30 birthdays are there total? I can’t do it. I guess I need to restrict my view even more. Let’s pretend there are only two people.
那么 30 人生日总共有多少种组合?我算不出来,大概得再加一些限制。让我们 假设只有两个人的情况。
P: Fine. Now you’re thinking like a mathematician. Let’s try n = 2. How many sets of two birthdays are there total?
很好,现在你开始像个数学家一样思考了,让我们考虑一下 n = 2。有多少种生 日组合?
R: I number the birthdays from 1 to 365 and forget about leap years. Then these are the all the possibilities:
忽略闰年,记可能的生日为1到365.下面是所有的可能:
1-1, 1-2, 1-3, ... , 1-365,
2-1, 2-2, 2-3, ... , 2-365,
...
365-1, 365-2, 365-3, ... , 365-365.
P: When you write 1-1, do you mean 1-1 = 0, as in subtraction?
这里你写的 1-1 不是 1-1=0?
R: Stop teasing me. You know exactly what I mean.
别想糊弄我,你知道我的意思。
P: Yes I do, and nice choice of notation I might add. Now how many pairs of birthdays are there?
好的,看来我选的这个标记还挺好用。现在有多少对生日组合?
R: There are 365 × 365 total possibilities for two people.
两个人的话共有 365 x 365 种可能。
P: And how many are there when there are no duplicate birthdays?
那么有多少种不重复的组合?
R: I can’t use 1-1, or 2-2, or 3-3 or ... 365-365, so I get
去掉 1-1 或 2-2 或 3-3……那么得到
::
1-2, 1-3, ... , 1-365, 2-1, 2-3, ... , 2-365, ... 365-1, 365-2, ... , 365-364
The total number here is 365 × 364 since each row now has 364 pairs instead of 365.
总共有 365 x 364 种,因为现在每行有 364 对而不是 365。
P: Good. You are going a little quickly here, but you’re 100% right. Can you generalize now to 30? What is the total number of possible sets of 30 birthdays? Take a guess. You’re getting good at this.
很好。这次你做的很快,不过完全弄对了。30 人生日的总的样本数是多少?猜 猜看,你很擅长这个的。
R: Well if I had to guess, (it’s not really a guess, after all, I already know the formula), I would say that for 30 people you get 365 × 365 ×... × 365, 30 times, for the total number of possible sets of birthdays.
OK让我猜一下,(也不算真的猜,毕竟我知道公式),30 人生日的所有可能样 本是 365 x 365 x 365 ... x365,共 30 次。
P: Exactly. Mathematicians write 365 30 . And what is the number of possible sets of 30 birthdays without any duplicates?
很好,数学家会将其写做 365 30 。那么 30 个不重复的生日 组合会是多少?
R: I know the answer should be 365 × 364 × 363 × 362 × ... × 336, (that is, start at 365 and multiply by one less for 30 times), but I am not sure I really see why this is true. Perhaps I should do the case with three people first, and work my way up to 30?
我知道这个答案应该是 365 × 364 × 363 × 362 × ... × 336 ,(即从 365 开始逐项递减相乘,共 30 项),但是我不确信它是对的。或许我应该先算 一下三个人的,然后逐项增加到 30?
P: Splendid idea. Let’s quit for today. The whole picture is there for you. When you are rested and you have more time, you can come back and fill in that last bit of understanding.
这个想法非常好。今天先到这里。你已经有了自己的构思。等你休息一下,有了 时间,可以回来完成最终的学习和理解。
R: Thanks a lot; it’s been an experience. Later.
非常感谢,我学到了很多知识。再见。
[1]Emblems of Mind, Edward Rothstein, Avon Books, page 15.
[2]ibid, page 16.
[3]ibid, page 38
[4]ibid, page 16.
[5]《西面颂歌》是大诗人艾略特所作,西面是圣经故事中的先知。能力所限,此处不译——译者。
[6]由于使用的文档工具问题,您看到的可能只是加粗,我尽量在 HTML 格式中还原原文的格式,并单独制作一份 Latex,直接写 HTML 固然可以完全控制格式,但是这对我太影响翻译的效率了——译者。

2010年10月9日星期六

按如下内容 sphinx 项目的 conf.py ,就可以使其生成的.tex在xelatex命名下正确输出中文PDF 

latex_preamble = '''\usepackage{xunicode} 
\usepackage{xltxtra} 
\usepackage{verbatim} 
\usepackage{fontspec} 
\setsansfont{UMingCN} 
\setromanfont{UKaiCN} 
\XeTeXlinebreaklocale "zh" 
\XeTeXlinebreakskip = 0pt plus 1pt 
''' 
字体设置根据自己的机器和爱好去修改就可以了。关键是这样已经达到了一个尽可能小的配置集,不需要xeCJK或CJKutf8,这个配置集同样可以用于其它xelatex配置,这样,基于sphinx、xelatex(texlive),可以形成一个比较理想的html+pdf文档编写环境