[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

1015.0. "Probabilty/Megabucks question" by MSD35::ASFOUR () Thu Jan 19 1989 16:52

    I have a probabilty/megabucks question.
    
    Given the set of integers 1 through 36. If you pick any 6 numbers
    (once you pick a number, it's out of the set, and you can't pick
    it again), then what kind of distribution do you get for the sum
    of these numbers?
    
    A friend of mine claims that the distribution of the sum is normal.
    Can anyone prove or disprove this?
    
    thx.

T.R	Title	User	Personal Name	Date	Lines
1015.1		AITG::DERAMO	Daniel V. {AITG,ZFC}:: D'Eramo	`Thu Jan 19 1989 18:18`	11
	If you select k distinct numbers from 1, ..., n and add them, then the distribution of the sum will approach a normal distribution as k and n increase "to infinity" with k/n decreasing to zero [this last restriction may not be necessary, but some restriction is, as the sum for k=n is always the same value n(n+1)/2]. Your friend would be correct if he said it was "roughly normal." But it isn't exactly a normal distribution. Dan
1015.2	Exact formula for "with replacement" case.	CADSYS::COOPER	Topher Cooper	`Fri Jan 20 1989 13:24`	60
	As it happens, I've been working on a related problem: the sum of of k integers drawn uniformly from the range 0..N-1 with replacement. This has application to a statistical procedure used widely in parapsychology in the last decade or so -- the sum of ranks statistic (I'll be glad to describe the statistic and its application if anyone is interested). Anyway if you let S be a particular sum, the probability of that sum is: k S/N E / k \ / S - NE + k - 1 \ p(S) = (1/N ) SUM (-1) ( ) ( ) E=0 \ E / \ k - 1 / And the cumulative probability is obviously just: k S M/N E / k \ / M - NE + k - 1 \ P(S) = (1/N ) SUM SUM (-1) ( ) ( ) M=0 E=0 \ E / \ k - 1 / If S1 is the sum of k integers drawn uniformly from the range 1..N, then simply replace S by S1-k in the above equations. What is the effect of non-replacement on the above? My intuition is as follows: 1) The extremes of the lower tail will be "pushed up" towards the middle since much of the lower tail is created by repetitions of small values (for example, the smallest value with non-zero probability in the with replacement case is 0, with probability 1/Nk; the smallest value with non-zero probability in the without replacement case is (k2 - k)/2 with probability k!(N-k)!/N! ). 2) The extremes of the upper tail will be "pushed down" towards the middle. The curve will continue to be perfectly symmetric. 3) The middle of the curve will retain roughly the same relative shape, but will become "rougher" in fine texture (the with replacement case is piecewise rather "smooth"). Various combinations are pulled out rather randomly in relationship to position, roughening it all up. I am unsure as to whether the without replacement case is more or less "normal"; I rather suspect that it depends on your measure of similarity. The major interference with normality in the with replacement case for small k is that the extreme tails are too heavy, which would argue for the without replacement case being "closer to normal". But this is only in the extreme tails, which only concerns very rare events, and so is rarely applicable. The roughening in the middle which I speculate on, would tend to make the highly probable center portion less normal. If you measure "normality" in terms of overall shape, therefore, I would guess that the without replacement case would be perhaps a bit more normal. If, on the other hand, you measure "normality" by something like the expected* deviation then the with replacement would probably win. Topher
1015.3	Another great lottery system bites the dust	POOL::HALLYB	The smart money was on Goliath	`Fri Jan 20 1989 17:26`	9
	> A friend of mine claims that the distribution of the sum is normal. Surely your friend isn't then going to proceed to tell you that therefore YOU should pick lottery tickets where the numbers total to 111 (which is all of mean/median/mode of the distribution). It doesn't work that way! John
1015.4	A way to prove it?	MSD35::ASFOUR		`Mon Jan 23 1989 17:15`	33
	Thanks for the reponse. I was trying to derive the probability of the sum at home over the weekend. I looked in my probabilty book (the famous Papoulis!) and was reminded that if you have statistically independant random variables with different distributions, then the distribution of their sum is the convolution of their disributions. The text proves this for the sum of two r.v.'s. But it seems that you can extend it for more (or can you?) Since we don't have replacement, the probability distribution of the first number is uniform with p(first_digit)=1/36.The distribution of the second digit is also uniform with p(s_d)=1/35. But, are these distributions statistically independant? Can we apply the convolution rule, here? Does what I'm saying make sense? (I was never any good at probability!) If the concept is sane, then the p() of the sum of two numbers would be a convolution of two uniform distributions, which would look like a triangle. \| \| \| /\ \| \| \| / \ \|----+ \|----+ \| / \ \| \| * \| \| = \|/ \ +----+-------- +----+--------- +------------ For the sum of three, it's quadratic, etc... which might lead to something close to normal as you sum up more and more numbers...I haven't crunched it out, yet. Any takers?
1015.5	... you should see the total!	POOL::HALLYB	The smart money was on Goliath	`Tue Jan 24 1989 13:05`	14
	> was reminded that if you have statistically independant random There is no "a" in "independent". Anyway, no, the numbers are not independent. Clearly, if the first number is 15 then the second number can't be 15. To my mind the theory part of this is hard (spelling is easier :-). Maybe you can just write a program to brute-force through all the possible combinations and count 'em up yourself. The exact distribution would be of interest to everybody. I think all you have to do is look elsewhere in this file (??) for a permutation generator and add up the first 6 numbers from each permutation. The average will be 111...
1015.6	The brute force solution	MSD35::ASFOUR		`Wed Jan 25 1989 17:36`	314
	I brute forced the solution as .5 suggested. I picked every possible sum (1402410240 of them!) and figured out the frequency of each. Now, since this is a distribution, the area under the curve should=1. So, I normalized the frequencies by dividing each frequency by 1402410240 to make their sum =1. (note: although there are only 1947792 different combinations, ) ( they can be made in 1402410240 ways...I think! please correct ) (me if I'm wrong. ) Now, if this where a normal distribution, then its mean would be 111, and the peak at the mean is peak= 1/(sqrt(2pi)sigma) but we know from the histogram that the peak frequency (which occurs at sum=111), is 23136480. From this information we can conclude that if the distribution where normal, it would be N(111,24.2) So I plotted both the normalized histogram, and the Guassian with mean 111, and sigma=24.2. Guess what? they look identical except at the extremes.It is close enough to call it Gaussian, at least for my purposes! I've enclosed the histogram of the sums, and the program to plot the curves. The program also creates a norm_hist.out which contains each sum, it's frequency, the normalized histogram, and the corresponding Guassian. Make sure when you extract the histogram to call it hist.out...the program takes this data and does the rest. Enjoy. yousif. p.s. the program is in Fortran...not commented, and could be better. But, it works. --------------------histogram data------------------------------------------ sum total 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 16 0 17 0 18 0 19 0 20 0 21 720 22 720 23 1440 24 2160 25 3600 26 5040 27 7920 28 10080 29 14400 30 18720 31 25200 32 31680 33 41760 34 51120 35 64800 36 79200 37 97920 38 117360 39 143280 40 169200 41 203040 42 238320 43 281520 44 326880 45 383040 46 440640 47 510480 48 583920 49 670320 50 761040 51 868320 52 978480 53 1107360 54 1242000 55 1395360 56 1555200 57 1737360 58 1924560 59 2136240 60 2355120 61 2598480 62 2849040 63 3127680 64 3411360 65 3724560 66 4044960 67 4393440 68 4748400 69 5135040 70 5524560 71 5946480 72 6372000 73 6827760 74 7285680 75 7776000 76 8263440 77 8782560 78 9298800 79 9843120 80 10380960 81 10948320 82 11502000 83 12083040 84 12649680 85 13237920 86 13807440 87 14398560 88 14963040 89 15546240 90 16101360 91 16668720 92 17202960 93 17749440 94 18254880 95 18768960 96 19241280 97 19715040 98 20142720 99 20572560 100 20948400 101 21323520 102 21644640 103 21958560 104 22215600 105 22466880 106 22654800 107 22834800 108 22953600 109 23059440 110 23102640 111 23136480 112 23102640 113 23059440 114 22953600 115 22834800 116 22654800 117 22466880 118 22215600 119 21958560 120 21644640 121 21323520 122 20948400 123 20572560 124 20142720 125 19715040 126 19241280 127 18768960 128 18254880 129 17749440 130 17202960 131 16668720 132 16101360 133 15546240 134 14963040 135 14398560 136 13807440 137 13237920 138 12649680 139 12083040 140 11502000 141 10948320 142 10380960 143 9843120 144 9298800 145 8782560 146 8263440 147 7776000 148 7285680 149 6827760 150 6372000 151 5946480 152 5524560 153 5135040 154 4748400 155 4393440 156 4044960 157 3724560 158 3411360 159 3127680 160 2849040 161 2598480 162 2355120 163 2136240 164 1924560 165 1737360 166 1555200 167 1395360 168 1242000 169 1107360 170 978480 171 868320 172 761040 173 670320 174 583920 175 510480 176 440640 177 383040 178 326880 179 281520 180 238320 181 203040 182 169200 183 143280 184 117360 185 97920 186 79200 187 64800 188 51120 189 41760 190 31680 191 25200 192 18720 193 14400 194 10080 195 7920 196 5040 197 3600 198 2160 199 1440 200 720 201 720 202 0 203 0 204 0 205 0 206 0 207 0 208 0 209 0 210 0 ----------------------program to do the rest--------------------------------- include 'sys$library:uisentry' include 'sys$library:uisusrdef' integer ihist(210),sum real nhist(210),guass(210) character40 ha sigma=24.18175709 !calculated as described in note sum=0 open(unit=1,file='hist.out',status='old') open(unit=2,file='norm_hist.out',status='new') read (1,2)ha 2 format(a40) write(2,4) 4 format(' sum histogram normalized hist guassian ') do 10 i=1,210 10 read(1,1)k,ihist(i) 1 format(' ',i3,i10) do 20 i=1,210 nhist(i)=real(ihist(i))/1402410240.0 guass(i)=1/(2.506628274sigma)* * exp(-((111.0-real(i))/(sigma))*2/2.0) 20 write(2,3)i,ihist(i),nhist(i),guass(i) 3 format(' ',i3,3x,i10,3x,f12.10,3x,f12.10) close (2) close(1) ivd=uis$create_display(0.0,0.0,210.0,0.02,30.0,30.0) iwd=uis$create_window(ivd,'sys$workstation') call uis$plot(ivd,0,0.0,0.0) do 30 i=1,210 call uis$plot(ivd,0,real(i-1),nhist(i-1), real(i),nhist(i)) 30 continue call uis$plot(ivd,0,0.0,0.0) do 40 i=1,210 call uis$plot(ivd,0,real(i-1),guass(i-1), * real(i),guass(i)) 40 continue pause end
1015.7		AITG::DERAMO	Daniel V. {AITG,ZFC}:: D'Eramo	`Wed Jan 25 1989 19:24`	19
	re .6 There are 1402410240 permutations of 6 distinct numbers selected out of 36, and 1947792 combinations. Each combination of 6 different numbers has 6! = 720 possible orders that the six could have been selected in. As a check, you should find that every count in your histogram is divisible by 720, and that 720 * 1947792 is 1402410240. So you could have had your program check the 1947792 combinations instead of the 1402410240 permutations. Question: How would this change have affected your computation of sigma and the resulting true Gaussian curve? Dan
1015.8		MSD35::ASFOUR		`Thu Jan 26 1989 16:30`	21
	re .7 If I were to count the combinations and not the permutations, that would mean that each frequency would really be divided by 6! =720 (since it would occur 720 times less) But, the total number of combinations would be 1947792. So to normalize the new frequencies, I would divide by 1947792... So, in effect I've divided the old frequeny by 720*1947792=1402410240 and should get the same normalized histogram, with the same mean at 111 and the same peak of 0.01649... Using the equation in .5, yields the same sigma, too. We're back to the N(111,24.2) So, nothing really changes, except the effort in obtaining the original histogram (which, I admit, is a significant effort). yousif.
1015.9		AITG::DERAMO	Daniel V. {AITG,ZFC}:: D'Eramo	`Thu Jan 26 1989 19:29`	8
	re .7, .8 Okay. I asked because I wasn't sure if there was a factor of sqrt(720) thrown in there somewhere in the computation of that "24.2" in N(111, 24.2) that would change the relative shape of the Gaussian you compared the actual histogram to. Dan
1015.10		ANT::JANZEN	Mr. MSI ECL Test	`Thu Feb 02 1989 17:03`	2
	Note 609 is about the lottery, too. Tom