| Title: | Mathematics at DEC |
| Moderator: | RUSURE::EDP |
| Created: | Mon Feb 03 1986 |
| Last Modified: | Fri Jun 06 1997 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 2083 |
| Total number of notes: | 14613 |
I liked this problem from USENET, and thought others might, too.
- Gilbert
Here's a probability question whose answer should be known, but I don't
know it.
Suppose that A and B are two random binary strings of length n.
What is the expected size of the longest common substring, where the
subtring doesn't have to appear contiguously in A and B?
For example, if A = 001011 and B = 101101, an example of a longest common
substring is 0101, which appears as 0-101-- in A and -01-01 in B.
A good asymptotic estimate would do if an exact answer isn't available.
Experiment suggests a value close to 0.82*n for large n.
Please e-mail any suggestions (even if you post them, as news often gets
lost on the way here). Thanks.
Brendan McKay
Paper: Computer Science Department, Australian National University,
GPO Box 4, Canberra, ACT 2601, Australia.
Telephone: + 61 62 49 3845 Telex: AA 62760 NATUNI
ACSnet: [email protected] ARPA: bdm%[email protected]
CSNET: [email protected]@csnet-relay.csnet JANET: anucsd.oz!bdm@ukc
UUCP: {uunet,ubc-vision,ukc,mcvax,prlb2,hplabs,enea,mulga}!munnari!anucsd!bdm
BITNET: [email protected] (or similar)
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 803.1 | what if they must be contiguous ? | VIDEO::OSMAN | type video::user$7:[osman]eric.vt240 | Mon Dec 14 1987 15:45 | 13 |
What is the largest expected common substring when comparing two
length-n strings if the substring must be contiguous ?
For instance, consider two random strings of length 8, which I
make up as we speak, by letting my fingers randomly hit the 0 and
1 keys:
01100110
10101010
What do we have here ? Largest is 10, length 2.
/Eric
| |||||
| 803.2 | CLT::GILBERT | Builder | Wed Dec 16 1987 19:07 | 41 | |
Let E(a,b) be the expectation of the length of the longest
discontiguous match between random binary strings of length
a and b, respectively.
Then E(a,b) = E(b,a), E(a,0) = 0, and
E(1,1) = 1/2
E(1,2) = 3/4
E(1,3) = 7/8
E(1,b) = ( 2^b - 1 )/( 2^b )
E(2,2) = 9/8
E(2,3) = 23/16
E(2,b) = ( 4*2^b - 2*b - 3 )/( 2*2^b )
E(3,3) = 58/32
E(3,4) = 137/64
E(3,5) = 308/128
E(3,6) = 667/256
E(3,7) = 1406/512
E(3,8) = 2909/1024
E(4,4) = 323/128
E(4,5) = 732/256
E(4,6) = 1612/512
E(4,7) = 3467/1024
E(4,8) = 7313/2048
E(5,5) = 1662/512
E(5,6) = 3677/1024
E(5,7) = 7975/2048
E(5,8) = 17019/4096
E(6,6) = 8151/2048
E(6,7) = 17739/4096
E(6,8) = 38049/8192
E(7,7) = 38678/8192
E(7,8) = 83186/16384
E(8,8) = 178212/32768
| |||||
| 803.3 | SSDEVO::LARY | Wed Dec 16 1987 21:44 | 7 | ||
E(3,b) = ( 12*2^b - 2*b^2 - 3*b - 11 )/( 4*2^b ) E(4,b) = ( 32*2^b - 4*(b^3)/3 - (b^2)/2 - 103*b/6 - 27 ) / ( 8*2^b ) Note: 4*(b^3)/3 + (b^2)/2 + 103*b/6 is always an integer, as it equals b^3 + 17*b + b*(b+1)*(2*b+1)/6 and the rightmost term is always an integer. | |||||
| 803.4 | SSDEVO::LARY | Wed Dec 16 1987 22:04 | 22 | ||
Another way of looking at E(a,b) is E(a,b) = a - Pa(b)/(2^b), where: P1(b) = 1 3 P2(b) = b + - 2 1 2 3 11 P3(b) = - b + - b + -- 2 4 4 1 3 1 2 103 27 P4(b) = - b + -- b + --- b + -- 6 16 48 8 Do these polynomials look familiar?? (I'd feel a lot better if P3's 11/4 were 9/4.....) | |||||
| 803.5 | Another look at it | SQM::HALLYB | Khan or bust! | Thu Dec 17 1987 10:54 | 19 |
N E(N,N) from Peter equals:
1 E(1,1) = 1/2 2 / 4^1
2 E(2,2) = 9/8 18 / 4^2
3 E(3,3) = 58/32 116 / 4^3
4 E(4,4) = 323/128 646 / 4^4
5 E(5,5) = 1662/512 3324 / 4^5
6 E(6,6) = 8151/2048 16302 / 4^6
7 E(7,7) = 38678/8192 77356 / 4^7
8 E(8,8) = 178212/32768 356424 / 4^8
Anybody recognize the sequence above ^^^ ?
Factoring the elements seems to lead nowhere, and after 5 they
no longer grow with N-factorial. Maybe some clever reader will
recognize some direct or recursive relation. Maybe Peter will
furnish us with higher order E(N,N).
John
| |||||