| Title: | Mathematics at DEC | 
| Moderator: | RUSURE::EDP | 
| Created: | Mon Feb 03 1986 | 
| Last Modified: | Fri Jun 06 1997 | 
| Last Successful Update: | Fri Jun 06 1997 | 
| Number of topics: | 2083 | 
| Total number of notes: | 14613 | 
I liked this problem from USENET, and thought others might, too.
					- Gilbert
 
Here's a probability question whose answer should be known, but I don't
know it.
 
Suppose that A and B are two random binary strings of length n.
What is the expected size of the longest common substring, where the
subtring doesn't have to appear contiguously in A and B?
 
For example, if  A = 001011 and B = 101101, an example of a longest common
substring is 0101, which appears as 0-101-- in A and -01-01 in B.
 
A good asymptotic estimate would do if an exact answer isn't available.
Experiment suggests a value close to  0.82*n  for large n.
 
Please e-mail any suggestions (even if you post them, as news often gets
lost on the way here).   Thanks.
 
Brendan McKay
 
Paper:     Computer Science Department, Australian National University,
           GPO Box 4, Canberra, ACT 2601, Australia.
Telephone: + 61 62 49 3845                Telex: AA 62760 NATUNI
ACSnet:    [email protected]                  ARPA:  bdm%[email protected]
CSNET:  [email protected]@csnet-relay.csnet   JANET: anucsd.oz!bdm@ukc
UUCP: {uunet,ubc-vision,ukc,mcvax,prlb2,hplabs,enea,mulga}!munnari!anucsd!bdm
BITNET:  [email protected] (or similar)
| T.R | Title | User | Personal Name | Date | Lines | 
|---|---|---|---|---|---|
| 803.1 | what if they must be contiguous ? | VIDEO::OSMAN | type video::user$7:[osman]eric.vt240 | Mon Dec 14 1987 15:45 | 13 | 
|     What is the largest expected common substring when comparing two
    length-n strings if the substring must be contiguous ?
    
    For instance, consider two random strings of length 8, which I
    make up as we speak, by letting my fingers randomly hit the 0 and
    1 keys:
    
    	01100110
    	10101010
    
    What do we have here ?  Largest is 10, length 2.
    
    /Eric
 | |||||
| 803.2 | CLT::GILBERT | Builder | Wed Dec 16 1987 19:07 | 41 | |
|     Let E(a,b) be the expectation of the length of the longest 
    discontiguous match between random binary strings of length
    a and b, respectively.
    Then E(a,b) = E(b,a), E(a,0) = 0, and
	E(1,1) = 1/2
	E(1,2) = 3/4
	E(1,3) = 7/8
	E(1,b) = ( 2^b - 1 )/( 2^b )
	E(2,2) = 9/8
	E(2,3) = 23/16
	E(2,b) = ( 4*2^b - 2*b - 3 )/( 2*2^b )
	E(3,3) = 58/32
	E(3,4) = 137/64
	E(3,5) = 308/128
	E(3,6) = 667/256
	E(3,7) = 1406/512
	E(3,8) = 2909/1024
	E(4,4) = 323/128
	E(4,5) = 732/256
	E(4,6) = 1612/512
	E(4,7) = 3467/1024
	E(4,8) = 7313/2048
	E(5,5) = 1662/512
	E(5,6) = 3677/1024
	E(5,7) = 7975/2048
	E(5,8) = 17019/4096
	E(6,6) = 8151/2048
	E(6,7) = 17739/4096
	E(6,8) = 38049/8192
	E(7,7) = 38678/8192
	E(7,8) = 83186/16384
	E(8,8) = 178212/32768
 | |||||
| 803.3 | SSDEVO::LARY | Wed Dec 16 1987 21:44 | 7 | ||
| E(3,b) = ( 12*2^b - 2*b^2 - 3*b - 11 )/( 4*2^b ) E(4,b) = ( 32*2^b - 4*(b^3)/3 - (b^2)/2 - 103*b/6 - 27 ) / ( 8*2^b ) Note: 4*(b^3)/3 + (b^2)/2 + 103*b/6 is always an integer, as it equals b^3 + 17*b + b*(b+1)*(2*b+1)/6 and the rightmost term is always an integer. | |||||
| 803.4 | SSDEVO::LARY | Wed Dec 16 1987 22:04 | 22 | ||
| Another way of looking at E(a,b) is E(a,b) = a - Pa(b)/(2^b), where: P1(b) = 1 3 P2(b) = b + - 2 1 2 3 11 P3(b) = - b + - b + -- 2 4 4 1 3 1 2 103 27 P4(b) = - b + -- b + --- b + -- 6 16 48 8 Do these polynomials look familiar?? (I'd feel a lot better if P3's 11/4 were 9/4.....) | |||||
| 803.5 | Another look at it | SQM::HALLYB | Khan or bust! | Thu Dec 17 1987 10:54 | 19 | 
| N	E(N,N) from Peter		equals:
1    	E(1,1) = 1/2	 		     2 / 4^1
2	E(2,2) = 9/8			    18 / 4^2
3	E(3,3) = 58/32	 		   116 / 4^3
4	E(4,4) = 323/128		   646 / 4^4
5	E(5,5) = 1662/512		  3324 / 4^5
6	E(6,6) = 8151/2048		 16302 / 4^6
7	E(7,7) = 38678/8192		 77356 / 4^7
8	E(8,8) = 178212/32768		356424 / 4^8
    
    Anybody recognize the sequence above ^^^ ?
    Factoring the elements seems to lead nowhere, and after 5 they
    no longer grow with N-factorial.  Maybe some clever reader will
    recognize some direct or recursive relation.  Maybe Peter will
    furnish us with higher order E(N,N).
      John
 | |||||