| T.R | Title | User | Personal Name
 | Date | Lines | 
|---|
| 4510.1 |  | HERON::KAISER |  | Thu Mar 28 1996 02:49 | 16 | 
|  | > Does anyone have the ability to extract off one of the Internet Server
> ALL the Internal URLs?
As soon as Alta Vista is cloned for internal use.
> We have the Easynotes_Conference to list all the available
> Notesfiles/Conferences.
EASYNOTES_CONFERENCE doesn't list all available conferences, only the ones
that people have thought to announce there.  There are many conferences not
mentioned there.  But a good web spider will find all interlinked web
pages.  It'll still be possible to set up an isolated island of web pages,
but as soon as someone outside the island links to the island ... wham!
they're indexable.
___Pete
 | 
| 4510.2 |  | VANGA::KERRELL | salva res est | Thu Mar 28 1996 04:01 | 6 | 
|  | re.1:
There's an internal search engine off Digital's internal home page - does this
not use a web spider to build the index?
Dave.
 | 
| 4510.3 |  | CIM::LOREN | Loren Konkus | Thu Mar 28 1996 05:56 | 4 | 
|  |     I find that the AIT Announcement Server is pretty useful for finding
    internal stuff. See:
    
    	http://www-ad.mso.dec.com/announce/pa-toc.html
 | 
| 4510.4 | Digital's Internal World-Wide Web index (DWI) | LGP30::FLEISCHER | without vision the people perish (DTN 227-3978, TAY1) | Thu Mar 28 1996 06:30 | 20 | 
|  | re Note 4510.2 by VANGA::KERRELL:
> There's an internal search engine off Digital's internal home page - does this
> not use a web spider to build the index?
  
        I think you're thinking of the Digital Web Indexer,
        http://src-www.pa.dec.com/cgi-bin/dwi, which uses some of the
        same technology as Alta Vista.  
        However, the Digital Web Indexer does not use a spider but
        relies on distributed gatherer programs to send information
        to the index.  (This is similar to the Harvest architecture,
        and similar to the enterprise catalog server recently
        announced by Netscape.)
        Since it depends upon gather programs outside of the direct
        control of the maintainers of the index, coverage is
        inconsistent.
        Bob
 | 
| 4510.5 | personal spider / indexer ??? | SAYER::ELMORE | through the looking glass | Thu Mar 28 1996 17:14 | 18 | 
|  |     I'd like a slight "spider" variation.  I would like to find a program
    that starts at a given WEB page, or, optionally, takes your own
    hotlinks/bookmarks, and traverses, then indexes every linked page from
    there.
    Ideally you could specify "how many levels deep" to go.
    I've seen [somewhere] some software that wakes up to look at
    hotlinks/bookmarks/history URLs to see if pages have been recently
    updated.  That's close, but I'm looking for an indexer too.  My
    bookmarks are already basically what I need, but I can never remember
    what bookmark contains what piece of information...therefore my
    [personal] need for the [personal] index.
    
    I'm sure I could write a spider script of sorts that follows URLs
    around, but not the indexer.
    
    --Steve
 | 
| 4510.6 | Intranet Alta Vista Trial Offer! | LJSRV2::POWELL |  | Tue Apr 02 1996 10:33 | 8 | 
|  |     You may have noticed that AltaVista is now under test internally.
    
    Try URL:   altavista.pa.dec.com/ and see what happens!
    I just noticed this entry this week, but don't know how long the test
    will run.  Looks like we're really going to make Alta Vista a product.
    
    Good luck!
    
 | 
| 4510.7 | yellow pages idea great | SALES::ICS::DIRICO |  | Fri May 10 1996 13:56 | 13 | 
|  |     All of the inconsistent search stuff aside, since this web stuff took
    off quickly and now is quite large to pull in and
    control/maintain/organize...
    
    I love the idea of a yellow pages of intranet URLs.  I think as the web
    becomes a more vital way to communicate within the company as
    notesfiles/public directories/email decrease...the yellow pages is a
    key first step to build from.
    
    My first thoughts are that someone from Corporate Communications
    publish this but then again, maybe not.  Any other thoughts?  
    
    Mary Beth
 | 
| 4510.8 |  | QUARK::LIONEL | Free advice is worth every cent | Fri May 10 1996 14:03 | 5 | 
|  | Re: .7
See .3
		Steve
 | 
| 4510.9 |  | TENNIS::KAM | Kam WWSE 714/261.4133 DTN/535.4133 IVO | Fri May 10 1996 14:06 | 5 | 
|  |     I'd like to see a yellow pages cuz I can't do a search if I don't know
    what phrase to supply the search engine.  I saw some URL's posted in a
    Notesfiles.  I went to Altavista.pa.dec.com and searched for the
    information and it didn't find it.  Therefore, I'm missing some
    valuable information.
 | 
| 4510.10 | not quite mission-critical yet | LGP30::FLEISCHER | without vision the people perish (DTN 227-3978, TAY1) | Fri May 10 1996 15:10 | 10 | 
|  | re Note 4510.9 by TENNIS::KAM:
>     I went to Altavista.pa.dec.com and searched for the
>     information and it didn't find it.  
        I don't believe that this is maintained as a production
        system, and thus may not always be available, may not be
        updated very often (or at all), etc.
        Bob
 | 
| 4510.11 |  | QUARK::LIONEL | Free advice is worth every cent | Fri May 10 1996 16:04 | 3 | 
|  | Kam, have you TRIED the AIT Announcement Server?
			Steve
 | 
| 4510.12 |  | TENNIS::KAM | Kam WWSE 714/261.4133 DTN/535.4133 IVO | Fri May 10 1996 16:46 | 6 | 
|  |     I'm looking for a Digital ONLY Yellow Pages.  This Company has so much
    information that I don't want it cluttered with information outside
    this company.
    
    	Regards,
    
 | 
| 4510.13 |  | plugh.ibg.ljo.dec.com::needle | Money talks. Mine says "Good-Bye!" | Fri May 10 1996 17:20 | 6 | 
|  | The information at altavista.pa.dec.com is in beta test.  It's not maintained
and is not a public service yet.  When it does become public, it would be
reasonable to expect a service of the quality of altavista.digital.com for
the intranet.
j.
 | 
| 4510.14 | exactly what would you like to see? | LGP30::FLEISCHER | without vision the people perish (DTN 227-3978, TAY1) | Fri May 10 1996 18:55 | 68 | 
|  | re Note 4510.0 by tennis.ivo.dec.com::KAM:
> We have the Easynotes_Conference to list all the available 
> Notesfiles/Conferences.  Can we create some mechanism to keep track of
> all the internal URLs?
        What that mechanism might be depends upon what you mean by
        "all the internal URLs".
        Note that the Easynotes_Conference conference does not list
        all of the internal topics and replies, it just lists the
        conferences.
        We do have separate services that actually search the content
        of most of the conferences (e.g., Comet at
        http://encke.alf.dec.com/cgi/v4.2 ).
        You use the former (Easynotes_Conferences) when looking for
        an appropriate conference.  It identifies conference by
        overall topic.
        You use the latter (Comet) when looking for specific notes a
        very specific subject, regardless of the conference
        containing them.
        I suspect that with the Web we need both kinds of service. 
        The nature of the Web makes the analogue of the former, an
        index of topical or thematic collections of pages, a little
        harder to define than does DEC Notes.  However, it probably
        should be an index of home pages (or what we called "front
        pages", as in the first page of a magazine or book) with a
        little description of the overall topic or theme of the
        service to which that page represents the entry.  This is
        what the Announcement Directory set out to be.
        The latter is simply AltaVista -- an index of all web pages
        (not password or otherwise protected) (note that the Comet
        URL listed above also provides an index of most Digital web
        pages).
> If what I am looking for is not available, I would like to create 
> something that will list all the available internally URLs, whether 
> their personnel, private, or public URLs.  
  
        So the question remains:  do you want to index every page as
        an entry in this list, or do you want to list every
        *collection* of related pages (recognizing that some
        significant "collections" may only be one page)?
        The former is a bit easier to do -- it can be done
        automatically, which is what AltaVista does.
        The latter is harder because it requires, for now, human
        intelligence to select the things to be registered, either in
        the form of a central staff, or through conventions followed
        by all who publish on the internal network (e.g., registering
        your own collections).
        If you'd like to do the latter, and implement a more robust
        version of the Announcement Directory, I'd be glad to see you
        do it and I'd offer any help I can.  There's a product in
        there, I'm sure (a number of similar products have been
        announced).  But hurry -- we in the group of which I am a
        part are likely to get our notices this coming week.
        Bob
        [email protected]
 | 
| 4510.15 |  | QUARK::LIONEL | Free advice is worth every cent | Fri May 10 1996 21:10 | 5 | 
|  |     Re: .12
    
    Ok, so now I KNOW you haven't looked at it.
    
    			Steve
 | 
| 4510.16 | http://www-ad.mso.dec.com/announce/pa-toc.html | LGP30::FLEISCHER | without vision the people perish (DTN 227-3978, TAY1) | Sat May 11 1996 07:45 | 28 | 
|  | follow-on to Note 4510.14:
        One obvious comparison to the Announcement Directory is the
        Yahoo service.  I hesitate to make this comparison because
        Yahoo has full time people, essentially librarians working
        in cyberspace, who carefully construct and maintain a rich
        classification hierarchy.
        The Announcement Directory has no staff to do this, so the
        only classifications provided are those that can be
        automatically determined, e.g., whether a URL is owned by
        Digital or not, and whether it is external to Digital or on
        Digital's Intranet.  We can also do obvious sorts, such as by
        date and title.
        (There are opportunities for the application of advanced
        natural language processing techniques here.)
        It was hoped that the Digital community would provide
        informal maintenance of the entries (anybody can add, and
        actually anybody can delete and replace an entry).  To some
        extent this happens, but it is far from being as
        well-maintained as Yahoo.  (Nobody has it in their job
        description to maintain it.)  On the other hand, its content
        is probably as well-maintained as Easynet_Conferences.
        Bob
        [email protected]
 | 
| 4510.17 | some references | LGP30::FLEISCHER | without vision the people perish (DTN 227-3978, TAY1) | Tue May 14 1996 11:49 | 16 | 
|  |         More follow-on:
        Two articles have recently appeared on the Web that address
        aspects of this subject.  One is:
            http://www.cio.com/WebMaster/0596_field.html
        	-- "Finding the Way", by former DECie Tim Horgan
        Another is:
            http://gnn.com/wr/96/05/10/webarch/index.html
        	-- "Revenge of the Librarians"
        Bob
 |