Network Working Group | R. Moats | |
Request for Comments: 2517 | R. Huber | |
Category: Informational | AT&T | |
February 1999 |
More recently, there has been discussion of directories that map from a company name to a domain name or web site. Many people are using DNS as a directory today to find this type of information about a given company. Typically when DNS is used, users guess the domain name of the company they are looking for and then prepend "www.". This makes it highly desirable for a company to have an easily guessable name.
There are two major problems here. As the number of assigned names increases, it becomes more difficult to get an easily guessable name. Also, the TLD must be guessed as well as the name. While many users just guess ".COM" as the "default" TLD today, there are many two-letter country code top-level domains in current use as well as other gTLDs (.NET, .ORG, and possibly .EDU) with the prospect of additional gTLDs in the future. As the number of TLDs in general use increases, guessing gets more difficult.
Between July 1996 and our shutdown in March 1998, the InterNIC Directory
and Database Services project maintained the Netfind search engine [1]
and the associated database that maps organization information to domain
names. This database thus acted as the type of Internet directory that
associates company names with domain names. We also built WWWSeeker, a
system that used the Netfind database to find web sites associated with
a given organization. The experienced gained from maintaining and
growing this database provides valuable insight into the issues of providing
a directory service. We present it here to allow future implementors
to avoid some of the blind alleys that we have already explored.
When maintaining the database, existing domains may be verified via
direct DNS lookups rather than a "tree walk." "Tree walks" should therefore
be the choice of last resort for directory population, and bulk retrieval
should be used whenever possible.
In our experience this information contains many factual and typographical
errors and requires further examination and processing to improve its quality.
Further, TLD registrars that support WHOIS typically only support WHOIS
information for second level domains (i.e. ne.us) as opposed to lower level
domains (i.e. windrose.omaha.ne.us). Also, there are TLDs without
registrars, TLDs without WHOIS support, and still other TLDs that use other
methods (HTTP, FTP, gopher) for providing organizational information.
Based on our experience, an implementor of an internet directory needs
to support multiple protocols for directory population. An automated
WHOIS search tool is necessary, but isn't enough.
Whether this is feasible depends on the frequency of database updates
provided. Because of the rate of growth of allocated domain names
(150K-200K new allocated domains per month in early 1998), we provided
monthly updates of the database. To rebuild the database each month (based
on the above time estimate) would require between 3 and 5 machines to be
dedicated full time (independent of machine architecture). Instead,
we checkpointed the allocated domain list and rebuild on an incremental
basis during one weekend of the month. This allowed us to complete the
update on between 1 and 4 machines (3 Sun SPARCstation 20s and a dual-processor
Sparcserver 690) without full dedication over a couple of days. Further,
by coupling incremental updates with periodic refresh of existing data
(which can be done during another part of the month and doesn't require
full dedication of machine hardware), older records would be periodically
updated when the underlying information changes. The tradeoff is
timeliness and accuracy of data (some data in the database may be old)
against hardware and processing costs.
Increasing the database size only requires more disk space to hold the
database and inverted tree. Of course, using database technology would
probably improve performance and scalability, but we had not reached the
point where this technology was required.
The protocols discussed here do not have built-in security features. If one source machine is spoofed while the directory data is being gathered, substantial amounts of incorrect and misleading data could be pulled in to the directory and be spread to a wider audience.
In general, building a directory from registry data will not open any
new security holes since the data is already available to the public. Existing
security and accuracy problems with the data sources are likely to be amplified.
[1] | M. F. Schwartz, C. Pu. "Applying an Information Gathering Architecture
to Netfind: A White Pages Tool for a Changing and Growing Internet", University
of Colorado Technical Report CU-CS-656-93. December 1993, revised
July 1994.
URL:ftp://ftp.cs.colorado.edu/pub/cs/techreports/schwartz/Netfind |
[2] | Sollins, K., "Plan for Internet Directory Services", RFC
1107, July 1989.
|
[3] | Hardcastle-Kille, S., Huizer, E., Cerf, V., Hobby, R. and S. Kent,
"A Strategic Plan for Deploying an Internet X.500 Directory Service", RFC
1430, February 1993.
|
[4] | Postel, J. and C. Anderson, "White Pages Meeting Report", RFC
1588, February 1994.
|
[5] | M. Lottor, "Network Wizards Internet Domain Survey", available from http://www.nw.com/zone/WWW/top.html |
EMail: jayhawk@att.com
Rick Huber
AT&T
Room C3-3B30, 200 Laurel Ave. South
Middletown, NJ 07748
USA
EMail: rvh@att.com
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published and
distributed, in whole or in part, without restriction of any kind, provided
that the above copyright notice and this paragraph are included on all
such copies and derivative works. However, this document itself may
not be modified in any way, such as by removing the copyright notice or
references to the Internet Society or other Internet organizations, except
as needed for the purpose of developing Internet standards in which case
the procedures for copyrights defined in the Internet Standards process
must be followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK
FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE
ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR
A PARTICULAR PURPOSE.