[Chapter 1] 1.2 On the Internet and internets

1.2 On the Internet and internets

A word on "the Internet," and on "internets" in general, is in order. In print, the difference between the two seems slight: one is always capitalized, one isn't. The distinction between their meanings, however, is significant. The Internet, with a capital "I," refers to the network that began its life as the ARPAnet and continues today as, roughly, the confederation of all TCP/IP networks directly or indirectly connected to commercial U.S. backbones. Seen close up, it's actually quite a few different networks - commercial TCP/IP backbones, regional TCP/IP networks, corporate and U.S. government TCP/IP networks, and TCP/IP networks in other countries - interconnected by high-speed digital circuits.

A lowercase internet, on the other hand, is simply any network made up of multiple smaller networks using the same internetworking protocols. An internet (little "i") isn't necessarily connected to the Internet (big "I"), nor does it necessarily use TCP/IP as its internetworking protocol. There are isolated corporate internets, and there are Xerox XNS-based internets and DECnet-based internets.

The new term "intranet" is really just a marketing term for a TCP/IP-based "little i" internet, used to emphasize the use of technologies developed and introduced on the Internet within a company's internal corporate network. An "extranet," on the other hand, is an internet that connects partner companies, or a company to its distributors, suppliers, and customers.

1.2.1 The History of the Domain Name System

Through the 1970s, the ARPAnet was a small, friendly community of a few hundred hosts. A single file, HOSTS.TXT, contained all the information you needed to know about those hosts: it held a name-to-address mapping for every host connected to the ARPAnet. The familiar UNIX host table, /etc/hosts, was compiled from HOSTS.TXT (mostly by deleting fields that UNIX didn't use).

HOSTS.TXT was maintained by SRI's Network Information Center (dubbed "the NIC") and distributed from a single host, SRI-NIC.[1] ARPAnet administrators typically emailed their changes to the NIC, and periodically ftped to SRI-NIC and grabbed the current HOSTS.TXT. Their changes were compiled into a new HOSTS.TXT once or twice a week. As the ARPAnet grew, however, this scheme became unworkable. The size of HOSTS.TXT grew in proportion to the growth in the number of ARPAnet hosts. Moreover, the traffic generated by the update process increased even faster: every additional host meant not only another line in HOSTS.TXT, but potentially another host updating from SRI-NIC.

[1] SRI is the Stanford Research Institute in Menlo Park, California. SRI conducts research into many different areas, including computer networking.

And when the ARPAnet moved to the TCP/IP protocols, the population of the network exploded. Now there was a host of problems with HOSTS.TXT:

Traffic and load: The toll on SRI-NIC, in terms of the network traffic and processor load involved in distributing the file, was becoming unbearable.
Name collisions: No two hosts in HOSTS.TXT could have the same name. However, while the NIC could assign addresses in a way that guaranteed uniqueness, it had no authority over host names. There was nothing to prevent someone from adding a host with a conflicting name and breaking the whole scheme. Someone adding a host with the same name as a major mail hub, for example, could disrupt mail service to much of the ARPAnet.
Consistency: Maintaining consistency of the file across an expanding network became harder and harder. By the time a new HOSTS.TXT could reach the farthest shores of the enlarged ARPAnet, a host across the network had changed addresses, or a new host had sprung up that users wanted to reach.

The essential problem was that the HOSTS.TXT mechanism didn't scale well. Ironically, the success of the ARPAnet as an experiment led to the failure and obsolescence of HOSTS.TXT.

The ARPAnet's governing bodies chartered an investigation into a successor for HOSTS.TXT. Their goal was to create a system that solved the problems inherent in a unified host table system. The new system should allow local administration of data, yet make that data globally available. The decentralization of administration would eliminate the single-host bottleneck and relieve the traffic problem. And local management would make the task of keeping data up-to-date much easier. It should use a hierarchical name space to name hosts. This would ensure the uniqueness of names.

Paul Mockapetris, then of USC's Information Sciences Institute, was responsible for designing the architecture of the new system. In 1984, he released RFCs 882 and 883, which describe the Domain Name System. These RFCs were superseded by RFCs 1034 and 1035, the current specifications of the Domain Name System.[2] RFCs 1034 and 1035 have now been augmented by many other RFCs, which describe potential DNS security problems, implementation problems, administrative gotchas, mechanisms for dynamically updating name servers and for securing domain data, and more.

[2] RFCs are Request for Comments documents, part of the relatively informal procedure for introducing new technology on the Internet. RFCs are usually freely distributed and contain fairly technical descriptions of the technology, often intended for implementors.


1.1 A (Very) Brief History of the Internet		1.3 The Domain Name System, in a Nutshell