[ Previous | Next | Overview ]
The Language of the Web
In order to use the WWW, you must know something about the language
used to communicate in the Web. There are three main components to
this language:
- Uniform Resource Locators (URLs)
-
URLs provide the hypertext links between one document and another.
These links can access a variety of protocols (e.g., ftp, gopher, or
http) on different machines (or your own machine).
- Hypertext Markup Language (HTML)
-
WWW documents contain a mixture of directives (markup), and text or
graphics. The markup directives do such things as make a word appear
in bold type. This is similar to the way UNIX users write nroff or
troff documents, and MPE users write with Galley, TDP, or Prose. For
PC users, this is completely different from WYSIWYG editing. However,
a number of tools are now available on the market that hide the actual
HTML.
- Common Gateway Interfaces (CGI)
-
Servers use the CGI interface to execute local programs. CGIs provide
a gateway between the HTTP server software and the host machine.
Uniform Resource Locators (URLs) specify the access-method (how),
the server name (where), and the location (what) needed for a WWW
client to find and access a WWW object. The general form of a URL is
access-method://server-name[:port]/location
Access Methods
The three most popular access methods are
- http:
-
This is the method provided by WWW servers. It includes hypertext
linking, the hypertext markup language, and server scripts.
- gopher:
-
Gopher was developed at
the University of Minnesota as a distributed campus information
service. There are gopher servers everywhere -- many of them provide
campus-wide information systems. Gopher information is organized into
menus. Because hypertext provides the same services as gopher and
more, many sites are moving from gopher-supplied information to
WWW-supplied information.
- ftp:
-
The File Transfer Protocol is one of the oldest and most popular of
all Internet services. You can access millions of files,
documentation, source code, and other useful objects on anonymous FTP
archives. You can use a WWW browser to view and to retrieve
information from FTP archives.
Server Name
The server name is an IP host name or an IP address. WWW servers
often start with the name "www" as in www.robelle.com or www.mayfield.hp.com.
The port number is usually not needed. If there are many servers on
one machine (e.g., two different WWW servers on the same host), you
would use a port number to select one of them. By default, WWW
servers are on port 80. Other protocols have different ports (e.g.,
the default for FTP is 21). Most users never need to know about port
numbers.
Welcome Page
Most WWW servers provide a welcome or home page. This is the document
that you see if you specify a machine name, but not a document name
(see all the examples above under "Server Name"). Good WWW welcome
pages provide a short description of the information the WWW server
provides, as well as links to all the other information available on
the server. The welcome page must be explicitly configured for each
WWW server. If you access a WWW server without giving a document
name, and receive the error message "no document found", you should
try one of the following common document names: welcome.html,
index.html, or default.html.
Location
The location can be a filename, a directory, a directory and filename,
a server-script name, or something specific to the access-method.
Filenames and directory structure often change, so don't be surprised
if a URL that worked a few months ago no longer works now.
When you write documents for WWW, you use the
Hypertext Markup Language (HTML). In a markup language, you mix
your text with the marks that indicate how formatting is to take
place. Most WWW browsers have an option to "View Source" that will
show you the HTML for the current document that you are viewing.
Each WWW browser renders HTML in its own way. Character-mode browsers
use terminal highlights (e.g., inverse video, dim, or underline) to
show links, bold, italics, and so on. Graphical browsers use
different typefaces, colors, and bold and italic formats to display
different HTML marks. Writers have to remember that each browser in
effect has its own HTML style sheet. For example, Lynx and Mosaic do
not insert a blank line before unnumbered user lists, but Netscape
does.
If you want to see how your browser handles standard and non-standard
HTML, try the WWW Test
Pattern. The test pattern will show differences between your
browser, standard HTML, and other broswers.
Creating HTML
Creating HTML is awkward, but not that difficult. The most common
method of creating HTML is to write the raw markup language using a
standard text editor. If you are creating HTML yourself, we have
found the chapter Authoring for the Web in the O'Reilly book "Managing Internet
Information Services" to be an excellent resource. You might also
find the HTML
Quick Reference to be useful.
Bob Green, founder of
Robelle, finds HTML
Writer to be useful for learning HTML. Instead of hiding the HTML
tags, HTML Writer provides menus with all of the HTML elements and
inserts these into a text window. To see how your documents look, you
must use a separate Web browser.
If you don't want to deal directly with HTML, you can get a WYSIWYG
HTML editor. On the PC, we have tried HoTMetal and the Microsoft Word
Internet add-on. HoTMetal is produced by
SoftQuad There is a free version, which we found somewhat
unreliable, and a professional version. HoTMetal probably works best
if you are writing HTML documents from scratch (we tried to edit
existing documents, some of which may have had invalid HTML).
Microsoft has produced a new add-on to Microsoft Word that produces
HTML.
The Internet Assistant is available from Microsoft at no charge.
You will need to know the basic concepts of Microsoft Word to take
advantage of the Internet Assistant. Since we are not experienced
Microsoft Word users, we found that the Internet Assistant didn't help
us much.
The HTML area of WWW is changing quickly. Users do not want to go
back to ascii text editing after they've used WYSIWYG editors for the
last several years. The Web itself carries a list of
WYSIWYG HTML editors for a variety of operating systems.
The Common
Gateway Interface (CGI) provides a method for WWW servers to
invoke other programs. You can write these programs with any tool or
language. They usually return HTML as their output. The Robelle WWW server statistics
are provided by a CGI script that runs the getstats
program.
Forms
The WWW supports
simple forms with text boxes, radio buttons, and pull-down lists.
Forms are processed by CGI scripts.