CSE2325/3325 lect3

FIT3084: eXtensible HyperText Markup Language (XHTML)

In the previous lecture:

The WWW is a document network linked by hyperlinks.
The WWW and Web browser mask the complexities of accessing computers and files on the Internet to simplify the retrieval of information from remote computers.

In this lecture:

What is a URL and how is it used to retrieve data from the WWW?
What is HTML and how is it used to construct a web page?
Suggested reading: Sebasta, chapters 2 and 8.1-8.5.

Identifying files on the Internet.

The Internet ("The Net") is a global network (of networks) of computers.

Every computer on the Net has a unique numerical address (an IP address) and a people-friendly equivalent.

130.194.64.81 ...is the numerical address for... molly.cs.monash.edu.au

The Net is divided into domains, and subdomains.

molly	is the machine name.
cs	is the Computer Science subdomain.
monash	is the Monash University domain.
edu	indicates the address is educational. What other extensions are there for different types of institutions?
au	indicates the address is Australian. What other extensions are there for different countries?

Every file on a computer has a filename unique for that machine. When appended to the IP address of its host computer, every file on the Internet therefore has a unique name.

Steps for Retrieving Documents from the Web.

Computers on the Internet called name servers keep lists of numerical IP addresses & people-friendly names and translate between them.

1) A web browser (client) sends a request using HyperText Transfer Protocol (HTTP) for a document, specified by its unique name, to a remote (server) machine.

The unique file name is specified within a Uniform Resource Locator (URL)...

Protocol://server_domain_name/file_path

The protocol may be omitted within some web browsers in which case HTTP is assumed.

Absolute URL's

http://www.cs.monash.edu.au/~aland/index.html

ftp://ftp.cs.monash.edu.au/pub/

are absolute because they include a domain name and a path.

Relative URL's

index.html

../index.html

are relative because they specify a path and domain name by reference to (usually) the URL of the file currently open in the browser (often referred to as the base).

Locations within documents

http://www.cs.monash.edu.au/~aland/index.html#chapter

index.html#fred

The text after the # symbols indicates a location within the document specified by the URL.

These locations are named whilst the document is being created. The #location is an optional part of a URL. When would it be useful to specify a location within a document in a URL?

2) A web server program on a remote machine always 'listens' on a 'well-known' port for incoming requests. (Port 80 for HTTP)

3) The web server checks client access privileges, if all is well, it sends the requested document.

4) Browser displays document retrieved from server on client machine in human-readable form.

A web document is anything accessed with a single request from a client to a server.

Try this in your own time*

Commands to type.	Explanation.
telnet www.csse.monash.edu.au 80	Telnet to the school's WWW server (on port 80)
GET /index.html HTTP/1.0	Access the web page "index.html" using the GET command which the browser would normally do for you. Follow your command with two carriage returns.
>> The server should send you the HTML of file "index.html"	See? The protocol isn't magic, you can participate in it manually.

* A little exercise taken from Lloyd Allison's old notes

Hyper Text Markup Language (HTML)

HTML is a document-layout and hyperlink specification language that was derived from Standrard Generalized Markup Language (SGML).

HTML tags specify:

the structure of text and embedded elements (images, sounds, tables etc.) of a document;
hyper-links to other web pages.

Several versions of HTML were approved by the WWW Consortium (W3C) (see http://www.w3.org/ ). The last of these versions was HTML 4.01 approved in 1999.

How was HTML supposed to be used?

HTML was intended for specification of document structure, not control of document appearance.

I.e. HTML was not originally intended for graphic design & typography.

Originally the browser interpretted and displayed a document's elements as it liked. Hence the final appearance of a document was up to the client browser, not the HTML author.

E.g. HTML allows the specification of a Heading level 2 but the client decides that all Headings level 2 shall be displayed in bold, 12pt Times Roman text (or otherwise).

HTML moved towards specification of document appearance with the addition of Style Sheets and tags that allowed specification of exact fonts and colours, but...

...differences between the way browsers of different authorship displayed and interpretted HTML made things tricky for designers.

Some browsers incorporated proprietary extensions to HTML...

...which did not work on other browsers (eg. Micro$oft Explorer & Netscape Navigator).

eXtensible HyperText Markup Language (XHTML)

HTML is everywhere, but it is sloppy all around: it is sloppily coded and sloppily interpretted by browsers.
Since the year 2000, XHTML has been approved by W3C.
XHTML has strict syntactic rules based on XML.
XHTML may be checked for correctness automatically using a validation tool.

Writing Your Own XHTML

For best results use one of the following:

An ordinary text editor (or one extended for XHTML)
A 'what you see is what you get' (WYSIWYG) editor

For often poor results, use:

A word processor with XHTML export

You will also require:

A web browser
(You can use the web browser to view XHTML files from your local drive. Files don't need to be uploaded to the Net until you are ready for everybody to look at them.)
An image editor

XHTML tags

Appear in lowercase characters between < and > symbols in the XHTML code
Assist page layout by specifying the type of content between them
Are not displayed by the browser (they are parsed by it though!)
Always have an open / close pair: <tag> </tag>
If the open / close tag pair do not contain any content between them, they may be written <tag /> to indicate both the open and close tag. *Note: The space before the forward slash in the tag is required.

Sample XHTML document code

In the sample code that follows, for interest's sake, XHTML-specific tags are marked in blue. The remaining tags were also present in HTML although it was possible to get away with missing some of them out altogether.

<?xml version = "1.0" encoding = "utf-8"?>

<!DOCTYPE html PUBLIC "-//w3c//DTD XHMTL 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<head>

<title> A silly, simple, sample page </title>

</head>

<body>

< !-- Document content goes here -->
<h2>A Grand Day</h2>
<p>
Oh what a <i>lovely</i> day <br /> for a walk!
</p>
<p>
Let's wander over to CEMA's <a href="http://www.csse.monash.edu.au/~cema">home page</a> and take a look!
</p>

</body>

</html>

The page produced by this code is available.

Some special tags

<?xml version = "..." encoding = "..."?>
is an XML declaration that what follows is based on XML. It includes the version number and unicode format of the XML.
<!DOCTYPE ... >
is an SGML document type definition (DTD) stating (amongst other things) that the document complies with XHTML 1.1.
<html xlmns = " ... "></html>
tags tell the browser the document is entirely HTML. xmlns refers to an XML Name Space that contains the specification of the tags. It is included for the same reason name spaces are required in programming languages like C: so that conflicting definitions for names do not cause problems.

An XHTML document has two parts, a head and a body...

<head></head> tags contain information about the document (e.g. A <title> for the page)
<body></body> contains document content
demarcate comments which are ignored by the browser but useful to humans
A hyperlink (usually to another document) is specified with the anchor tag:

<a href="linked_to_doc_URL#anchor_name"> clickable elements go here </a>

An anchor within a document can be made:

<a name="anchor_name"> clickable elements go here </a>

Inline Images

The basic requirements for an image image tag are the source (src) attribute (a URL) of the image and some alternate (alt) text to display if images are turned off. (See these important notes on accessibility).

Additional attributes of the image tag may also be added. For example,

width & height attributes allow page layout to occur whilst/before images are downloaded and allow the image to be scaled by the browser.
border determines pixel width of the image border. This is especially useful if the image is used as a hyperlink...

border=0

border=1

border=3

align can be left, center or right and determines where text will be placed in relation to the image
Images must be in the GIF, JPEG (or PNG) format for inline display.

Image Maps

Images may be over-laid with regions.
Each region may be made a hyperlink activated by a mouse click.
Image maps may be client or server side.
Client side server maps are more efficient. Server side maps are hardly used nowadays.
A sample XHTML image map:
<img src="myImage.JPG" usemap="#myImageMap" />
<map id="myImageMap">
<area href="page1.html" shape="circle" coords="152,113,14" alt="page 1" />
<area href="page2.html" shape="polygon" coords="241,64,235,91,332,91,338,67" alt="page 2"/>
</map>

Click on the bugs above to see an image map in action!

(See these important notes on accessibility)

Ordered and Unordered Lists

<h4>Spot the odd one out</h4>

<ul>
<li>Tomatoes</li>
<li>Potatoes</li>
<ul>
     <li>sweet</li>
     <li>rotten</li>
</ul>
<li>Elephantoes</li>
</ul>

Spot the odd one out

Tomatoes
Potatoes

sweet
rotten

Elephantoes

<h4>Spot the dog</h4>

 <ol>

<li>Collar</li>
 <li>Cat</li>
 <li>Caterpillar</li>

</ol>

Spot the dog

Collar
Cat
Caterpillar

Additional things to research

Tables - very useful for laying out pages.

this	is
a simple	table

<tr>
<td>a simple</td>
<td>table</td>
</tr>
</table>

Forms - useful to obtain data from users

Formatting and other tags

Tags such as <br />, <hr />, <span>, <div> & <meta> are all handy to know about, as are many others... do some reading to find out what tags are available. Some of them will be touched upon in later lectures.

Web References:

http://www.w3.org/MarkUp/ - the official spec's for XHTML (and HTML).
http://validator.w3.org/file-upload.html - a checker to be certain your web pages comply with the standard.

This lecture's key point(s):

A Uniform Resource Locator consists of a protocol, domain name and path which is unique to every document on the WWW and is used to retrieve that particular document.
XHTML is a simple method of identifying sections of text, images, tables etc. to be displayed within a web browser in formats governed largely by the web browser itself.

Courseware | Lecture notes