The
explosive growth of the World Wide Web since its inception in 1993 has
seen growing expectations from the user base, in terms of the
capabilities of tools and the means of representing information. First
generation web standards, such as early variants of HTML and those
beloved mainstays of web imagery, JPEG and GIF87/89, are less than ideal
for the delivery of the kind of content many users now seek.
Of particular importance are the limitations of existing
standards in controlling the presentation of data and imagery, the
frequently poor density of imagery data, but also the difficulties
experienced in machine driven analysis of web content.
The latter has proven to be a major issue, as it limits what
can be usefully done by automated systems such as search engines. We
have all had the experience of typing in to a search engine what would
appear to be unambiguous specifiers, only to be inundated with mostly
irrelevant URLs. What is an annoyance to a human reader who has the
intelligence and the ability to use heuristics to filter out irrelevant
content, amounts to an extremely difficult if not intractable problem
for automated system. Without the ability to use heuristics as a human
being does, automated systems sink very quickly indeed.
Other problems with the existing HTML 1-3 based web standards
are the cumbersome handling of mathematical expressions, a major
hindrance to the wider use of hypertext for publishing scientific
papers, and the frequently highly inconsistent presentation of markup
across different browsers. We have all had the experience of trying to
produce a web page layout only to find out that it doesn't quite look
the same on browsers other than that used during the testing of the
webpage.
Which of these are the most pressing concerns at this time ?
Evidently, judging from activity in web standards development, all areas
are considered to be serious issues.
In this month's feature we will briefly survey some of the
current draft standards or recommendations being explored by the World
Wide Web Consortium (W3C at http://www.w3.org/TR/), to provide the
reader with some idea of current trends in web publishing standards.
Vector Graphics Standards
One of the most glaring inadequacies of the existing web
standards base is the inability to support vector graphics particularly
well. Vector graphics are images which are represented by mathematical
descriptions of lines, shapes, curves, and areas, rather than fields of
pixels set to individual colours.
Vector graphics have some highly attractive qualities for
presentation:
-
They are very dense, in the sense that often complicated
entities can be described with fairly simple and terse expressions. Even
without lossless compression, an image of considerable complexity can
be presented in a file of modest size. With lossless compression the
size is frequently comparable to bitmap imagery of much lower
resolution.
-
They provide quality which is limited only by the
rendering device, such as a display running the browser or laser
printer. Doubling the rendering resolution (ie zooming) of a bitmap
image produces something of significantly inferior quality, whereas a
vector graphic representation of the same image will usually lose no
quality if we zoom in.
-
Lack of ambiguity in representation. Many bitmap formats
vary in presentation quality with the quality of the algorithm used to
decode and render the image. JPEGs are a typical example. Vector images
do not suffer this problem as frequently.
The ability to cleanly translate the image into another vector
representation with no loss of information and thus image quality. If an
alternate vector representation standard can support the same graphics
primitives, an exact mapping can be performed.
There are many vector graphics based file formats in
existence, examples being Postscript/EPS, Computer Graphics Metafile
(CGM), HPGL, Adobe Illustrator and a host of others.
Vector graphics representation is the basis of all engineering
drawing packages, and it is by far the best way of representing charts,
plots, graphs, line drawings and line illustrations.
For a web environment, this technique is close to ideal, since
the quality of the image is limited primarily by the quality of the
browser and the display or printer. The currently widespread use of GIF
and JPEG for this purpose usually does a disservice to most such
imagery.
At this stage two schemes are being explored as the basis of
future web standards.
The first is the Scalable Vector Graphics (SVG) language,
derived from the XML standard. The aim of SVG is to provide a genuine
vector graphics language, capable of representing conventional graphical
shapes, such as lines, areas, curves, text and embedded bitmap images,
which can be grouped, styled, transformed and composited. Facilities
exist for shading and colour transitions. The SVG model goes further
than conventional vector representations, since it embeds features
allowing access to global attributes, as well as providing facilities
for scripting. Animation can be incorporated, extending the established
XML model.
SVG is now at the Candidate Recommendation phase in the W3C
standards scheme, while effort continues on the related SMIL animation
scheme.
Another proposal proceeding through the same standardisation
system is the WebCGM standard, a web optimised subset of the widely used
ISO Computer Graphics Metafile standard (ISO/IEC 8632:1992) vector
graphics standard. CGM has been a mainstay of the CAD/CAM industry, and
also is the only industry standard vector format accepted by most
Microsoft tools (... hint for xfig users who need to submit work in MS
Word or PPT).
The attractiveness of WebCGM lies in its huge established
industry base, which makes the development of robust output filters a
very economical proposition, since it amounts to little more than
tweaking existing and often very mature code. The WebCGM proposal is
largely derived from the CGM Open consortium (see
http://www.cgmopen.org/) CGM model, and the ATA (Air Transport
Association) CGM profile, the latter was simplified to conform to
existing W3C standardisation requirements.
While SVG is a newer and arguably a more flexible scheme than
WebCGM, the latter is almost off-the-shelf. Incorporation of either or
both into future browsers will provide an unprecedented increase in the
quality of web graphics, and hopefully will also see the eventual
disappearance of those tedious, oversized GIFs and JPEGs so popular with
graphics rich websites.
Bitmap Graphics Standards
Another important graphical standard to recently emerge is the
Portable Network Graphics (PNG) standard, designed to replace the GIF87a
and GIF89a standards, without the legal encumbrance of the compression
standard in GIF. PNG is however a much more sophisticated standard than
GIF, and in some respects is considered a replacement for the Adobe
TIFF format.
PNG retains many of the nice properties of GIF, it supports
progressive display during a download, transparency, lossless
compression and supports 256 colour indexed images. However, it also
provides new features, such as 48 bit per pixel true colour, 16 bit per
pixel grayscale, a per pixel alpha channel for transparency information,
embedded gamma correction to accommodate arbitrary displays, reliable
error detection using a 16-bit CRC code and is faster to render
initially than a GIF.
In many respects, PNG fills the gap between the legacy
technology GIF, evolved in the era of 8-bit graphics adaptors, and JPEG,
which as a lossy compression scheme frequently damages fine detail in
bitmap images. Web page designers who are fussy about bitmap image
quality, this writer inclusive, have long suffered the indignity of
putting up with either miserable colour palletes in GIFs or loss of
sharpness in JPEGs. PNG provides a TIFF-like large colour pallete, but
does so with lossless compression to always retain high image quality.
Markup Languages - HTML
4.01
The mainstay of the current web is the trusty Hyper Text
Markup Language, or HTML. A number of HTML versions remain in use, with
sites being written around versions 1, 2 and 3. The latest incarnation
of basic HTML is HTML 4.01, yet another incrementally improved variant
of the basic product.
Like earlier versions of HTML, HTML4 is build around the well
loved mechanisms of the HTTP protocol, hypertext, and the Universal
Resource Identifier (URL for traditionalists). While earlier
standardisation efforts were aimed primarily at forcing
interoperability, HTML adds numerous pieces of extra functionality:
-
style sheets.
-
scripting.
-
frames.
-
embedding objects.
-
text other than left to right read.
-
better table support.
-
better forms support.
-
ISO/IEC:10646 standard support for international character
sets.
The extensions to the basic HTML we love and know so well are
quite extensive in a number of areas.
The changes to the table scheme is built around the IETF RFC
1942 model, and is designed to add column groups and column widths, the
latter to allow display as the table data is received by a browser.
The IMG and APPLET elements in earlier HTML are now replaced
by the generic OBJECT, which is intended for displaying images, video,
sound, mathematical equations, and includes provisions for alternate
renderings where the browser cannot handle the intended rendering.
A big enhancement to HTML4 is the adoption of a style sheet
mechanism to control the layout of a document. Style sheets will be
either embedded in the HTML document, or provided via an external style
sheet document, and will cover elements or groups of elements in a HTML
document. With a style sheet mechanism, HTML authors will be able to
locally and globally control attributes such as font information,
alignment and colors in a HTML document.
Scripting support is improved in HTML4, the intend being to
allow for dynamic form web pages which adaptively change as the reader
types responses into them.
In perspective, the biggest gain to be seen from HTML4 is the
style sheet mechanism, since it provides a means of putting some
consistency into website presentation. This has been, arguably, the
greatest weakness of the HTML markup mechanism to date.
XHTML 1.0 - The Extensible
HyperText Markup Language
The next step in the evolution of HTML is XHTML, which in the
simplest of terms is a reformulation of the HTML 4 standard in XML. The
long term aim of the W3C standards community is a to XML, and XHTML is
an important transitional phase in this process, since it provides a
bridge between what will be established HTML 4 applications and the
future generation of XML based browsers and production tools.
XML is a more powerful SGML derivative than the relatively
lightweight HTML, itself also derived from SGML. SGML is considered to
be extremely complex, and powerful, and its complexity has largely been
the reason why it has not become widely adopted in practical tools.
HTML, as a minimal subset of SGML, could not keep up with the
expectations of web users, and the strategy of migrating to XML is
intended mainly to bypass the plague of proprietary enhancements to
HTML, which has been a feature of the web in recent years. By providing
a standard which is powerful enough to make all proprietary HTML
variants irrelevant, the W3C aims to discourage proprietary players from
contaminating the standard with incompatible modifications.
One of the basic aims of the XML standard is to make to easy
to define new types of markup, and XHTML is intended to allow
straightforward inclusion of XML additions into XHTML documents. Another
important aim si to provide mechanisms which allow web servers to
optimise the presentation of web site content, depending upon the type
of browser and display device being used to access it.
The XHTML proposal describes three categories of compliance
for a document: Strict, where only mandatory features are used,
'Transitional' and Frameset, which correspond to their HTML 4
equivalents. The XML namespace mechanism will be supported.
The XHTML proposal also details important differences from
HTML4:
-
Documents must be well formed, i.e. closing tags must be
used, nesting must be used, and it is not permitted to overlap elements
in the manner tolerated by many browsers (i.e. sloppy HTML syntax).
-
Element and attribute names, i.e. tags, must be in lower
case, since XML is case sensitive. In XML, <li> and <LI>
amount to different things.
-
End tags are mandatory, and the common HTML practice of
implied end tags is not permitted in XHTML. The classical
<p>blah<p>blah<p>blah construct is illegal and must
be presented as
<p>blah</p><p>blah</p><p>blah</p>.
-
All attribute values must be quoted. The example cited is
<table rows="3"> against <table rows=3>.
-
Attributes cannot be minimised. Constructs like <dl
compact> are illegal.
-
Empty elements must be denoted as such, using an end tag,
or shorthand, e.g. <br/><hr/> against <br><hr>,
which is illegal.
-
Leading and trailing whitespaces are stripped from
attributes.
-
An element associated with a script or style should be
declared as having #PC DATA content.
-
The SGML exclusion mechanism is not supported in XHTML.
-
Fragment identifier naming is changed. In HTML4, elements
of types a, applet, form, frame, iframe, img, map have the attribute
name. In XHTML, the fragment identifier is ID and it replaces the
HTML model, to comply with XML syntax.
The XHTML proposal provides guidelines for making XHTML documents
viewable using HTML compliant browsers. A number of syntactic tricks
are applied, to accommodate problem areas such as empty elements,
element minimisation, embedding style, line breaks, fragment
identifiers, character encoding, boolean attributes, etc. Careful use of
the syntax will allow the creation of documents which are XHTML
compliant, yet also compatible with most HTML4, and possibly earlier
HTML variants.
Document Object Model
The document object model is intended to provide a
interface mechanism through which programs and scripts can dynamically
access and update the content, structure and style of documents. The DOM
is to be independent of the type of platform used and language used.
The model is to incorporate a family of core interfaces
which are to create and manipulate the structure and contents of a
document, but also optional modules with interfaces aimed at supporting
XML, HTML, generic style sheets and Cascading Style Sheets.
The Phase 2 of the DOM specification is currently in the
midst of some argument over the mechanism for handling namespace URIs,
given the complexity and ambitious aims of the DOM, this should come as
no surprise.
Cascading Style
Sheets
The Cascading Style Sheets (CSS) specification is one of
the most interesting and potentially useful ideas in the new crop of web
standard proposals.
The intent of CSS, currently at Level 2 (CSS2), is to
wholly divorce the presentation style of a web document from its
content. Features are to include content positioning, downloadable
fonts, table layout, features for internationalization, automatic
counters and numbering, in addition to support for visual browsers,
aural devices, printers, braille devices, and handheld devices. An
inheritance property is included to allow style properties to propagate
from ancestor documents in a document structure, to inheritors.
Like other proposal specifications in this crop, CSS is
both ambitious and far reaching, and is designed to fit with HTML4,
XHTML and XML. In many respects, it aims to emulate the well established
LaTeX style file model, but also aims to accommodate greatly differing
media.
The basic model for CSS processing is that an agent such
as browser reads in a document, parses its content and creates a tree
structure to describe it. It identifies the intended media for the
document, and finds all of the style sheets either embedded in the
document or pointed to. Every element found in the document tree
structure will have particular properties for the presentation medium in
question, and each of these is assigned values given by cascading and
inheritance rules, and the contents of the style sheets. Given this
information, the agent then generates a formatting structure, to
describe exactly how it will render the document, and then renders the
document.
How CSS2 will appear in a HTML document is best
illustrated by purloining the example presented in the CSS2
specification document. The starting point is a tiny but complete HTML
document:
<!DOCTYPE HTML
PUBLIC "-//W3C//DTD HTML 4.0//EN">
<HTML>
<HEAD>
<TITLE>Bach's
home page</TITLE>
</HEAD>
<BODY>
<H1>Bach's home page</H1>
<P>Johann Sebastian Bach was a prolific composer.</P>
</BODY>
</HTML>
Using CSS2 to import an
external style sheet, we alter the document thus:
<!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.0//EN">
<HTML>
<HEAD>
<TITLE>Bach's
home page</TITLE>
<LINK
rel="stylesheet" href="bach.css" type="text/css">
</HEAD>
<BODY>
<H1>Bach's home page</H1>
<P>Johann Sebastian Bach was a prolific composer.</P>
</BODY>
</HTML>
The <LINK
rel="stylesheet" href="bach.css" type="text/css"> element identifies
the stylesheet, points to its location as bach.css and identifies
its format as text/css. In this manner, the document's
appearance can be globally changed by altering a single line in the
file. We need not be that aggressive, and the CSS2 specification does
allow embeddingof local style information (not unlike LaTeX). The cited
example is:
<!DOCTYPE HTML
PUBLIC "-//W3C//DTD HTML 4.0//EN">
<HTML>
<HEAD>
<TITLE>Bach's home page</TITLE>
<STYLE type="text/css">
H1 { color: blue }
BODY {
font-family: "Gill Sans", sans-serif;
font-size: 12pt;
margin: 3em;
color: red;
}
</STYLE>
</HEAD>
<BODY>
<H1>Bach's home page</H1>
<P>Johann Sebastian Bach was a prolific composer.</P>
</BODY>
</HTML>
Cascading Style Sheets are likely to become one of the
most popular and widely used features of the new look package of web
standards, and web designers would be well advised to watch developments
in this area very carefully.
Mathematical Markup
Language
The Mathematical Markup Language or MathML is clearly an
instance of the web assaulting that previously inviolate domain of LaTeX
and TeX, the markup of mathematical notation and structure.
The intent of MathML is to produce an XML based markup
language for the accurate representation of mathematical notation, which
is human readable, with the assumption that in most instances
conversion tools or WYSIWYG equation editors would be used to generate
the MathML source (we can safely assume that the first conversion tool
will be a LaTeX to MathML translator).
A detailed discussion of MathML syntax is best left to the
W3C website paper (http://www.w3.org/TR/REC-MathML/), but it is
illustrative to again purloin a W3C example from the specification
document:
(a + b)^2
can be represented in MathML as:
a + b 2
Summary
It is clear that the current standards development effort
in markup languages, and associated web oriented vector and bitmap
graphics standards, will transform the web we so love over the next
decade. The process will see a gradual transition to XML, though the
intermediate, XHTML, and the proliferation of standards such as CSS2,
MathML, SVG. WebCGM, PNG will see web users enjoying a quality of
presentation and interoperability difficult to imagine with today's
kludged package of standards.
Is there a down side to this process ? Arguably yes,
insofar as the simple text editor based production of web pages will
become increasingly tricky to get right, as the complexity and syntactic
tightness of these emerging standards asserts itself.
However, the benefits of the new standards clearly
outweigh the drawbacks, and the tired cliche the best is yet to come
definitely holds true here.
|