Early word processors used "markups". These were instructions embedded in the documents that specified how parts of the text should be presented and included effects like bold type, italics, superscript, etc. While this was very useful, these instructions were of little use in helping determine the content or meaning of any parts of the document. In the late 1960's, the utility of the "markup" idea was extended by some researchers (including Charles Goldfarb). These people realized that marking electronic documents with more general tags that indicated the meaning of parts of the documents could be very valuable. For example, many documents have a title, an author, a date, etc. If all of the similar portions of the documents in a collection were marked up in a standard manner, it would be possible to write programs that could locate documents by a certain author, produce outlines of all of the documents, etc. In addition, using the meaningful markup tags, one could print the documents in different styles based on preferences. For example, you could print a document in one font for one person and in another font for another person by specifying at the time of printing that titles should be printed in a certain size, the authors' names printed in italics, etc.



2.1 SGML

This concept proved so useful, that SGML was made into an international standard (ISO 8859) in the mid-1980's. While SGML is utilized by thousands of institutions, companies and organization, SGML is a very large, complicated standard and does not lend itself to casual use.



2.2 HTML

HTML was first developed by Tim Berners-Lee as a hypertext language for linking information among researchers. His system used a set of uniform addresses to refer to documents on different computers, a set of rules (a protocol) for transmitting the documents and a simple markup language for encoding the documents. He based his HTML on SGML, but used only a small subset of markers or tags. Documents using these tags could be interpreted by programs written for a number of different computer architectures. Because the programs and the computer platforms varied, the documents would appear differently in a program written for a very high-powered computer that could display different fonts, sizes and colors, than they would in a program written for a low-end text-based system. But the important notion was that these documents could be requested from a server, read by almost anyone with a computer and could provide links to other related information.


The World Wide Web spread quicker than anyone expected. People wanted to include more than just text in these documents. They wanted to include other media such as images, icons, text styles, and other media included. The rapid spread of the WWW left standards organizations in the dust. Two major browsers emerged, and gradually each responded to the demand for more complex media and more style in different ways with different tags. This diversion meant that pages developed for one browser may not be able to be interpreted by another browser. This was in direct opposition to the initial concept of the WWW, which was to provide universal access to information. Today, there are five major browsers, IE, Firefox, Safari, Opera and Chrome. All five adhear almost identically to each other when it comes to HTML, though legacy IE browsers (IE5, IE6, IE7, IE8) tend to cause the most issues due to their continued widespread use throughout the business world.


HTML is still a means of formatting text for display in a browser window. It is not a procedural programming language like C, C++, Java, Pascal, or Fortran. However, a variety of HTML, DHTML, has programming aspects to it. HTML5 is the latest standard version of HTML and is the new standard. HTML5 is the first major release in ten years and is helping make HTML5 a powerful tool once again. It is bringing the first updates to HTML forms since the release of HTML2.



2.3 XML

XML is another derivative of SGML developed by ISO in the mid1990's. It is called "extensible" because it does not consist of a set of tags like HTML. It is really a "meta" language – a language from which to create other languages. It provides a consistent set of rules for creating these other languages and can be thought of as a lighter version of SGML.


XML was developed by ISO in response to a need for consistency among browser markup languages (see XHTML) in response to the desire for a means of using tags that were meaningful in terms of content, not just page appearance. The latter is close to the initial goals of SGML, and, in fact, XML makes it easier to share SGML-style documents over the WWW. In a way, XML can be viewed as an intermediary between HTML and SGML.


XML furnishes a common syntax for the creation of specialized markup languages for any domain or discipline. It is not a procedural programming language like C++, Java or Fortran. It is a means of describing information that will be stored, transmitted to others and processed by a program written to interpret it. There are already specialized markup languages for Mathematics (MathML), Chemistry (CML) and business data (XBRL). With XML, a group can create a markup language for books, games, sports, teams, people, animals, finance, products, services, etc. (Deitel, Deitel & Nieto, 2001, p25) For example, a book may be described in XML like this:

<book>
<title>Gone with the Wind</title>
<author>
<firstname>Margaret</firstname>
<lastname>Mitchell</lastname>
<flag gender="F" />
</author>
<publisher>Warner Books</publisher>
<isbn>0446365386</isbn>
<review>
Sometimes only remembered for the epic motion picture and
"Frankly … I don't give a damn," Gone with the
Wind was initially a compelling and entertaining novel.
It was the sweeping story of tangled passions and the rare
courage of a group of people in Atlanta during the time of
Civil War that brought those cinematic scenes to life. The
reason the movie became so popular was the strength of its
characters--Scarlett O'Hara, Rhett Butler, and Ashley Wilkes
--all created here by the deft hand of Margaret Mitchell, in
this, her first novel.
</review>
</book>

The XML code shows an element book which has subelements title, author, publisher, isbn and review. The subelement, author, has subelements for the first and last names of the author. It also contains an empty element, flag. The flag element has an attribute, gender which indicates the gender of the author as "M" or "F". Empty elements can either be enclosed by placing the slash at the end of the beginning tag or by explicitly using a closing tag. In other words the following two set of tags are equivalent.

<flag gender="F" />

<flag gender="F"></flag>

In order to process an XML document, an XML parser is employed. The XML parser locates the tags and comments in an XML document. Programs written in Java, C++ or other languages can then respond to the elements found by the parser. For example, one might write a program that displays a book's title in bold print and the author in italics. More complicated programs might also search for other books by this author and provide a link to those records.


XML documents can optionally reference a DTD or a schema. The DTD or schema contains a formal definition for the XML used in a document. Some parsers will check the DTD or schema and check to see that the tags used in the document conform to the formal definition.


If we wanted to place the XML code above in a complete XML document, it might look like this:

<?xml version="1.0"?>
<!-- gwtw.xml -->
<!DOCTYPE book SYSTEM "book.dtd">

<book>
<title>Gone with the Wind</title>
<author>
<firstname>Margaret</firstname>
<lastname>Mitchell</lastname>
<flag gender="F" />
</author>
<publisher>Warner Books</publisher>
<isbn>0446365386</isbn>
<review>
Sometimes only remembered for the epic motion picture and
"Frankly … I don't give a damn," Gone with the
Wind was initially a compelling and entertaining novel.…
</review>
</book>

The first line is an optional declaration indicating that this document conforms to a particular version of XML. The next line indicates the name of this particular file, using ".xml" as the file extension. The third line indicates that the root element of this file is "book" and that this file based on a DTD found in "book.dtd". The keyword SYSTEM denotes the location of the external DTD.


The DTD for the book XML document might be as follows:

<!-- book.dtd -->

<!ELEMENT book (title, author+, publisher, isbn, review*)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (firstname, lastname, flag)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT flag (EMPTY)>
<!ATTLIST flag gender (M | F) "F">
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT isbn (#PCDATA)>
<!ELEMENT review (#PCDATA)>

The first ELEMENT defines the rule for a book. It says that a book consists of a title, one or more authors, a publisher, isbn and optional reviews. The plus + after author indicates that there can be more than one author. Other indicators used with XML are the asterisk, *, indicating optional elements that can occur not at all or any number of times, and the question mark, ?, indicating optional elements that can occur at most once.


The title ELEMENT contains a flag which indicates that the content of a title is of type, parsed character data (#PCDATA). Parsable data should not contain any markup characters used to indicate tags, such as the ' < ' and ' > ' angle brackets. To include these characters in the content of XML tags, the entity codes discussed under the HTML section should be used (e.g., <, >, etc.). Another type recognized by XML is character data (#CDATA) which indicates one or more character that the parser will not process (i.e, the parser will not look for other tags in this type of element). The other types that can occur are ANY and EMPTY.


The ELEMENT author specifies three child components, firstname, lastname and flag. The flag ELEMENT indicates that this is an empty tag – there should be no content between the beginning and ending tag (see above). The ATTLIST for flag defines the gender attribute of the flag element. This attribute can be either M or F and the default for a flag that is omitted will be and " F ". (The example above was adapted from one in Deitel, Deitel & Nieto, 2001, p644)



2.4 XHTML

XHTML is the most widely used version of HTML and was the standard until the recent promotion of HTML5. While HTML5 may be the newest standard, the power and popularity of XHTML makes learning it essential. This reference focuses on XHTML. XHTML combines aspects of HTML but, unlike HTML, is extensible like XML. XHTML conforms to the stricter syntax rules of XML. The differences between HTML and XHTML include the following:

  1. All XHTML tags and attributes must be in lowercase letters. HTML allows either case, and even mixed case.
  2. All non-empty elements must have corresponding closing tags. This means that the <p> tag used in HTML to mean "paragraph" must have a corresponding ending </p> tag in XHTML. In XHTML, the <p>…</p> tags mark the beginning and ending of a paragraph.
  3. Tags that are usually empty in HTML, such as horizontal rule <hr> tags and line break <br> tags can use the combined beginning/ending empty tag format seen in XML. So, the horizontal rule tag in XHTML is <hr /> and the line break tag in XHTML is <br />. Similarly, an HTML image tag such as <img src="arrow.gif" alt="next"> becomes <img src="arrow.gif" alt="next" />. Note the space between the tag information and the "/>" in these tags.
  4. In XHTML, nested tags must be closed from the inside out. HTML would often accept a line such as: <i><b>This is bold italics.</i></b> However, XHTML requires that the inner bold <b> tag be closed before the outer, italics <i> tag. Thus this line in XHTML is: <i><b>This is bold italics.</b></i>
  5. In XHTML, all attribute values must be in quotes, even numeric values such as a border width on an image.

A very minimal XHTML document is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>An XHTML document</title>
</head>

<body>
<p>
This document uses XHTML instead of HTML.
</p>
</body>
</html>

The basic XHTML document begins by specifying the <!DOCTYPE html>. In the section on XML above, the <!DOCTYPE html> was a DTD (document type definition) that was local. The official XHTML definition is located at W3 Consortium.


The beginning html tag must include the attribute xmlns and the value "http://www.w3.org/1999/xhtml". This attribute identifies the namespace of the document.


There are some other differences between HTML and XHTML, but the transition from HTML to XHTML is relatively painless. One beneficial aspect of XHTML, is that the more rigid structure identified by the <!DOCTYPE html> at the beginning of the document, facilitates the use of validators which can verify whether or not the document has correct XHTML syntax. One such validator can be found at http://validator.w3.org/.



2.5 DHTML

DHTML is a means of animating, varying, controlling aspects of webpages and responding to user actions without contacting the server and downloading a new page. DHTML is implemented using combinations of HTML, style sheets and scripting. One should commit to using DHTML with caution because there is no agreed upon standard for DHTML. This means that the major browsers, IE, Firefox, Safari, Opera and Chrome, use different versions of DHTML. A page developed for one browser will not necessarily work correctly when viewed with the other. Scripts can be written that detect the user's browser type and version and users can be directed to the correct version of pages developed for their browser. Of course, this entails more work from the developers and designers. More details of the browser differences and how to cope with them can be found at Web Developers, http://www.wdvl.com/Authoring/DHTML/CB/. The W3 Consortium is addressing the foundations of the differences between the two browser's DOM, hoping to come up with a consensus and one standard for DHTML. A DOM provides the interface between the HTML, CSS and scripting language (usually JavaScript).


In DHTML, an HTML document is viewed as an object hierarchy consisting of the nested elements in the document. Each object or element in the document can be identified by an id attribute. The attributes of the objects can then be manipulated by JavaScript. Look at the following simple example adapted from an example in Deitel, Deitel & Nieto, 2001, p436. When the document is loaded, the change() function will be called. This function puts a small input box on the screen which will contain the text enclosed in the <p> tag in the document body. When the user clicks the "OK" button on the box, the text in the body of the document will change to "You clicked the OK button".

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<head>
<title>An Example of DHTML document</title>
<script type="text/javascript">
<!--
function change()
{
alert( pText.innerText );
pText.innerText="You clicked the OK button!";
}
// -->
</script>
</head>
<!-- DHTML Demo -->

<body onload="change()">
<p id="pText">
This text will change, thanks to DHTML.
</p>
</body>
</html>

References

  1. Bos, Bert, XML Introduction (accessed August, 2002) http://www.w3.org/XML/1999/XML-in-10-points
  2. Dietel, H. M., Dietel, P. J. & Neito, T. R. (2001) Internet & World Wide Web: How to Program. 2nd Edition. Prentice Hall, NJ.
  3. Flynn, P. XML FAQ (accessed August, 2002) http://www.ucc.ie/xml/#acro
  4. Richmond, A. The Web Developers Virtual Library Introduction to XHTML (accessed August, 2002) http://www.wdvl.com/Authoring/Languages/XML/XHTML/
  5. Veen, J. (2001) The Art and Science of Web Design. New Riders: Indianapolis, IN.
  6. HTML and XHTML Information www.w3c.org/markup

Introduction to Web Design by Cynthia J. Martincic :: Credits