Doc Steve
Web Coding Service

Fully Accessible Web Code, Custom Written by Hand
Specializing in html, xml, css, and U.S.§508

Web Technical Notes

[ Technical Pages Home ]

Validating as XML with embedded non-parsable elements

Page Index
[ Skip Index ]
[ Validating with Embedded JavaScript ] [ Comments: Parsing Failures in New Browsers ]



[ TOP ]

Validating with Embedded JavaScript

A problem that is often encountered when migrating to xml is that pages will not validate whith an embedded javascript. Without knowing what causes the problem, many developers simply place the script into an external script library and then use the link element to call the desired functions. While this apprach works, and indeed produces the benefit of placing often-used functions into a central resource, it can be annoying when a simple script is needed in one file.

The underlying issue is that xml parses all elements in a document, bacause elements may contain pertinent sub-elements. The problem, then, is that in parsing the script element that all manner of speical characters are encountered (e.g., < and &), and fail on the validation. Even escaping the characters does not solve the parsing problem.

Note: Although there are five predefined entity references in xml . . .

< > & ' "

only the characters "<" and "&" are strictly illegal. Apostrophes, quotation marks, and greater than signs are legal and generally do not disrupt parsing.

The solution is actually easy: the code segment not to be parsed must be embedded in a CDATA block, done thusly:

<![CDATA[
   .
   .
   . 
]]>

So, for example,

<script type="text/javascript">
<!-- 
//<![CDATA[
  document.writeln("\<table summary="Test Only">")
  document.writeln("  \<tr>\<td>Hi!\</td>\</tr>\</table>")
// ]]> -->
</script>

Note the generous use of comment tags there, too: the JavaScript engine allows the string "<!--" to occur at the start of a SCRIPT element, and ignores further characters until the end of the line. JavaScript interprets "//" as starting a comment extending to the end of the current line. This is needed to hide the string "-->" from the JavaScript parser (viz-a-viz these comment strings, analogous constructs may be used with other scripting languages according to their own comment syntax). Generally, the code works without the comments as well.

Reference:

W3C Extensible Markup Language (XML) 1.0 (Second Edition), § 2.7, "CDATA Sections"


[ TOP ]

Comments: Parsing Failures in New Browsers

A related issue is comments, and since all good programming includes plenty of comments (i.e., self-documentation), this may be important.

Written as the following, this is the typical comment:

  <!-- comment -->

In the new browsers (IE6, Mozilla 1), if you document your html with comments, and you index your comments with a double-dash, you get a failure:

  <!-- This is how I did this: 
  -- I did this
  -->

And this also does not work:

<!-- This is how I did this: 
  -- (1) first I did this
  -- (2) then I did that
  -- (3) then I did that other thing
  -->

But this does:

<!-- This is how I did this: 
  -- (1) first I did this
  -- (2) then I did that
  -->

Which probably looks strange at first: one might think that these should all work, and they have, but in some manner it is observed to fail with the new browsers. If you do it, all hell breaks loose, as the comment is actually still open. But then a pattern emerges: the second double dash is the problem.

If you do any of the following, it works fine:


So what was happening here?

The pattern observed is that an odd number of indexes (1, 3, 5 ...) fail. However, it can be closed cleanly with a double close ( --> --> or simply -- --> ), regardless of the number of odd-number increments.

What is happening is that even without the opening angle-bracket/exclamation-point the double-dash is being treated as a comment-open indicator, and likewise without the closing angle-bracket it is being treated as a comment-close indicator. Therefore, even numbers come out (ahem) even, while odd numbers require the double close. And one cannot simply always close such comment lists with a double close, since the open & close must always be in pairings, with no open-comments left hanging. Neither Mozilla nor IE6 handle either form of the double close as a comment with even numbers of increments.

While it may seem really odd how Mozilla is picking up the double dash as a comment (but only within an already correctly-opened existing comment), the Mozilla & IE6 parsers are written to do exactly what they are doing, with the basis for it in the W3C's specifications for the comment element. The following is from the html 4 specifications:

White space is not permitted between the markup declaration open delimiter("<!") and the comment open delimiter ("--"), but is permitted between the comment close delimiter ("--") and the markup declaration close delimiter (">"). A common error is to include a string of hyphens ("---") within a comment. Authors should avoid putting two or more adjacent hyphens inside comments.

So there.

Solution:

So don't do that -- don't increment with a double dash -- use a number, some other symbol, or split the double dash -- and be on the lookout for legacy pages that fail when taking this directive to heart.

Reference:

W3C HTML 4.01 Specification; Chap. 3, "On SGML and HTML"; § 3.2.4: "Comments"


Document: http://
Revised:
TOP ]
HOME ]

Made with Cascading Style Sheets  | Valid CSS!  | Valid XHTML 1.1!  | Level Triple-A conformance icon, W3C-WAI Web Content Accessibility Guidelines 1.0  | Bobby WorldWide Approved AAA


Contact DocSteve


Copyright © 2004
Steve Sconfienza, Ph.D.
All Rights Reserved