Affiliations: CTO, Opera Software; Chairman, YesLogic
Date: 2009-03-31
This case study is part of a series.
Footnotes, or references, in Wikipedia articles are created with the <ref> element in the wiki markup. Here is a simple example from the article on Norway:
The wiki markup is converted to HTML code in Wikipedia's servers. The above code will result in two HTML chunks being generated: a paragraph with a footnote call, and a footnote with a footnote marker. Here is the resulting HTML code of the paragraph with the footnote call:
And here is the resulting HTML code of the footnote with the footnote marker:
In a browser, the HTML code is presented approximately like this:
In 1319, Sweden and Norway were united under King Magnus Eriksson. In 1349, the Black Death killed between 50% and 60% of the population,[17] resulting in a period of decline, both socially and economically.
...
The presentation is fine, but the underlying markup can be simplified.
It is possible to simplify the HTML markup while still retaining the same presentation in common browsers. The key to the simplifications is to describe the presentation in CSS instead of using presentational HTML elements. Also, the number of elements are reduced by having one element serve both as a source and target anchor. The presentation, and the link functionality, remains the same:
In 1319, Sweden and Norway were united under King Magnus Eriksson. In 1349, the Black Death killed between 50% and 60% of the population,17 resulting in a period of decline, both socially and economically.
...
The CSS code used to style the footnote call is simple:
a.ref { font-size: 83%; vertical-align: 35% }
Also, the square brackets around the footnote call are generated by the style sheet:
a.ref:before { content: '[' } a.ref:after { content: ']' }
Some legacy browsers do not support CSS 2.1 generated content (most notably IE6 and IE7). To allow for graceful degradation in these browsers, one must insert the square brackets into the markup, rather than in the style sheet. This is done in this example:
In 1319, Sweden and Norway were united under King Magnus Eriksson. In 1349, the Black Death killed between 50% and 60% of the population,[17] resulting in a period of decline, both socially and economically.
...
The HTML code used in this example is:
This is also the HTML code that I propose for Wikipedia to use.
The table below compares the current markup of Wikipedia articles with the two proposals.
elements | attributes | pseudo-elements | |
---|---|---|---|
current markup | 7 | 9 | 0 |
CSS-based solution | 3 | 7 | 2 |
CSS-based without generated content | 3 | 7 | 0 |
As can be seen, the number of elements is halved, and the number of attributes is reduced. The reported gain is per footnote. Mature articles can have many footnotes. For example, the article on Unites States has more than 200 footnotes. Only inline elements and attributes on inline elements are counted in the table above.
The gains in the two CSS-based solutions are simlar. The version that uses generated content may be slightly more complex in browsers due to it using two pseudo-elements per footnote. However, it also leaves more flexibility; the square brackets in the pseudo-elements can be removed or exchanged with other content.
The gains, as described in the table, are achieved by:
b
, sup
a
element
span
elements around the square brackets in the footnote call
Also, some of the code is slightly more complex:
The markup proposed in this study isn't just simpler, it also allows for better reuse of the content. Consider this code in the current markup:
<b><a href="#cite_ref-16a" title="">^</a></b>
and this code in the new markup:
<a class="backref" href="#cite_ref-16b">^</a>
As can be seen, the new markup has added a class ("backref") to the element that contains the aback reference (which points to footnote callout). This way, the character that denotes the back reference ("^") can be removed when necessary. For example, it doesn't make sense to show the character in a printed version of the article.
CSS 2.1 offers features that can further improve the presentation of the content. Here is a simple example where square brackets are added to the footnote marker:
In 1319, Sweden and Norway were united under King Magnus Eriksson. In 1349, the Black Death killed between 50% and 60% of the population,[17] resulting in a period of decline, both socially and economically.
...
Also, the back reference character has been removed in the above presentation. This shows how the proposed markup allows for presentational flexibility not offered by the current markup.
Reducing the number of elements and attributes may seem like an interesting, but useless exercise in the days of cheap memory and fast connections. Still, when multiplied by a vast number of articles and page requests, I believe the gains are significant. Along with other optimizations, Wikipedia's bandwidth can be improved, and more articles can fit into smaller machines.
Also, the presentational flexibility offered by the proposed markup is, in itself, a reason to use it.
Comments from Reisio improved this paper.