Readability of textual content
Readability in an Accessibility Context
Readability
Among the more obscure and little known accessibility guidelines are those which recommend improving the readability of content. Both WCAG 1.0 and 2.0 carry such advice in checkpoints 14.1 and 3.1.5 respectively.
The intent behind these guidelines is to encourage authors to write in such a way as to improve the possibility of comprehension by readers. It is not a suggestion to “dumb down” content, but rather to avoid unnecessary complexity by making the text easier to “scan”.
In a larger context, readability studies include fields such as legibility of print, reader motivation, and reading conditions. For the most part such areas are difficult to measure on the web; it is, for example, impossible to guarantee the use of a specific typeface, no matter how well chosen. It might simply not exist on the client browser.
What we can measure, with some degree of objectivity, is the complexity of content structure — sentences, phrases and words. The basic theory is sound, and revolve around the way humans process information by pattern matching.
Under normal circumstances, a human will read by scanning lines of text. Imagine a window moving along the line, showing a number of words at a time. The window is moved 8 to 10 times, and each pause between movements last approximately 0.2 seconds.
When the sentence structure is complicated it is often necessary to move the window backwards, what some call a “regression”, in order to fully assimilate the content. This might happen with texts containing compound–complex sentences, such as in this example:
In America everybody is of the opinion that he has no social superiors, since all men are equal, but he does not admit that he has no social inferiors, for, from the time of Jefferson onward, the doctrine that all men are equal applies only upwards, not downwards. (Bertrand Russell)
The more complex the structure, the higher a “reading age” is required, lest comprehension is hampered by sheer difficulty in scanning the text.
A more thorough examination of this subject, including references to supporting research, can be found in the article Readability by Chris & Keith Johnson.
It is well worth noting that this article warns against human testing of structural complexity and mention that teachers in studies have been shown to misjudge the reading age of a text with as much as eight years, possibly due to their own extensive reading competence.
Other authors warn against uncritical use of automated testing, a position Greytower share.
When working with individual texts nothing can beat the human eye. It is relatively easy for a person fluent in a given language to determine whether it is hard to read specific content. When given the task of testing hundreds, or perhaps thousands, of pages human analysis becomes error–prone, and a tool is required.
Automatic Testing in siteSifter
siteSifter is designed to test multiple documents, and to apply a number of different analysis algorithms to each. The first thing to do is subsequently to pick one, or more, readability tests to apply. This was not a trivial task — by some sources over 200 different algorithms exists for this purpose.
Luckily one of them is “local” to our main development office: the LIX, or LäsbarhetsIndeX; a Swedish method constructed in 1968 by C.H. Björnsson.
The index is calculated by first establishing the number of words in the text (O), the number of long words (L, defined as words with more than six characters) and the number of sentences (M). The LIX value is then found by the function
Lm = O / M Lo = L / O * 100 Lix = Lm + Lo
The resulting number is compared to a translation table, normally adjusted according to the type of text tested. Since it is virtually impossible for an automated tool to tell the difference between a children’s book and a scientific discourse, we use a standard table as follows:
<table> <thead> <tr> <th>LIX</th> <th>Readability</th> <th>Example</th> </tr> </thead> <tbody> <tr> <td>< 30</td> <td>Very low / Easily read</td> <td>Children’s books</td> </tr> <tr> <td>30 — 40</td> <td>Normal / Easily read</td> <td>Fiction, popular magazines</td> </tr> <tr> <td>40 — 50</td> <td>Medium difficulty / Medium hard to read</td> <td>Normal newspaper texts</td> </tr> <tr> <td>50 — 60</td> <td>Difficult / Hard to read</td> <td>Government documents</td> </tr> <tr> <td>> 60</td> <td>Very difficult / Very difficult to read</td> <td>Scientific papers, research notes</td> </tr> </tbody> </table>
The first section of this article measure out at 43/44, or a medium difficult text, equivalent to normal newspaper content.
One advantage of the LIX index is that it is not dependent on the language used — it “only” measures sentence complexity. Many other tests, perhaps the majority, are designed to operate on English only. An example would be the Gunning–Fog Index from 1952 which relies, among other things, on counting syllables.
The following quote from “A Study in Scarlet” (Conan–Doyle, 1887) scores 18.5 on the Gunning–Fog Index, using a revised algorithm from 1980:
The reader may set me down as a hopeless busybody, when I confess how much this man stimulated my curiosity, and how often I endeavoured to break through the reticence which he showed on all that concerned himself. Before pronouncing judgment, however, be it remembered, how objectless was my life, and how little there was to engage my attention. My health forbade me from venturing out unless the weather was exceptionally genial, and I had no friends who would call upon me and break the monotony of my daily existence. Under these circumstances, I eagerly hailed the little mystery which hung around my companion, and spent much of my time in endeavouring to unravel it.
On the same sample, LIX yields a value of 53 — i.e. quite difficult to read. Most people would likely agree. Our second example is taken from “The Tale of Jimmy Rabbit” (Bailey, 1916):
Jimmy Rabbit was very busy. He was getting ready for May Day. And he intended to hang two May baskets. One of them was already finished, and filled with things that Jimmy himself liked - such as strips of tender bark from Farmer Green's young fruit trees, and bits of turnip from his vegetable cellar. You might almost think that Farmer Green himself ought to have hung that basket. But Jimmy Rabbit never once thought of such a thing. He expected to hang it on the door of a neighbor's house, where there lived a young girl-rabbit. Jimmy had made that basket the best he knew how.
This text score a 26 on the LIX test (very easily read), and a 6.8 on the Gunning–Fog Index. We conclude, based on these simple comparisons and on other material, that the Swedish algorithm will give a reasonable indicator on how readable material may be.
Again we wish to stress that as no readability algorithm measure the quality of content, but only very simple factors such as the ratio of words to sentences, siteSifter will only flag for human review of documents which score too high in these tests. Reality is far too complicated for us to give a straight yes or no answer as to whether a text is truly readable.
Implementation
In order to extract a suitable section of text, we do some preparatory work on the webpage. Many algorithms are designed to operate on 100–word chunks, but this is not a limitation we consider in our implementation.
First, to avoid a number of short words and phrases normally used in various LINK elements, we extract the body content only, by way of the simple regular expression
s#.*?<\s*body.*?>(.*?)</body>.*#$1#gs
In normal, everyday cases, a text is made more easy to read by breaking it up into parts, often separated by headers and vertical space. These items give the eye a break, and help us refocus. With webpages the text may be accessed without headers, and since we are regardless aiming for the worst–case scenario, we remove them:
s#<\s*h[(1-6)].*?>.*?</h[1-6]>##gs
Next we remove all occurrences of EMBED, OBJECT, SCRIPT, MARQUEE, and APPLET. These elements are all designed to embed external content, and rarely used for plain text. In addition they are only used correctly — i.e. with proper alternative content — in a extremely low number of cases, so we can safely ignore them.
Form controls and tables are removed next. Both constructs will, in normal use, represent short bursts of content meant to be read in conjunction with other structures such as table headers and control labels.
Finally we replace lists — UL, OL and DL — with comment–based placeholders on the form:
<!-- placeholder elementname-index -->
This is done to, once more, remove short bursts of content such as is commonly kept in lists. Should this later prove to reduce the content to an unacceptable level, the lists will be restored.
When the markup is prepared, we convert it to plain text using the Lynx browser. There are two primary reason for this choice: the browser is used real–life, and very well adapted to handle even very poor quality markup, and it is exceptionally good at creating plain text versions of HTML.
lynx -nolist -force_html -dump -stdin
The output from Lynx is then used as the basis for a LIX calculation. Should the resulting score be higher than 50 the document in question is flagged as possibly in violation of the associated checkpoint, and left for the user to review.
Final Words
The checkpoint described above is only one of a great many tests which siteSifter perform on websites daily. Some of these will flag items for human review; some will make objective and precise determinations on whether a document pass or not.
Although no readability algorithm can decide the quality of content, we find the LIX tests a valuable tool for getting a rough idea of whether or not a page needs to be manually examined.
This, in particular, apply to situations where the team performing the accessibility review is not the same as the one publishing content, or where the review includes a great number of pages.
Notes
- The algorithms described are not applicable to ideogrammatic writing systems such as Chinese.
<li>Even tho Success Criterion 3.1.5 in WCAG 2.0 suggest that “proper names” can be removed before testing, we do not do so. This is due to the fact that the majority of testing algorithms are not built with such a criteria in place.</li> </ul>
Readability in an Accessibility Context
Readability
Among the more obscure and little known accessibility guidelines are those which recommend improving the readability of content. Both WCAG 1.0 and 2.0 carry such advice in checkpoints 14.1 and 3.1.5 respectively.
The intent behind these guidelines is to encourage authors to write in such a way as to improve the possibility of comprehension by readers. It is not a suggestion to “dumb down” content, but rather to avoid unnecessary complexity by making the text easier to “scan”.
In a larger context, readability studies include fields such as legibility of print, reader motivation, and reading conditions. For the most part such areas are difficult to measure on the web; it is, for example, impossible to guarantee the use of a specific typeface, no matter how well chosen. It might simply not exist on the client browser.
What we can measure, with some degree of objectivity, is the complexity of content structure — sentences, phrases and words. The basic theory is sound, and revolve around the way humans process information by pattern matching.
Under normal circumstances, a human will read by scanning lines of text. Imagine a window moving along the line, showing a number of words at a time. The window is moved 8 to 10 times, and each pause between movements last approximately 0.2 seconds.
When the sentence structure is complicated it is often necessary to move the window backwards, what some call a “regression”, in order to fully assimilate the content. This might happen with texts containing compound–complex sentences, such as in this example:
In America everybody is of the opinion that he has no social superiors, since all men are equal, but he does not admit that he has no social inferiors, for, from the time of Jefferson onward, the doctrine that all men are equal applies only upwards, not downwards. (Bertrand Russell)
The more complex the structure, the higher a “reading age” is required, lest comprehension is hampered by sheer difficulty in scanning the text.
A more thorough examination of this subject, including references to supporting research, can be found in the article Readability by Chris & Keith Johnson.
It is well worth noting that this article warns against human testing of structural complexity and mention that teachers in studies have been shown to misjudge the reading age of a text with as much as eight years, possibly due to their own extensive reading competence.
Other authors warn against uncritical use of automated testing, a position Greytower share.
When working with individual texts nothing can beat the human eye. It is relatively easy for a person fluent in a given language to determine whether it is hard to read specific content. When given the task of testing hundreds, or perhaps thousands, of pages human analysis becomes error–prone, and a tool is required.
Automatic Testing in siteSifter
siteSifter is designed to test multiple documents, and to apply a number of different analysis algorithms to each. The first thing to do is subsequently to pick one, or more, readability tests to apply. This was not a trivial task — by some sources over 200 different algorithms exists for this purpose.
Luckily one of them is “local” to our main development office: the LIX, or LäsbarhetsIndeX; a Swedish method constructed in 1968 by C.H. Björnsson.
The index is calculated by first establishing the number of words in the text (O), the number of long words (L, defined as words with more than six characters) and the number of sentences (M). The LIX value is then found by the function
Lm = O / M Lo = L / O * 100 Lix = Lm + Lo
The resulting number is compared to a translation table, normally adjusted according to the type of text tested. Since it is virtually impossible for an automated tool to tell the difference between a children’s book and a scientific discourse, we use a standard table as follows:
<table> <thead> <tr> <th>LIX</th> <th>Readability</th> <th>Example</th> </tr> </thead> <tbody> <tr> <td>< 30</td> <td>Very low / Easily read</td> <td>Children’s books</td> </tr> <tr> <td>30 — 40</td> <td>Normal / Easily read</td> <td>Fiction, popular magazines</td> </tr> <tr> <td>40 — 50</td> <td>Medium difficulty / Medium hard to read</td> <td>Normal newspaper texts</td> </tr> <tr> <td>50 — 60</td> <td>Difficult / Hard to read</td> <td>Government documents</td> </tr> <tr> <td>> 60</td> <td>Very difficult / Very difficult to read</td> <td>Scientific papers, research notes</td> </tr> </tbody> </table>
The first section of this article measure out at 43/44, or a medium difficult text, equivalent to normal newspaper content.
One advantage of the LIX index is that it is not dependent on the language used — it “only” measures sentence complexity. Many other tests, perhaps the majority, are designed to operate on English only. An example would be the Gunning–Fog Index from 1952 which relies, among other things, on counting syllables.
The following quote from “A Study in Scarlet” (Conan–Doyle, 1887) scores 18.5 on the Gunning–Fog Index, using a revised algorithm from 1980:
The reader may set me down as a hopeless busybody, when I confess how much this man stimulated my curiosity, and how often I endeavoured to break through the reticence which he showed on all that concerned himself. Before pronouncing judgment, however, be it remembered, how objectless was my life, and how little there was to engage my attention. My health forbade me from venturing out unless the weather was exceptionally genial, and I had no friends who would call upon me and break the monotony of my daily existence. Under these circumstances, I eagerly hailed the little mystery which hung around my companion, and spent much of my time in endeavouring to unravel it.
On the same sample, LIX yields a value of 53 — i.e. quite difficult to read. Most people would likely agree. Our second example is taken from “The Tale of Jimmy Rabbit” (Bailey, 1916):
Jimmy Rabbit was very busy. He was getting ready for May Day. And he intended to hang two May baskets. One of them was already finished, and filled with things that Jimmy himself liked - such as strips of tender bark from Farmer Green's young fruit trees, and bits of turnip from his vegetable cellar. You might almost think that Farmer Green himself ought to have hung that basket. But Jimmy Rabbit never once thought of such a thing. He expected to hang it on the door of a neighbor's house, where there lived a young girl-rabbit. Jimmy had made that basket the best he knew how.
This text score a 26 on the LIX test (very easily read), and a 6.8 on the Gunning–Fog Index. We conclude, based on these simple comparisons and on other material, that the Swedish algorithm will give a reasonable indicator on how readable material may be.
Again we wish to stress that as no readability algorithm measure the quality of content, but only very simple factors such as the ratio of words to sentences, siteSifter will only flag for human review of documents which score too high in these tests. Reality is far too complicated for us to give a straight yes or no answer as to whether a text is truly readable.
Implementation
In order to extract a suitable section of text, we do some preparatory work on the webpage. Many algorithms are designed to operate on 100–word chunks, but this is not a limitation we consider in our implementation.
First, to avoid a number of short words and phrases normally used in various LINK elements, we extract the body content only, by way of the simple regular expression
s#.*?<\s*body.*?>(.*?)</body>.*#$1#gs
In normal, everyday cases, a text is made more easy to read by breaking it up into parts, often separated by headers and vertical space. These items give the eye a break, and help us refocus. With webpages the text may be accessed without headers, and since we are regardless aiming for the worst–case scenario, we remove them:
s#<\s*h[(1-6)].*?>.*?</h[1-6]>##gs
Next we remove all occurrences of EMBED, OBJECT, SCRIPT, MARQUEE, and APPLET. These elements are all designed to embed external content, and rarely used for plain text. In addition they are only used correctly — i.e. with proper alternative content — in a extremely low number of cases, so we can safely ignore them.
Form controls and tables are removed next. Both constructs will, in normal use, represent short bursts of content meant to be read in conjunction with other structures such as table headers and control labels.
Finally we replace lists — UL, OL and DL — with comment–based placeholders on the form:
<!-- placeholder elementname-index -->
This is done to, once more, remove short bursts of content such as is commonly kept in lists. Should this later prove to reduce the content to an unacceptable level, the lists will be restored.
When the markup is prepared, we convert it to plain text using the Lynx browser. There are two primary reason for this choice: the browser is used real–life, and very well adapted to handle even very poor quality markup, and it is exceptionally good at creating plain text versions of HTML.
lynx -nolist -force_html -dump -stdin
The output from Lynx is then used as the basis for a LIX calculation. Should the resulting score be higher than 50 the document in question is flagged as possibly in violation of the associated checkpoint, and left for the user to review.
Final Words
The checkpoint described above is only one of a great many tests which siteSifter perform on websites daily. Some of these will flag items for human review; some will make objective and precise determinations on whether a document pass or not.
Although no readability algorithm can decide the quality of content, we find the LIX tests a valuable tool for getting a rough idea of whether or not a page needs to be manually examined.
This, in particular, apply to situations where the team performing the accessibility review is not the same as the one publishing content, or where the review includes a great number of pages.
Notes
- The algorithms described are not applicable to ideogrammatic writing systems such as Chinese.
<li>Even tho Success Criterion 3.1.5 in WCAG 2.0 suggest that “proper names” can be removed before testing, we do not do so. This is due to the fact that the majority of testing algorithms are not built with such a criteria in place.</li> </ul>