Tetropy

by Tina Holmboe 18^th of June 2012 archive

Abstract

In this companion piece to the upcoming article on EPUB creation, we’ll discuss the concept of using easily written markup languages for e–book creation and Tetropy in particular.

Introduction

Creating an electronic book is in many ways similar to creating a printed ditto. Once the content is in place, a manuscript will be prepared. That manuscript, after one or more iterations during which editors and author work to iron out difficulties, is then sent to the printer.

In the electronic world the printer is a specialist who use a digital tool to create an e–book. Once again the manuscript go through a number of iterations, where draft copies are sent back to the author, and the editors, and changes made.

One way of streamlining this process is to ensure that the source material is in a format conductive to fast, consistent processing.

A manuscript should be written in such a way that it can be automatically converted into an e–book. The less manual handling the better, as each step that involve a human operator will add to the processing time, and hence the cost.

Some e–book formats such as EPUB lend themselves very well to this practise. More on that in a later article.

This leaves the question of how to write the manuscript so that this automatic processing can be done. We’ve got a modest suggestion for a language which fit the purpose.

Please note that there are several different styles in use for manuscript formatting, and that the language that follows is an amalgamation of several systems — and, of course, an entirely new system in its own right. The described methodology, above, is also variable, and change between publishers.

The Gory Details

Tetropy, from Greytower Technologies, is a markup language related to Markdown and Wikitext. Like HTML it uses plain text in which certain well–defined symbols — text strings — are embedded to indicate structure.

The main difference between Tetropy and the HTML–family of markup languages is the simplicity — whereas HTML or XHTML would use:

<p>
 To Sherlock Holmes she is always THE woman. I have seldom heard
 him mention her under any other name. In his eyes she eclipses
 and predominates the whole of her sex.
</p>

<p>
 All emotions, and that one particularly, were abhorrent to his cold,
 precise but admirably balanced mind. He was, I take it, the most
 perfect reasoning and observing machine that the world has seen, but
 as a lover he would have placed himself in a false position.
</p>

Tetropy would use:

 To Sherlock Holmes she is always THE woman. I have seldom heard
 him mention her under any other name. In his eyes she eclipses
 and predominates the whole of her sex.

 All emotions, and that one particularly, were abhorrent to his cold,
 precise but admirably balanced mind. He was, I take it, the most
 perfect reasoning and observing machine that the world has seen, but
 as a lover he would have placed himself in a false position.

This, by necessity, makes it a less capable language than HTML, but in return the source code is eminently readable and very, very easy to write. The source of a Tetropy document will look very similar to a newspaper article, or a page from a book.

Plain text source documents, for example from Project Gutenberg, can subsequently be transformed into reasonably good — and accessible — markup very quickly. Some hands–on post–processing will be required for perfection, if indeed that is what you seek.

As you can imagine, this format is not designed for web applications or splash–pages, but rather for processing large amounts of text into reflowable markup for e–books, articles, and so forth.

Primary Syntax

Tetropy can be written in one of three ways, or in a combination of them. The primary syntax, so named because it is the one most users will see and take advantage of, is designed to be simple, and to resemble “normal” text as much as possible.

A sentence is a sequence of characters which end in punctuation (dot, comma, exclamation mark, question mark, or right parenthesis). A paragraph is one or more sentences; paragraphs are separated from each other by an empty line.

Example:

 Tetropy can be written in one of three ways, or a combination of
 them. The primary syntax, so named because it is the one most users
 will see and take advantage of, is designed to be simple, and to
 resemble "normal" text as much as possible.

 A sentence is a sequence of characters which end in punctuation
 (dot, comma, exclamation mark, question mark, or right parenthesis). A
 paragraph is one or more sentences; paragraphs are separated from each
 other by an empty line.

A section heading is a line which does not end in punctuation, and which is not part of a paragraph — a line standing alone, in other words.

Example:

Primary Syntax

Tetropy can be written in one of three ways, or a combination of
them. The primary syntax, so named because it is the one most users
will see and take advantage of, is designed to be simple, and to
resemble "normal" text as much as possible.

The phrase “Primary Syntax” will be deduced a heading, and marked up as such.

URIs are automatically detected, including images and e–mail addresses — just type these normally, and the proper markup will be constructed. An URI such as http://www.google.com will subsequently turn into a link: http://www.google.com — see the section on “Secondary Syntax” for more details on links, including details on how you can prevent them from being turned into markup, how you can add titles, embed images and how you can supply class and ID information.

Simple data tables are written as you would expect, with data in lines and rows. All you need remember is to start each line with a tab, and place another tab between each “cell”.

The horizontal size of each tab does not matter, but make sure that your text editor does not turn them into sequences of regular space characters.

This example:

	a	b	c	d	e
	f	g	h	i	j
	k	l	m	n	o
	p	q	r	s	t

will result in this HTML table:

a	b	c	d	e
f	g	h	i	j
k	l	m	n	o
p	q	r	s	t

Lists follow the same principle as paragraphs, headers, and tables: write them as you normally would, one item per line. Use “*” as the bullet:

 * Banana
 * Carrots
 * Blueberries

* Banana * Carrots * Blueberries

or, if you need a numbered list:

 1. Banana
 2. Carrots
 3. Blueberries

Banana
Carrots
Blueberries

A sequence of five, or more, hyphen or underscore characters will form a horizontal divider. This:


-----

becomes this:

E–book Specific Markup

Tetropy has a growing number of book — or manuscript — specific forms of markup. A sequence of three number signs (hash marks) alone on a line signify a scene change. As a rule, Tetropy use two–character codes, but in this case we’ve decided to introduce an exception for clarity.

###

As no equivalent markup exist in XHTML, it will most likely be expanded into a construction such as this:

 <p class="scene_change">&nbsp;</p>

Many manuscript editors use underlined text to indicate that a specific portion of content is a character’s literal thoughts. Tetropy use faux underlines to achieve the same:

"I wonder what that fellow is looking for?" I asked, pointing to a
stalwart, plainly-dressed individual who was walking slowly down the
other side of the street, looking anxiously at the numbers. He had a
large blue envelope in his hand, and was evidently the bearer of a
message.

"You mean the retired sergeant of Marines," said Sherlock Holmes.

__"Brag and bounce!"__ thought I to myself. __"He knows that I cannot verify his guess."__

In this case, the sections indicated will generate:

 <p>
  "I wonder what that fellow is looking for?" I asked, pointing to a
  stalwart, plainly-dressed individual who was walking slowly down the
  other side of the street, looking anxiously at the numbers. He had a
  large blue envelope in his hand, and was evidently the bearer of a
  message.
 </p>

 <p>
  "You mean the retired sergeant of Marines," said Sherlock Holmes.
 </p>

 <p>
  <span class="literal_thought">"Brag and bounce!"</span> thought I to
  myself. <span class="literal_thought">"He knows that I cannot verify
  his guess."</span>
 </p>

It is then up to the associated stylesheet to render thoughts in a suitable fashion — often this is done using italics.

Raw Markup

You can also use markup directly. An implementation of Tetropy should avoid converting HTML.

The PRE element is considered special, but differ somewhat from it’s HTML cousin. In Tetropy, PRE signify content which is also code – so that it is possible to, for example, include literal markup.

Secondary Syntax

The secondary syntax is a tad more complicated, and requires you to remember a number of so–called meta–characters. I won’t use that term ever again :)

Inside a paragraph, or a header, you can add meaning to words or phrases by using the following techniques:

 //phrase//            - The phrase is emphasised
 **phrase**            - The phrase is strongly emphasised
 = phrase =            - A first level heading
 == phrase ==          - A second level heading
 === phrase ===        - A third level heading
 ==== phrase ====      - A fourth level heading
 ===== phrase =====    - A fifth level heading
 ====== phrase ======  - A sixth level heading
 ||phrase||            - The phrase is marked as 'deleted text'
 ++phrase++            - The phrase is marked as 'inserted text'
 %%phrase%%            - The phrase is marked as a citation
 __phrase__            - The phrase is a 'literal thought'
 ~~phrase~~            - The phrase is subscripted
 ^^phrase^^            - The phrase is superscripted
 ``phrase``            - The phrase is code

This is a mostly useless paragraph containing a number of ~~phrases~~ marked up using secondary syntax or Tetropy’s phrase markup syntax; specifically _all of ^them after another. It’s followed by six hyphens in a row to create a horizontal divider.

Example:

 This //is// a **mostly** useless paragraph containing a number of
 ||phrases|| marked up using ++secondary++ syntax or %%Tetropy's%%
 phrase markup syntax; specifically ~~all~~ of ^^them^^ after
 ``another``. It's followed by six hyphens in a row to create a
 horizontal divider.

Links can be controlled in more detail by prefixing them with + and –. If you, for example, write an image URI as +URI, then code will be created which embed the image instead of linking to it. If you write any URI as –URI, then it’ll be included literally, and not converted to a link.

Example:

http://www.greytower.net                           - creates a link
-http://www.greytower.net                          - is used literally
http://www.greytower.net/images/wcag1AAA-blue.png  - creates a link
+http://www.greytower.net/images/wcag1AAA-blue.png - embeds the image

In the above, http://www.greytower.net/images/wcag1AAA-blue.png will create a link to the image, while adding a ‘+’ in front of the URI will embed it:

Abbreviations, Acronyms, Quotations, and so forth

Certain constructions used within paragraphs are of a more complex nature. Take, for example, the abbreviation “URI”, used above. It expands to the phrase “Uniform Resource Identificator”, and should be properly written as

<abbr title="Uniform Resource Identificator">URI</abbr>

In Tetropy, you’d write:

 An URI (+Uniform Resource Identificator+) is a string which ...

or, in other words, the way you would usually write an abbreviation and its meaning in regular text, but with the addition of the “+” character inside the parenthesis.

You can do the same for acronyms:

 I've spent some time studying
 COBOL (-Common Business Oriented Language-)

and inline quotations:

 Do you recall the line "the mirror crack'd" (*http://en.wikipedia.org/wiki/The_Lady_of_Shalott*) 
 by that famous poet Whatshisname?

Tables

Table headers can be created by adding a circumflex — or tophat – character before and after the content of each cell you want to make into a header.

Example:

	^a^	^b^	^c^	^d^	^e^
	f	g	h	i	j
	k	l	m	n	o
	p	q	r	s	t

will produce:

a	b	c	d	e
f	g	h	i	j
k	l	m	n	o
p	q	r	s	t

Tertiary — or Advanced — Syntax

Advanced Usage: Links

If you wish even further control of links, then this is also possible. The general syntax is as follows:

(([URI][Link Text][Attributes]))

  will generate <a href="URI" Attributes>Link Text</a>



  will generate <img src="URI" alt="alt" Attributes>

  while this



  will generate <a href="URI 2"><img src="URI 1" Attributes></a>

You can, for example, embed an image and include both an ALT text and style information:

(({http://www.greytower.net/images/wcag1AAA-blue.png} {} {alt="WCAG 1.0 'AAA' Compliant" style="display: block ; margin-left: auto ; margin-right: auto ;"}))

which will yield: WCAG 1.0 'AAA' Compliant

Advanced Usage: Tables

Tetropy follows a similar philosophy as Dokuwiki when it comes to tables. Each cell is separated from the next by way of a pipe (“|”) symbol; each line from the next by one new line (not two!).

Example:

|  ^Heading 1^  |  ^Heading 2^  |  ^Heading 3^ |
| Row 1 Col 1   | Row 1 Col 2   | Row 1 Col 3  |
| Row 2 Col 1   | Row 2 Col 2   | Row 2 Col 3  |
| Row 3 Col 1   | Row 3 Col 2   | Row 3 Col 3  |

will produce:

Heading 1	Heading 2	Heading 3
Row 1 Col 1	Row 1 Col 2	Row 1 Col 3
Row 2 Col 1	Row 2 Col 2	Row 2 Col 3
Row 3 Col 1	Row 3 Col 2	Row 3 Col 3

Advanced Usage: classes and IDs

In some circumstances you might want to add classes and/or ids to elements. This can be done as follows:

 =={foo} This is a header ==

will produce:

 <h2 id="foo">This is a header</h2>

and:

 (bar)To Sherlock Holmes she is always THE woman. I have seldom heard
 him mention her under any other name. In his eyes she eclipses
 and predominates the whole of her sex.

would produce:

 <p class="bar">
  To Sherlock Holmes she is always THE woman. I have seldom heard
  him mention her under any other name. In his eyes she eclipses
  and predominates the whole of her sex.
 </p>

Advanced Usage: HTML

Tetropy will do its very best to recognise, and avoid touching, markup. Even so we suggest you use as little of it as possible, as it may just interfere with the selected output filter.

Using elements from HTML 3.2, for example, which doesn’t exist in XHTML 1.1 would have consequences for the output, and would in a worst case scenario make the markup non–wellformed.

Or just plain ugly …

Output

The exact markup produced from a Tetropy–format document depends on the output–filter chosen — and, of course, the parser. At the moment our reference parser can write HTML 4.01 Strict, XHTML 1.1 and EPUB 2 – the latter being a specialised form of XHTML.

The default output is HTML 4.01 Strict, where paragraphs create P–elements, emphasis EM, inline quotations Q, and so forth.

For XHTML 1.1, underlining is transformed into a SPAN element (in lowercase) with a STYLE attribute. Tetropy will attempt to follow the specifications it support as closely as possible, both syntactically and semantically.

The THEAD and TBODY elements are produced for both HTML 4.01 Strict and XHTML 1.1 — if any TH elements are produced.

An empty ALT–text is added to IMG elements if none is present.

As far as possible, the reference parser attempt to produce accessible markup. You have noted, of course, that there are no elements of physical markup for font changes or colour in Tetropy. All stylistic information is assumed to be in stylesheets.

Notes

If the text file uses tab or white–space indentation to separate paragraphs, this will be normalised.
Normalisation of indentation will not be done if the text contains raw markup.

Why “Tetropy”?

It’s a pun. Seriously. Don’t worry about it. Here, take a cookie. I promise, by the time you’re done eating it, you’ll feel as right as rain.

Conclusion

What, then, are the advantages to this type of markup language?

It’s simple. The resulting document will look very much like a traditional manuscript as sent to a publisher.
It’s not tied to a certain application or platform. You can write Tetropy in whichever editor or word processor you prefer, as long as it can save as plain UTF–8 text.
The manuscript, once done, can be used again and again to produce a number of different formats — HTML, XHTML, etc. — without conversion.
The manuscript is future–proof to a large degree — it’s plain text, nothing more, nothing less.
Did we remember to mention that it is simple? :) Easy to write, easy to read, easy to understand. Most of the codes involved you are likely already familiar with from other contexts.

It is this very simplicity — the fact that you can write an entirely book without doing anything more than put a blank line between paragraphs — which has made Tetropy our format of choice for articles, news, blog entries and internal documentation.

It could also work very well for books.