Discount

Discount is free software released under the terms of a BSD-style license.

If you find it useful, please consider making a contribution to help support onward development.

donate?

download

Discount 2.2.7d, released 22-Aug-2023

description

This is my implementation of John Gruber’s Markdown text to html language. There’s not much here that differentiates it from any of the existing Markdown implementations except that it’s written in C instead of one of the vast flock of scripting languages that are fighting it out for the Perl crown.

Markdown provides a library that gives you formatting functions suitable for marking down entire documents or lines of text, a command-line program that you can use to mark down documents interactively or from a script, and a tiny (3 programs so far) suite of example programs that show how to fully utilize the markdown library.

My markdown also does, by default, various smartypants-style substitutions.

The program

The markdown program is a trivial compiler that reads in a markdown file and writes out a html document or – if you use the -d flag – an outline showing the parse tree. It does have a few options;

-d : is, as previously mentioned, the flag that makes markdown produce a parse tree instead of a html document. -F <flags> : sets various flags that change how markdown works. The flags argument is a somewhat less than obvious bitmask – for example, -F 0x4 tells markdown to not do the smartypants translations on the output. (there are cases – like running the test suite – where this is a useful feature.) -f <flags> : sets various flags that change how markdown works. Unlike -F, these are the names of the flags (you can get a list of the supported flags with the -f? option; supported flags + synonyms with -Vf? ) optionally prefixed with no or - to turn them off. To reuse the example to disable smartypants, you’d do -f nopants (“pants” is a synonym for “smarty” == smartypants.) -o file : tells markdown to write the output to file -V : tells you a markdown version number and how the package was configured. For example

    $ markdown -V
    markdown: discount 2.2.2 TAB=8 DEBUG

tells you that this is markdown 2.2.2, and that the package
was configured with support for sensible tabs & debugging
malloc.

-VV : is like -V, except it also returns the current values of many of the flags that can be set with -f or -F.

The library

There are 17 public functions in the markdown library, broken into three categories:

Input functions

  1. MMIOT *mkd_in(FILE *f, int flags) reads a markdown input file and returns a MMIOT containing the preprocessed document. (which is then fed to markdown() for final formatting.)

  2. MMIOT *mkd_string(char *bfr, int size, int flags) reads the markdown input file that’s been written into bfr and returns a preprocessed blob suitable for feeding to markdown(). This function exists because annotations uses mmap() to access message files instead of traditional file i/o. (If you’re going to port Markdown to an AS/400, this function is the droid you’ve been looking for.)

“Big Picture”-style processing functions

  1. int markdown(MMIOT *doc, FILE *out, int flags) formats a document (created with mkd_in() or mkd_string()) and writes the resulting HTML document to out.

  2. int mkd_line(char *bfr, int size, char **out, int flags) allocates a buffer, then formats the text string into that buffer. text string, allocates a buffer, The differences from markdown() are it doesn’t support quoting, footnotes (“reference links”,) multiple paragraphs, lists, code sections, or pure html sections.

  3. int mkd_generateline(char*bfr, int size, FILE *out, int flags) formats the text string and writes the resulting HTML fragment to out. It is exactly like mkd_line() except that it writes the output to a FILE*.

Fine-grained access to the internals

  1. int mkd_compile(MMIOT *doc, int flags) takes a document created by mkd_in() or mkd_string() and compiles it into a tree of block elements.

  2. int mkd_generatehtml(MMIOT *doc, FILE *out) generates html from a compiled document.

  3. int mkd_document(MMIOT *doc, char **text) returns (in text) a pointer to the compiled html document, and (in the return code) the size of that document.

  4. int mkd_css(MMIOT *doc, char **out) allocates a buffer and populates it with any style blocks found in the document.

  5. int mkd_generatecss(MMIOT *doc, FILE *out) prints any style blocks in the document.

  6. int mkd_toc(MMIOT *doc, char **out) allocates a buffer, populates it with a table of contents, assigns it to out, and returns the length of the buffer.

    To get a table of contents, you must compile() the document with the MKD_TOC flag (described below)

  7. int mkd_generatetoc(MMIOT *doc, FILE *out) writes a table of contents to out; other than writing to a FILE*, it operates exactly like mkd_toc()

  8. int mkd_dump(MMIOT *doc, FILE *f, int flags, char *title) prints a block structure diagram of a compiled document.

  9. void mkd_cleanup(MMIOT *doc) releases the MMIOT allocated for the document.

Document header access functions

  1. char *mkd_doc_title(MMIOT *doc) returns the % title line.
  2. char *mkd_doc_author(MMIOT *doc) returns the % author(s) line.
  3. char *mkd_doc_date(MMIOT *doc) returns the % date line.

Url callback functions

  1. void mkd_e_url(MMIOT*, char* (callback)(char*,int,void*)) sets up a callback function that is called whenever discount processes a []() or <link> construct. The callback function is passed a pointer to the url, the size of the url, and a data pointer (null or supplied by mkd_e_data())
  2. void mkd_e_flags(MMIOT*, char *(callback)(char*,int,void*)) sets up a callback to provide additional arguments to the tags generated by []() and <link> constructs. If, for instance, you wanted to add target="_blank" to every generated url, you could just make a callback function that returned that string.
  3. `void mkd_e_code(MMIOT, char (callback)(char,int,void))‘ sets up a callback to format the contents of a code block.
  4. void mkd_e_free(char *, void*) is called to free any allocated memory returned by the url or flags callbacks.
  5. void mkd_e_data(MMIOT*, void*) assigns a callback data area to the url & flags callbacks.

The flags argument in markdown(), mkd_text(), mkd_in(), mkd_string(), mkd_compile(), and mkd_generatehtml() is a mask of the following flag bits:

Flag Action
MKD_NOLINKS Don’t do link processing, block <a> tags
MKD_NOIMAGE Don’t do image processing, block <img>
MKD_NOPANTS Don’t run smartypants()
MKD_NOHTML Don’t allow raw html through AT ALL
MKD_STRICT Disable SUPERSCRIPT, RELAXED_EMPHASIS
MKD_TAGTEXT Process text to go inside an html tag; no emphasis or html expansion & embedded html will be stripped out.
MKD_NO_EXT Don’t allow pseudo-protocols
MKD_CDATA Generate code for xml ![CDATA[...]]
MKD_NOSUPERSCRIPT No A^B
MKD_NORELAXED Emphasis happens everywhere
MKD_NOTABLES Don’t process PHP Markdown Extra tables.
MKD_NOSTRIKETHROUGH Forbid ~~strikethrough~~
MKD_TOC Do table-of-contents processing
MKD_1_COMPAT Compatability with MarkdownTest_1.0
MKD_AUTOLINK Make http://foo.com a link even without <>s
MKD_SAFELINK Paranoid check for link protocol
MKD_NOHEADER Don’t process document headers
MKD_TABSTOP Expand tabs to 4 spaces
MKD_NODIVQUOTE Forbid >%class% blocks
MKD_NOALPHALIST Forbid alphabetic lists
MKD_NODLIST Forbid definition lists
MKD_EXTRA_FOOTNOTE Enable PHP Markdown Extra-style footnotes (warning: not the later version of multiple-paragraph ones.)
MKD_NOSTYLE Don’t extract <style> blocks
MKD_NODLDISCOUNT Disable discount-style definition lists
MKD_DLEXTRA Enable PHP Markdown Extra definition lists
MKD_FENCEDCODE Enable Github-style fenced code blocks.
MKD_GITHUBTAGS Allow dashes & underscores in element names
MKD_HTML5ANCHOR Use the html5 namespace for anchor names
MKD_LATEX Enable embedded LaTeX (mathjax-style)
MKD_EXPLICITLIST Don’t merge adjacent numbered/bulleted lists

Language bindings

I have an experimental C++ binding that lives on Github in mkdio.h++. It implements a couple of RAII objects; MKIOT – can’t call the class MMIOT because it clashes with the C MMIOT it wraps – for standard markdown (plus my extensions, of course) and GFIOT for github-flavo(u)red markdown. Alas, it is undocumented, but the mkdio.h++ header file is pretty simple and a trivial program that uses it is included in the mkdio.h++ sccs tree.

Smartypants substitutions

  1. `` text ‘’ is translated to “text”.
  2. "double-quoted text" becomes “double-quoted text”
  3. 'single-quoted text' becomes ‘single-quoted text’
  4. don't is “don’t.” as well as anything-else’t. (But foo'tbar is just foo'tbar.)
  5. And it's is “it’s,” as well as anything-else’s (except not foo'sbar and the like.)
  6. (tm) becomes ™
  7. (r) becomes ®
  8. (c) becomes ©
  9. 1/4th ? ¼th. Ditto for 1/4 (¼), 1/2 (½), 3/4ths (¾ths), and 3/4 (¾).
  10. ... becomes …
  11. . . . also becomes …
  12. --- becomes —
  13. -- becomes –
  14. A^B becomes AB. Complex superscripts can be enclosed in ()s, so A^(B+2) becomes AB+2.

Language extensions

My markdown was written so I could replace the fairly gross homemade text to html prettifier that I wrote for annotations, so I’ve extended it in a few ways; I’ve put support for paragraph centering in so that I don’t have to hand enter the <center> and </center> tags (nowadays I generate a css-styled <p> block, because that’s xhtml compatible instead of the now-depreciated <center> block element.) I’ve added support for specifying image sizes, and I’ve written a not-earthshatteringly-horrible markup extension for definition lists.

Paragraph centering : To center a paragraph, frame it with -> and <-. > > ->this is a test<- >produces > >->this is a test<-

Specifying image sizes : An image size is defined by adding an additional =widthxheight field to the image tag: > > dust mite >produces >dust mite

Definition lists : To mark up a definition list, left-justify the label and frame it with = characters, then put the body of the list item on the next line, indented 4 spaces. > > =hey!= > This is a definition list > produces > > ><dt>hey!</dt> ><dd>This is a definition list</dd> >

A definition list label is just a regular line of markdown code,
so you can put links and images into it.

In [discount 1.2.3](older.html#1.2.3), the definition list syntax has been
extended so that you can define sequential `<dt>` blocks by doing

=tag1=
=tag2=
    data.

which generates

<dt>tag1</dt>
<dt>tag2</dt>
<dd>data.</dd>

(If you want a definition list with a trailing empty tag, give it a body
that's just a html comment, like:

>
>     =placeholder!=
>         <!-- this space intentionally left blank -->
> produces
>
>```
><dt>placeholder!</dt>
><dd><!-- this space intentionally left blank --></dd>
>```



In [discount 2.0.4](#v2.0.4) I extended the definition list
syntax to allow [php markdown extra]
[definition lists][markdown extra definition list]
which means that source like

tag1
: data

now generates

<dt>tag1</dt>
<dd>data</dd>

alpha lists : Ordered lists with alphabetic labels (enabled by --enable-alpha-list during configuration) are supported in the same way that numeric ordered lists are:

a. first item
b. second item

generates

 a. first item
 b. second item

New pseudo-protocols for [] links : I wanted to be able to apply styles inline without having to manually enter the <span class="xxx"></span> html. So I redid the [][] code to support some new “protocols” within my markdown:

`abbr:`_description_
  : The label will be wrapped by `<abbr title="`_description_`">`...`</abbr>`
`class:`_name_
  : The label will be wrapped by `<span class="`_name_`">`...`</span>`
`id:`_name_
  : The label will be wrapped by `<a id="`_name_`">`...`</a>`
`raw:`_text_
  : _Text_ will be written verbatim to the output.   The protocol
was inspired by a short thread on the markdown mailing list
about someone wanting to embed LaTeX inside `<!-- -->` and
finding, to their distress, that markdown mangled it.

Passing text through in comments seems to be a path to unreadable
madness, so I didn't want to do that.   This is, to my mind, a
better solution.

Style blocks : accept <style></style> blocks and set them aside for printing via mkd_style().

Class blocks : A blockquote with a first line of > %class% will become <div class="class"> instead of a <blockquote>.

Tables : PHP Markdown Extra-style tables are supported;

     aaa | bbbb
    -----|------
    hello|sailor

becomes the following table:

 aaa | bbbb
-----|------
hello|sailor

And much of the rest of the current table syntax (alignment, handling
of orphan columns) follows the [PHP Markdown Extra] spec.

Document Headers : Pandoc-style document headers are supported; if the first three lines in the document begin with a % character, they are taken to be a document header in the form of

% Document title
% Document author
% Document date

and can be retrieved by the [library functions](id:document_header)
`mkd_doc_title()`, `mkd_doc_author()`, and `mkd_doc_date()`.

Note that I implement Pandoc document headers as they were documented
in 2008;  any Pandoc changes since then will not be reflected in my
implementation.

Fenced code blocks : If called with the MKD_FENCEDCODE option, Pandoc-style fenced code blocks are supported; blocks of code wrapped in ~~~ lines are treated as code just as if it was indented the traditional 4 spaces. Github-flavored-markdown fenced code blocks (blocks wrapped in backtick lines) are also supported.

Both of these formats support the github-flavored-markdown class
extension where you can put a word at the end of the opening backtick
line and have the block given that class.

Embedded LaTeX (mathjax) : If called with the MKD_LATEX option, text wrapped in $$$$, \[\], and \(\) is passed unchanged (except for encoding <, >, and &) to the output for processing by a LaTeX renderer.

This collides with how Markdown escapes '[', ']', '(', and ')' -- if discount is called with `MKD_LATEX`, `\(` and `\[` will only map to `(` and `[` if corresponding `\)` or `\]`s are **not** found in the same paragraph.

Github checkbox list items : If configured with the --github-checkbox flag, discount will understand github-style checkboxes and generate checkboxes using either html entities (--github-checkbox w/o an argument) or <input> elements (--github-checkbox=input)

How standard is it?

When I run the standard test suite (version 1.0.3) from daringfireball, MarkdownTest.pl reports:

$ MARKDOWN_FLAGS=0x20004 ./MarkdownTest.pl --tidy --script=/usr/local/bin/markdown
Amps and angle encoding ... OK
Auto links ... OK
Backslash escapes ... OK
Blockquotes with code blocks ... OK
Code Blocks ... OK
Code Spans ... OK
Hard-wrapped paragraphs with list-like lines ... OK
Horizontal rules ... OK
Inline HTML (Advanced) ... OK
Inline HTML (Simple) ... OK
Inline HTML comments ... OK
Links, inline style ... OK
Links, reference style ... OK
Links, shortcut references ... OK
Literal quotes in titles ... OK
Markdown Documentation - Basics ... OK
Markdown Documentation - Syntax ... OK
Nested blockquotes ... OK
Ordered and unordered lists ... OK
Strong and em together ... OK
Tabs ... OK
Tidyness ... OK


22 passed; 0 failed.

When I run the old standard test suite from daringfireball, MarkdownTest.pl reports:

$ MARKDOWN_FLAGS=0x22004 ./MarkdownTest.pl --tidy --script=/usr/local/bin/markdown
Amps and angle encoding ... OK
Auto links ... OK
Backslash escapes ... OK
Blockquotes with code blocks ... OK
Hard-wrapped paragraphs with list-like lines ... OK
Horizontal rules ... OK
Inline HTML (Advanced) ... OK
Inline HTML (Simple) ... OK
Inline HTML comments ... OK
Links, inline style ... OK
Links, reference style ... OK
Literal quotes in titles ... OK
Markdown Documentation - Basics ... OK
Markdown Documentation - Syntax ... OK
Nested blockquotes ... OK
Ordered and unordered lists ... OK
Strong and em together ... OK
Tabs ... OK
Tidyness ... OK


19 passed; 0 failed.

Most of the “how to get standards compliant” changes that went in were cleaning up corner cases and blatant misreading of the spec, but there were two places where I had to do a horrible hack to get compliant:

  1. To pass the Hard-wrapped paragraphs with list-like lines test, I had to modify mkd_compile() so that it would have top-level paragraphs absorb adjacent list items, but I had to retain the old (and, IMO, correct) behavior of a new list forcing a block break within indented (quoted, inside lists) blocks..
  2. To pass the Markdown Documentation - Syntax test in MarkdownTest 1.0, I had to change the behavior of code blocks from “preserve trailing whitespace” to “preserve trailing whitespace unless it’s the first line in the block.” From version 1.3.3 on, this is no longer the default, but the flag MKD_1_COMPAT (0x2000) turns it on again for testing purposes.

Does this markdown treat tabs as 4 spaces?

By default, yes, it does. The habit of compensating for broken editors that give no way to indent except for tabbing by setting tabstops to 4 is so intertwined with this language that treating tabs properly would be the moral equivalent of dropping nuclear devices into the testsuite.

But if you use a proper tabstop (8 characters), you can configure markdown with --with-tabstop and it will expand tabs to 8 spaces. If you’ve configured your markdown like this (markdown -V will report TAB=8) and you need to mark up text from other sources, you can set the input flag MKD_TABSTOP to revert those documents back to the icky standard 4-space tab.

Source Code

To build discount, untar your selected tarball, cd into the directory it creates, then do configure.sh to generate your Makefiles. After doing this, a make should give you a functional stack of programs and libraries.

Discount builds, for me, on MacOS 10.12, FreeBSD 4.8, NetBSD 8, Minix 3, and Debian Linux (dunno which version, but it’s a systemd joint that’s running a 3.10 kernel.) It may build on SLS Linux and Windows with mingw, but I’m not sure about that.

Archived releases

older versions of the code are still available.

Trivia

  1. This document is generated from markdown source.
  2. I’ve got a public mirror of my sccs repository on github.