Discount 1.6.7, released 31-Aug-2010.
This is my implementation of John Gruber’s Markdown text to html language. There’s not much here that differentiates it from any of the existing Markdown implementations except that it’s written in C instead of one of the vast flock of scripting languages that are fighting it out for the Perl crown.
Markdown provides a library that gives you formatting functions suitable for marking down entire documents or lines of text, a command-line program that you can use to mark down documents interactively or from a script, and a tiny (1 program so far) suite of example programs that show how to fully utilize the markdown library.
My markdown also does, by default, various smartypants-style substitutions.
The markdown program is a trivial compiler that reads in a markdown
file and writes out a html document or — if you use the -d flag —
an outline showing the parse tree. It does have a few options;
-d is, as previously mentioned, the flag that makes markdown
produce a parse tree instead of a html document.-F <flags> sets various flags that change
how markdown works. The flags argument is a somewhat less
than obvious bitmask — for example, -F 0x4 tells markdown
to not do the smartypants translations on the output.
(there are cases — like running the test suite — where
this is a useful feature.)-o file tells markdown to write the output to file-V tells you a markdown version number and how the package
was configured. For example
$ markdown -V
markdown: discount 1.0.0 DL_TAG HEADER TAB=8
tells you that this is markdown 1.0.0, and that the package was configured with support for definition lists, pandoc document headers, and sensible tabs.
There are 17 public functions in the markdown library, broken into three categories:
MMIOT *mkd_in(FILE *f, int flags) reads a markdown input file
and returns a MMIOT containing the preprocessed document.
(which is then fed to markdown() for final formatting.)
MMIOT *mkd_string(char *bfr, int size, int flags) reads the
markdown input file that’s been written into bfr and
returns a preprocessed blob suitable for feedin to markdown().
This function exists because annotations uses mmap() to
access message files instead of traditional file i/o. (If
you’re going to port Markdown to an AS/400, this function is
the droid you’ve been looking for.)
int markdown(MMIOT *doc, FILE *out, int flags) formats a document
(created with mkd_in() or mkd_string()) and writes the
resulting HTML document to out.
int mkd_line(char *bfr, int size, char **out, int flags) allocates a
buffer, then formats the text string into that buffer.
text string, allocates a buffer,
The differences from markdown() are it doesn’t support quoting,
footnotes (“reference links”,) multiple paragraphs, lists, code
sections, or pure html sections.
int mkd_generateline(char*bfr, int size, FILE *out, int flags)
formats the text string and writes the resulting HTML fragment to out.
It is exactly like mkd_line() except that it writes the output to
a FILE*.
int mkd_compile(MMIOT *doc, int flags) takes a document created by
mkd_in() or mkd_string() and compiles it into a tree of block
elements.
int mkd_generatehtml(MMIOT *doc, FILE *out) generates html from
a compiled document.
int mkd_document(MMIOT *doc, char **text) returns (in text) a
pointer to the compiled html document, and (in the return code)
the size of that document.
int mkd_css(MMIOT *doc, char **out) allocates a buffer and populates
it with any style blocks found in the document.
int mkd_generatecss(MMIOT *doc, FILE *out) prints any style blocks in
the document.
int mkd_toc(MMIOT *doc, char **out) allocates a string, populates it
with a table of contents, assigns it to out, and returns the length of
the string.
To get a table of contents, you must compile() the document
with the MKD_TOC flag (described below)
int mkd_generatetoc(MMIOT *doc, FILE *out) writes a table of contents
to out; other than writing to a string, it operates exactly like
mkd_toc()
int mkd_dump(MMIOT *doc, FILE *f, int flags, char *title) prints
a block structure diagram of a compiled document.
void mkd_cleanup(MMIOT *doc) releases the MMIOT allocated for the
document.
char *mkd_doc_title(MMIOT *doc) returns the % title line.char *mkd_doc_author(MMIOT *doc) returns the % author(s) line.char *mkd_doc_date(MMIOT *doc) returns the % date line.void mkd_e_url(MMIOT*, char* (callback)(char*,int,void*))
sets up a callback function that is called whenever discount
processes a []() or <link> construct. The callback function
is passed a pointer to the url, the size of the url, and a data
pointer (null or supplied by mkd_e_data())void mkd_e_flags(MMIOT*, char *(callback)(char*,int,void*))
sets up a callback to provide additional arguments to the tags
generated by []() and <link> constructs. If, for instance,
you wanted to add target="_blank" to every generated url, you
could just make a callback function that returned that string.void mkd_e_free(char *, void*) is called to free any allocated
memory returned by the url or flags callbacks.void mkd_e_data(MMIOT*, void*) assigns a
callback data area to the url & flags callbacks.The flags argument in
markdown(), mkd_text(), mkd_in(),
mkd_string(), mkd_compile(), and
mkd_generatehtml() is a mask of the following flag bits:
MKD_NOLINKS<a” or expand “[][]” into a link.MKD_NOIMAGE<img” or expand “![][]” into
an IMG tag.MKD_NOHTML<’s with <.MKD_NOPANTS MKD_NOHEADERMKD_TABSTOPMKD_NO_EXTMKD_STRICTMKD_TOC<h1>, <h2>, etc will include a id="name" argument.)MKD_1_COMPATMKD_AUTOLINK<>sMKD_SAFELINK[][]” is expanded into a link — if the url
isn’t a local reference, http://, https://, ftp://, or news://,
it will not be converted into a hyperlink.MKD_NOTABLEStext \‘\’ is translated to “text”."double-quoted text" becomes “double-quoted text”'single-quoted text' becomes ‘single-quoted text’don't is “don’t.” as well as anything-else’t.
(But foo'tbar is just foo'tbar.)it's is “it’s,” as well as anything-else’s
(except not foo'sbar and the like.)(tm) becomes ™(r) becomes ®(c) becomes ©1/4th ? ¼th. Ditto for 1/4 (¼), 1/2 (½),
3/4ths (¾ths), and 3/4 (¾).... becomes …. . . also becomes …-- becomes — - becomes – , but A-B remains A-B.A^B becomes AB.My markdown was written so I could replace the fairly gross homemade
text to html prettifier that I wrote for annotations, so I’ve extended
it in a few ways; I’ve put support for paragraph centering in
so that I don’t have to hand enter the <center> and </center> tags,
I’ve added support for specifying image sizes, and I’ve written a
not-earthshatteringly-horrible markup extension for definition lists.
-> and <-.
->this is a test<-produces
<center>this is a test</center>
=widthxheight field to the image tag:
produces
<img src="http://dust mite" height=150 width=150 alt="dust mite">
= characters, then put the body of the list
item on the next line, indented 4 spaces.
=hey!= This is a definition listproduces
<dt>hey!</dt> <dd>This is a definition list</dd>
A definition list label is just a regular line of markdown code, so you can put links and images into it.
In discount 1.2.3, the definition list syntax has been
extended so that you can define sequential <dt> blocks by doing
=tag1=
=tag2=
data.
which generates
<dt>tag1</dt>
<dt>tag2</dt>
<dd>data.</dd>
Ordered lists with alphabetic labels (enabled by --enable-alpha-list
during configuration) are supported in the same way that numeric ordered
lists are:
a. first item
b. second item
generates
I wanted to be able to apply styles inline without having
to manually enter the <span class="xxx">…</span>
html. So I redid the [][] code to support some new
“protocols” within my markdown:
abbr:description<abbr title="description">…</abbr>class:name<span class="name">…</span>id:name<a id="name">…</a>raw:textText will be written verbatim to the output. The protocol
was inspired by a short thread on the markdown mailing list
about someone wanting to embed LaTeX inside <!-- --> and
finding, to their distress, that markdown mangled it.
Passing text through in comments seems to be a path to unreadable madness, so I didn’t want to do that. This is, to my mind, a better solution.
<style>…</style> blocks and set them aside for printing
via mkd_style().> %class% will become
<div class="class"> instead of a <blockquote>.PHP Markdown Extra-style tables are supported;
aaa | bbbb
-----|------
hello|sailor
becomes the following table:
| aaa | bbbb |
|---|---|
| hello | sailor |
And much of the rest of the current table syntax (alignment, handling of orphan columns) follows the PHP Markdown Extra spec.
When I run the standard test suite (version 1.0.3) from
daringfireball, MarkdownTest.pl reports:
$ MARKDOWN_FLAGS=0x0204 ./MarkdownTest.pl --tidy --script=/usr/local/bin/markdown Amps and angle encoding ... OK Auto links ... OK Backslash escapes ... OK Blockquotes with code blocks ... OK Code Blocks ... OK Code Spans ... OK Hard-wrapped paragraphs with list-like lines ... OK Horizontal rules ... OK Inline HTML (Advanced) ... OK Inline HTML (Simple) ... OK Inline HTML comments ... OK Links, inline style ... OK Links, reference style ... OK Links, shortcut references ... OK Literal quotes in titles ... OK Markdown Documentation - Basics ... OK Markdown Documentation - Syntax ... OK Nested blockquotes ... OK Ordered and unordered lists ... OK Strong and em together ... OK Tabs ... OK Tidyness ... OK 22 passed; 0 failed.
When I run the old standard test suite from daringfireball,
MarkdownTest.pl reports:
$ MARKDOWN_FLAGS=0x2204 ./MarkdownTest.pl --tidy --script=/usr/local/bin/markdown Amps and angle encoding ... OK Auto links ... OK Backslash escapes ... OK Blockquotes with code blocks ... OK Hard-wrapped paragraphs with list-like lines ... OK Horizontal rules ... OK Inline HTML (Advanced) ... OK Inline HTML (Simple) ... OK Inline HTML comments ... OK Links, inline style ... OK Links, reference style ... OK Literal quotes in titles ... OK Markdown Documentation - Basics ... OK Markdown Documentation - Syntax ... OK Nested blockquotes ... OK Ordered and unordered lists ... OK Strong and em together ... OK Tabs ... OK Tidyness ... OK 19 passed; 0 failed.
Most of the “how to get standards compliant” changes that went in were cleaning up corner cases and blatant misreading of the spec, but there were two places where I had to do a horrible hack to get compliant:
mkd_compile() so that it would have top-level
paragraphs absorb adjacent list items, but I had to retain the
old (and, IMO, correct) behavior of a new list forcing a block
break within indented (quoted, inside lists) blocks..MKD_1_COMPAT
(0x2000) turns it on again for testing purposes.By default, yes, it does. The habit of compensating for broken editors that give no way to indent except for tabbing by setting tabstops to 4 is so intertwined with this language that treating tabs properly would be the moral equivalent of dropping nuclear devices into the testsuite.
But if you use a proper tabstop (8 characters), you can configure
markdown with --with-tabstop and it will expand tabs to 8
spaces. If you’ve configured your markdown like this (markdown -V
will report TAB=8) and you need to mark up text from other
sources, you can set the input flag MKD_TABSTOP to revert those
documents back to the icky standard 4-space tab.
To build discount, untar your selected tarball, cd into the directory it creates, then do
configure.shto generate your Makefiles. After doing this, amakeshould give you a functional stack of programs and libraries.Discount builds, for me, on SLS Linux, MacOS 10.5, FreeBSD 4.8, and RHEL3. It may build on Windows with mingw, but I’m not sure about that.
version 1.6.7 repairs one defect in backtick handling, where if a code span was closed by more backticks than it was opened with discount would consume the starting characters within the code span ( so ``foo``` would become <code>oo</code> instead of <code>foo</code>`.) Ooops. This was a simple case of mishandling the tick matching so that if the # of opening ticks was different from the # of closing ticks I would set the # of ticks to consume to the # of closing ticks no matter what. 1.6.7 corrects this feature.
version 1.6.6
repairs two defects, one in the markdown compiler and one in theme:
In theme, I needed to take into account the source filename
might not have an extension when I’m making the .html filename.
The old behavior was to look for a dot and put the .html after
that, but I didn’t check to see if there was actually a dot there
before appending the .html!
This did not work out too well if there was no dot.
In the markdown parser, I wasn’t handling escapes of the
open square bracket inside a []() construct.
So a link like
[foo\[and\]bar](does not work properly!)
would not parse because
my square bracket matcher would look for an additional ]
to match the \[ inside the [] part.
version 1.6.5 repairs six defects and adds two new features.
The bugfixes are:
_C99_SNPRINTF_EXTENSION to tell
the APE cc that, yes, I do really want to be
using snprintf() and I know that C99 has it
return the size it wants to populate.<style> blocks as mystery node during markdown -dThere is one race condition if you are using discount in a threaded environment; the code that initializes the blocktag list (for html blocks) has an internal test then set variable, but it’s not atomic, and it’s possible to either (a) have two threads go into that initializer at the same time and both attempt to initialize, or (b) have a second thread attempt to mark down some code while the first thread is still initializing. Either way it would produce hours of debugging fun.
To correct this defect, I’ve created the new function mkd_initialize(),
which does all of the initialization. This function is called from
within the markdown parser (so existing code will still work) but if
you’re doing markdown from a threaded environment, you must call
mkd_initialize() before you start spinning off threads.
And the features:
in relation to this, add the library function mkd_add_html5_tags(),
which adds (globally, and non-removably) a handful of
new tags for html5 support.
There are actually two new functions here — mkd_add_html5_tags()
is a additional library function that will not be linked in unless
you demand it (or build the library as a dynamic library), and
mkd_define_tag(char * tag, int selfclose) is a function that
allows you to add html tags as you wish (so you could, if you
wished, so something like mkd_define_tag("html", 0) to treat
the contents of a html document as html) and which is used by
mkd_initialize() to define the standard markdown html block
tags, mkd_add_html5_tags() to add the narsty new html5 tags,
or any user-written function to add the tags you want.
It’s not threadsafe, so you need to define your tags before spinning off threads.
version 1.6.4 repairs a single defect (in the handling of html blocks), adds a pair of features (both via github,) and adds a couple of test cases to further excercise the code block handler changes from version 1.6.3.
The defect was that text following the closing tag of a html block would cause following content to be misprocessed (besides, of course, the existing defect that that text would not be processed.) The bugfix is that I now split following content into a new logical line, so input like
<p>blah blah</p>foo
which used to output
<p>blah blah</p>foo
now outputs
<p>blah blah blah</p>
<p>foo</p>
The (creeping) features are
're, 'll, 've', 'm', and 'd so I sucked those
changes into the baseline code.lang:<whatever> which would expand into
<span lang=<whatever>…</span>The additional testcases are to catch mishandled backslashes in code sections (most recently documented at http://code.reddit.com/ticket/695.) This defect was corrected in version 1.6.3, and I testcased it then, but more tests are always good.
version 1.6.3 repairs code section handling by the simple expedient of ripping most of the existing handlers out and replacing them with newer (and more dingus-compliant) code.
version 1.6.2 cleans up a couple of defects, adds manpages for the new callback functions introduced in version 1.6.0, and rewrites the emphasis generator (again) to make it work in a more xhtml-compliant manner when fed pathological emphasis.
The bugfixes are:
<a tag._ or * when they’re in the
middle of whitespace.[]() link.<h6>[]-<]: won’t erroneously become
a footnote.version 1.6.1 fixes two scripting exploits, corrects three edge cases to make them work more like the reference implementation, and lightly modifies list handling to make it work a little more like the reference implementation (but, hopefully, without triggering the awful failure case that the reference implementation has.)
The two security holes that were fixed were:
< then just copy the rest of the tag to the output
until I reached a ‘>’. So if someone did
<hi<script>attack<hi</script>, the <script> would be
dumped to the output just as prettily as you please." and ', so a malformed title string could contain
arbitrary text, up to and including scripts.The three compatability changes are:
<br/>‘ed,And the tweak is to change how list items absorb new paragraphs. Discount traditionally would absorb new paragraphs if they were indented one level (4 space), and punt them back to the parent level if they weren’t indented that much. The reference implementation, on the other hand, captures new paragraphs into the preceding list if, apparently, they are indented at all. The people at github, who, apparently, are using Discount, have a bug report against list handling where
* list indented 3 spaces
* list indented 3 spaces
paragraph indented 3 spaces
doesn’t absorb “paragraph …” into the second list item. I have modified the indent handler to absorb items indented as deeply as the text on the preceding list item instead of requiring a 4 space indent. It doesn’t appear to break any of my test cases (markdowntest 1.0, markdowntest 1.0.3, and any of my flock of tests), and it allows for slightly more readable lists.
version 1.6.0 adds one new feature that’s large enough to require a version bump; I’ve now added callback routines so that a program that uses discount can
[]() and <link> constructs,
as well as[]()
and <link> constructs.As an example, if you wanted to ensure that no urls were relative, you could set up a basename callback to add a prefix to every relative url (this was hardwired into discount prior to 1.6.0, but has been newly made into a callback.)
older versions of the code are still available.