Discount

Discount is free software released under the terms of a BSD-style license.

If you find it useful, please consider making a contribution to help support onward development.

download

Discount 2.2.7d, released 22-Aug-2023

description

This is my implementation of John Gruber’s Markdown text to html language. There’s not much here that differentiates it from any of the existing Markdown implementations except that it’s written in C instead of one of the vast flock of scripting languages that are fighting it out for the Perl crown.

Markdown provides a library that gives you formatting functions suitable for marking down entire documents or lines of text, a command-line program that you can use to mark down documents interactively or from a script, and a tiny (3 programs so far) suite of example programs that show how to fully utilize the markdown library.

My markdown also does, by default, various smartypants-style substitutions.

The program

The markdown program is a trivial compiler that reads in a markdown file and writes out a html document or – if you use the -d flag – an outline showing the parse tree. It does have a few options;

-d : is, as previously mentioned, the flag that makes markdown produce a parse tree instead of a html document. -F <flags> : sets various flags that change how markdown works. The flags argument is a somewhat less than obvious bitmask – for example, -F 0x4 tells markdown to not do the smartypants translations on the output. (there are cases – like running the test suite – where this is a useful feature.) -f <flags> : sets various flags that change how markdown works. Unlike -F, these are the names of the flags (you can get a list of the supported flags with the -f? option; supported flags + synonyms with -Vf? ) optionally prefixed with no or - to turn them off. To reuse the example to disable smartypants, you’d do -f nopants (“pants” is a synonym for “smarty” == smartypants.) -o file : tells markdown to write the output to file -V : tells you a markdown version number and how the package was configured. For example

    $ markdown -V
    markdown: discount 2.2.2 TAB=8 DEBUG

tells you that this is markdown 2.2.2, and that the package
was configured with support for sensible tabs & debugging
malloc.

-VV : is like -V, except it also returns the current values of many of the flags that can be set with -f or -F.

The library

There are 17 public functions in the markdown library, broken into three categories:

Input functions

MMIOT *mkd_in(FILE *f, int flags) reads a markdown input file and returns a MMIOT containing the preprocessed document. (which is then fed to markdown() for final formatting.)
MMIOT *mkd_string(char *bfr, int size, int flags) reads the markdown input file that’s been written into bfr and returns a preprocessed blob suitable for feeding to markdown(). This function exists because annotations uses mmap() to access message files instead of traditional file i/o. (If you’re going to port Markdown to an AS/400, this function is the droid you’ve been looking for.)

“Big Picture”-style processing functions

int markdown(MMIOT *doc, FILE *out, int flags) formats a document (created with mkd_in() or mkd_string()) and writes the resulting HTML document to out.
int mkd_line(char *bfr, int size, char **out, int flags) allocates a buffer, then formats the text string into that buffer. text string, allocates a buffer, The differences from markdown() are it doesn’t support quoting, footnotes (“reference links”,) multiple paragraphs, lists, code sections, or pure html sections.
int mkd_generateline(char*bfr, int size, FILE *out, int flags) formats the text string and writes the resulting HTML fragment to out. It is exactly like mkd_line() except that it writes the output to a FILE*.

Fine-grained access to the internals

int mkd_compile(MMIOT *doc, int flags) takes a document created by mkd_in() or mkd_string() and compiles it into a tree of block elements.
int mkd_generatehtml(MMIOT *doc, FILE *out) generates html from a compiled document.
int mkd_document(MMIOT *doc, char **text) returns (in text) a pointer to the compiled html document, and (in the return code) the size of that document.
int mkd_css(MMIOT *doc, char **out) allocates a buffer and populates it with any style blocks found in the document.
int mkd_generatecss(MMIOT *doc, FILE *out) prints any style blocks in the document.
int mkd_toc(MMIOT *doc, char **out) allocates a buffer, populates it with a table of contents, assigns it to out, and returns the length of the buffer.

To get a table of contents, you must compile() the document with the MKD_TOC flag (described below)
int mkd_generatetoc(MMIOT *doc, FILE *out) writes a table of contents to out; other than writing to a FILE*, it operates exactly like mkd_toc()
int mkd_dump(MMIOT *doc, FILE *f, int flags, char *title) prints a block structure diagram of a compiled document.
void mkd_cleanup(MMIOT *doc) releases the MMIOT allocated for the document.

Document header access functions

char *mkd_doc_title(MMIOT *doc) returns the % title line.
char *mkd_doc_author(MMIOT *doc) returns the % author(s) line.
char *mkd_doc_date(MMIOT *doc) returns the % date line.

Url callback functions

void mkd_e_url(MMIOT*, char* (callback)(char*,int,void*)) sets up a callback function that is called whenever discount processes a []() or <link> construct. The callback function is passed a pointer to the url, the size of the url, and a data pointer (null or supplied by mkd_e_data())
void mkd_e_flags(MMIOT*, char *(callback)(char*,int,void*)) sets up a callback to provide additional arguments to the tags generated by []() and <link> constructs. If, for instance, you wanted to add target="_blank" to every generated url, you could just make a callback function that returned that string.
`void mkd_e_code(MMIOT, char (callback)(char,int,void))‘ sets up a callback to format the contents of a code block.
void mkd_e_free(char *, void*) is called to free any allocated memory returned by the url or flags callbacks.
void mkd_e_data(MMIOT*, void*) assigns a callback data area to the url & flags callbacks.

The flags argument in markdown(), mkd_text(), mkd_in(), mkd_string(), mkd_compile(), and mkd_generatehtml() is a mask of the following flag bits:

Flag	Action
MKD_NOLINKS	Don’t do link processing, block `<a>` tags
MKD_NOIMAGE	Don’t do image processing, block `<img>`
MKD_NOPANTS	Don’t run `smartypants()`
MKD_NOHTML	Don’t allow raw html through AT ALL
MKD_STRICT	Disable `SUPERSCRIPT`, `RELAXED_EMPHASIS`
MKD_TAGTEXT	Process text to go inside an html tag; no emphasis or html expansion & embedded html will be stripped out.
MKD_NO_EXT	Don’t allow pseudo-protocols
MKD_CDATA	Generate code for xml `![CDATA[...]]`
MKD_NOSUPERSCRIPT	No `A^B`
MKD_NORELAXED	Emphasis happens everywhere
MKD_NOTABLES	Don’t process PHP Markdown Extra tables.
MKD_NOSTRIKETHROUGH	Forbid `~~strikethrough~~`
MKD_TOC	Do table-of-contents processing
MKD_1_COMPAT	Compatability with MarkdownTest_1.0
MKD_AUTOLINK	Make `http://foo.com` a link even without `<>`s
MKD_SAFELINK	Paranoid check for link protocol
MKD_NOHEADER	Don’t process document headers
MKD_TABSTOP	Expand tabs to 4 spaces
MKD_NODIVQUOTE	Forbid `>%class%` blocks
MKD_NOALPHALIST	Forbid alphabetic lists
MKD_NODLIST	Forbid definition lists
MKD_EXTRA_FOOTNOTE	Enable PHP Markdown Extra-style footnotes (warning: not the later version of multiple-paragraph ones.)
MKD_NOSTYLE	Don’t extract `<style>` blocks
MKD_NODLDISCOUNT	Disable discount-style definition lists
MKD_DLEXTRA	Enable PHP Markdown Extra definition lists
MKD_FENCEDCODE	Enable Github-style fenced code blocks.
MKD_GITHUBTAGS	Allow dashes & underscores in element names
MKD_HTML5ANCHOR	Use the html5 namespace for anchor names
MKD_LATEX	Enable embedded LaTeX (mathjax-style)
MKD_EXPLICITLIST	Don’t merge adjacent numbered/bulleted lists

Language bindings

I have an experimental C++ binding that lives on Github in mkdio.h++. It implements a couple of RAII objects; MKIOT – can’t call the class MMIOT because it clashes with the C MMIOT it wraps – for standard markdown (plus my extensions, of course) and GFIOT for github-flavo(u)red markdown. Alas, it is undocumented, but the mkdio.h++ header file is pretty simple and a trivial program that uses it is included in the mkdio.h++ sccs tree.

Smartypants substitutions

`` text ‘’ is translated to “text”.
"double-quoted text" becomes “double-quoted text”
'single-quoted text' becomes ‘single-quoted text’
don't is “don’t.” as well as anything-else’t. (But foo'tbar is just foo'tbar.)
And it's is “it’s,” as well as anything-else’s (except not foo'sbar and the like.)
(tm) becomes ™
(r) becomes ®
(c) becomes ©
1/4th ? ¼th. Ditto for 1/4 (¼), 1/2 (½), 3/4ths (¾ths), and 3/4 (¾).
... becomes …
. . . also becomes …
--- becomes —
-- becomes –
A^B becomes A^B. Complex superscripts can be enclosed in ()s, so A^(B+2) becomes A^B+2.

Language extensions

My markdown was written so I could replace the fairly gross homemade text to html prettifier that I wrote for annotations, so I’ve extended it in a few ways; I’ve put support for paragraph centering in so that I don’t have to hand enter the <center> and </center> tags (nowadays I generate a css-styled <p> block, because that’s xhtml compatible instead of the now-depreciated <center> block element.) I’ve added support for specifying image sizes, and I’ve written a not-earthshatteringly-horrible markup extension for definition lists.

Paragraph centering : To center a paragraph, frame it with -> and <-. > > ->this is a test<- >produces > >->this is a test<-

Specifying image sizes : An image size is defined by adding an additional =widthxheight field to the image tag: > > dust mite >produces >

Definition lists : To mark up a definition list, left-justify the label and frame it with = characters, then put the body of the list item on the next line, indented 4 spaces. > > =hey!= > This is a definition list > produces > >><dt>hey!</dt> ><dd>This is a definition list</dd> >

A definition list label is just a regular line of markdown code,
so you can put links and images into it.

In [discount 1.2.3](older.html#1.2.3), the definition list syntax has been
extended so that you can define sequential `<dt>` blocks by doing

=tag1=
=tag2=
    data.

which generates

<dt>tag1</dt>
<dt>tag2</dt>
<dd>data.</dd>

(If you want a definition list with a trailing empty tag, give it a body
that's just a html comment, like:

>
>     =placeholder!=
>         <!-- this space intentionally left blank -->
> produces
>
>```
><dt>placeholder!</dt>
><dd><!-- this space intentionally left blank --></dd>
>```



In [discount 2.0.4](#v2.0.4) I extended the definition list
syntax to allow [php markdown extra]
[definition lists][markdown extra definition list]
which means that source like

tag1
: data

now generates

<dt>tag1</dt>
<dd>data</dd>

alpha lists : Ordered lists with alphabetic labels (enabled by --enable-alpha-list during configuration) are supported in the same way that numeric ordered lists are:

a. first item
b. second item

generates

 a. first item
 b. second item

New pseudo-protocols for [] links : I wanted to be able to apply styles inline without having to manually enter the <span class="xxx">…</span> html. So I redid the [][] code to support some new “protocols” within my markdown:

`abbr:`_description_
  : The label will be wrapped by `<abbr title="`_description_`">`...`</abbr>`
`class:`_name_
  : The label will be wrapped by `<span class="`_name_`">`...`</span>`
`id:`_name_
  : The label will be wrapped by `<a id="`_name_`">`...`</a>`
`raw:`_text_
  : _Text_ will be written verbatim to the output.   The protocol
was inspired by a short thread on the markdown mailing list
about someone wanting to embed LaTeX inside `<!-- -->` and
finding, to their distress, that markdown mangled it.

Passing text through in comments seems to be a path to unreadable
madness, so I didn't want to do that.   This is, to my mind, a
better solution.

Style blocks : accept <style>…</style> blocks and set them aside for printing via mkd_style().

Class blocks : A blockquote with a first line of > %class% will become <div class="class"> instead of a <blockquote>.

Tables : PHP Markdown Extra-style tables are supported;

     aaa | bbbb
    -----|------
    hello|sailor

becomes the following table:

 aaa | bbbb
-----|------
hello|sailor

And much of the rest of the current table syntax (alignment, handling
of orphan columns) follows the [PHP Markdown Extra] spec.

Document Headers : Pandoc-style document headers are supported; if the first three lines in the document begin with a % character, they are taken to be a document header in the form of

% Document title
% Document author
% Document date

and can be retrieved by the [library functions](id:document_header)
`mkd_doc_title()`, `mkd_doc_author()`, and `mkd_doc_date()`.

Note that I implement Pandoc document headers as they were documented
in 2008;  any Pandoc changes since then will not be reflected in my
implementation.

Fenced code blocks : If called with the MKD_FENCEDCODE option, Pandoc-style fenced code blocks are supported; blocks of code wrapped in ~~~ lines are treated as code just as if it was indented the traditional 4 spaces. Github-flavored-markdown fenced code blocks (blocks wrapped in backtick lines) are also supported.

Both of these formats support the github-flavored-markdown class
extension where you can put a word at the end of the opening backtick
line and have the block given that class.

Embedded LaTeX (mathjax) : If called with the MKD_LATEX option, text wrapped in $$…$$, \[…\], and $…$ is passed unchanged (except for encoding <, >, and &) to the output for processing by a LaTeX renderer.

This collides with how Markdown escapes '[', ']', '(', and ')' -- if discount is called with `MKD_LATEX`, `\(` and `\[` will only map to `(` and `[` if corresponding `\)` or `\]`s are **not** found in the same paragraph.

Github checkbox list items : If configured with the --github-checkbox flag, discount will understand github-style checkboxes and generate checkboxes using either html entities (--github-checkbox w/o an argument) or <input> elements (--github-checkbox=input)

How standard is it?

When I run the standard test suite (version 1.0.3) from daringfireball, MarkdownTest.pl reports:

$ MARKDOWN_FLAGS=0x20004 ./MarkdownTest.pl --tidy --script=/usr/local/bin/markdown
Amps and angle encoding ... OK
Auto links ... OK
Backslash escapes ... OK
Blockquotes with code blocks ... OK
Code Blocks ... OK
Code Spans ... OK
Hard-wrapped paragraphs with list-like lines ... OK
Horizontal rules ... OK
Inline HTML (Advanced) ... OK
Inline HTML (Simple) ... OK
Inline HTML comments ... OK
Links, inline style ... OK
Links, reference style ... OK
Links, shortcut references ... OK
Literal quotes in titles ... OK
Markdown Documentation - Basics ... OK
Markdown Documentation - Syntax ... OK
Nested blockquotes ... OK
Ordered and unordered lists ... OK
Strong and em together ... OK
Tabs ... OK
Tidyness ... OK


22 passed; 0 failed.

When I run the old standard test suite from daringfireball, MarkdownTest.pl reports:

$ MARKDOWN_FLAGS=0x22004 ./MarkdownTest.pl --tidy --script=/usr/local/bin/markdown
Amps and angle encoding ... OK
Auto links ... OK
Backslash escapes ... OK
Blockquotes with code blocks ... OK
Hard-wrapped paragraphs with list-like lines ... OK
Horizontal rules ... OK
Inline HTML (Advanced) ... OK
Inline HTML (Simple) ... OK
Inline HTML comments ... OK
Links, inline style ... OK
Links, reference style ... OK
Literal quotes in titles ... OK
Markdown Documentation - Basics ... OK
Markdown Documentation - Syntax ... OK
Nested blockquotes ... OK
Ordered and unordered lists ... OK
Strong and em together ... OK
Tabs ... OK
Tidyness ... OK


19 passed; 0 failed.

Most of the “how to get standards compliant” changes that went in were cleaning up corner cases and blatant misreading of the spec, but there were two places where I had to do a horrible hack to get compliant:

To pass the Hard-wrapped paragraphs with list-like lines test, I had to modify mkd_compile() so that it would have top-level paragraphs absorb adjacent list items, but I had to retain the old (and, IMO, correct) behavior of a new list forcing a block break within indented (quoted, inside lists) blocks..
To pass the Markdown Documentation - Syntax test in MarkdownTest 1.0, I had to change the behavior of code blocks from “preserve trailing whitespace” to “preserve trailing whitespace unless it’s the first line in the block.” From version 1.3.3 on, this is no longer the default, but the flag MKD_1_COMPAT (0x2000) turns it on again for testing purposes.

Does this markdown treat tabs as 4 spaces?

By default, yes, it does. The habit of compensating for broken editors that give no way to indent except for tabbing by setting tabstops to 4 is so intertwined with this language that treating tabs properly would be the moral equivalent of dropping nuclear devices into the testsuite.

But if you use a proper tabstop (8 characters), you can configure markdown with --with-tabstop and it will expand tabs to 8 spaces. If you’ve configured your markdown like this (markdown -V will report TAB=8) and you need to mark up text from other sources, you can set the input flag MKD_TABSTOP to revert those documents back to the icky standard 4-space tab.

Source Code

To build discount, untar your selected tarball, cd into the directory it creates, then do configure.sh to generate your Makefiles. After doing this, a make should give you a functional stack of programs and libraries.

Discount builds, for me, on MacOS 10.12, FreeBSD 4.8, NetBSD 8, Minix 3, and Debian Linux (dunno which version, but it’s a systemd joint that’s running a 3.10 kernel.) It may build on SLS Linux and Windows with mingw, but I’m not sure about that.

Version 2.2.7d One more maintenance release to fix a few more bugs:
- markdown extra footnotes were slightly broken – two adjacent footnotes ([^1][^2]) were being treated as a regular old hyperlink because the code I put in originally was a huge old monster bodge.
- change the description for -fstrict.
- add a fistful of test cases for the new! improved! Markdown.pl compatible <tags>
- rework the tag handler to more closely adhere to Markdown.pl’s observed behavior; also treat incomplete tags as actual tags in compatability mode (-fstrict)
- fix weird behavior on freebsd 4.8 w/ gcc 2.95.4; if an #ifdef … #else … #endif wraps the end of a if () it pukes up an error about a syntax error in a macro; pull those tests out and conditionally #define a macro instead
- the behavior of tags-starting-with-alpha-/-! was not like that of the reference implementation; fixed (in a very script-kiddy leaky fashion which I can’t resolve with a runtime flag thanks to the 32 bit flags field in the 2.x series :-(
version 2.2.7b Another maintenance release to fix a bug in the Makefile; I didn’t have the proper dependencies for the pandoc_headers tool, so if a parallel make was tried it would fall over complaining about missing objects when it tried to link everything together.
version 2.2.7 A maintenance release to clean up a buffer overflow when generating label names (backported from the new v3 under [slow] development)
version 2.2.6 should have been released last fall, but I was too busy working on breaking the published interface so I could expand the flag structure to permit more than 32 feature flags. *sigh*

It’s not a huge release; it fixes a few bugs, tweaks a few things, and adds a couple of features:

=bugs=
- deal with cached text before handling fenced code blocks (because otherwise it would accumulate the contents of things between the blocks and spit them out afterwards.)
- don’t give a code block class if it’s just an empty string
- mkd2html wasn’t passing pgm to hoptusage()
- mkd2html wasn’t properly handling two arguments for input & output files. While I was at it, add the -G option to use gfm_in() to read the input file in (and preserve newlines)
- push the fencedcode block handler up to the toplevel scanner so it will handle blank lines inside the fence (it was splitting the code block up into paragraphs and treating them as text.)
=features=
- add an anchor format callback, plus add the -x option to the markdown program to squash toc anchors to (close to) github compatability
- add an external code block formatter to the markdown program
- clean up all references to flags; define them EVERYWHERE as mkd_flag_t
- support external code formatters
=tweaks=
- configure to
  1. use the modern standard uintXX_t types for DWORD, WORD, BYTE if possible and
  2. if a program is defined via an environment variable (like CC) try to use that variable instead of doing a path search in AC_PROG
- make a test program to examine pandoc header elements
- patch the cmakefile so that has an option to only generate a library
- set the Plan9 CFLAGS to the two extensions (instead of having them be part of the CC macro) and wipe out any CFLAGS (like -T, which tells the Plan9 posix C compiler to be stupid) that already exists
- the Makefile macros a bit so people can wedge in their own compilers (BUILD) and linkers (LINK) during a build
- when dumping the parse tree, say which header is being dumped – h1..h6 vs just header
version 2.2.4 fixes a couple of small buglets, adds a couple of new features, and tweaks the html5 support module to chase the html5 standard:
1. (bugfix) when splitting a line, null terminate the new line; also don’t inherit the dle from the previous line
2. (bugfix) fix the broken footnote code; allow multi-paragraph footnotes, not just one line of footnote.
3. (feature) support github checkbox list items (static configuration option)
  1. --github-checkbox does the checkbox with html check entities
  2. --github-checkbox=input does them with html <input> elements (set to disabled so that people can’t check and uncheck willy-nilly)
4. (feature) on systems that support it, use the ‘destructor’ attribute on mkd_shlib_destructor() so it will run automatically when the library unloads.
5. (feature) add --cxx-binding option to configure.sh to generate a mkdio.h that’s got an extern "C" wrapper around it.
6. (tweak)delete HGROUP from the list of html5 elements, because the w3c working group decided to punt it
The bugfix for not null-terminating a split line was reported by Github user fCorleone, who was running a input mangler program (afl) to stress discount by feeding random garbage to it. It saw that the splitline() function (used when breaking out embedded chunks of html) was not null-terminating the split line chunks, and was throwing errors on that. (this is something that clang on modern macos catches if you use the -fsanitize=address option.)

I never actually use footnotes, so I’d been running for about a decade assuming that footnotes were one or maybe two lines, but github user somasis pointed out that this was wrong wrong wrong! Ooops; at least it was fixable w/o much pain and suffering.

HGROUP was caught by (no longer registered) github user Crypto-Anarchist in their own branch of discount, so I cherry-picked that changeset and pulled it back into the mainline discount.
version 2.2.3a 2.2.3 has a configuration glitch (not properly testing for the existance of S_ISSOCK, et seq) which 2.2.3a corrects. I also pulled the plug on the single use of alloca() in theme.c (Windows MinGW doesn’t support it properly(?) and in this one case I’m better off just malloc()ing the offending thing and letting it be garbage collected when the program finishes.)
version 2.2.3 I turn around and blink, and suddenly 13 months have gone by while I slowly tested various beta versions of 2.2.3, and now it’s 2018 and a small collection of cosmetic, portability, and build tweaks, plus some and actually bad bugs have been cleaned up:
1. Have tools/branch be a no-op if there’s no git that I can use to check for my SCCS status.
2. Add user-contributed cmake support
3. check for non-null T(link) in the safelink function; rename paranoia.t to safelink.t, add a couple more tests for safelinks
4. tweak the safelink detection code to allow more types of url fragments
5. when finding the installer, check that -s works (doesn’t work on Minix 3?)
6. Correct makepage & theme to use the new set_flag() return scheme (0==success, !0==pointer to bogus flag)
7. if an unknown flag was in the middle of a comma-delimited flag string (like -flatex,bogus,footnote), the markdown program would incorrectly report the first flag as unknown (because set_flag returned 0 on error, 1 on success and the strtok() of the flag string had already replaced the commas up to that point with nulls.)
Change it so that set_flag returns null on successful processing and a pointer to the offending flag on an unknown one.
1. Tweak the install rule to install a GNU-style .pc file iff @MK_PKGCONFIG@ is not defined as #
2. Hand-resolve a conflict in the msvc Makefile
3. Add in paranoid list splitting [EXPLICITLIST] (the default behavior many many versions ago before I realized it wasn’t the standard) (and take the last of the 32-bit flag mask for it) between adjacent ordered and unordered lists.
4. Create a utility function [notspecial()] for theme & mkd2html – check a filename to see if it’s a special file that doesn’t need to be deleted or to have a .html suffix added to it. It only works on machines that have a stat() system call + the S_ISFIFO, S_ISCHR, and S_ISSOCK macros, otherwise it thinks that nothing is special.
5. Add a README for the utilities in the tools subdirectory
6. Add the git branch name into the version string if we’re not on the master branch.
7. Add support for NMAKE and Visual C++ toolset. (courtesy of Martin Hofmann (tin-pot) fork of Discount).
8. Add a ‘dirty’ flag to the Document structure & set it whenever a callback changes (github issue #136) so that the next mkd_compile will regenerate the document.
9. Tweak superscript handling to be able to superscript something wrapped in html
10. Tweak the show_flags() function so that if it’s called verbosely it will show synonyms for named flags. (Calling it verbosely is done by the V option, which is overloaded for verbosity when listing flags.)
11. Process html blocks in compile (as well as in compile_document(); compile_document() needs to handle <style> blocks and compile() needs to handle html blocks that are nested inside blockquotes, tables, lists, &tc.
12. Add –h1-title as an option to configure.sh; this enables code that uses the first h1 in a document as the document title (in mkd2html & theme)
13. add gethopt() – a q&d getopt clone that accepts both full-word & single character options – instead of doing kludgy bespoke argument parsing in mkd2html & theme.
version 2.2.2 A few months worth of bugfixes, mainly for annoying bugs but one for a serious one:
- In mkd_line() I was doing a sneaky hack to take the rendered text out of the MMIOT it was in which was fine if I was building discount w/o --with-amalloc, but horribly wrong --with-amalloc; the return from mkd_line() is a freeable string, so my sneak (I was pulling the contents of the rendered string out, then zeroing the pointers in the MMIOT; if amalloc was active, the rendered string was in the middle of a allocated block which could not be freed (and would cause core dumps on some systems. Kludges always come back to bite you in the ass)) was terribly terribly wrong.
- Redo toc anchor generation to reduce namespace collisions (by encoding out of namespace characters to -XX- hex sequences (except space, which encodes to just -) in html4, %XX in html5 (formerly urlencodedanchor)).
- Add html5anchor as a synonym for urlencodedanchor (urlencoded is depreciated, but will stick around like an unwelcome guest until the next major release) now that I’m doing html5 encoding there.
- When generating a TOC, don’t anchor-encode the human-readable contents of the links .
- “quote” “everything” “when” “I” “generate” “librarian.sh”
- Horrifying kludge to deal with destination directories with spaces in them;
- quote filename arguments to AC_SUB (via the __ac_quote builtin), and
- instead of using the shell to check for sed substitute markers (if I do echo "$*" | sed -e 's/;/\\;/' it collapses a layer of backslashes needlessly. It’s awful enough to do echo $* | sed -e 's/ /\\\\ /' (so sed won’t eat the backslashes) but to do sed -e 's/ /\\\\\\\\ /' to keep sed + the shell from eating the backslashes is just appalling) I generate the little C program config.sed, which generates the sed pattern, escaping the ; and NOT collapsing backslashes.
  
  Ugh. It’s all pretty gross, but it looks like it works? At least for discount.
- Always have mkd_xmlpage() generate a title element, even if it’s empty
- Add <form> … </form> to the set of known block tags
- Use \r instead of ^C for the internal hard end of line marker (which means if I don’t unescape it when rendering it won’t leave ^C poop in the output.)
- Bring the installation instructions a little closer to up to date.
- When there’s either pkg-config on the system or if the configure option --pkg-config is used, generate a pkgconfig .pc file.
version 2.2.1 A few months of small bugfixes, a few tweaks, and some more conversion of static flags to runtime ones.
- Update the muñoz test case for a text fragment with an 0xff
- Kludge peek() and poke() to not sign extend on machines with signed chars, so a 0xff character will not sign extend and become an EOF. This breaks a test in muñoz.t (which tripped the 0xff becomes EOF bug) so that test needed to be rewritten.
- Have configure.sh not do the WinDef.h short circuit
- include "config.h" to pick up the definition of DWORD/WORD/BYTE (windows support) & add a new label field for better TOC label generation
- Wasn’t updating config.sub with {scalar_type}->{scalar_type} on windows machines.
- Tweak mkd_xmlpage() so that it only uses the published interface.
- Eat one of the two remaining flag bits (flag_t structure – or 64-bit int, which isn’t portable to old Unices – here we come!) to make LaTeX support a runtime flag.
- Add some missing dependencies to the makefile
- In makepage, I was using the wrong argument for the file to make a page from. needed to check for argc > 0 & argv[0] (after shifting argc/argv by optind) but was checking argc > 1 & argv[1], which was something less than useful.
- Strip out --enable-all-features from configure.sh (obsolete now after the mass conversion of features from compile-time to run-time.)
- Add config.h to the includes for amalloc (for the if #define to make clang stfu)
version 2.2.0 Many MANY tweaks over the last year, including…
- mathjax support (–with-latex – changed to the runtime flag MKD_LATEX in 2.2.1) ($$..$$, $..$, and \[..\], not $..$ ))
- make the amalloc() paranoia malloc library even more paranoid by putting markers at the start and the end of each allocated block.
- Redo comment block handling; standard markdown only treats comments as block html if the start comment marker starts at the beginning of a line and the end comment marker is at the end of a line.
- clean up broken and insufficiently paranoid parts of configure.inc
- When attempting to match the closing tag of an html block, don’t advance the match index unless that character actually matches. (defect: <p></>* was splitting into 2 lines when it should have generated <p><p></>*</p>)
- if mkd_compile() is called multiple times, actually recompile the document if the flags change.
- When processing automatic links, explicitly allow extended utf-8 characters as part of the url.
- Tweak configure.inc to quote __cwd & __d so that a $__cwd with spaces in the path won’t make configure.sh (or make install) puke
- messed up the fwrite() error check in mkd_generatehtml(),
- return EOF instead of -1 on error in mkd_xhtmlpage(),
- if the mkd output fails, exit with nonzero status
- Handle error conditions and pass errorcodes out of various mkd_xxx() output functions (inspired by a patch written by Koen Punt)
- in mkd_document() don’t pad the generated html with a 0 unless it’s actually generated.
- If the C compiler generates .dSYM directories, get rid of them during distclean
- Change the configure.sh message for the --with-(foo) variables to reflect what it’s actually doing with them.
- Theme really wants the old behavior of --with-(foo), so pass those settings into theme in a #define
- Have configure.sh just state that an option is not supported instead of dying.
- Convert many configure-time settings to runtime flags
- Manpage editing by Nathan Phillip Brink
- Update plan9 support.
version 2.1.8a When I put in the patch to …
- Change the mail demangler to a debian-specific ‘always mangle one way’ hack. (enabled with the configure.sh option –debian-glitch)
I messed up the format string and made the mangled email address into a fixed bogus string. Sigh. Fixed (thanks to a patch from Alessandro Ghedini), updated (and I really have to expand the runtime configuration flags array to be long enough to fit 64 settings, but that’s a fix for a different day) and released.
version 2.1.8 After a year or so of letting the code sit and slowly accumulate fixes, a new version which fixes a wad of bugs and adds a few new features. Some of this code is front other people, and those changes will be marked with their names:
- FINALLY address the bug where markdown extra-style footnotes lose numbering when they show up in nested element; I was not carrying the m-e reference# inside the footnotes structure, but was instead carrying it in the parent structure and not updating it. So I changed the footnotes structure to include the reference + the list of footnotes, which made the misnumbering go away on my tests.
- Fix makefile distclean to cleanup all the generated files and corrected the names of the installed sample program man pages to end in .1 (Mark Pizzolato mark@infocomm.com)
- Change the mail demangler to a debian-specific ‘always mangle one way’ hack. (enabled with the configure.sh option –debian-glitch)
- Add –with-unmangled-email compile-time flag to disable mailto: mangling
- Allow the magic output filename -, which means send output to stdout instead of to a file.
- Fix a bug where autolink + github flavored markdown absorbs the ^C eoln character into a link at the end of a line.
- Tweak install.samples so that the user can supply a SAMPLE_PFX on the command line SAMPLE_PFX=discount- make install.samples to install the sample programs with a package-specific prefix.
- Emit pages in utf-8 instead of us-ascii (simply a change to the Content-Type meta) (Nathan Phillip Brink binki@gentoo.org)
- Patch the horrible list handler to support long numeric list items (George Hartzell hartzell@alerce.com)
- Various bugfixes (Masayoshi Sekimura sekimura@gmail.com)
- Fix support for CFLAGS=-m32 ./configure.sh by using CFLAGS for all build invokations of CC. (Nathan Phillip Brink binki@gentoo.org)
- Github-style language attributes on fenced code blocks (Loren Segal lsegal@amazon.com)
- When defining WORD & DWORD, check first for the MS Windows WinDef.h file; if found, include it instead of defining WORD & DWORD ourselves.
- support url-encoded anchor links with –with-urlencoded-anchor option (Daisuke Murase typester@cpan.org)
version 2.1.6 does nothing except for some bugfixes (and ignores some particularly scary ones that I /must/ fix soon) and adds two small features.

The bugfixes are:
1. A < at the end of the input is exactly the same as \<(space)
2. Markdown.pl does not appear to escape \<[nonwhite] sequences. Sigh.
3. Tweak the previous Markdown does not escape... commit to simply push out the backslash and back up to the start of the <[nonwhite] sequence, so -fnohtml will continue to work.
4. Treat hard <br/> (via two spaces) as whitespace.
5. Tweak divquote handling so that two adjacent divquotes won’t die if there is a space between the second > & leading %
6. Tweak one of the list tests back to the previous behavior (I’ve put in a hack for list indentation, and accidentally committed the changes. Oops!)
The features are that I now use styles for table cell alignment instead of align=, and that I’m using the 3-clause BSD license for this release (because there is one widely used closed-source license that claims that you can’t dynamically link with code that uses the 4-clause license. Fine. I’ll 3-clause this release to make the stupid GPL happy.)
version 2.1.5a does even more cleanup to deal with clang, plus adds a few small features:
- MKD_NOSTYLE – treat <style> blocks as regular html.
- some github flavored markdown support; gfm_…() input methods that put hardbreaks (== two spaces) at the end of every input line.
- support for github flavored markdown backtick-delimited code blocks (in addition to tilde-delimited codeblocks)
- in the markdown program, add
  1. -S flag (tell markdown to spit out style sections)
  2. -n flag (tell markdown not to output generated text)
version 2.1.3 cleans up a couple of bugs that have been here for a while, plus tweaks the build process a bit for better compilation with the LLVM C compiler, mingw, and cygwin.

The bugfixes are
1. Stop tripping over tables with leading |s; the first implementation of tables treated leading |s as the end of an empty table cell. 2.1.3 corrects this to properly be merely decoration.
  
  As a side-effect, you now need to have all the rows of the table either with a leading | or not; you cannot mix and match them, sorry.
2. For some mysterious reason I was treating the <br> tag as a block-level html tag, so my html blockifier would split paragraphs with explicit <br>s in them.
3. The table of contents code was apparently generating bad html (something I never noticed, because I never use that feature, alas!) but Stefano D'Angelo contributed a patch to clean up the generated html to make it correct.
version 2.1.2 tweaks table handling so that tables with leading |’s won’t end up generating empty false <td></td>s, and that tables with trailing |’s won’t end up getting those pipes included in the output.
version 2.1.1.3 corrects a defect in smartypants processing that has been there since the beginning of time; I’d managed to misread the reference smartypants documentation and thought that 1 dash made an – and 2 dashes made an — (it’s actually 2 and 3 dashes respectively.) John Foerch read the documentation recently, noticed it was wrong, and sent me a note pointing it out. Whoops! But it’s fixed now (as is the this page, which is, regrettably, the only documentation about smartypants.)
version 2.1.1.2 corrects one small defect in block handling, plus changes the format of the output for failed tests in the test suite.

The defect in block handling is that the reference implementation has text block absorbing adjacent code, so

~~~ text text text code ~~~

will generate

~~~
text text text code
~~~

instead of a paragraph of text(s) followed by code.

The change in failed test output makes it output first the source code of the failed test, and then the differences between the expected output and the generated output.
version 2.1.1.1 implements PHP markdown extra-style fenced code sections, where your chunks of code are surrounded by ~~~ lines instead of being indented 4 spaces.

Fenced code sections are a configuration option, not a runtime option, and are enabled by using the configure.sh flag --with-fenced-code (FENCED-CODE in the version string).

There are a few optimizations inside markdown.c to support fenced code detection without slowing the parser down, but those optimizations don’t cause any of my test cases to fail. Version 2.1.1 is still a fairly experimental release despite this, so take some care with it.
version 2.1.0 cleans up a small collection of warts and build bugs, plus adds some fairly small enhancements to the theme and makepage programs:
- more modifications to configure.sh so that it generates all its test scripts in the build directory.
- more MacOS support in configure.sh; check to see if .dSYM folders are created when a test script is compiled, and, if so, don’t forget to delete them when they’re cleaned up.
- makepage now accepts markdown option flags a'la the markdown program (via -Fxxxx, -fname, or in the MARKDOWN_FLAGS environment variable.)
- strip bitfields out of opts[] – I can’t initialize a bitfield on plan9 cc.
- add a -E flag to theme to ignore context-sensitivity on <?theme xxx?> substitutions.

Archived releases

older versions of the code are still available.

Trivia

This document is generated from markdown source.
I’ve got a public mirror of my sccs repository on github.

home
sitemap
colophon
/~orc/Code/discount/index.text
Fri Mar 29 23:44:01 2024

Mastodon