redoc Design and Internals

This document describes the general approach and design of redoc for developers interested in contributing.

Two-way R Markdown workflows are challenging because R Markdown and knitr workflows are lossy - the compiled document does not contain all of the information in the source. Also, we are limited by information that can be passed via pandoc from markdown to final formats and in reverse.

Compiling the document: General Approach

To produced a Reversible Reproducible Document in Word (a “redoc”), the redoc() format first pre-parses the source .Rmd file. knitr doesn’t expose its parser to developers, so I’ve lifted most of the code for this parser from knitr and rmarkdown. The parser captures YAML headers, code chunks, and inline code, giving names to unnamed chunks and inline code sections and wrapping them in named <div> and <span> tags with unique id values and the class "redoc". The contents of those sections are stored in a file called filename.codelist.yml.

redoc() then knits the .Rmd file. Code output is wrapped within the same <span> and <div> tags as the original chunks.

When the knitted document is converted to a .docx by pandoc, redoc passes it through a series of pandoc lua filters (found in inst/lua-filters). These do three things:

Converts sections with <span> and <div> tags of class redoc to hidden custom styles with names corresponding to their unique IDs so that they are retained in the Word document.
Inserts hidden text in the place of code sections that have no output.
Converts CriticMarkup syntax to Word tracked-changes format.

Then using rmarkdown::output_format()’s post_processor argument and functions from officer, the original .Rmd and the codelist.yml file are stored in the Word document. As .docx files are just ZIP archives, this is straightforward, except that some metadata must be added to ensure Word preserves these files when editing.

If the option diagnostics=TRUE is set, information about the R session and current software versions is also stored in the Word document for later debugging.

If highlight_output=TRUE is set, the post-processor also modifies all Word document styles to color the redoc-class sections.

De-Rendering Documents

When the dedoc() function is run, it extracts the *.codelist.yml file from the .docx file.

Then pandoc is used to convert the docx back to markdown. A custom lua filter converts any track-changes text to Critic Markup, and another lua filter replaces any elements with the custom redoc styles with placeholders of the form [[chunk-id]]. dedoc() then uses the data in the *.chunks.yml file to replace these placeholders with original chunk (or inline code). In the event that chunk output has been deleted or modified beyond recognition, redoc tries to be smart about its placement, placing it near its original location. Depending on the policies selected via dedoc()’s block_missing or inline_missing arguments, the restored code may be wrapped in an HTML comment or not restored at all.

Customizing and developing with redoc

redoc() is based on rmarkdown::word_document(), and can similarly be extended.

The simplest form of extension is defining additional parts of the document to be wrapped and stored in the *.codelist.yml file. These are defined in as a list of functions in the wrappers argument of redoc(). Each function captures a type of code, and by default these are R chunks and inline code, HTML comments, YAML blocks, some LaTeX, pandoc-style citations, and pandoc raw spans and blocks.

You can capture other types of code by adding additional functions, which are detailed in the ?wrappers documentation. If the code is simple enough to be captured with a regular expression, these functions can be generated with with make_wrapper().

When building additional formats based on redoc(), it is important to use the base_format option of rmarkdown::output_format(). rmarkdown will then merge the post_processor functions of redoc() and your format so that redoc()'s runs after your custom post-processor.

Future versions of redoc() will include a reversible version of officedown::rdocx_document().