Word to Markdown using Pandoc


Markdown has become the de-facto standard for writing software documentation. This post discusses converting Word documents to Markdown using Pandoc.

markdown.png

If you haven’t already, install Pandoc. Word documents need to be in the docx format. Legacy binary doc files are not supported by Pandoc.

Pandoc supports several flavors of Markdown (md) such as the popular GitHub flavored Markdown (GFM). To produce a standalone GFM document from docx, run

pandoc -t gfm --extract-media . -o file.md file.docx

The --extract-media option tells Pandoc to extract media to a ./media folder. All embedded media in Markdown links to files in that folder.

The generation of Markdown document is the first step. If you’re happy with the output, you can stop here, but I discuss additional changes that can make the document easier to maintain, and read using HTML renderers such as GitHub’s markup.

Markdown Editor

You’ll need a text editor to edit a md file. I use Visual Studio Code (Code) which has built-in support for editing and previewing Markdown files. I use a few additional plugins to make editing Markdown files more productive

Tables

Pandoc will render tables whose cells have a single (wrapped or not) line of text using the pipe table syntax. Column text alignment is not rendered, you’ll have to add that back manually. Tables whose cells have complex data such as lists are rendered in the HTML table syntax. It is not unusual for tables with complex layouts such as merged cells to be missing columns. Review all tables carefully. I suggest simplifying complex tables in the original docx before conversion.

Both formats are relatively easy to edit by hand, but Markdown editors such as Typora provide support for visually editing tables. Typora does not support HTML tables, at the moment. It is also does not handle large document (having several hundred pages) well.

Table of Contents

Pandora dumps the table of contents (TOC) of the original docx a line per topic. I suggest eliminating that TOC and generating a hyperlinked TOC using the capabilities of Markdown TOC plugin of Code.

The plugin can also add/update or remove section numbering. If you have cross-references in the Word document using section numbers, this will, at least for the moment, give you a consistent document. In the long term, I suggest avoiding section numbers, and substituting textual cross-references with intra-document hyperlinks. See TOC generated by Markdown TOC to see intra-document hyperlinking in action.

Images

Images are exported in their native format and size. They are inserted in the document using the ![caption](path) GFM syntax, or the img tag within HTML tables. Image size cannot be customized in GFM syntax, hence you may need to resize images to get a consistent size.

Diagrams

Pandocs is unable to render any diagrams created using figures and shapes available in Word. You’ll need to recreate those by screen grabbing the output rendered by Word. You can also use mermaid.js syntax to create diagrams such as flowcharts and message sequence charts, and embed them in the Markdown document.

mermaid.js.png

GitHub doesn’t render mermaid diagrams, but Code is able to render them with the help of the Mermaid Preview (and similar) plugins.

Render PDF

To render a PDF using Pandoc

pandoc file.md -f gfm -o file.pdf

If you want to render HTML instead, change extension of file.pdf from pdf to html

pandoc file.md -f gfm -o file.html

Regular Expressions

Using regular expressions will significantly speed up your ability to do bulk search and replace operations. Use simple find and replace when that works.

Some useful regular expressions

#+\s*$     search empty headings
\s+$       search lines with trailing spaces
\b\s\s+\b  search repeated space between words
\|.*\|     search through all rows of pipe tables
section\s+(?!(\d+\.*\d*?){1,})
           search for cross-references starting with section but missing section number
Advertisements

.NET Core class library solution from scratch


This post documents using the dotnet command to create a class library solution from scratch. The solution builds a class library project, and a MS unit test project that tests the class library.

To create an empty solution called MySolution.sln

dotnet new sln [--force] -n MySolution

sln is just one of several templates supported by the command. To see a list, try dotnet new -l. Additional templates can be installed using dotnet new --install e.g. AvaloniaUI.

To create a new class library project

dotnet new classlib [--force] -n MyLibrary

This creates a folder called MyLibrary and a MyLibrary.csproj file in it. Any C# files in the MyLibrary folder will be compiled during build.

If MyLibrary exists, use --force to replace the exiting project file.

If your project has an AssemblyInfo.cs that contains assembly attributes, you can edit project file to exclude autogeneration of assembly attributes

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFramework>netstandard2.0</TargetFramework>
    <GenerateAssemblyInfo>false</GenerateAssemblyInfo>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.CSharp" Version="4.4.0" />
  </ItemGroup>

</Project>

Otherwise, you’ll get errors such as

obj/Debug/netcoreapp2.0/MyLibrary.AssemblyInfo.cs(10,12): error CS0579: Duplicate 'System.Reflection.AssemblyCompanyAttribute' attribute ...

Also, note the use of Microsoft.CSharp package in the project file. That is required to use C# language features such as dynamic. Without it, you’ll get an error such as

MyClass.cs(177,50): error CS0656: Missing compiler required member 'Microsoft.CSharp.RuntimeBinder.CSharpArgumentInfo.Create'

To add package reference, head into the MyLibrary project folder and run

dotnet add MyLibrary.csproj package Microsoft.CSharp

Then, run the following to restore package(s) from nuget

dotnet restore

Head over to the solution folder. To add the class library project to the solution, and build the solution

dotnet sln [MySolution.sln] add MyLibrary/MyLibrary.csproj
dotnet build

Specifying solution name is optional if you’ve got just one solution file in a folder.

To add a new MS unit test project

dotnet new mstest [--force] -n MyLibraryTest

Head into MyLibraryTest and add a reference to MyLibrary and package references

dotnet add MyLibraryTest.csproj reference ../MyLibrary/MyLibrary.csproj
dotnet add MyLibraryTest.csproj package Microsoft.CSharp
dotnet restore

Head over to the solution folder, build, and run unit tests

dotnet build
dotnet test MyLibraryTest

That wraps up the basic usage of dotnet to create and maintain a simple .NET Core class library project.

Highlighting problems in Lua dissectors


Here’s a snippet of code from nordic_ble dissector that shows how you can highlight problems in Lua dissectors using add_expert_info

        local item  = tree:add_le(hf_nordic_ble_micok, tvb(UART_PACKET_FLAGS_INDEX, 1), micok > 0)
        if micok == 0 then
            -- MIC is bad
            item:add_expert_info(PI_CHECKSUM, PI_WARN, "MIC is bad")
            item:add_expert_info(PI_UNDECODED, PI_WARN, "Decryption failed (wrong key?)")

NOTE I recommend using add_proto_expert_info because add_expert_info is now deprecated.

GRASP SOLID for effective object-oriented programming


Objects are responsible for their state and behavior. Assigning responsibilities to objects effectively makes maintenance of a program less cumbersome.

This post summarizes the GRASP patterns and SOLID principles. They may be thought as the principles and patterns underlying the design patterns described in Design Patterns: Elements of Reusable Object-Oriented Software.

GRASP Patterns

General Responsibility Assignment Principles or GRASP patterns were popularized by Craig Larman in his book Applying UML and Patterns. They help with identifying objects required by a program, and their responsibilities.

Creator

Assign responsibility of creation to object that contains, or has the information required to create, a given object.

Controller

Assign responsibility to object representing a module’s facade, or a handler of a use case. Beware of a fat controller.

High Cohesion (HC)

Assign responsibility to object with closely related state and behavior. Don’t Repeat Yourself (DRY) rule helps maintain HC.

Indirection

Assign responsibility to an intermediary object so that coupling is low.

Information Expert (IE)

Assign responsibility to object that has related information.

Low Coupling (LC)

Creation, inheritance, type reference, and message passing, all result in coupling. Assign responsibility such that coupling is low.

Polymorphism

Assign behavior to subclass when related behavior varies by type. Prefer aggregation and composition to inheritance.

Protected Variations (PV)

Assign responsibility to new object that provides a stable interface around known instabilities.

Pure Fabrication (PF)

Assign responsibility to a new object not derived from the domain to ensure LC and HC.

SOLID Principles

SOLID are more generalized principles popularized by Robert Martin aka Uncle Bob in his book Agile Software Development: Principles, Patterns, and Practices.

Single-Responsibility Principle (SRP)

A class or module does one thing well. See HC.

Open-Closed Principle (OCP)

A class, module, or function, should be closed for modification but open for extension.

Liskov Substitution Principle (LSP)

Subclasses must be substitutable by their base (super) classes.

Interface Segregation Principle (ISP)

Avoid situation where clients depend on a fat interface. Changes to fat interface due to any client will affect all the other clients.

Dependency Inversion Principle (DIP)

Prevent modules in higher (more abstract) layers of an architecture from being impacted by changes in modules in lower layers. Abstractions should not depend on concrete implementations.

SSL/TLS decryption in Wireshark


Wireshark’s dissector for SSL is able to decrypt SSL/TLS, given the private key in PFX/P12 or PEM format. If you want to figure out whether you’re using the right private key, you can derive the public key from it, and compare its modulus with the first certificate in the chain of certificates sent in the SERVER HELLO.

$ openssl rsa -text -in key.pem -pubout
Private-Key: (2048 bit)
modulus:
00:97:c6:a5:01:d6:36:b3:25:fa:83:9c:93:75:dd:
bb:dc:f6:ef:78:b8:b5:cc:20:1c:35:9a:ba:3d:8d:
d3:94:9b:b0:b2:6c:e7:79:83:3c:07:37:1f:8f:e5:
02:f8:f4:ac:9b:7c:1a:b6:74:6f:73:f5:57:34:30:
5b:32:5a:3b:ba:bd:65:dc:cc:98:30:13:01:fb:0b:
3c:f3:e3:6c:da:9b:3d:47:1f:5f:c3:12:a2:4f:21:
dc:cc:39:90:9d:83:05:b3:06:40:d3:62:25:fe:8b:
e9:1e:ca:a2:d8:0f:9d:cd:84:10:62:15:0e:f3:ab:
cb:d6:fc:92:cf:ff:04:75:17:c6:c7:2d:d6:05:c6:
c1:ce:4e:77:c4:fc:fc:c5:ff:37:4f:83:bb:93:f9:
0f:2f:06:70:6a:55:37:e5:6f:0c:92:5e:14:99:0d:
87:2a:e6:d4:30:f9:de:fb:b5:c6:5e:e8:f5:98:5e:
19:4b:8f:53:8a:e5:f1:87:7b:69:99:4d:a0:55:02:
a0:57:5d:bf:ca:0b:84:8c:23:ed:f6:e5:7a:97:4b:
3e:3f:bb:38:29:0e:11:28:53:6d:d4:d8:69:88:5f:
2d:23:28:e6:43:97:e0:51:db:e8:a8:c7:c5:9f:c3:
9d:11:48:d3:51:8c:5f:ba:ab:c0:60:30:26:e2:c9:
54:8b

wireshark-tls-modulus.png

Export private key in pfx or p12 file to pem format


The following openssl command can be used to export private key in a pfx or p12 file to pem

openssl pkcs12 -nodes -in file.pfx -out key.pem -nocerts

If you need the public key for the private key in key.pem

openssl rsa -in key.pem -out key.pub -pubout

If you need information on the public key (modulus, exponent…)

openssl rsa -in key.pem -pubout -text

OR

openssl rsa -pubin -in key.pub -text