Monoids are everywhere

Semigroups are those structures that have an associative “combine” operation.

The simplest example to understand are numbers (integers) wrt addition.

1 + 2 + 3 = (1 + 2) + 3 = 1 + (2 + 3)

Note that this definition does not require the definition of a 0 element.

If we add the requirement that there must be some 0 element.

i.e. a + 0 = 0 + a = a

then such a structure is called a monoiod.

Semigroups are everywhere and so are monoids.

ℕ¹ \ {0} i.e. ℕ₁ is a semigroup under +, but not a monoid
ℕ with 0 i.e. ℕ₀ is a semigroup and a monoid under +
The set of positive integers ℤ⁺ is a semigroup under +, but not a monoid
ℤ = ℤ⁺ ⋃ {0} is a monoid and a semigroup under +
ℕ₁ is a monoid and a semigroup under ×
ℕ₁ - {1} is a semigroup but not a monoid under ×

All this jargon seems meaningless in the context of imperative programming but is relevant in the world of functional programming.

The monoid and semigroup abstractions are actually useful to provide as a constraint for the input. In haskell and scala its usually done using typeclasses and constraints supplied in the type parameter.

// Scala

def combine[A: Monoid]

def sort[A: Ordering]

// haskell

qsort :: (Ord a) => [a] -> [a]

This might still seem arcane but these are other common monoids

Strings with concatenation, where “” is the 0 element.
text files with concatenation, where the empty file can be considered the 0 element.
Lists of anything, where the empty list is the 0 element and list concat is the operation.
markdown files.

Segueing into markdown, I had a file in wikimedia format which needed to be moved into github markdown format, due to which I discovered this tool called pandoc - courtesy some llm chatting. Coincidentally pandoc is written in haskell and there’s a running joke that this is the only piece of useful software written in haskell.

pandoc helped me convert some really intricate sections into github markdown (although some tables gave it trouble and it was likely that I did not supply the formats correctly).

The .md file in question ² was easier to edit and correct if it was in smaller chunks. So refactored the original wikimedia file into a bunch of .md file into sections ³ and added links to the section in ⁴. However I noticed that browsing these files in github was a pain - you lose context when you navigate back and forth between links. What was easy to write and edit was not easy to edit. So I decided to regenerate the original monolithic content using pandoc. I created a list of files ⁵ and then supplied the list of files as input to pandoc:

let files = cat list-of-files.txt | grep -v "#" | xargs // In nushell 
pandoc $files -o Eke.md

However I ran into a limit on the number of inputs on mac.

pandoc: .. openBinaryFile: invalid argument (File name too long)

One of the fixes for this:

cat list-of-files | xargs pandoc -o Eke.md

just works since the act of joining md files is an associative operation and hence forms a semigroup. So doing the operation with a list of inputs vs doing it one file at a time yields the same result.

See ⁶ for the actual command.

At the end, I have the single .md file which contains all the sections instead of having to navigate back and forth. To see the differences, see ² and ⁴ on your browser and scroll down.

References: