Ch 3 — Phases 19–15 — Deep TOC, Markdown Structure, OCR & Eke Pipeline

← Ch 2 · Contents · Ch 4 →

Chapters: Ch 1 · Ch 2 · Ch 3 · Ch 4 · Ch 5

Phases 19–15 — Deep TOC, Markdown Structure, OCR & Eke Pipeline

2026-03-19 to 2026-03-20

Phase 19 — Deep 3-Level TOC + Section Anchors (2026-03-20)

Scope: All 11 books with kn.md (02, 03, 07-vol1, 07-vol2, 08, 14, 17, 25, 27, 28, 29).

Changes:

Added <a id="sec-N-M"> anchors at all section headings and <a id="sub-N-M-K"> at all subsection headings in every kn.md
ಪರಿವಿಡಿ TOC in each kn.md extended to list all three levels (chapter → section → subsection)
Cross-links inserted after every sec/sub anchor: [Eke →](./SLUG-kn-eke#sec-N-M) in kn.md; [ಕನ್ನಡ →](./SLUG-kn#sec-N-M) in kn-eke.md
Chapter nav fragments corrected to #adhyAya-N throughout
Index back-links added to all kn.md headers: [← ಸೂಚಿ](./README) and [← sUci](./README) in kn-eke.md
kn-eke.md self-referential header links corrected (were pointing to wrong file)

Phase 18 — docs/ sync + Noto Sans Kannada + ettuge-sync skill (2026-03-20)

Root cause fixed: All 57 docs/dnsbhat/ files were stale — Phase 17 OCR cleanup and kn.md changes went to src/ but were never copied to docs/ (the GitHub Pages source). This caused garbled rendering of books 25, 15, and all other Phase 17-touched books.

Changes:

docs/dnsbhat/ — synced 57 files from src/main/md/kannada/dnsbhat/ preserving Jekyll nav front matter (title, parent, nav_order) in each file
docs/_sass/custom/custom.scss — added Noto Sans Kannada via Google Fonts for correct nukta (U+0CBC ಼) rendering; previously Georgia/system fonts silently dropped nukta-modified clusters
.claude/skills/ettuge-sync/ — new skill (ettuge-sync) automates the full post-phase sync pipeline: staleness detection → CLAUDE.md updates → claude-prompt updates → docs/ sync → global skill copy → regenerate combined docs files → commit and push
.claude/skills/ettuge-sync/scripts/sync_docs.py — standalone script for src→docs sync (Step 4a of ettuge-sync skill)

Phase 17 — Nudi Encoding Cleanup, u’ → u^, TOC Restructure, Citation Quote Convention (2026-03-18–19)

Multi-part phase completing Nudi/WX glyph-map artifact cleanup, fixing the unrounded-u Eke marker, restructuring TOCs, removing residual OCR structural artifacts, and establishing a canonical citation-quote convention for the published site.

Sub-phase A — Nudi character-level cleanup (books 17 and 14)

Books 17 and 14 were typeset in Nudi legacy font. WX-decoding produced Kannada Unicode text but left unmapped Latin glyph-map residuals that required cross-referencing the original PDF.

Book 17 — symbols resolved:

Symbol	U+	→	Replacement	Count	Context
`ù`	00F9	→	ಱ (archaic RA, U+0CB1)	85	vowel-displacement pattern
`Â`	00C2	→	`ᵒ` (modifier letter small o, U+1D52)	24	Havyaka suffix marker
`ï`	00EF	→	್ (virama, U+0CCD)	7	unrounded-u context
`û`	00FB	→	಼ (nukta, U+0CBC)	3	—
`Ð`	00D0	→	direct reconstructions	2	two Tamil loanwords: ಞೆಙ್ಙೋಳ್, ಞಙ್ಙು
`Œ`	0152	→	char + ಼ (nukta)	21	lowered vowels (ಅ಼, ಎ಼, ಒ಼) — pattern A prefix and B infix

Additional compound OCR garbles fixed: ಯುೀ → ಯೇ (75×), ೊೀ → ೋ (37×), ೂೀ → ೋ (3×), ದುು → ದು (1×).

New Eke rules for book 17’s archaic symbols: ಱ → R/Ra, ೞ → Z/Za, ಙ → G/Ga, ಞ → Y/Ya (halant/full akshara); ಼ (nukta) → : suffix (e.g. ಅ಼ → a:); ᵒ → pass through as-is; ಉ್ (unrounded u) → u^.

Book 14 — symbols resolved:

Symbol	U+	Replacement	Count	Context
`«`	00AB	`<`	16	etymological source arrow (§4.6)
`»`	00BB	`>` (4×) / `,` (7×)	11	word-change notation / clause joins
`¢`	00A2	vowel extender	1	`ಮಧ್ಯದಲ್ಲೆ ¢ → ಮಧ್ಯದಲ್ಲೇ`
`£`	00A3	`,`	1	clause join (§12.2)
`©`	00A9	deleted	1	page-break artifact block (§9.1 running header)
`(ಆಕ)`	—	`(೮ಕ)`	1	OCR misread of Kannada digit ೮

Sub-phase B — Eke u’ → u^ fix (2026-03-18)

All 8 existing kn-eke.md files (03, 07-vol1, 07-vol2, 14, 17, 25, 27, 28) regenerated with u^ (caret) for the unrounded-u vowel ಉ್, replacing the earlier u' (apostrophe). Reason: apostrophe caused rendering ambiguity in citation-quote contexts and Markdown processors. Book 27 and 29 re-regenerated again after the fragment cleanup below. Commit: 9a9b8fe.

Sub-phase C — OCR structural artifact removal

Commit	Books	What was removed
`dc21662`	27, 28, 29	Per-page running chapter headers embedded in body text
`66a7c62`	27	Page-break orphaned fragments before section headings
`61d2f36`	29	Page-split sentence fragment rejoined to its paragraph
`5412429`	08	Page-break orphaned fragment lines (3 instances)
`6b072f1`	03	Stray `ಚ` page-break fragment isolated before a section heading
`949ed17`	25	Entire OCR’d anukaraNike (preface) block removed from body (202 lines) — preface had been OCR’d twice, appearing a second time mid-body

Sub-phase D — TOC restructure (all kn.md files)

All books with kn.md now have a clean ಒಳಪಿಡಿ/ಪರಿವಿಡಿ section with <a id> anchors and section-link tables. Books 03 and 27 received new full TOCs in this phase; other books were already clean.

Book	TOC header	Anchor scheme	Count
03	`## ಒಳಪಿಡಿ`	`sec-N-M`, `sec-N-M-P`	100 sections, 3 levels
07	`## ಒಳಪಿಡಿ`	`adhyAya-N`	4 (vol1) + 2 (vol2)
08	`## ಒಳಪಿಡಿ`	mixed	38
14	`## ಒಳಪಿಡಿ`	mixed	164
17	`## ಪರಿವಿಡಿ`	`adhyAya-N`	12
25	`## ಪರಿವಿಡಿ`	`adhyAya-N`	11
27	`## ಒಳಪಿಡಿ`	`part-N`, `sec-N-M`, `sub-N-M-K`	221 (5+32+184)
28	`## ಪರಿವಿಡಿ`	`adhyAya-N`	12
29	`## ಪರಿವಿಡಿ`	`adhyAya-N`	11

Commits: 20bb002 (book 03 — new full 3-level TOC), ad6be57 (book 27 — new 3-tier TOC with 221 anchors).

Sub-phase E — Citation quote convention (books 07, 17, 25, 28)

DNS Bhat’s books were typeset with backtick (U+0060) as typographic open-quote and apostrophe (U+0027 or U+2019) as close. Backtick triggers Markdown code-span rendering on the published site.

Decision: Replace with curly single quotes 'word' (U+2018 open / U+2019 close) — the convention already used natively in books 03 and 27 (Sarvam OCR output). The vowel-modification marker u^ (unrounded-u in Eke) is explicitly not a citation quote and is left unchanged.

Close-char per book: U+0027 (ASCII apostrophe) for books 07, 25, 28; U+2019 (right single quotation mark) for book 17.

Implementation: retrieved HEAD^:{path} via git to get pre-intermediate-commit state, then applied a DOTALL regex (\CONTENT’ → ‘CONTENT’, max 300-char span to handle page-break-split citations) with a double-backtick pass first (``CONTENT’’ → ‘CONTENT’` for direct speech). Orphaned opens/closes handled case-by-case.

Book (file pair)	Quotes converted	Notable edge cases
07 vol1 kn + eke	~400	OCR fix `ನವi್ಮಲ್ಲಿ → ನಮ್ಮಲ್ಲಿ` (+ Eke `navaimalli → nammalli`); double-citation-mark display `('')` → `('')`; 1 orphaned-open vocab gloss
07 vol2 kn + eke	~300	1 orphaned open (parallel entry); 1 nested outer backtick; 2 isolated OCR fragment orphans (backtick removed); 1 orphaned close
17 kn + eke	15	4 list-gloss items with OCR-dropped close; 1 bibliography backtick before garbled English title (backtick removed)
25 kn + eke	4	4 double-backtick direct-speech citations; 0 residual backticks after regex
28 kn + eke	~30	3 translation glosses with OCR-dropped close; 1 number-structure example

Commits: 500a296 (intermediate ^..^ convention — superseded), 971e918 (final curly single quotes — 10 files across 5 books).

All Nudi Latin artifacts (0x80–0xFF) now cleared across all kn.md files except © (genuine copyright symbol, preserved in books 03 and 27).

Phase 16 — Cross-Link Audit + Nav Transformation Fix (2026-03-17)

Motivation: After adding cross-links to kn.md files in prior phases, two systemic issues remained:

kn.md cross-links used wrong label ([ingliS →] — Eke romanisation of “English” — instead of [English →])
gen_kn_eke.py passed [English →] | [Eke →] nav lines through verbatim, so regenerated kn-eke.md files had self-referential [Eke →] links pointing at themselves
02-kn.md had zero cross-links (the user reported #ch2 had no navigation to English or Eke)

Audit of all kn.md files for cross-links:

Book	[English →] links	[ingliS →] links	Status
02	0	0	❌ Missing — added 60
03	9 (1/chapter)	0	✅
07 vol1	4 (1/chapter)	0	✅
07 vol2	2 (1/chapter)	0	✅
08	38 (1/section)	0	✅
14	0	82	❌ Wrong label — renamed to [English →]
17	12	0	✅
25	11	0	✅
27	5	0	✅
28	12 (1/chapter)	0	✅
29	11 (1/chapter)	0	✅

Fix 1 — Book 14 kn.md: rename [ingliS →] → [English →] (82 occurrences; kn-eke.md already correct, not regenerated)

Fix 2 — gen_kn_eke.py: proper nav-link transformation

Previously: [English →](en) | [Eke →](kn-eke) was passed through verbatim into kn-eke.md — creating self-referential Eke links.

Now: when generating kn-eke.md, these lines are transformed to the correct perspective:

[English →](./book-en#en-anchor) | [Eke →](./book-kn-eke#sec-id)
  ↓  (in kn-eke.md)
[ಕನ್ನಡ →](./book-kn#sec-id) | [English →](./book-en#en-anchor)

The kn URL is derived by stripping -eke from the Eke filename in the [Eke →] link.

Fix 3 — Book 02 kn.md: 60 cross-links added (every chapter + section anchor)

Anchor-to-English-anchor mapping (30 unique chapters/sections):

ch1, sec-1-[1-3] → part-1--philosophy-and-core-principles
ch2, sec-2-[1-3] → part-2--framework-overview
ch3, sec-3-[1-2] → part-3--adjective-to-noun--ತನ
ch4, sec-4-[1-6] → parts-45--verb-to-noun
ch6, sec-6-1 → part-6--zero-derivation
ch7, sec-7-[1-3] → part-7--noun-to-noun
ch8-ch11, ch13-ch14, ch18-ch19, ch29-36, ch37-52 (and their sections) → most specific en.md anchor

Regenerations:

File	Old lines	New lines	Change
`02-...-kn-eke.md`	491 (no nav)	611 (with nav)	+60 nav links; correct `[ಕನ್ನಡ →]` format
`07-...-vol1-kn-eke.md`	20,183	20,183	Nav fixed: `[English →]\\|[Eke →]` → `[ಕನ್ನಡ →]\\|[English →]`
`07-...-vol2-kn-eke.md`	13,331	13,331	Same nav fix

Verbatim content audit (all kn-eke.md files): All 11 books confirmed verbatim — non-empty line counts match kn.md exactly.

Commit: fix(02,14): add kn.md cross-links, fix ingliS→English, fix kn-eke nav transformation

Phase 15 — Holistic kn-eke.md Audit + Nav Fix + Stale-Eke Regeneration (2026-03-17)

Motivation: After Phase 14, a cross-book audit revealed two systemic issues that had been fixed one book at a time in prior commits, and two that hadn’t been fixed at all.

Issue 1 — Nav link hygiene (fixed holistically in commit 4964158)

All kn-eke.md files had inconsistent nav-link labels. Patterns found and corrected:

Old pattern	Correct	Books affected
`[ಕನ್ನಡ →]` (hybrid Eke in Kannada label)	`[ಕನ್ನಡ →]`	02, 07, 14, 18, 27, 29
`[ingliS →]` (Eke romanisation of “English”)	`[English →]`	02, 14
`[English →] \\| [Eke →](kn-eke#...)` (self-referential)	`[ಕನ್ನಡ →](kn#adhyAya-N) \\| [English →](en#...)`	03, 17, 25, 28

Total: 12 files, 18,746 insertions across the single holistic commit.

Issue 2 — Book 07 OCR page headers/footers (fixed in commit 98c2c7e)

After Phase 14 cleaned vol1-kn.md and vol2-kn.md, the corresponding kn-eke.md files were still stale — generated from the uncleaned source. Transliterated page headers remained:

File	Lines before	Lines after	Pattern removed
`vol1-kn.md`	20,475	20,185	`N / kannaDa barahada sollarime`, garbled M¼À
`vol2-kn.md`	13,928	13,333	copyright line, `N / kannaDa barahada sollarime`, chapter headers

Issue 3 — Book 07 kn-eke.md files stale after OCR cleanup (fixed in this phase)

The vol1-kn-eke.md (20,473 lines) and vol2-kn-eke.md (13,929 lines) were regenerated from the Phase 14 uncleaned kn.md — before the header/footer removal. After removing those artifacts from kn.md, the kn-eke.md files still contained their transliterated equivalents:

4 / kannaDa barahada sollarime — page headers from left-page running headers
Copyright line in Eke form
Section separators from chapter titles printed at top of print pages

Fix: Regenerate both from the cleaned kn.md using gen_kn_eke.py.

Issue 4 — Book 02 kn-eke.md was hand-authored summaries, not verbatim Eke (fixed in this phase)

The earliest kn-eke.md in the collection (book 02, Kannadalle Hosapadagalannu Kattuva Bage) was written manually as a companion document with explanatory Eke text — not a verbatim transliteration of kn.md. At sections like sec-4-4, the kn-eke.md had analytical explanation (“esaka padakkE -ka oTTannu sErisi upakaraNavannu hesarisuvA…”) while kn.md had verbatim Kannada word lists and body text. The file was 835 lines vs kn.md’s 553 lines (52% larger — expanded by hand-authored explanations).

Fix: Regenerate from kn.md using gen_kn_eke.py, replacing hand-authored content with verbatim Eke.

Regenerations in this phase (all via gen_kn_eke.py, 0 residual Kannada chars):

File	Old lines	New lines	Source	Reduction
`02-...-kn-eke.md`	835 (hand-authored)	491 (verbatim)	`02-...-kn.md` (553L)	−344 (removed summaries)
`07-...-vol1-kn-eke.md`	20,473 (stale)	20,183 (clean)	`07-...-vol1-kn.md` (20,185L)	−290 (removed page headers)
`07-...-vol2-kn-eke.md`	13,929 (stale)	13,331 (clean)	`07-...-vol2-kn.md` (13,333L)	−598 (removed page headers/footers)

Known residual: 07-...-vol1-kn.md line 11206 has (4) M¼À: — a garbled WX-encoded list entry (1 occurrence). Requires original PDF to determine correct Kannada. All other character-level cleanup is complete.

Commit: fix(02,07): regenerate kn-eke.md verbatim — drop hand-authored summaries and stale page headers