← Ch 2  ·  Contents  ·  Ch 4 →

Chapters: Ch 1 · Ch 2 · Ch 3 · Ch 4 · Ch 5

Phases 19–15 — Deep TOC, Markdown Structure, OCR & Eke Pipeline

2026-03-19 to 2026-03-20


Phase 19 — Deep 3-Level TOC + Section Anchors (2026-03-20)

Scope: All 11 books with kn.md (02, 03, 07-vol1, 07-vol2, 08, 14, 17, 25, 27, 28, 29).

Changes:

  • Added <a id="sec-N-M"> anchors at all section headings and <a id="sub-N-M-K"> at all subsection headings in every kn.md
  • ಪರಿವಿಡಿ TOC in each kn.md extended to list all three levels (chapter → section → subsection)
  • Cross-links inserted after every sec/sub anchor: [Eke →](./SLUG-kn-eke#sec-N-M) in kn.md; [ಕನ್ನಡ →](./SLUG-kn#sec-N-M) in kn-eke.md
  • Chapter nav fragments corrected to #adhyAya-N throughout
  • Index back-links added to all kn.md headers: [← ಸೂಚಿ](./README) and [← sUci](./README) in kn-eke.md
  • kn-eke.md self-referential header links corrected (were pointing to wrong file)


Phase 18 — docs/ sync + Noto Sans Kannada + ettuge-sync skill (2026-03-20)

Root cause fixed: All 57 docs/dnsbhat/ files were stale — Phase 17 OCR cleanup and kn.md changes went to src/ but were never copied to docs/ (the GitHub Pages source). This caused garbled rendering of books 25, 15, and all other Phase 17-touched books.

Changes:

  • docs/dnsbhat/ — synced 57 files from src/main/md/kannada/dnsbhat/ preserving Jekyll nav front matter (title, parent, nav_order) in each file
  • docs/_sass/custom/custom.scss — added Noto Sans Kannada via Google Fonts for correct nukta (U+0CBC ಼) rendering; previously Georgia/system fonts silently dropped nukta-modified clusters
  • .claude/skills/ettuge-sync/ — new skill (ettuge-sync) automates the full post-phase sync pipeline: staleness detection → CLAUDE.md updates → claude-prompt updates → docs/ sync → global skill copy → regenerate combined docs files → commit and push
  • .claude/skills/ettuge-sync/scripts/sync_docs.py — standalone script for src→docs sync (Step 4a of ettuge-sync skill)


Phase 17 — Nudi Encoding Cleanup, u’ → u^, TOC Restructure, Citation Quote Convention (2026-03-18–19)

Multi-part phase completing Nudi/WX glyph-map artifact cleanup, fixing the unrounded-u Eke marker, restructuring TOCs, removing residual OCR structural artifacts, and establishing a canonical citation-quote convention for the published site.

Sub-phase A — Nudi character-level cleanup (books 17 and 14)

Books 17 and 14 were typeset in Nudi legacy font. WX-decoding produced Kannada Unicode text but left unmapped Latin glyph-map residuals that required cross-referencing the original PDF.

Book 17 — symbols resolved:

Symbol U+ Replacement Count Context
ù 00F9 ಱ (archaic RA, U+0CB1) 85 vowel-displacement pattern
 00C2 (modifier letter small o, U+1D52) 24 Havyaka suffix marker
ï 00EF ್ (virama, U+0CCD) 7 unrounded-u context
û 00FB ಼ (nukta, U+0CBC) 3
Ð 00D0 direct reconstructions 2 two Tamil loanwords: ಞೆಙ್ಙೋಳ್, ಞಙ್ಙು
Œ 0152 char + ಼ (nukta) 21 lowered vowels (ಅ಼, ಎ಼, ಒ಼) — pattern A prefix and B infix

Additional compound OCR garbles fixed: ಯುೀ → ಯೇ (75×), ೊೀ → ೋ (37×), ೂೀ → ೋ (3×), ದುು → ದು (1×).

New Eke rules for book 17’s archaic symbols: ಱ → R/Ra, ೞ → Z/Za, ಙ → G/Ga, ಞ → Y/Ya (halant/full akshara); ಼ (nukta) → : suffix (e.g. ಅ಼ → a:); → pass through as-is; ಉ್ (unrounded u) → u^.

Book 14 — symbols resolved:

Symbol U+ Replacement Count Context
« 00AB < 16 etymological source arrow (§4.6)
» 00BB > (4×) / , (7×) 11 word-change notation / clause joins
¢ 00A2 vowel extender 1 ಮಧ್ಯದಲ್ಲೆ ¢ → ಮಧ್ಯದಲ್ಲೇ
£ 00A3 , 1 clause join (§12.2)
© 00A9 deleted 1 page-break artifact block (§9.1 running header)
(ಆಕ) (೮ಕ) 1 OCR misread of Kannada digit ೮

Sub-phase B — Eke u’ → u^ fix (2026-03-18)

All 8 existing kn-eke.md files (03, 07-vol1, 07-vol2, 14, 17, 25, 27, 28) regenerated with u^ (caret) for the unrounded-u vowel ಉ್, replacing the earlier u' (apostrophe). Reason: apostrophe caused rendering ambiguity in citation-quote contexts and Markdown processors. Book 27 and 29 re-regenerated again after the fragment cleanup below. Commit: 9a9b8fe.

Sub-phase C — OCR structural artifact removal

Commit Books What was removed
dc21662 27, 28, 29 Per-page running chapter headers embedded in body text
66a7c62 27 Page-break orphaned fragments before section headings
61d2f36 29 Page-split sentence fragment rejoined to its paragraph
5412429 08 Page-break orphaned fragment lines (3 instances)
6b072f1 03 Stray page-break fragment isolated before a section heading
949ed17 25 Entire OCR’d anukaraNike (preface) block removed from body (202 lines) — preface had been OCR’d twice, appearing a second time mid-body

Sub-phase D — TOC restructure (all kn.md files)

All books with kn.md now have a clean ಒಳಪಿಡಿ/ಪರಿವಿಡಿ section with <a id> anchors and section-link tables. Books 03 and 27 received new full TOCs in this phase; other books were already clean.

Book TOC header Anchor scheme Count
03 ## ಒಳಪಿಡಿ sec-N-M, sec-N-M-P 100 sections, 3 levels
07 ## ಒಳಪಿಡಿ adhyAya-N 4 (vol1) + 2 (vol2)
08 ## ಒಳಪಿಡಿ mixed 38
14 ## ಒಳಪಿಡಿ mixed 164
17 ## ಪರಿವಿಡಿ adhyAya-N 12
25 ## ಪರಿವಿಡಿ adhyAya-N 11
27 ## ಒಳಪಿಡಿ part-N, sec-N-M, sub-N-M-K 221 (5+32+184)
28 ## ಪರಿವಿಡಿ adhyAya-N 12
29 ## ಪರಿವಿಡಿ adhyAya-N 11

Commits: 20bb002 (book 03 — new full 3-level TOC), ad6be57 (book 27 — new 3-tier TOC with 221 anchors).

Sub-phase E — Citation quote convention (books 07, 17, 25, 28)

DNS Bhat’s books were typeset with backtick (U+0060) as typographic open-quote and apostrophe (U+0027 or U+2019) as close. Backtick triggers Markdown code-span rendering on the published site.

Decision: Replace with curly single quotes 'word' (U+2018 open / U+2019 close) — the convention already used natively in books 03 and 27 (Sarvam OCR output). The vowel-modification marker u^ (unrounded-u in Eke) is explicitly not a citation quote and is left unchanged.

Close-char per book: U+0027 (ASCII apostrophe) for books 07, 25, 28; U+2019 (right single quotation mark) for book 17.

Implementation: retrieved HEAD^:{path} via git to get pre-intermediate-commit state, then applied a DOTALL regex (\CONTENT’‘CONTENT’, max 300-char span to handle page-break-split citations) with a double-backtick pass first (``CONTENT’’‘CONTENT’` for direct speech). Orphaned opens/closes handled case-by-case.

Book (file pair) Quotes converted Notable edge cases
07 vol1 kn + eke ~400 OCR fix ನವi್ಮಲ್ಲಿ → ನಮ್ಮಲ್ಲಿ (+ Eke navaimalli → nammalli); double-citation-mark display ('')(''); 1 orphaned-open vocab gloss
07 vol2 kn + eke ~300 1 orphaned open (parallel entry); 1 nested outer backtick; 2 isolated OCR fragment orphans (backtick removed); 1 orphaned close
17 kn + eke 15 4 list-gloss items with OCR-dropped close; 1 bibliography backtick before garbled English title (backtick removed)
25 kn + eke 4 4 double-backtick direct-speech citations; 0 residual backticks after regex
28 kn + eke ~30 3 translation glosses with OCR-dropped close; 1 number-structure example

Commits: 500a296 (intermediate ^..^ convention — superseded), 971e918 (final curly single quotes — 10 files across 5 books).

All Nudi Latin artifacts (0x80–0xFF) now cleared across all kn.md files except © (genuine copyright symbol, preserved in books 03 and 27).



Motivation: After adding cross-links to kn.md files in prior phases, two systemic issues remained:

  1. kn.md cross-links used wrong label ([ingliS →] — Eke romanisation of “English” — instead of [English →])
  2. gen_kn_eke.py passed [English →] | [Eke →] nav lines through verbatim, so regenerated kn-eke.md files had self-referential [Eke →] links pointing at themselves
  3. 02-kn.md had zero cross-links (the user reported #ch2 had no navigation to English or Eke)

Audit of all kn.md files for cross-links:

Book [English →] links [ingliS →] links Status
02 0 0 ❌ Missing — added 60
03 9 (1/chapter) 0
07 vol1 4 (1/chapter) 0
07 vol2 2 (1/chapter) 0
08 38 (1/section) 0
14 0 82 ❌ Wrong label — renamed to [English →]
17 12 0
25 11 0
27 5 0
28 12 (1/chapter) 0
29 11 (1/chapter) 0

Fix 1 — Book 14 kn.md: rename [ingliS →][English →] (82 occurrences; kn-eke.md already correct, not regenerated)

Fix 2 — gen_kn_eke.py: proper nav-link transformation

Previously: [English →](en) | [Eke →](kn-eke) was passed through verbatim into kn-eke.md — creating self-referential Eke links.

Now: when generating kn-eke.md, these lines are transformed to the correct perspective:

[English →](./book-en#en-anchor) | [Eke →](./book-kn-eke#sec-id)
  ↓  (in kn-eke.md)
[ಕನ್ನಡ →](./book-kn#sec-id) | [English →](./book-en#en-anchor)

The kn URL is derived by stripping -eke from the Eke filename in the [Eke →] link.

Fix 3 — Book 02 kn.md: 60 cross-links added (every chapter + section anchor)

Anchor-to-English-anchor mapping (30 unique chapters/sections):

  • ch1, sec-1-[1-3] → part-1--philosophy-and-core-principles
  • ch2, sec-2-[1-3] → part-2--framework-overview
  • ch3, sec-3-[1-2] → part-3--adjective-to-noun--ತನ
  • ch4, sec-4-[1-6] → parts-45--verb-to-noun
  • ch6, sec-6-1 → part-6--zero-derivation
  • ch7, sec-7-[1-3] → part-7--noun-to-noun
  • ch8-ch11, ch13-ch14, ch18-ch19, ch29-36, ch37-52 (and their sections) → most specific en.md anchor

Regenerations:

File Old lines New lines Change
02-...-kn-eke.md 491 (no nav) 611 (with nav) +60 nav links; correct [ಕನ್ನಡ →] format
07-...-vol1-kn-eke.md 20,183 20,183 Nav fixed: [English →]\|[Eke →][ಕನ್ನಡ →]\|[English →]
07-...-vol2-kn-eke.md 13,331 13,331 Same nav fix

Verbatim content audit (all kn-eke.md files): All 11 books confirmed verbatim — non-empty line counts match kn.md exactly.

Commit: fix(02,14): add kn.md cross-links, fix ingliS→English, fix kn-eke nav transformation



Phase 15 — Holistic kn-eke.md Audit + Nav Fix + Stale-Eke Regeneration (2026-03-17)

Motivation: After Phase 14, a cross-book audit revealed two systemic issues that had been fixed one book at a time in prior commits, and two that hadn’t been fixed at all.

Issue 1 — Nav link hygiene (fixed holistically in commit 4964158)

All kn-eke.md files had inconsistent nav-link labels. Patterns found and corrected:

Old pattern Correct Books affected
[ಕನ್ನಡ →] (hybrid Eke in Kannada label) [ಕನ್ನಡ →] 02, 07, 14, 18, 27, 29
[ingliS →] (Eke romanisation of “English”) [English →] 02, 14
[English →] \| [Eke →](kn-eke#...) (self-referential) [ಕನ್ನಡ →](kn#adhyAya-N) \| [English →](en#...) 03, 17, 25, 28

Total: 12 files, 18,746 insertions across the single holistic commit.

Issue 2 — Book 07 OCR page headers/footers (fixed in commit 98c2c7e)

After Phase 14 cleaned vol1-kn.md and vol2-kn.md, the corresponding kn-eke.md files were still stale — generated from the uncleaned source. Transliterated page headers remained:

File Lines before Lines after Pattern removed
vol1-kn.md 20,475 20,185 N / kannaDa barahada sollarime, garbled M¼À
vol2-kn.md 13,928 13,333 copyright line, N / kannaDa barahada sollarime, chapter headers

Issue 3 — Book 07 kn-eke.md files stale after OCR cleanup (fixed in this phase)

The vol1-kn-eke.md (20,473 lines) and vol2-kn-eke.md (13,929 lines) were regenerated from the Phase 14 uncleaned kn.md — before the header/footer removal. After removing those artifacts from kn.md, the kn-eke.md files still contained their transliterated equivalents:

  • 4 / kannaDa barahada sollarime — page headers from left-page running headers
  • Copyright line in Eke form
  • Section separators from chapter titles printed at top of print pages

Fix: Regenerate both from the cleaned kn.md using gen_kn_eke.py.

Issue 4 — Book 02 kn-eke.md was hand-authored summaries, not verbatim Eke (fixed in this phase)

The earliest kn-eke.md in the collection (book 02, Kannadalle Hosapadagalannu Kattuva Bage) was written manually as a companion document with explanatory Eke text — not a verbatim transliteration of kn.md. At sections like sec-4-4, the kn-eke.md had analytical explanation (“esaka padakkE -ka oTTannu sErisi upakaraNavannu hesarisuvA…”) while kn.md had verbatim Kannada word lists and body text. The file was 835 lines vs kn.md’s 553 lines (52% larger — expanded by hand-authored explanations).

Fix: Regenerate from kn.md using gen_kn_eke.py, replacing hand-authored content with verbatim Eke.

Regenerations in this phase (all via gen_kn_eke.py, 0 residual Kannada chars):

File Old lines New lines Source Reduction
02-...-kn-eke.md 835 (hand-authored) 491 (verbatim) 02-...-kn.md (553L) −344 (removed summaries)
07-...-vol1-kn-eke.md 20,473 (stale) 20,183 (clean) 07-...-vol1-kn.md (20,185L) −290 (removed page headers)
07-...-vol2-kn-eke.md 13,929 (stale) 13,331 (clean) 07-...-vol2-kn.md (13,333L) −598 (removed page headers/footers)

Known residual: 07-...-vol1-kn.md line 11206 has (4) M¼À: — a garbled WX-encoded list entry (1 occurrence). Requires original PDF to determine correct Kannada. All other character-level cleanup is complete.

Commit: fix(02,07): regenerate kn-eke.md verbatim — drop hand-authored summaries and stale page headers