Ch 5 — From self-1.md (Oct 2024 – Feb 2026)

← Ch 4 · Contents · ↑ Contents

Chapters: Ch 1 · Ch 2 · Ch 3 · Ch 4 · Ch 5

From self-1.md (Oct 2024 – Feb 2026)

Understanding Transformer Reasoning on Graph Algorithms (Oct 2024) Research analyzing how transformers internally represent and execute graph algorithms — finding that they implement something structurally similar to breadth-first search in their attention patterns when trained on graph traversal tasks. The work advances mechanistic interpretability by connecting observable attention behavior to known algorithmic primitives. arxiv.org — https://arxiv.org/abs/2405.18512

nvidia/Llama-3.1-Nemotron-70B (Oct 2024) NVIDIA’s instruction-tuned variant of Llama-3.1-70B, optimized for instruction following and helpfulness through a combination of RLHF and preference data collected from human annotators. Nemotron-70B ranked highly on chat and instruction-following benchmarks, demonstrating that post-training alignment can substantially improve a strong open base model. huggingface.co — https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct

TapeAgent AI Framework ServiceNow’s TapeAgent framework structures agentic AI workflows around a “tape” — an append-only log of observations, thoughts, actions, and results — that makes agent behavior auditable, reproducible, and debuggable. The tape abstraction addresses a key pain point in agentic systems: understanding what the agent did and why it made particular decisions. github.com — https://github.com/ServiceNow/TapeAgents

Santiago: RAG Pipelines Tweet A detailed thread by Santiago Valdarrama explaining multi-step RAG pipeline construction with evaluation strategies — covering query reformulation, hybrid sparse-dense retrieval, cross-encoder re-ranking, faithfulness evaluation with LLM-as-judge, and the RAGAS framework. A reference for building production RAG systems that actually handle adversarial and edge-case queries. x.com — https://x.com/svpino/

DeepSeek-R1 PDF (Jan 2025) The full technical report for DeepSeek-R1 — a reasoning model trained primarily with reinforcement learning (GRPO) that demonstrated that extended chain-of-thought reasoning can be elicited through RL without requiring massive supervised reasoning data. R1’s open release and competitive performance with o1 made it a watershed moment for open-source reasoning AI. arxiv.org — https://arxiv.org/abs/2501.12948

Scale Test-Time Compute (HuggingFace) HuggingFace’s blog post on test-time compute scaling — the discovery that allocating more inference-time computation (extended chain-of-thought, beam search, majority voting over multiple samples) improves reasoning performance substantially, often more cost-effectively than training a larger model. This finding reframed the compute trade-off between training and inference. huggingface.co — https://huggingface.co/blog/scaling-test-time-compute

Jeff Dean NeurIPS 2024 Talk Jeff Dean’s keynote at NeurIPS 2024 covering Google DeepMind’s view of the future trajectory of AI — including multi-modal foundation models, AI for science (AlphaFold lineage), AI-assisted drug discovery, and the infrastructure challenges of training and serving increasingly large models at global scale. https://neurips.cc/virtual/2024/105966

DeepSeek-v3 101 (Jan 2025) An overview article explaining the DeepSeek-v3 architecture and training process: mixture-of-experts with load balancing, multi-head latent attention for KV cache compression, FP8 training for efficiency, and a pipeline parallelism strategy that achieved dramatically lower training costs than comparable models. V3 demonstrated that frontier-quality models could be trained for a fraction of the cost of GPT-4 class models. huggingface.co — https://huggingface.co/papers/2412.19437

7B Model RLHF Reasoning (Jan 2025) A paper demonstrating that a 7B parameter model trained with reinforcement learning from human feedback (specifically GRPO) on reasoning traces can achieve competitive performance on mathematical and logical reasoning benchmarks — challenging the assumption that strong reasoning requires very large models. The work is part of the “small models, big reasoning” research wave inspired by DeepSeek-R1. arxiv.org — https://arxiv.org/abs/

Hugging Face Journal Club: DeepSeek R1 A HuggingFace journal club deep dive on the DeepSeek R1 paper — covering the GRPO training algorithm, the “cold-start” supervised fine-tuning phase, the role of format rewards (requiring models to produce reasoning in <think> tags), and comparative analysis with OpenAI’s o1. The discussion highlights what is genuinely novel versus what builds on prior RL-for-reasoning work. huggingface.co — https://huggingface.co/papers/2501.12948

GRPO Explained (tweet) Group Relative Policy Optimization — the reinforcement learning algorithm at the core of DeepSeek-R1’s training — explained in a tweet thread. GRPO avoids training a separate value/critic model by estimating advantages from the relative rewards of a group of sampled responses to each prompt, making it substantially more memory-efficient than standard PPO for LLM training. x.com — https://x.com/

RAT — Retrieval Augmented Thinking (arXiv 2403.05313) A prompting technique that interleaves retrieval with chain-of-thought reasoning — the model retrieves relevant context, reasons over it, identifies gaps, retrieves again, and iterates — rather than retrieving once before generating the full response. RAT reduces hallucinations on knowledge-intensive tasks by anchoring each reasoning step in retrieved evidence. arxiv.org — https://arxiv.org/abs/2403.05313

Open Thoughts Reasoning Dataset (Jan 2025) An open-source dataset for training reasoning models, providing long chain-of-thought solutions to mathematics and science problems annotated with step-level correctness labels. Created as a community response to the data scarcity that limits reproducibility of DeepSeek-R1-style RL training. github.com — https://github.com/open-thoughts/open-thoughts

One-Hot Encoding in ML An explainer on one-hot encoding — converting categorical variables into binary indicator vectors — covering when to use it (nominal categories with no ordinal relationship), its dimensionality cost on high-cardinality features, and alternatives like embedding layers and target encoding for machine learning applications. towardsdatascience.com — https://towardsdatascience.com/decoding-one-hot-encoding-a-beginners-guide-to-categorical-data-058582240e86/

Beyond Black Box LLMs (Quanta) Quanta’s coverage of mechanistic interpretability research — the program of reverse-engineering what computations neural networks actually perform by analyzing circuits of interacting attention heads and MLP neurons. The article surveys Anthropic’s “superposition” hypothesis, Neel Nanda’s Modular Arithmetic work, and the broader goal of making LLM internals as transparent as classical algorithms. quantamagazine.org — https://www.quantamagazine.org/why-language-models-are-so-hard-to-understand-20250430/

SUP — State Update Prompts A prompting technique for tracking state across long agentic interactions by explicitly prompting the model to produce structured state update summaries at each step — preventing the context drift and “amnesia” that afflicts agents in extended tasks. SUP makes implicit state tracking explicit and verifiable, improving reliability in multi-step agent workflows. arxiv.org — https://arxiv.org/abs/

AI Engineering — Chip Huyen (O’Reilly) Chip Huyen’s book covering the full stack of production AI engineering: data pipelines, model evaluation, serving infrastructure, monitoring, and organizational patterns for deploying LLM-based systems. Grounded in Huyen’s experience building ML systems at NVIDIA, Snorkel, and advising AI startups — one of the most practically useful books on the topic. oreilly.com — https://www.oreilly.com/library/view/ai-engineering/9781098166298/

Create Apps Without Code Using DeepSeek + RooCode (YouTube, Jan 2025) A tutorial demonstrating building functional web applications entirely through natural language instructions to an AI agent (RooCode) backed by a local DeepSeek model — with no manual code writing. The video represents the “vibe coding” phenomenon where sufficiently capable code generation tools allow non-programmers to build working software. https://www.youtube.com/watch?v=2Frayo_8ovQ

GitHub: Roo-Code Roo-Code is an open-source agentic coding assistant for VS Code, similar to Cursor but with a focus on local model support and extensibility. It supports multiple backends (Ollama, OpenAI-compatible APIs, Anthropic) and provides a chat interface with file editing, terminal access, and project-wide code understanding. github.com — https://github.com/RooVetGit/Roo-Code

Local DeepSeek R1 Hardware Setup (Reddit/LocalLLaMA, Jan 2025) A Reddit thread discussing the hardware requirements for running various sizes of the DeepSeek R1 model locally — from the 7B distilled version (feasible on a consumer GPU with 16GB VRAM) through the full 671B MoE model (requiring multi-GPU server configurations). A practical reference for assessing local inference feasibility. reddit.com — https://www.reddit.com/r/LocalLLaMA/

MIT Researchers Train More Reliable AI Agents (Nov 2025) MIT research on improving the reliability of multi-step AI agents through better reward shaping, error recovery training, and uncertainty-aware action selection — addressing the core problem that agents compound errors across long task horizons, causing failure rates to grow exponentially with task length. mit.edu — https://news.mit.edu/2024/mit-researchers-develop-efficiency-training-more-reliable-ai-agents-1122

LLM Post-Training: DeepSeek Tweet Thread A thread covering post-training techniques used in DeepSeek models: supervised fine-tuning on instruction data, RLHF with GRPO, and distillation from the larger DeepSeek-R1 model into smaller variants. The thread is notable for explaining why the distilled 7B and 14B models perform so well — they inherit the reasoning traces from the larger model’s RL training. x.com — https://x.com/

OpenAI Operator vs Browser Use (Reddit/LocalLLaMA) A comparison of OpenAI’s Operator (a hosted web browser agent) and Browser Use (open-source alternative using local models) — evaluating task success rates, supported websites, latency, and cost. The comparison reveals that the gap between proprietary and open-source browser agents narrowed dramatically with DeepSeek-class models in early 2025. reddit.com — https://www.reddit.com/r/LocalLLaMA/

Thoughts on a Month With Devin AI Coding Agent (answer.ai, Jan 2025) An honest assessment of Devin — Cognition AI’s autonomous software engineering agent — after extended use, finding it impressive on well-defined isolated tasks but unreliable on tasks requiring deep codebase understanding, complex debugging, or collaboration with existing engineering workflows. The review provides a sober counterpoint to Devin’s launch marketing. answer.ai — https://www.answer.ai/posts/2025-01-08-devin.html

LLM Agent Book Shelved (tweet) An author tweets about shelving a book on LLM agents because the field moved too fast for traditional publishing timelines — by the time the book would publish, the specific frameworks and techniques covered would be outdated. A vivid illustration of the unprecedented pace of development in AI tooling. x.com — https://x.com/

Chatbot Fundamental Limitations (Quanta, Jan 2025) Quanta coverage of the core architectural constraints limiting chatbot capabilities: the context window bottleneck, inability to update weights at inference time, probabilistic generation without truth verification, and lack of persistent memory across sessions. The article argues these are not engineering problems to be engineered away but fundamental properties of the current architecture. quantamagazine.org — https://www.quantamagazine.org/chatbot-software-begins-to-face-fundamental-limitations-20250131/

AI Decodes Animal Calls (Nature, Dec 2024) Nature coverage of ML models trained to find structure in animal vocalizations — specifically, using dimensionality reduction and clustering to identify distinct call types in whale, bat, and bird communication, and sequence modeling to detect statistical regularities that might correspond to semantic content. An early step toward interspecies communication research. nature.com — https://www.nature.com/articles/d41586-024-04050-5

Demystifying DeepSeek (Thoughtworks) Thoughtworks’ accessible explainer on what makes DeepSeek models architecturally distinctive: multi-head latent attention (compressing the KV cache via low-rank projections), mixture-of-experts routing with load balancing, FP8 training, and multi-token prediction objectives. The post situates DeepSeek in the broader landscape of efficient frontier model design. thoughtworks.com — https://www.thoughtworks.com/insights/blog/generative-ai/demystifying-deepseek

Google AI Toolbox Introduction (Feb 2025) Google’s GenAI Toolbox — a collection of open-source tools for building production AI applications including model evaluation, prompt management, retrieval pipelines, and monitoring. Positioned as the Google-native alternative to LangChain/LlamaIndex for teams already invested in GCP infrastructure. googleapis.github.io — https://googleapis.github.io/genai-toolbox/getting-started/introduction/

State of LLM Reasoning Models (arXiv 2501.09686) A survey of the reasoning model landscape in early 2025: OpenAI o1/o3, DeepSeek-R1, Gemini Thinking, Claude Extended Thinking — covering training methodologies (RL, distillation, chain-of-thought supervision), benchmark comparisons, and failure modes. A useful snapshot of where reasoning AI stood at the start of the reasoning model era. arxiv.org — https://arxiv.org/abs/2501.09686

Building ML Systems: Trillion FLOPS Talk (YouTube) A talk on the infrastructure considerations for training and serving trillion-parameter-scale models — covering distributed training strategies, gradient checkpointing, mixed precision, interconnect bandwidth requirements, and the economics of large-scale compute procurement. Relevant context for anyone thinking about the practical constraints on scaling AI systems. https://www.youtube.com/watch?v=139UPjoq7Kw

Where Does Meaning Live — Category Theory and LLMs (arXiv, Apr 2025) A paper connecting categorical semantics — specifically functorial semantics and string diagrams from applied category theory — to how language models encode meaning in their embedding spaces. The work proposes that compositional meaning in LLMs can be analyzed through the lens of monoidal categories, linking formal semantics to neural representations. arxiv.org — https://arxiv.org/abs/

To Make Language Models Work Better (prompting research) A research paper on systematic prompting techniques — including structured output constraints, step-back prompting, and self-consistency decoding — that reliably improve model performance across diverse task types without any model modification. The paper provides ablations showing which techniques are robust versus task-specific. arxiv.org — https://arxiv.org/abs/

Virtual Machinations: LLMs as Computers (ACM Queue) An ACM Queue essay by Michael Levin and colleagues proposing that LLMs are best understood as universal virtual machines rather than statistical text generators — capable of “running” any computable function that can be expressed in natural language, with implications for understanding LLM capabilities and limitations through the lens of computability theory. queue.acm.org — https://queue.acm.org/detail.cfm?id=3676287

Understanding Attention in LLMs (arXiv 2409.03752) A technical paper providing a systematic analysis of attention mechanisms in transformer language models — covering multi-head attention’s implicit factored representation, the role of attention sink tokens, and empirical patterns in what individual attention heads attend to across layers. A complement to 3Blue1Brown’s visual treatment with more mathematical depth. arxiv.org — https://arxiv.org/abs/2409.03752

Generalized Transformers from Applicative Functors (cybercat.institute, Feb 2025) A blog post from the CyberCat Institute showing that transformer attention can be derived as a special case of applicative functor composition in category theory — providing a principled algebraic foundation for why transformer architectures work and how they could be generalized. A mathematically sophisticated connection between functional programming abstractions and neural architectures. cybercat.institute — https://cybercat.institute/2025/02/12/transformers-applicative-functors/

Indic-Parler TTS (Dec 2024) AI4Bharat’s Indic-Parler is a text-to-speech model supporting multiple Indian languages with natural prosody and speaker diversity. Built on the Parler-TTS architecture with Indian language text and audio training data, it addresses the significant gap in high-quality TTS for languages like Kannada, Telugu, Malayalam, and Tamil. huggingface.co — https://huggingface.co/ai4bharat/

Pralekha — Indic Document Alignment (Dec 2024) AI4Bharat’s Pralekha tool for aligning Indic language documents to create parallel corpora — essential infrastructure for training translation and multilingual models. The tool handles the challenges of Indic script alignment, including different segmentation conventions, script normalization, and the lack of strong sentence-boundary signals in some languages. github.com — https://github.com/AI4Bharat/Pralekha

Ollama deepseek-r1 Local Deployment (Jan 2025) Running DeepSeek-R1 via Ollama — a local model serving tool that manages model downloads, quantization, and an OpenAI-compatible API endpoint. The Ollama integration makes it possible to run R1’s reasoning capabilities on consumer hardware without any cloud dependencies, enabling offline and private AI applications. ollama.ai — https://ollama.ai/library/deepseek-r1

LLM Reasoning State — State of LLMs Mar 2025 (arXiv 2503.22732) A March 2025 survey of where reasoning models stood: benchmarks on AIME, MATH-500, and code reasoning; comparisons across o3, DeepSeek-R1, Gemini Thinking, and Claude 3.7; and open questions about whether extended thinking traces reflect genuine multi-step reasoning or sophisticated pattern completion. A useful temporal snapshot. arxiv.org — https://arxiv.org/abs/2503.22732

NotebookLM App (May 2025) Google’s NotebookLM updated to support audio podcast generation — AI hosts discuss uploaded documents in a conversational format, making dense technical content accessible through listening. The tool is particularly useful for reviewing long research papers or personal note collections, as demonstrated by the Sep 2024 personal summary archived above. notebooklm.google.com — https://notebooklm.google.com/

After Three Years: Modular’s CUDA Alternative Is Ready (May 2025) A retrospective on Modular’s three-year effort to build the Mojo programming language and MAX inference engine as alternatives to NVIDIA’s CUDA ecosystem. The piece covers what Modular got right (Python-compatible syntax, performance competitive with CUDA on modern GPUs) and what challenges remain (ecosystem maturity, model coverage). eetimes.com — https://www.eetimes.com/after-three-years-modulars-cuda-alternative-is-ready/

Modular DeepSeek Democratizing Compute Modular’s blog post on running DeepSeek models efficiently on diverse hardware using their MAX inference engine — demonstrating that frontier model inference is becoming accessible beyond NVIDIA H100 clusters, running well on AMD GPUs, Intel accelerators, and Apple Silicon. modular.com — https://www.modular.com/blog/democratizing-compute-part-1-deepseeks-impact-on-ai

AI Is Helping Decode Animals’ Speech (Nature, Sep 2025) An updated Nature feature on ML tools for animal communication research — covering the Earth Species Project’s work on dolphin whistle classification, efforts to decode prairie dog “sentences,” and bee waggle dance analysis. The piece notes the challenge of distinguishing meaningful signal from statistical noise in animal vocalizations without ground-truth translations. nature.com — https://www.nature.com/articles/d41586-025-02917-9

AI Scientist Song-Chun Zhu Leaves US for China (Guardian, Sep 2025) A long-form Guardian profile on why Song-Chun Zhu — a UCLA AI researcher known for his work on scene understanding and cognitive AI — returned to China’s Peking University, citing research freedom, resource availability, and a more supportive environment for ambitious long-horizon AI research. The piece captures broader geopolitical dynamics in AI talent flows. theguardian.com — https://www.theguardian.com/news/ng-interactive/2025/sep/16/song-chun-zhu-why-one-of-the-worlds-most-brilliant-ai-scientists-left-the-us-for-china

AI Experts Return From China Stunned: US Grid Too Weak (Fortune, Aug 2025) Fortune’s reporting on AI researchers visiting Chinese data centers who found that China’s electrical grid capacity for AI compute substantially exceeds what is available in the US — with single data centers drawing power equivalent to small cities. The piece argues that energy infrastructure, not chips or algorithms, may become the binding constraint on AI development. fortune.com — https://fortune.com/2025/08/14/data-centers-china-grid-us-infrastructure/

GPT-5.2 Derives New Result in Theoretical Physics OpenAI preprint documenting GPT-5.2 proposing a new closed-form formula for a gluon scattering amplitude — a result in quantum chromodynamics that was subsequently formally verified by physicists. The first instance of an AI system making a genuine new discovery in fundamental physics, not just solving known problems. openai.com — https://openai.com/index/new-result-theoretical-physics/

LLM-Generated Lean 4 Proofs — Dylan Miller A research paper benchmarking GPT-5, Gemini 2.5, and Claude 3.7 on formal theorem proving in Lean 4 — measuring not just success rates but proof style (readability, length, use of tactics vs. automation). The results show frontier models approaching human mathematician-level performance on undergraduate-level theorem proving. github.com — https://github.com/lampless/LLM-Generated-Lean4-Proofs/blob/main/Dylan%20Miller_%20LLM-Generated%20Lean4%20Proofs.pdf

Model Context Protocol Intro — Anthropic (Feb 2025) Anthropic’s announcement of the Model Context Protocol (MCP) — an open standard for connecting AI models to external data sources, APIs, and tools in a composable, permission-controlled way. MCP enables AI assistants to access databases, file systems, and APIs without custom per-integration code, and has since been adopted by major AI tools. anthropic.com — https://www.anthropic.com/news/model-context-protocol

OpenClaw 21 Use Cases — Matthew Berman A tweet thread cataloging 21 practical use cases for OpenClaw — an agentic workflow tool built around Claude Code — ranging from autonomous code review and documentation generation to self-improving prompt libraries and competitive intelligence automation. The thread illustrates how persistent AI agents running on consumer hardware are enabling solo developers to operate at team scale. x.com — https://x.com/MatthewBerman/status/2023843493765157235

How to Run a 24/7 AI Company for $50/Month (OpenClaw) A guide demonstrating how to run continuous AI agent workflows on a Mac Mini using OpenClaw — with agents handling customer queries, content generation, code review, and data processing autonomously. The $50/month figure covers API costs for moderate usage, illustrating the dramatic reduction in the cost of automating knowledge work. x.com — https://x.com/ziwenxu_/status/2023610499024171077

OpenClaw Memory Fix Guide Tips for preventing AI agents from losing context across long sessions — including strategies for explicit memory summaries, structured state serialization, and prompting the agent to maintain a running context document. Context management across long agent sessions remains one of the primary engineering challenges in production agentic workflows. x.com — https://x.com/KSimback/status/2024431606002319739

Greg Isenberg: 24/7 Digital Employees With OpenClaw A thread on building cash-flowing automation businesses using AI agents — where “digital employees” handle routine tasks (social media management, customer support, content production) continuously without human supervision. Isenberg argues this represents a fundamental shift in what a small team can accomplish economically. x.com — https://x.com/gregisenberg/status/2024247983999521123

Anthropic Free Short Course: Build Skills With Claude Code A free course from Anthropic covering how to build custom skills and workflows deployable across Claude Code, the Claude API, Claude SDK, and VS Code extensions — enabling developers to create reusable AI capabilities that can be invoked across multiple interfaces from a single implementation. x.com — https://x.com/sentientt_media/status/2025142906051498085

Fine-Tuning LLMs: 114-Page Comprehensive Guide A paper exploring fine-tuning strategies from foundational (full fine-tuning, LoRA, QLoRA) through advanced (DPO, IPO, ORPO for alignment) to multimodal applications (vision-language fine-tuning). At 114 pages, it serves as a handbook covering both theoretical motivation and practical implementation guidance across the full spectrum of adaptation techniques. x.com — https://x.com/techwith_ram/status/2025255030585237931

DOSA Code Discovery: One Gene Links Heart, Obesity, Sleep Apnea (Feb 2026) Bengaluru BRIC-inStem scientists discover a KCNA2 gene mutation that links susceptibility to obesity, cardiovascular disease, and sleep apnea through a shared molecular pathway — a pleiotropic variant that may explain why these three conditions so frequently co-occur. The discovery was made using genomic analysis of multi-disorder families and validated in model organisms. deccanherald.com — https://www.deccanherald.com/india/karnataka/bengaluru/the-dosa-code-one-gene-three-disorders-3899925

Signal Creator Moxie Marlinspike Launches Confer AI (Jan 2026) Moxie Marlinspike — the cryptographer who built Signal’s end-to-end encryption protocol — launches Confer, an AI assistant with end-to-end encrypted conversations and a commitment that Anthropic (the model provider) cannot read user queries. The project applies the same privacy-by-design philosophy from Signal to AI assistants. arstechnica.com — https://arstechnica.com/security/2026/01/signal-creator-moxie-marlinspike-wants-to-do-for-ai-what-he-did-for-messaging/

Recommended AI Books List — John Crickett 21 AI/ML book recommendations from software engineering educator John Crickett, covering the full stack from mathematical foundations (linear algebra, probability) through ML theory (Shalev-Shwartz, Bishop) to production systems (Chip Huyen) and recent LLM-focused texts. A useful reading list for engineers making the transition from software to AI development. x.com — https://x.com/johncrickett/status/2026288507312910547

aidnn by Isotopes AI (Oct 2025) Isotopes AI’s multi-agentic platform for data analysis and business decision support — combining LLM reasoning with structured data querying, visualization generation, and automated insight extraction. Positioned for enterprise analytics teams wanting to reduce dependence on data scientists for routine analytical workflows. isotopes.ai — https://isotopes.ai/#about-us

User Personal Note: AI Project Ideas (Feb 2026) A collection of personal project ideas captured in notes: (1) procrastination agent — an AI that monitors and nudges against procrastination patterns; (2) Eke subtitle generation — generating romanized Kannada subtitles for Kannada video content; (3) extended Wiktionary from DNS Bhat methodology — using Ellara Kannada word-coining principles to generate etymological entries; (4) self-history clustering — clustering personal notes into an indexed searchable book; (5) research paper translation to Ellara Kannada; (6) team-of-rivals approach for vocabulary (multiple models debating the best native Kannada word for a concept); (7) newsfeed auto-summarization in Kannada.

Mechanistic Interpretability

Grigory Sapunov: Anthropic reverse-engineered Claude 3.5 Haiku — 6D helical manifolds x.com — https://x.com/che_shr_cat/status/2023729615055782140?s=20

A tweet by Grigory Sapunov (an AI researcher) discussing research claiming to reverse-engineer the internal representation structure of Claude 3.5 Haiku — finding that features are organized in 6-dimensional helical manifolds. A fascinating mechanistic interpretability result. [→ machine-learning-ai]

ML & AI Channels

Machine Learning Street Talk (YouTube) https://www.youtube.com/channel/UCMLtBahI5DMrt0NPvDSoIRQ

Long-form interviews with leading ML researchers and practitioners: Tim Scarfe and Keith Duggar’s show covers LLM theory, reinforcement learning, mechanistic interpretability, and AI safety with unusual technical depth. One of the best English-language ML podcasts for practitioners who want more than surface-level coverage. [→ machine-learning-ai]

The Robot Brains Podcast (YouTube) https://www.youtube.com/channel/UCXNviQjBONXljxkJzNV-Xbw

Pieter Abbeel’s podcast interviewing AI and robotics researchers; covers embodied AI, reinforcement learning, robot learning, and the frontier of AI capabilities. Abbeel is a leading RL researcher at UC Berkeley and co-founder of Covariant. [→ machine-learning-ai; robotics]

Coding Tech (YouTube) https://www.youtube.com/channel/UCtxCXg-UvSnTKPOzLH4wJaQ

Aggregator of technology conference talks covering software engineering, ML/AI, cloud infrastructure, and programming languages. Good for keeping up with conference content without attending. [→ machine-learning-ai; infrastructure]

Claude Code Skills + Playwright MCP Can Automate ANYTHING (YouTube, Watch History)

Video documenting an advanced Claude Code + Playwright MCP workflow for browser automation — demonstrating how AI-assisted coding tools can compose with web automation to create sophisticated agentic pipelines. Directly relevant to the ettuge skill system and automation infrastructure. [→ machine-learning-ai; tools]