Document → Markdown Converter

PythonClaude VisionLaTeX2026 — Case study

Problem

Working with dense academic material — lecture slides, DOCX notes, PDFs, and textbooks with equations — created two separate bottlenecks. Reformatting documents for Obsidian was tedious and inconsistent. Feeding raw files into AI pipelines was expensive in tokens and produced noisy output, hurting response quality.

Approach

Built a Python pipeline that accepts any common document format (PPTX, DOCX, PDF) and outputs clean Markdown. Claude Vision handles the hard parts: describing images and diagrams in natural language, and converting mathematical notation to LaTeX. The LaTeX choice specifically minimises token overhead when the converted files are later fed into LLM contexts — structured maths compresses far better than rasterised images or approximate alt-text.

Outcome

Study notes are now generated automatically from source materials and live directly in Obsidian. Documents routed through the converter use significantly fewer tokens in downstream AI pipelines, reducing cost and improving context quality.