epub2text Documentation ======================= A niche CLI tool to extract text from EPUB files with smart cleaning capabilities. .. toctree:: :maxdepth: 2 :caption: Contents: installation usage api changelog Features -------- - **Smart Navigation Parsing**: Supports both EPUB3 (NAV HTML) and EPUB2 (NCX) navigation formats - **Selective Extraction**: Extract specific chapters by range or interactive selection - **Flexible Output Formatting**: - One paragraph per line with customizable separators - One sentence per line using spaCy NLP - Automatic line splitting at clause boundaries for long lines - **Smart Text Cleaning**: - Remove bracketed footnotes (``[1]``, ``[42]``) - Remove page numbers (standalone, at line ends, with dashes) - Normalize whitespace and paragraph breaks - Preserve ordered lists with proper numbering - **Rich Interactive UI**: Beautiful terminal output with tables and tree views - **Pipe-Friendly**: Works as both CLI tool and Python library - **Nested Chapter Support**: Handles hierarchical chapter structures - **Full Dublin Core Metadata**: Extract all EPUB metadata fields Quick Start ----------- Install epub2text:: pip install epub2text Extract text from an EPUB file:: epub2text extract book.epub List chapters:: epub2text list book.epub Show metadata:: epub2text info book.epub Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`