Usage Guide =========== epub2text provides three main commands: ``list``, ``extract``, and ``info``. List Chapters ------------- Display all chapters in an EPUB file:: # Table format (default) epub2text list book.epub # Tree format (shows hierarchy) epub2text list book.epub --format tree The table format shows chapter numbers, titles, character counts, and nesting levels. The tree format displays the hierarchical structure of nested chapters. Extract Text ------------ Basic Extraction ~~~~~~~~~~~~~~~~ Extract all chapters to stdout:: epub2text extract book.epub Extract to a file:: epub2text extract book.epub -o output.txt Chapter Selection ~~~~~~~~~~~~~~~~~ Extract specific chapters by number:: # Single chapter epub2text extract book.epub -c 1 # Multiple chapters epub2text extract book.epub -c 1,3,5 # Chapter range epub2text extract book.epub -c 1-5 # Complex range epub2text extract book.epub -c 1-5,7,9-12 -o selected.txt Interactive chapter selection:: epub2text extract book.epub --interactive Output Formatting ~~~~~~~~~~~~~~~~~ **Paragraph Mode** (``-p, --paragraphs``): One line per paragraph:: epub2text extract book.epub --paragraphs **Sentence Mode** (``-s, --sentences``): One line per sentence (requires spaCy):: epub2text extract book.epub --sentences **Max Line Length** (``-m, --max-length``): Split long lines at clause boundaries:: epub2text extract book.epub --max-length 80 **Paragraph Separators**: By default, paragraphs are separated by two spaces at the start of each new paragraph. You can customize this behavior:: # Use empty lines between paragraphs epub2text extract book.epub --empty-lines # Custom separator (e.g., tab) epub2text extract book.epub --separator "\\t" # No separator epub2text extract book.epub --separator "" Text Cleaning Options ~~~~~~~~~~~~~~~~~~~~~ By default, epub2text applies smart text cleaning. You can disable or customize it:: # Disable all cleaning (raw output) epub2text extract book.epub --raw # Keep bracketed footnotes like [1] epub2text extract book.epub --keep-footnotes # Keep page numbers epub2text extract book.epub --keep-page-numbers # Hide chapter markers epub2text extract book.epub --no-markers Output Control ~~~~~~~~~~~~~~ Control which lines are output:: # Skip first 10 lines epub2text extract book.epub --offset 10 # Limit to 100 lines epub2text extract book.epub --limit 100 # Add line numbers epub2text extract book.epub --line-numbers Language Model ~~~~~~~~~~~~~~ For sentence-level formatting, you can specify a different spaCy language model:: # Use German language model epub2text extract book.epub --sentences --language-model de_core_news_sm Show Metadata ------------- Display EPUB metadata and statistics:: # Panel format (default) epub2text info book.epub # Table format epub2text info book.epub --format table # JSON format (for scripting) epub2text info book.epub --format json The ``info`` command displays: - Title - Authors - Contributors - Publisher - Publication Year - Identifier (ISBN, UUID, etc.) - Language - Rights (copyright) - Coverage - Description - Chapter count - Total character count Chapter Markers --------------- Extracted text includes chapter markers in the format:: <> Chapter text content here... <> More content... Use ``--no-markers`` to hide these markers. Examples -------- Extract a book for text-to-speech processing:: # One sentence per line, suitable for TTS epub2text extract book.epub --sentences -o book.txt Create a clean plain text version:: # Paragraphs with empty lines, no markers epub2text extract book.epub --paragraphs --empty-lines --no-markers -o book.txt Extract specific chapters with line length limit:: # Chapters 1-5 with max 100 chars per line epub2text extract book.epub -c 1-5 --max-length 100 -o excerpt.txt Get metadata as JSON for scripting:: epub2text info book.epub --format json | jq '.title'