Changelog
All notable changes to epub2text will be documented here.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Added
Full Dublin Core metadata support:
identifier,language,contributors,rights,coverageNew
extractcommand formatting options:--paragraphs, -p: One line per paragraph--sentences, -s: One sentence per line (requires spaCy)--max-length, -m N: Split long lines at clause boundaries--separator TEXT: Custom paragraph separator (default: two spaces)--empty-lines: Use empty lines between paragraphs--offset N: Skip first N lines of output--limit N: Limit output to N lines--line-numbers, -n: Add line numbers to output
New
infocommand--formatoption:panel(default),table, orjsonEllipsis handling:
...and. . .are no longer treated as sentence boundariesComprehensive test suite for metadata and formatters
Changed
Refactored
extractcommand with cleaner, more intuitive optionsImproved
get_metadata()function with reduced complexityDefault paragraph separator changed from newlines to two-space prefix
Removed
Deprecated
--format-styleoption (replaced by--paragraphs,--sentences)Deprecated
--no-cleanoption (replaced by--raw)Deprecated
--no-chapter-titlesoption (replaced by--no-markers)Deprecated
--no-empty-linesoption (default behavior now uses separators)
[0.1.0] - 2025-01-01
Added
Initial release
EPUB parsing with NAV HTML (EPUB3) and NCX (EPUB2) support
Chapter listing with table and tree formats
Text extraction with chapter selection
Smart text cleaning (footnotes, page numbers, whitespace)
Rich terminal UI with progress indicators
Python library API