More changes to preprocessing and chunking logic:
- better (but more expensive) filtering of HTML - don't compress whitespace until after sentence detection - use SBD for sentence detection instead of spacy, it's much smaller
Loading
Please sign in to comment