Commit a8e36b44 authored by Eric Frias's avatar Eric Frias
Browse files

More changes to preprocessing and chunking logic:

 - better (but more expensive) filtering of HTML
 - don't compress whitespace until after sentence detection
 - use SBD for sentence detection instead of spacy, it's much smaller
parent b87f9fa8
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment