add paragraph_threshold into paragraph_tokenize function
#806
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.








Adding
paragraph_thresholdargument, According to the original paper 'Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation,' we have the option to adjust the paragraph threshold using theparagraph_thresholdargument. This threshold corresponds to thealphavalue mentioned in the paper's method section. By default, the paragraph threshold is set to 0.5Here is a usage:
when
paragraph_threshold=0.5when the
paragraph_threshold = 0.8-> more conservative segmentationwhen the
paragraph_threshold = 0.05-> less conservative segmentation