-
Notifications
You must be signed in to change notification settings - Fork 195
Split on sentence and other boundaries #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@lmullen that's right. |
R/modifiers.r
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to make the default value of skip_word_none NA, and then do:
if (identical(skip_word_none, NA)) {
skip_word_none <- type == "word"
}This would also need some doc updates
|
when this will be merged? |
|
@lmullen do you want to finish this off? It also needs a bullet point in NEWS |
This bug causes warnings if options(warnPartialMatchArgs=TRUE).
…e to use a formula and hyphens to break long lines in the source code.
…h no placeholders.
|
@hadley Sorry, I screwed up squashing the pull request. Mind if I resubmit this as a new, clean PR? |
|
Yeah, sure |
This pull request fixes a problem with splitting on boundaries other than words. Currently, splitting on sentence boundaries returns a list with an empty character vector:
The problem is that
boundary()setsskip_word_none = TRUEby default. But ifstringi:stri_split_boundaries()is called for any boundary other than word boundaries, andskip_word_noneis set toTRUE, then it returns an empty character vector. For non-word boundaries, this fix setsskip_word_nonetoFALSEunless the user has deliberately chosen otherwise.The PR adds tests for sentence splitting.