Skip to content

Add alpha value in wtpsplit #807

@pavaris-pm

Description

@pavaris-pm

According to the original paper 'Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation,' we have an option to adjust the paragraph threshold using the paragraph_threshold argument in wtpsplit. This threshold corresponds to the alpha value mentioned in the paper's method section. I already made a PR at #806 include the usage and output example of using that alpha value, with that, we can specify the paragraph threshold for wtpsplit if we want. What do you think ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementenhance functionalities

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions