[Suggestion] Add consonant-remover method

## Detailed description
 I suggest to add a dictionary-based consonant-remover method.
As like เริศศศศศศศศศศศศศศ -> เริศ

## Context
I am doing text mining of Pantip. I saw that there are not few people write like "เริศศศศศศศศศศศศศศ", to express their emotions. Current `pythainlp.utils.normalize()` removes only vowels duplication, so there is no method to handle this now.  Current tokenizers may separate this as "เริศ / ศศศศศศศศศศศศศ", but it becomes a noise of analysis.
Plus the implementation was a little long, so I wanted this method in pythainlp library

## Possible implementation
My implementation was like below. 
```python
       #>>against เริศศศศศศศศศศศศศศ

        if (len(sentence) > 2) and pythainlp.util.isthaichar(sentence[-1]) and (sentence[-1] == sentence[-2]):
            # The last of the sentence has duplication (duplication typically at the last)

            dup = sentence[-1]
        
            #find the words in the dictionary that has duplication at the last
            #required here because dictio dynamically added
            repeaters = []
            for word in dictio:
                if (len(word) > 2) and (word[-1] == dup) and (word[-2] == dup):
                    all_same = True
                    for cnt_1 in range(len(word)):
                        if word[cnt_1] != dup:
                            all_same = False
                            break
                    if not all_same:
                        repeaters.append(word)
                    
            #check if there is matching with repeaters
            sentence_head = sentence
            while(sentence_head[-1] == dup):
                if (len(sentence_head) == 1):
                    break
                
                sentence_head = sentence_head[:-1]

            found = False
            for repeater in repeaters:
                rep_head = repeater
                
                repetition = 0
                while(rep_head[-1] == dup):
                    rep_head = rep_head[:-1]
                    repetition += 1
                    
                if sentence_head[-len(rep_head):] == rep_head:
                    found = True
                    break
                    
            if found:
                sentences[cnt] = sentence_head + (dup * repetition)
            else:
                sentences[cnt] = sentence_head + (dup * 1)
```

If this plan seems good, I could make a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Suggestion] Add consonant-remover method #860

Detailed description

Context

Possible implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Suggestion] Add consonant-remover method #860

Description

Detailed description

Context

Possible implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions