-
Notifications
You must be signed in to change notification settings - Fork 1.3k
makeqstrdata: use an extremely accurate dictionary heuristic #4978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Try to accurately measure the costs of including a word in the dictionary vs the gains from using it in messages. This saves about 160 bytes on trinket_m0 ja, the fullest translation for that board. Other translations on the same board all have savings, ranging from 24 to 228 bytes. ``` Translation Before After Savings ja 1164 1324 160 de_DE 1260 1396 136 fr 1424 1652 228 zh_Latn_pinyin 1448 1520 72 pt_BR 1584 1736 152 pl 1592 1640 48 es 1724 1816 92 ko 1724 1816 92 fil 1764 1800 36 it_IT 1896 2040 144 nl 1956 2136 180 ID 2072 2180 108 cs 2124 2148 24 sv 2340 2448 108 en_x_pirate 2644 2740 96 en_GB 2652 2752 100 el 2656 2768 112 en_US 2656 2768 112 hi 2656 2768 112 ```
I was puzzled by why the dictionary words were sorted by length. It was because TextSplitter sorted its parameter, instead of a copy. This doesn't affect encoding size, but does affect the encoding NUMBER of the found words. We'll deliberately restore sorting by length next, for other reasons, but not by spooky action.
|
The above stats apply to the FIRST commit only. The second commit may affect things a little bit, but only for the better. The problem with potentially accepting "negative valued" dictionary words only came into play with another change I'm testing locally which allowed for more dictionary terms. |
tyomitch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The improvement in the heuristic's accuracy is excellent, I just wish the comment describing the new heuristic were more accurate.
Thanks to tyomitch for suggesting the comment could be more accurate.

Try to accurately measure the costs of including a word in the dictionary vs the gains from using it in messages.
This saves about 160 bytes on trinket_m0 ja, the fullest translation for that board before and after this change. Other translations on the same board all have savings, ranging from 24 to 228 bytes. The list below sorted the fullest translations (before the change) to the top. The numbers shown are the "bytes free in flash firmware" reported during a local build.
Deltas for
adafruit_proxlight_trinkey_m0in CI: