-
Notifications
You must be signed in to change notification settings - Fork 22
Hotfix for tiktoken removal #231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this already broke once, we should add extensive test cases using various samples of code to ensure we don't get more regressions
…tiktoken`) Here is an optimized version of your code. The bottleneck is minimal as the computation is a single multiplication and a cast to int, which is already fast. However, a very minor optimization can be done by avoiding the `int()` call for many cases by using integer division directly. You can also remove the `__future__` import, as `annotations` is default since Python 3.7. Here is an optimized version. This avoids floating point multiplication and conversion overhead, and gives the same result as `int(len(s)*0.25)` for non-negative integer `len(s)`.
⚡️ Codeflash found optimizations for this PR📄 70% (0.70x) speedup for
|
…tiktoken`) Here is an optimized version of your code. The multiplication and conversion to int are very fast, but calling `len()` on a Python string first computes the length. To minimize overhead, we can use integer arithmetic to avoid the float operations in `len(s)*0.3`. Multiplying by 0.3 is equivalent to multiplying by 3 and integer dividing by 10. Here's the optimized code. This avoids floating point multiplication and `int()` casting, and is slightly faster. All comments and signatures are preserved.
⚡️ Codeflash found optimizations for this PR📄 39% (0.39x) speedup for
|
User description
We increase the compression ratio from 0.5 to 0.3
PR Type
Enhancement
Description
Changes walkthrough 📝
code_utils.py
Update token length estimation factorcodeflash/code_utils/code_utils.py