Hotfix for tiktoken removal #231

aseembits93 · 2025-05-20T18:03:52Z

User description

We increase the compression ratio from 0.5 to 0.3

PR Type

Enhancement

Description

Adjust token length estimation to 30%

Changes walkthrough 📝

Relevant files

Enhancement

code_utils.py `Update token length estimation factor` codeflash/code_utils/code_utils.py Change encoded_tokens_len factor to 30%	+1/-1

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

github-actions · 2025-05-20T18:04:46Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Rounding behavior The new implementation uses float multiplication and `int` conversion, which may round down to 0 for short strings. Consider edge cases and possibly using `ceil` or enforcing a minimum token count. return int(len(s)0.3) Missing tests* There are no unit tests covering the updated compression ratio logic. Add tests for typical strings and edge cases to ensure correct behavior. return int(len(s)*0.3)

github-actions · 2025-05-20T18:05:26Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Round up token estimate Avoid undercounting tokens by rounding up the estimate; use integer math to apply a ceiling instead of truncating. codeflash/code_utils/code_utils.py [16] -return int(len(s)0.3) +return (len(s) 3 + 9) // 10 Suggestion importance[1-10]: 5 __ Why: The proposed ceiling calculation correctly avoids undercounting tokens by using integer math for a precise rounding up, enhancing estimate accuracy.	Low
General	Use named constant for ratio Extract the magic multiplier into a named constant for clarity and easier future adjustments. codeflash/code_utils/code_utils.py [16] -return int(len(s)0.3) +ENCODE_RATIO = 0.3 +return int(len(s) ENCODE_RATIO) Suggestion importance[1-10]: 4 __ Why: Introducing `ENCODE_RATIO` makes the multiplier more descriptive and simplifies future updates, though it’s a minor readability improvement.	Low

KRRT7

this already broke once, we should add extensive test cases using various samples of code to ensure we don't get more regressions

codeflash/code_utils/code_utils.py

…tiktoken`) Here is an optimized version of your code. The bottleneck is minimal as the computation is a single multiplication and a cast to int, which is already fast. However, a very minor optimization can be done by avoiding the `int()` call for many cases by using integer division directly. You can also remove the `__future__` import, as `annotations` is default since Python 3.7. Here is an optimized version. This avoids floating point multiplication and conversion overhead, and gives the same result as `int(len(s)*0.25)` for non-negative integer `len(s)`.

codeflash-ai · 2025-05-21T01:49:13Z

⚡️ Codeflash found optimizations for this PR

📄 70% (0.70x) speedup for `encoded_tokens_len` in `codeflash/code_utils/code_utils.py`

⏱️ Runtime : 40.1 microseconds → 23.6 microseconds (best of 237 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function encoded_tokens_len by 70% in PR #231 (remove-tiktoken) #235

If you approve, it will be merged into this PR (branch remove-tiktoken).

…tiktoken`) Here is an optimized version of your code. The multiplication and conversion to int are very fast, but calling `len()` on a Python string first computes the length. To minimize overhead, we can use integer arithmetic to avoid the float operations in `len(s)*0.3`. Multiplying by 0.3 is equivalent to multiplying by 3 and integer dividing by 10. Here's the optimized code. This avoids floating point multiplication and `int()` casting, and is slightly faster. All comments and signatures are preserved.

codeflash-ai · 2025-05-21T01:49:43Z

⚡️ Codeflash found optimizations for this PR

📄 39% (0.39x) speedup for `encoded_tokens_len` in `codeflash/code_utils/code_utils.py`

⏱️ Runtime : 42.8 microseconds → 30.8 microseconds (best of 277 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function encoded_tokens_len by 39% in PR #231 (remove-tiktoken) #236

If you approve, it will be merged into this PR (branch remove-tiktoken).

Update code_utils.py

4336650

aseembits93 requested a review from KRRT7 May 20, 2025 18:03

Merge branch 'main' into remove-tiktoken

755d71c

github-actions bot added the Review effort 2/5 label May 20, 2025

KRRT7 requested changes May 20, 2025

View reviewed changes

misrasaurabh1 reviewed May 20, 2025

View reviewed changes

codeflash/code_utils/code_utils.py Outdated Show resolved Hide resolved

Update codeflash/code_utils/code_utils.py

c4a24e8

misrasaurabh1 merged commit 6d2d249 into main May 20, 2025
15 checks passed

codeflash-ai bot mentioned this pull request May 21, 2025

⚡️ Speed up function encoded_tokens_len by 70% in PR #231 (remove-tiktoken) #235

Closed

codeflash-ai bot mentioned this pull request May 21, 2025

⚡️ Speed up function encoded_tokens_len by 39% in PR #231 (remove-tiktoken) #236

Closed

aseembits93 deleted the remove-tiktoken branch May 28, 2025 02:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hotfix for tiktoken removal #231

Hotfix for tiktoken removal #231

Uh oh!

aseembits93 commented May 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 20, 2025

Uh oh!

github-actions bot commented May 20, 2025

Uh oh!

KRRT7 left a comment

Uh oh!

Uh oh!

Uh oh!

codeflash-ai bot commented May 21, 2025

⚡️ Speed up function `encoded_tokens_len` by 70% in PR #231 (`remove-tiktoken`) #235

Uh oh!

codeflash-ai bot commented May 21, 2025

⚡️ Speed up function `encoded_tokens_len` by 39% in PR #231 (`remove-tiktoken`) #236

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Hotfix for tiktoken removal #231

Hotfix for tiktoken removal #231

Uh oh!

Conversation

aseembits93 commented May 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Changes walkthrough 📝

Uh oh!

github-actions bot commented May 20, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented May 20, 2025

PR Code Suggestions ✨

Uh oh!

KRRT7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codeflash-ai bot commented May 21, 2025

⚡️ Codeflash found optimizations for this PR

📄 70% (0.70x) speedup for encoded_tokens_len in codeflash/code_utils/code_utils.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function encoded_tokens_len by 70% in PR #231 (remove-tiktoken) #235

Uh oh!

codeflash-ai bot commented May 21, 2025

⚡️ Codeflash found optimizations for this PR

📄 39% (0.39x) speedup for encoded_tokens_len in codeflash/code_utils/code_utils.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function encoded_tokens_len by 39% in PR #231 (remove-tiktoken) #236

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aseembits93 commented May 20, 2025 •

edited by github-actions bot

Loading

📄 70% (0.70x) speedup for `encoded_tokens_len` in `codeflash/code_utils/code_utils.py`

⚡️ Speed up function `encoded_tokens_len` by 70% in PR #231 (`remove-tiktoken`) #235

📄 39% (0.39x) speedup for `encoded_tokens_len` in `codeflash/code_utils/code_utils.py`

⚡️ Speed up function `encoded_tokens_len` by 39% in PR #231 (`remove-tiktoken`) #236