Skip to content

chore: add tdigest commands #5110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 13, 2025
Merged

chore: add tdigest commands #5110

merged 1 commit into from
May 13, 2025

Conversation

kostasrim
Copy link
Contributor

The following commands are implemented:

  • create
  • add
  • rank
  • revrank
  • byrank
  • byrevrank
  • reset
  • info
  • min
  • max
  • cdf

@kostasrim kostasrim self-assigned this May 12, 2025
}

auto cb = [key, compression](Transaction* tx, EngineShard* es) -> OpResult<bool> {
auto& db_slice = tx->GetDbSlice(es->shard_id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the callback implementation from each command to Op(CommandName), as we do for other data types (e.g., json_family). I believe it's not a good approach to keep all the logic inside these static command functions, as it significantly reduces readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't even bother for now, this is sketch -- and we do have callback within functions in other places.

In general I agree, we should keep those functions as small as possible. I am happy to polish if we ever actually merge any of this functionality upstream.

if (!it) {
return it.status();
}
auto* wrapper = it->it->second.GetRobjWrapper();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a GetTdHistogram method to CompactObj, similar to the existing GetJson method, since these two lines are repeated across multiple T-Digest methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I will add it in a follow up PR. not hat important for now

return cmd_cntx.rb->SendError(res.status());
}

double TdGetByRank(td_histogram_t* td, double total_obs, double rnk) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to the anonymous namespace, along with the other methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, happy to polish one we actually consider this work for merging. I will move all those to unamed namespaces after I get tests+functionility done

const double max = td_max(td);

ByRankResult result;
for (auto rnk : ranks) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do result.reserve(ranks.size())

rb->SendLong(res->mem_usage);
}

struct MinMax {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using MinMax = std::pair

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly wanted a struct here toi avoid first and second

std::vector<size_t> ranks;
while (parser.HasNext()) {
double val = parser.Next<double>();
ranks.push_back(val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ranks is vector of size_t, and val is double. Something is wrong here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the internal implementation converts to int. I will check all of these in detail once I add tests

const double max = td_max(td);

RankResult result;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do reserve

std::vector<size_t> values;
while (parser.HasNext()) {
double val = parser.Next<double>();
values.push_back(val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

;
auto* td = (td_histogram_t*)wrapper->inner_obj();
Result result;
for (auto val : values) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reserve

@kostasrim kostasrim merged commit 224ef8f into prob_base May 13, 2025
2 checks passed
@kostasrim kostasrim deleted the kpr7 branch May 13, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants