Skip to content

Conversation

raphw
Copy link
Contributor

@raphw raphw commented Sep 13, 2023

Adds an abstraction for reporting metrics, reports and logs. This makes the majority of agent-core uses obsolete within plugins, with few exceptions.

@github-actions github-actions bot added agent-java community Issues and PRs created by the community triage labels Sep 13, 2023
@github-actions
Copy link

👋 @raphw Thanks a lot for your contribution!

It may take some time before we review a PR, so even if you don’t see activity for some time, it does not mean that we have forgotten about it.

Every once in a while we go through a process of prioritization, after which we are focussing on the tasks that were planned for the upcoming milestone. The prioritization status is typically reflected through the PR labels. It could be pending triage, a candidate for a future milestone, or have a target milestone set to it.

@raphw raphw changed the title Metric api Introduce API for metric and log reporting within tracer-api. Sep 13, 2023
@raphw raphw force-pushed the metric-api branch 2 times, most recently from 8366c75 to 425d897 Compare September 27, 2023 21:55
…n already be read by constructor injection. Add warning to use the global tracer from listeners, and make init method unavailable for regular plugins.
import co.elastic.apm.agent.impl.ElasticApmTracer;
import co.elastic.apm.agent.tracer.LifecycleListener;

public interface InitLifecycleListener extends LifecycleListener {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion it is not a good idea to have an inheritance hierarchy for LifecycleListener, because this gets confusing when the instantiation happens via ServiceLoader.

I assume you did this separation so that LifecycleListener implementors don't necessarily have to override init(tracer), correct? I think it would be better to make LifecycleListener an abstract class with empty default implementations for the methods.

In cases were this causes trouble due to multiple-inheritance those use-cases should fallback to using composition instead of inheritance for the second superclass.

Copy link
Contributor Author

@raphw raphw Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this separation as there is only few uses of this method and it is all internal. Beyond that I could not imagine a scenario where a plugin would need the init method, that's why I wanted to avoid exposing it. I kept them in the same hierarchy as those few uses also require other lifecycle methods and might keep state. Without this, I would have separated the interfaces but the service loader would instantiate a new class.


import javax.annotation.Nullable;

public interface DataWriter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaking this class is only used by the OpenTelemetry and Micrometer plugins to report metrics.

In my opinion this abstraction is on the wrong level: Users of this API have to effectively build correct JSON for the elastic specific IntakeV2 protocol.

I think the abstraction should rather be something like a MetricSetBuilder where:

  • A metricset is a group of metrics which have the same labels
  • The builder allows adding double and long scalar metric values
  • The builder allows adding histogram metric values
  • The builder terminates with a report() operation causing the data to be sent

Maybe it would be even better to allow providing Labels with each metric value, abstracting away the grouping into metricsets based on labels from API users.

With #3016 we wanted to remove the duplicate metricset serialization logic, which would make introducing this kind of abstraction a lot simpler. Unfortunately we haven't been able to prioritize this yet and likely won't be able to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about a even further refactoring, but I did not want to derive too much from the original implementation in a first attempt. I would prefer to see such a refactoring to be incremental.

Copy link
Contributor

@JonasKunz JonasKunz Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this needs a lot more refactoring work to arrive at what I suggested with the MetricSetBuilder thingy.

The problem I see with the DataWrite is that it introduces an abstraction which actually isn't one. Even with later refactoring, that intermediate abstraction doesn't bring any reasonable benefit.

So instead of first getting rid of the core-dependency for the OpenTelemetry and Micrometer instrumentations and then cleaning it up, I think the order should be reversed here to not worsen the code for a potential future refactor, which might be long in the future.

So the order should rather be to first refactor those plugins to allow the introduction of a good abstraction afterwards. I know this means that we'll have to stay with those two plugins depending on core for now, as we won't be doing that biggish refactor soon, but for us this is the preferred solution.


ScheduledExecutorService getSharedSingleThreadedPool();

void addShutdownHook(Closeable job);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to have shutdown hooks as part of LifecycleListeners?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it myself but the hook is registered programmatically and the instance state cannot be transferred that easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. The problem is that LifecycleListeners are instantiated as singletons, and the singletons are not accessible from instrumentations without piping them through an ugly static variable.

In my opinion this should rater be solved by making ElasticApmTracer.getLifecycleListener part of the tracer-api.
This way instrumentations also get access to the start() method, which could be called after stop() if the tracer is resumed.

I had a look at the only usage of addShutdownHook, which currently is the micrometer instrumentation. This instrumentation would then register a custom LifecycleListener, which in turn allows the MicrometerMetricsReporters to register themselves to be flushed on shutdown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after an internal discussion we decided to stay with just adding addShutdownHook to the Tracer. This should keep things simple for now, if we want to, we can refactor later.


DataWriter newWriter(int maxSerializedSize);

ScheduledExecutorService getSharedSingleThreadedPool();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this method to either the general Tracer interface or a separate interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a correlation between a plugin reporting and needing a custom lifecycle. Wouldn't it make sense to keep the two together for this reason?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I forgot to add an initial comment, with which my proposal here would make more sense:

I was thinking of splitting ReportingTracer into MetricsTracer and LogReportingTracer. Because both would still need the getSharedSingleThreadedPool() (I think?) I would move that to the parent interface.

SylvainJuge
SylvainJuge previously approved these changes Oct 17, 2023
Copy link
Member

@SylvainJuge SylvainJuge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we need to keep it as simple as possible, thus I think it's better if we/you:

  • while plugins do not need the ability to use agent init, I think it's not worth having a separate InitLifecycleListener for that, there is only a single implementation that does not inherit from the abstract impl.
  • remove the extra ReportingTracer and ServiceAwareTracer as their only purpose it to group related features.

@raphw
Copy link
Contributor Author

raphw commented Oct 17, 2023

Certainly, I can remove the extra interface, this can always be refactored later. I did not want to expose it also because a lot of the global state is not initialized at this point, but it also adds complexity.

As for the second suggestion: my hope was that alternative users could not implement this feature and disable some code being executed, such as the servlet instrumentation, if that information is not needed.

@JonasKunz
Copy link
Contributor

  • remove the extra ReportingTracer and ServiceAwareTracer as their only purpose it to group related features.

As for the second suggestion: my hope was that alternative users could not implement this feature and disable some code being executed, such as the servlet instrumentation, if that information is not needed.

For me personally splitting up the tracer functionality into multiple interfaces is fine. It allows to easily distinguish different "classes" of instrumentation plugins based on what tracer-variant they use and makes the interfaces less "overwhelming"

@raphw
Copy link
Contributor Author

raphw commented Oct 18, 2023

I could add a super interrface LifecycleTracer or just add it to the basic Tracer. It's only log reporting and metric tracers that need this API, but I thought about splitting them up, too.

@JonasKunz
Copy link
Contributor

We've had an internal discussion on these changes and decided on the following:

  • We strongly prefer having init(Tracer) in the LifecycleListener, even if the Tracer would be only partially initialized at that point for plugins. That is fine and preferable to having multiple interfaces in our opinion.

  • We think that splitting up the Tracer interface into specialized sub-interfaces introduces more complexity than it brings benefits. For the rare cases that someone else want's to define a tracer, they should rather implement unsupported operations as no-op. This means we prefer to push that complexity from plugins having to tracer.require what they need to the implementors of the Tracer interface. As a result, we should (in a different PR) also remove the ServiceAwareTracer introduced in Moves ServiceInfo to tracer module and reduce dependency exposure to core module of plugins. #3272 , sorry for having to row back here.

  • For the DataWriter and subsequently the OpenTelemetry and Micrometer plugins, see this comment

@raphw
Copy link
Contributor Author

raphw commented Nov 20, 2023

I removed the init-lifecycle listener and moved the shutdown-hook method.

As for the interfaces: the idea was that custom extensions could piggy back on the existing mechanism. So that one could hook into the current tracer API but add my doCustomLogic method to a subinterface that only makes sense for my custom logic. In a way, it is meant as an extension mechanism for plugins beyond the current project. Or did you want to keep the require and probe methods?

As for the writer: the advantage of the added abstraction is that it is now easy to add as a no-op. That's rather impossible before as there is no interface and the constructor will allocate on any occasion. The API use is also simplified as the stringBuilder that is used as a buffer is moved to the writer and away from the plugins. I would also argue that this first refactoring makes all subsequent refactoring more easy for this reason. Are you sure that I should undo this one?

@raphw
Copy link
Contributor Author

raphw commented Dec 13, 2023

Short reminder with regards to my questions, thanks!

@SylvainJuge
Copy link
Member

Short reminder with regards to my questions, thanks!

Thanks for the reminder, we've been quite busy but I hope to we'll back to it soon.

@JonasKunz
Copy link
Contributor

Hey @raphw , sorry for the very late response.

As for the writer: the advantage of the added abstraction is that it is now easy to add as a no-op. That's rather impossible before as there is no interface and the constructor will allocate on any occasion. The API use is also simplified as the stringBuilder that is used as a buffer is moved to the writer and away from the plugins. I would also argue that this first refactoring makes all subsequent refactoring more easy for this reason. Are you sure that I should undo this one?

The alternatives I see are the following:

  1. Keep the metrics related plugins dependent on apm-core, do not add metric-specific functionality to the Tracer
  2. Your approach: Add the metric specific abstraction on JSON-level to theTracer to make micrometer + opentelemetry metrics plugins independent from core. However, actual third party Tracer would have a hard way of providing a meaningful implementation, because the abstraction basically is "write elastic IntakeV2 metricset JSON to an interface".
  3. Do a refactor and create an actual abstraction for metrics to the Tracer on the correct level (=Tracer accepts metric values instead of JSON data)

We ruled approach 3 out for now, because it simply is too much effort.
Tbh I don't see the benefit of approach 2. over 1. : I can't see how that abstraction helps us in making code easier to understand or third-party Tracers in making a Tracer easier to implement: I can't see how anyone would do a non-Noop implementation of the metric related functionality here. That's why I'm thinking that the micrometer + opentelemetry metrics plugin simply are not suited for being reuused by others at the moment and should stay dependent on apm-core.

As for the interfaces: the idea was that custom extensions could piggy back on the existing mechanism. So that one could hook into the current tracer API but add my doCustomLogic method to a subinterface that only makes sense for my custom logic. In a way, it is meant as an extension mechanism for plugins beyond the current project. Or did you want to keep the require and probe methods?

If you see a great benefit of require and probe for third-party tracer implementations, I'm fine with keeping them. We would also use them for plugins which are not suited to be reusable by others at the moment to get an ElasticApmTracer (e.g. micrometer, Otel metrics). We were just thinking that we put all methods we use in any of our non apm-core dependendent plugins into the single Tracer interface instead of spreading them across sub-interfaces to keep things simple for us.

cc: @jackshirazi @SylvainJuge please give feedback whether you have a different opinion here

@raphw
Copy link
Contributor Author

raphw commented Jan 19, 2024

As for (1) - I would try to get the entire JSON writer into a custom module. This mainly to shield the plugin from other dependencies. I will see how this goes and fallback to not breaking out the core dependency.

As for the other bit: I think I will then leave it as the unwrapping is useful for getting "internal access".

I'll wait for concluding feedback and set into motion thereafter. @jackshirazi @SylvainJuge @JonasKunz

@raphw raphw closed this Feb 5, 2024
@raphw
Copy link
Contributor Author

raphw commented Feb 5, 2024

I closed this PR in favor of separate PRs, one for logs (#3515), one for FAAS (#3516) the one for metrics capture (TBD).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-java community Issues and PRs created by the community triage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants