-
Notifications
You must be signed in to change notification settings - Fork 66
[ML] Correct query times for model plot and forecast #327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Correct query times for model plot and forecast #327
Conversation
lib/api/CForecastRunner.cc
Outdated
| core_t::TTime bucketLength{model.s_ForecastModel->params().bucketLength()}; | ||
| core_t::TTime startTime{model_t::sampleTime( | ||
| feature, forecastJob.s_StartTime, bucketLength)}; | ||
| core_t::TTime endTime{model_t::sampleTime( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to fix it in CAnomalyJob::doForecast instead? CForecastRunner is just a dumb worker and should not have any important logic. CAnomalyJob::doForecast calls into the runner and sets startTime to m_LastResultsTime, it seems to me, that adjusting it there does the same thing but is a bit cleaner. endTime is anyway just relative to startTime.
Maybe the same can be done for model plots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is this is feature specific. So it is tricky to push it higher up if the forecast is being run over a job with multiple detectors with different features.
I could create a wrapper which implements the logic in the model library. I can't directly push the feature into the forecast function (because it is in the maths library which can't depend on EFeature). I could supply a call back to compute the offset start and end times and have this use the wrapper from the model library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, how about I add a function to actually run the forecast to model_t which wraps up this detail. Given we only have the maths::CTimeSeriesModel here (for good reason) this seems like it might be the cleanest option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is this is feature specific. So it is tricky to push it higher up if the forecast is being run over a job with multiple detectors with different features.
ok, I see and agree that's to complicated.
What about inside of model.s_ForecastModel->forecast(...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That hits the library dependency issue mentioned above. However, what about if I have a
CForecastDataSink::SForecastModelWrapper::forecast function which takes the forecast job. This could wrap all the functionality now in this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I am also ok if we keep the current version given that alternatives are to complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I like the idea of wrapping this in SForecastModelWrapper. It seems more natural to me than in this loop which is really just about scheduling. I'll make it and see how it looks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f980f26. Note that none of the members of SForecastModelWrapper are needed outside of the new forecast function, so I converted to a class.
hendrikmuhs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
I removed the |
|
We discussed this some more. There were some misunderstandings about the nature of the change, but also there was a change to the default offsets in time buckets at which forecast points were requested. I reverted to the old style of defining the forecast points at "bucket time ", i.e. offset zero, in #332. We will target this and #332 together at 6.5.4. |
We were querying for the model bounds and forecast points at the beginning of each bucket. Instead we should match the time offset we apply to bucket samples when we update the model.
The upshot was that model bounds and forecasts were (typically) offset in time with respect to the data values. The problem is particularly noticeable for long bucket lengths. For example, the figures below show the model bounds for 1 day buckets before and after the fix.
Before:

After:
