-
Notifications
You must be signed in to change notification settings - Fork 20
Add a dashboard in a new metrics.rs module #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change introduces a simple HTML table for quick reference about individual symbol status. The table is served using a new HTTP endpoint, under "/dashboard".
jayantk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. I agree with ali's comments too, so i'll let him approve when you've addressed them.
| pub struct MetricsServer { | ||
| /// Used to pull the state of all symbols in local store | ||
| local_store_tx: mpsc::Sender<Message>, | ||
| global_store_lookup_tx: mpsc::Sender<Lookup>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i presume in the future you're going to create another component that aggregates metrics over time (like gets sent info on each update or whatever to track in accumulators), and the data from that component will be accessible here as well?
I think the current information here is a good start to a dashboard, but I'm pretty sure that people will want to see metrics over some time interval. Most problems (like dropped RPC requests, etc) will happen sporadically and you'll want historical data to see whether or not they've been happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state obtained by this dashboard will not benefit the metrics necessarily. Explanation below.
The main piece of Rust boilerplate for adding metrics is that the Rust Prometheus crate does not let you discover existing metrics easily from global metrics state. This state is kept in a Registry object, which is the top-level container used to gather and export formatted metrics in bulk. The preferred method for creating/referencing metrics is linking counter/gauge objects to the registry, and using the objects themselves as references to individual metrics, presumably right beside other state variables.
This pattern was reasonably easy for the attester, where we focused on the overall performance of the system and got away with storing a handful of static metrics as globals.
In the agent, we'll likely need to add the counter objects per-symbol dynamically and only make the common Registry static. I plan to extend global/local store state to hold the new objects and update them with incoming Update* messages. I'ii also take good a look at Richard's comments and screenshots to get an idea of what could work well.
| let price_string = if let Some(global_data) = price_data.global_data { | ||
| let expo = global_data.expo; | ||
| let price_with_expo: f64 = global_data.agg.price as f64 * 10f64.powi(expo); | ||
| format!("{:.2}", price_with_expo) | ||
| } else { | ||
| "no data".to_string() | ||
| }; | ||
|
|
||
| let last_publish_string = if let Some(global_data) = price_data.global_data { | ||
| if let Some(datetime) = | ||
| NaiveDateTime::from_timestamp_opt(global_data.timestamp, 0) | ||
| { | ||
| datetime.format("%Y-%m-%d %H:%M:%S").to_string() | ||
| } else { | ||
| format!("Invalid timestamp {}", global_data.timestamp) | ||
| } | ||
| } else { | ||
| "no data".to_string() | ||
| }; | ||
|
|
||
| let last_local_update_string = if let Some(local_data) = price_data.local_data { | ||
| if let Some(datetime) = | ||
| NaiveDateTime::from_timestamp_opt(local_data.timestamp, 0) | ||
| { | ||
| datetime.format("%Y-%m-%d %H:%M:%S").to_string() | ||
| } else { | ||
| format!("Invalid timestamp {}", local_data.timestamp) | ||
| } | ||
| } else { | ||
| "no data".to_string() | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpik: maybe in the future I move these to another method to separate the html rendering vs what data to display.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already broke ground on the metrics front, starting with rearranging the modules. It definitely looks better that way and is coming soon.
ali-behjati
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I think it's good to wait a bit for Tom to review this too.
I left some comments and appreciate if you address them before merging.
| @@ -1,3 +1,4 @@ | |||
| #![recursion_limit = "256"] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add the reasoning as a comment here too?
(they lived happily ever after)
Overview
This change introduces a simple HTML table for quick reference about individual symbol status. The table is served using a new HTTP endpoint, under "/dashboard".
Screenshot
click here
Changes to existing code
agent.rstogether with other async jobs. It reuses the logger and some of the channel tx's.New code
metrics.rs- a new metrics- and dashboard oriented module. I plan to also add Prometheus metrics there and I'll probably exile dashboard generation logic to something likedashboard.rs. lmk if it's not okay with you.Testing
integration-testsconfig (that's also how I generated the screenshot). The timestamps and price values update often, as expected. The rightmost "last local update time" column displays "no data" unil updates start flowing in from example publisher, as expected. In the screenshot you can notice the column contains a newer timestamp, demonstrating that the local data is more up to date ...as expected.Review Highlights
metrics.rs- I am not 100% happy with the readability of the code that generates the HTML and does lookups in the local/global stores. As I move this code to its own module, It will probably make more sense.