Skip to content

Conversation

@drozdziak1
Copy link
Contributor

@drozdziak1 drozdziak1 commented Mar 15, 2023

Overview

This change introduces a simple HTML table for quick reference about individual symbol status. The table is served using a new HTTP endpoint, under "/dashboard".

Screenshot

click here

Changes to existing code

  • The HTTP endpoint is spawned in agent.rs together with other async jobs. It reuses the logger and some of the channel tx's.
  • Macro recursion limit is raised to 256 (the new type-checked HTML crate is macro-heavy)
  • New dependencies
    • typed-html - ergonomic HTML manipulation

New code

  • metrics.rs - a new metrics- and dashboard oriented module. I plan to also add Prometheus metrics there and I'll probably exile dashboard generation logic to something like dashboard.rs. lmk if it's not okay with you.

Testing

  • The existing Pytest harness continues to pass.
  • I verified my code using the example publisher and the integration-tests config (that's also how I generated the screenshot). The timestamps and price values update often, as expected. The rightmost "last local update time" column displays "no data" unil updates start flowing in from example publisher, as expected. In the screenshot you can notice the column contains a newer timestamp, demonstrating that the local data is more up to date ...as expected.

Review Highlights

  • metrics.rs - I am not 100% happy with the readability of the code that generates the HTML and does lookups in the local/global stores. As I move this code to its own module, It will probably make more sense.

This change introduces a simple HTML table for quick reference about
individual symbol status. The table is served using a new HTTP
endpoint, under "/dashboard".
Copy link
Contributor

@jayantk jayantk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. I agree with ali's comments too, so i'll let him approve when you've addressed them.

pub struct MetricsServer {
/// Used to pull the state of all symbols in local store
local_store_tx: mpsc::Sender<Message>,
global_store_lookup_tx: mpsc::Sender<Lookup>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i presume in the future you're going to create another component that aggregates metrics over time (like gets sent info on each update or whatever to track in accumulators), and the data from that component will be accessible here as well?

I think the current information here is a good start to a dashboard, but I'm pretty sure that people will want to see metrics over some time interval. Most problems (like dropped RPC requests, etc) will happen sporadically and you'll want historical data to see whether or not they've been happening.

Copy link
Contributor Author

@drozdziak1 drozdziak1 Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state obtained by this dashboard will not benefit the metrics necessarily. Explanation below.

The main piece of Rust boilerplate for adding metrics is that the Rust Prometheus crate does not let you discover existing metrics easily from global metrics state. This state is kept in a Registry object, which is the top-level container used to gather and export formatted metrics in bulk. The preferred method for creating/referencing metrics is linking counter/gauge objects to the registry, and using the objects themselves as references to individual metrics, presumably right beside other state variables.

This pattern was reasonably easy for the attester, where we focused on the overall performance of the system and got away with storing a handful of static metrics as globals.

In the agent, we'll likely need to add the counter objects per-symbol dynamically and only make the common Registry static. I plan to extend global/local store state to hold the new objects and update them with incoming Update* messages. I'ii also take good a look at Richard's comments and screenshots to get an idea of what could work well.

Comment on lines +164 to +194
let price_string = if let Some(global_data) = price_data.global_data {
let expo = global_data.expo;
let price_with_expo: f64 = global_data.agg.price as f64 * 10f64.powi(expo);
format!("{:.2}", price_with_expo)
} else {
"no data".to_string()
};

let last_publish_string = if let Some(global_data) = price_data.global_data {
if let Some(datetime) =
NaiveDateTime::from_timestamp_opt(global_data.timestamp, 0)
{
datetime.format("%Y-%m-%d %H:%M:%S").to_string()
} else {
format!("Invalid timestamp {}", global_data.timestamp)
}
} else {
"no data".to_string()
};

let last_local_update_string = if let Some(local_data) = price_data.local_data {
if let Some(datetime) =
NaiveDateTime::from_timestamp_opt(local_data.timestamp, 0)
{
datetime.format("%Y-%m-%d %H:%M:%S").to_string()
} else {
format!("Invalid timestamp {}", local_data.timestamp)
}
} else {
"no data".to_string()
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpik: maybe in the future I move these to another method to separate the html rendering vs what data to display.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already broke ground on the metrics front, starting with rearranging the modules. It definitely looks better that way and is coming soon.

Copy link
Collaborator

@ali-behjati ali-behjati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I think it's good to wait a bit for Tom to review this too.
I left some comments and appreciate if you address them before merging.

@@ -1,3 +1,4 @@
#![recursion_limit = "256"]
Copy link
Collaborator

@ali-behjati ali-behjati Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add the reasoning as a comment here too?

@drozdziak1 drozdziak1 merged commit 9374b3f into main Mar 21, 2023
@drozdziak1 drozdziak1 deleted the drozdziak1/agent-dashboard branch March 21, 2023 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants