Skip to content

Logging structured data #5240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 4, 2020
Merged

Conversation

thk123
Copy link
Contributor

@thk123 thk123 commented Feb 21, 2020

On #5237 it was pointed out it is a bit clunky to have to switch between message handler UIs. This PR attempts to simplify this by allowing sending XML to the message handler, and it will convert it to JSON or plain text if those UIs are in play.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • [na] My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Thoughts?

@thk123 thk123 added the RFC Request for comment label Feb 21, 2020
@thk123 thk123 force-pushed the logging-structured-data branch 2 times, most recently from ee747d9 to 146502e Compare February 21, 2020 17:43
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rename xmlt to structured_messaget or something now it's the all-purpose structured-log-message type

#include <algorithm>
json_objectt from_xml(xmlt data);

std::string xml_to_pretty(const xmlt &data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that xml_to_pretty isn't outputting json, I'd be tempted to call this new module something slightly different, e.g. xml_conversion.{cpp,h} or xml_output_conversion.{cpp,h} or something similar? Also, given that both these two functions are (at a high level) "reproduce XML in some other output format", why are the names of the two so different, and why does one take a const xmlt &, but the other takes a plain xmlt ? I'd ideally like to see both of these as const xmlt & inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by removing this file

Copy link
Contributor

@chrisr-diffblue chrisr-diffblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like this a lot - it's a relatively "simple" change to the interfaces and opens the scope for a whole lot of other code to get cleaned up - many thanks :-)

I'd like to see the naming and type signatures cleaned up a bit if possible, just to try to keep things as consistent as possible, so that's the only reason I'm requesting changes on this PR.

For additional bonus karma though, it would be super cool if the xml -> {whatever} conversion could also do a bit of name formatting conversion, e.g. converting element names from XML style (e.g. hyphenated words, like current-unwinding)to JSON style formatting (e.g. camel-case currentUnwinding) when converting to JSON. I appreciate that might be hard/impossible to do robustly, but if it is possible it would be really helpful in keeping the XML/JSON/plaintext output looking stylistically consistent with other parts of code that output XML/JSON/etc. This part is really not a blocking comment though - definitely in the unreasonable wish category :-)

Copy link
Member

@peterschrammel peterschrammel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might remove a few cases, but not all. Eg the json api supports streaming of arrays and objects.

@@ -4,6 +4,6 @@ main.c
^EXIT=10$
^SIGNAL=0$
"messageText": "VERIFICATION FAILED"
"currentUnwinding": 1
"current-unwinding": "1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything-is-a-string is problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved: data is now a jsont so in particular can be a json_numbert

Copy link
Contributor

@danpoe danpoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good step towards cleaning up the logging code. As mentioned by @chrisr-diffblue, it would be nice if names could be output according to xml and json conventions.

}
xmlt data{"current-unwinding"};
data.data = std::to_string(unwind);
log.statistics() << data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a bit surprising that the result here is converted to a different format. Could we introduce a structured_messaget type as suggested by @smowton? It could well be implemented via xmlt for the time being (i.e., contain a member xmlt and just delegate to that). Then we could add a print(const structured_messaget &msg) method to ui_message_handlert which converts a structured_messaget to the appropriate format. The existing print() methods could be left unchanged.

@thk123 thk123 force-pushed the logging-structured-data branch from 146502e to ca4d244 Compare February 26, 2020 14:31
@thk123
Copy link
Contributor Author

thk123 commented Feb 26, 2020

Based on the broadly positive reception, I have tidied this up. It now uses an intermediate structure (essentially nested key/value pairs). It also addresses:

  • naming (xml now uses kebab case, pretty printing uses regular spacing, json uses camelCase)
  • numbers as numbers not strings for json

Hope this is still of use, ready for re-review @chrisr-diffblue @peterschrammel

@thk123 thk123 force-pushed the logging-structured-data branch 3 times, most recently from 724abf3 to 4d6fdfa Compare February 26, 2020 17:04
@thk123 thk123 force-pushed the logging-structured-data branch from 504a4e3 to 4ca6b57 Compare February 27, 2020 10:00
@thk123 thk123 removed the RFC Request for comment label Feb 27, 2020
@thk123 thk123 changed the title RFC: Logging structured data Logging structured data Feb 27, 2020
Copy link
Contributor

@chrisr-diffblue chrisr-diffblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a few queries 👍


xmlt to_xml() const;
jsont to_json() const;
std::string to_pretty() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more consistent to call this to_plain ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure? .pretty is the terminology used for converting irep to a plain text representation? Happy to change if you are sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly don't mind :-) Just asking the question, and your answer seems equally valid.

return output_data;
}

xmlt structured_datat::to_xml() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One design question I have (which I honestly don't know what the right answer is :-) ) is - who's responsibility is it to do the conversion from structured_data_entryt to {xml,json,plain} ? It feels like the knowledge of what (say) xml needs to look like (e.g. kebab_case) should be more closely located with the various XML utility functions. So perhaps the design should be that structure_data.{h,cpp} has a to_xm() to_json() and to_plain(), and those basically just be simple wrappers around other functions like xml_node(...) in the XML utility modules? Then all the knowledge of what XML should look like would be locate in the XML utility modules. Of course that means xml/json/etc modules need to know a little bit about structured_data... so it's not a clear choice either way.

Copy link
Contributor

@danpoe danpoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good, some additional suggestions

{
static structured_data_entryt data_node(const jsont &data);
static structured_data_entryt
entry(std::map<labelt, structured_data_entryt> children);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think structured_data_entryt unnecessarily exposes the fact that jsont and std::map are used to store the data and children. Could we instead have constructors which take string and int, and a method to add children? 🐝

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't agree with this - this allows you to easily create a constant structured_data using the existing type initialization C++ provides. Further, I'd either need to recreate constructors for all the json types, or limit what kind of json data you can output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third option: a very small handful of specialized constructors/helper functions for the "common" cases, but leaving the full exposure of jsont/std::map for any of the less common cases?

class structured_datat
{
public:
explicit structured_datat(std::map<labelt, structured_data_entryt> data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐝

return data.value;
}
std::map<labelt, structured_data_entryt>
structured_data_entryt::get_children() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could return a const reference

}
bool labelt::operator<(const labelt &other) const
{
return camel_case() < other.camel_case();
Copy link
Contributor

@danpoe danpoe Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could directly compare vectors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to prefer this? Here I rely on the library implemented alphabetical sort of strings, which seems simpler than writing a sort for vector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's already implemented in the standard library for vectors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible thing to be aware of, though not entirely sure if it applies here, and if so, whether it even matters or not... is that a lot of standard library sort routines on strings are actually locale sensitive... meaning you may/will get different sort orders on different platforms, or different users on the same platform if they change locale. If you want this to be at all stable in its sort order, you need to be aware of that. Locale sensitive sorting has caused issues with CI in the past (e.g. output from CBMC in different orders between CI and local runs)

.map(to_lower_string)
.collect<std::vector<std::string>>();
}
std::string labelt::camel_case() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newlines between methods (here and at other places)

this->components = make_range(components)
.map(to_lower_string)
.collect<std::vector<std::string>>();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make it a precondition that the inputs need to be in all lowercase? That would ensure better consistency in the uses of the API. Additionally we could add a precondition checking that there is at least one component (then we can get rid of the !components.empty() checks), and a precondition checking that each component is non-empty.

}
structured_datat data{{{labelt({"current", "unwinding"}),
structured_data_entryt::data_node(
json_numbert{std::to_string(unwind)})}}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems a bit too verbose for this common use case. Could we have some convenience methods for the most common cases (keys with 1-2 components, values of type string or int)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a nice to have - happy for you to do in a follow up PR but won't as part of this PR.

@thk123
Copy link
Contributor Author

thk123 commented Mar 3, 2020

@chrisr-diffblue @danpoe comments addressed

@thk123 thk123 force-pushed the logging-structured-data branch from 9652a79 to 077e545 Compare March 3, 2020 12:13
Copy link
Contributor

@chrisr-diffblue chrisr-diffblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Very many thanks for persevering with this! It sucks to volunteer a solution and then have lots of nits picked as a thanks, but you've stuck with it. I'm happy for this to go in if you could just fix up the comment headers/copyrights and update a couple of comments.

src/util/json.h Outdated
@@ -465,4 +467,18 @@ inline const json_stringt &to_json_string(const jsont &json)

bool operator==(const jsont &left, const jsont &right);

/// Convert the structured_data into an json object. For the example structured
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structured_data -> structured_datat - also there's no example structured data here now, as this comment has moved from it's original location (sorry...).

src/util/xml.h Outdated
bool operator==(const xmlt &a, const xmlt &b);
bool operator!=(const xmlt &a, const xmlt &b);

/// Convert the structured_data into an xml object. For the example structured
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the json version, structured_data -> structured_datat and the example structure data will likely need copying into this comment as well. (sorry again...)

@@ -0,0 +1,167 @@
// Copyright 2016-2020 Diffblue Limited. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the proper CBMC comment header, and no Diffblue copyright.

@@ -0,0 +1,86 @@
// Copyright 2016-2020 Diffblue Limited. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the proper CBMC comment header, and no Diffblue copyright.

@@ -0,0 +1,14 @@
// Copyright 2016-2020 Diffblue Limited. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be a Diffblue copyright, just authorship.

@@ -0,0 +1,179 @@
// Copyright 2016-2020 Diffblue Limited. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be a Diffblue copyright, just authorship.

@@ -0,0 +1,93 @@
// Copyright 2016-2020 Diffblue Limited. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be a Diffblue copyright, just authorship.

@thk123 thk123 force-pushed the logging-structured-data branch from 077e545 to 95d0c36 Compare March 3, 2020 13:43
@thk123
Copy link
Contributor Author

thk123 commented Mar 3, 2020

@chrisr-diffblue done if you want to re-check otherwise will merge on green CI

@chrisr-diffblue
Copy link
Contributor

Thanks @thk123 - could I be a real pain and ask for proper Module titles in the comment headers, rather than blanks? a git grep Module: in your CBMC checkout should give your a good flavour of what's been done else where. If thats too much hassle after all this, then feel free to merge.

@thk123 thk123 force-pushed the logging-structured-data branch from d1a8afb to c0c513d Compare March 4, 2020 10:02
@thk123
Copy link
Contributor Author

thk123 commented Mar 4, 2020

apt-get failure - restarting cbmc-linux-clang

@thk123 thk123 force-pushed the logging-structured-data branch from c0c513d to 1879a8d Compare March 4, 2020 11:04
@thk123 thk123 merged commit 1e6ce0b into diffblue:develop Mar 4, 2020
@thk123 thk123 deleted the logging-structured-data branch March 4, 2020 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants