-
Notifications
You must be signed in to change notification settings - Fork 280
Logging structured data #5240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging structured data #5240
Conversation
ee747d9
to
146502e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rename xmlt
to structured_messaget
or something now it's the all-purpose structured-log-message type
src/util/convert_xml_json.h
Outdated
#include <algorithm> | ||
json_objectt from_xml(xmlt data); | ||
|
||
std::string xml_to_pretty(const xmlt &data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that xml_to_pretty
isn't outputting json, I'd be tempted to call this new module something slightly different, e.g. xml_conversion.{cpp,h}
or xml_output_conversion.{cpp,h}
or something similar? Also, given that both these two functions are (at a high level) "reproduce XML in some other output format", why are the names of the two so different, and why does one take a const xmlt &
, but the other takes a plain xmlt
? I'd ideally like to see both of these as const xmlt &
inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved by removing this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like this a lot - it's a relatively "simple" change to the interfaces and opens the scope for a whole lot of other code to get cleaned up - many thanks :-)
I'd like to see the naming and type signatures cleaned up a bit if possible, just to try to keep things as consistent as possible, so that's the only reason I'm requesting changes on this PR.
For additional bonus karma though, it would be super cool if the xml -> {whatever} conversion could also do a bit of name formatting conversion, e.g. converting element names from XML style (e.g. hyphenated words, like current-unwinding
)to JSON style formatting (e.g. camel-case currentUnwinding
) when converting to JSON. I appreciate that might be hard/impossible to do robustly, but if it is possible it would be really helpful in keeping the XML/JSON/plaintext output looking stylistically consistent with other parts of code that output XML/JSON/etc. This part is really not a blocking comment though - definitely in the unreasonable wish category :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might remove a few cases, but not all. Eg the json api supports streaming of arrays and objects.
@@ -4,6 +4,6 @@ main.c | |||
^EXIT=10$ | |||
^SIGNAL=0$ | |||
"messageText": "VERIFICATION FAILED" | |||
"currentUnwinding": 1 | |||
"current-unwinding": "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything-is-a-string is problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved: data is now a jsont
so in particular can be a json_numbert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good step towards cleaning up the logging code. As mentioned by @chrisr-diffblue, it would be nice if names could be output according to xml and json conventions.
} | ||
xmlt data{"current-unwinding"}; | ||
data.data = std::to_string(unwind); | ||
log.statistics() << data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a bit surprising that the result here is converted to a different format. Could we introduce a structured_messaget
type as suggested by @smowton? It could well be implemented via xmlt
for the time being (i.e., contain a member xmlt
and just delegate to that). Then we could add a print(const structured_messaget &msg)
method to ui_message_handlert
which converts a structured_messaget
to the appropriate format. The existing print()
methods could be left unchanged.
146502e
to
ca4d244
Compare
Based on the broadly positive reception, I have tidied this up. It now uses an intermediate structure (essentially nested key/value pairs). It also addresses:
Hope this is still of use, ready for re-review @chrisr-diffblue @peterschrammel |
724abf3
to
4d6fdfa
Compare
504a4e3
to
4ca6b57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just a few queries 👍
src/util/structured_data.h
Outdated
|
||
xmlt to_xml() const; | ||
jsont to_json() const; | ||
std::string to_pretty() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it more consistent to call this to_plain
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure? .pretty
is the terminology used for converting irep to a plain text representation? Happy to change if you are sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly don't mind :-) Just asking the question, and your answer seems equally valid.
src/util/structured_data.cpp
Outdated
return output_data; | ||
} | ||
|
||
xmlt structured_datat::to_xml() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One design question I have (which I honestly don't know what the right answer is :-) ) is - who's responsibility is it to do the conversion from structured_data_entryt
to {xml,json,plain} ? It feels like the knowledge of what (say) xml needs to look like (e.g. kebab_case) should be more closely located with the various XML utility functions. So perhaps the design should be that structure_data.{h,cpp} has a to_xm()
to_json()
and to_plain()
, and those basically just be simple wrappers around other functions like xml_node(...)
in the XML utility modules? Then all the knowledge of what XML should look like would be locate in the XML utility modules. Of course that means xml/json/etc modules need to know a little bit about structured_data... so it's not a clear choice either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good, some additional suggestions
{ | ||
static structured_data_entryt data_node(const jsont &data); | ||
static structured_data_entryt | ||
entry(std::map<labelt, structured_data_entryt> children); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think structured_data_entryt
unnecessarily exposes the fact that jsont
and std::map
are used to store the data and children. Could we instead have constructors which take string and int, and a method to add children? 🐝
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't agree with this - this allows you to easily create a constant structured_data using the existing type initialization C++ provides. Further, I'd either need to recreate constructors for all the json types, or limit what kind of json data you can output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third option: a very small handful of specialized constructors/helper functions for the "common" cases, but leaving the full exposure of jsont/std::map for any of the less common cases?
class structured_datat | ||
{ | ||
public: | ||
explicit structured_datat(std::map<labelt, structured_data_entryt> data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐝
src/util/structured_data.cpp
Outdated
return data.value; | ||
} | ||
std::map<labelt, structured_data_entryt> | ||
structured_data_entryt::get_children() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could return a const reference
src/util/structured_data.cpp
Outdated
} | ||
bool labelt::operator<(const labelt &other) const | ||
{ | ||
return camel_case() < other.camel_case(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could directly compare vectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to prefer this? Here I rely on the library implemented alphabetical sort of strings, which seems simpler than writing a sort for vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's already implemented in the standard library for vectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible thing to be aware of, though not entirely sure if it applies here, and if so, whether it even matters or not... is that a lot of standard library sort routines on strings are actually locale sensitive... meaning you may/will get different sort orders on different platforms, or different users on the same platform if they change locale. If you want this to be at all stable in its sort order, you need to be aware of that. Locale sensitive sorting has caused issues with CI in the past (e.g. output from CBMC in different orders between CI and local runs)
.map(to_lower_string) | ||
.collect<std::vector<std::string>>(); | ||
} | ||
std::string labelt::camel_case() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newlines between methods (here and at other places)
this->components = make_range(components) | ||
.map(to_lower_string) | ||
.collect<std::vector<std::string>>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make it a precondition that the inputs need to be in all lowercase? That would ensure better consistency in the uses of the API. Additionally we could add a precondition checking that there is at least one component (then we can get rid of the !components.empty()
checks), and a precondition checking that each component is non-empty.
} | ||
structured_datat data{{{labelt({"current", "unwinding"}), | ||
structured_data_entryt::data_node( | ||
json_numbert{std::to_string(unwind)})}}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems a bit too verbose for this common use case. Could we have some convenience methods for the most common cases (keys with 1-2 components, values of type string or int)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a nice to have - happy for you to do in a follow up PR but won't as part of this PR.
@chrisr-diffblue @danpoe comments addressed |
9652a79
to
077e545
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Very many thanks for persevering with this! It sucks to volunteer a solution and then have lots of nits picked as a thanks, but you've stuck with it. I'm happy for this to go in if you could just fix up the comment headers/copyrights and update a couple of comments.
src/util/json.h
Outdated
@@ -465,4 +467,18 @@ inline const json_stringt &to_json_string(const jsont &json) | |||
|
|||
bool operator==(const jsont &left, const jsont &right); | |||
|
|||
/// Convert the structured_data into an json object. For the example structured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
structured_data
-> structured_datat
- also there's no example structured data here now, as this comment has moved from it's original location (sorry...).
src/util/xml.h
Outdated
bool operator==(const xmlt &a, const xmlt &b); | ||
bool operator!=(const xmlt &a, const xmlt &b); | ||
|
||
/// Convert the structured_data into an xml object. For the example structured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with the json version, structured_data
-> structured_datat
and the example structure data will likely need copying into this comment as well. (sorry again...)
src/util/structured_data.cpp
Outdated
@@ -0,0 +1,167 @@ | |||
// Copyright 2016-2020 Diffblue Limited. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the proper CBMC comment header, and no Diffblue copyright.
src/util/structured_data.h
Outdated
@@ -0,0 +1,86 @@ | |||
// Copyright 2016-2020 Diffblue Limited. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the proper CBMC comment header, and no Diffblue copyright.
@@ -0,0 +1,14 @@ | |||
// Copyright 2016-2020 Diffblue Limited. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be a Diffblue copyright, just authorship.
unit/util/structured_data.cpp
Outdated
@@ -0,0 +1,179 @@ | |||
// Copyright 2016-2020 Diffblue Limited. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be a Diffblue copyright, just authorship.
unit/util/xml.cpp
Outdated
@@ -0,0 +1,93 @@ | |||
// Copyright 2016-2020 Diffblue Limited. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be a Diffblue copyright, just authorship.
077e545
to
95d0c36
Compare
@chrisr-diffblue done if you want to re-check otherwise will merge on green CI |
Thanks @thk123 - could I be a real pain and ask for proper Module titles in the comment headers, rather than blanks? a |
d1a8afb
to
c0c513d
Compare
apt-get failure - restarting cbmc-linux-clang |
This handles printing a sequence of words in different formats
This will be used to log data, independent of what message handler is uses.
Add tests for capitalize function
This stops structured_datat growing if more formats are added
c0c513d
to
1879a8d
Compare
On #5237 it was pointed out it is a bit clunky to have to switch between message handler UIs. This PR attempts to simplify this by allowing sending XML to the message handler, and it will convert it to JSON or plain text if those UIs are in play.
Thoughts?