Skip to content

associate variables' names to tmp_post #6847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

qinheping
Copy link
Collaborator

@qinheping qinheping commented May 10, 2022

For pointer dereference (*ptr++) with side effect, the existing CBMC goto_convert_side_effect.cpp creates a temporary variables tmp_post and later dereference tmp_post instead of ptr. One problem of this approach is that the temporary variable tmp_post lose all the information about the original pointer ptr. So the trace cannot correctly tell which pointers cause the failure.

This patch associate the name of the original variable to the temporary variable, so that the trace will reflect a more precise cause of failure.

Example:

int memcmp(const void *s1, const void *s2, size_t n)
{
  int res=0;
  const unsigned char *sc1=s1, *sc2=s2;
  for(; n!=0; n--)
  {
    res = (*sc1++) - (*sc2++);
    if (res != 0)
      return res;
  }
  return res;
}

In the above function, if the out-of-boundary check fails on the pointer dereference (*sc1++), the trace will give the violated property as

dereference failure: pointer outside object bounds in *tmp_post

However, without looking at the goto program, users will have no clue what is tmp_post. With this patch, the violated property will become

dereference failure: pointer outside object bounds in *tmp_post_memcmp::1::sc1

where the suffix of the temporary variable indicate the cause of the failure.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • [N/A] The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • [N/A] My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • [N/A] White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to please add a test? Such a test would probably go in the regression/goto-cc-cbmc/ and would either use a pattern checking for the expected name in the counterexample trace or could use --show-goto-functions.

@codecov
Copy link

codecov bot commented May 10, 2022

Codecov Report

Merging #6847 (5819620) into develop (5bf57ef) will increase coverage by 0.21%.
The diff coverage is 100.00%.

❗ Current head 5819620 differs from pull request most recent head b1f440a. Consider uploading reports for the commit b1f440a to get more accurate results

@@             Coverage Diff             @@
##           develop    #6847      +/-   ##
===========================================
+ Coverage    77.78%   78.00%   +0.21%     
===========================================
  Files         1567     1567              
  Lines       179769   187228    +7459     
===========================================
+ Hits        139840   146039    +6199     
- Misses       39929    41189    +1260     
Impacted Files Coverage Δ
src/goto-programs/goto_convert_side_effect.cpp 95.54% <100.00%> (+0.05%) ⬆️
src/util/find_symbols.h 61.53% <0.00%> (-38.47%) ⬇️
src/ansi-c/ansi_c_language.h 66.66% <0.00%> (-8.34%) ⬇️
src/solvers/flattening/boolbv.h 57.69% <0.00%> (-4.81%) ⬇️
jbmc/src/java_bytecode/convert_java_nondet.cpp 76.85% <0.00%> (-2.60%) ⬇️
src/ansi-c/ansi_c_language.cpp 94.49% <0.00%> (-2.02%) ⬇️
src/goto-analyzer/taint_analysis.cpp 76.95% <0.00%> (-1.79%) ⬇️
src/analyses/custom_bitvector_analysis.cpp 54.24% <0.00%> (-1.72%) ⬇️
src/analyses/goto_rw.h 73.80% <0.00%> (-1.20%) ⬇️
src/goto-instrument/dump_c.cpp 79.32% <0.00%> (-1.13%) ⬇️
... and 72 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fee89cc...b1f440a. Read the comment docs.

std::string suffix = "post";
if(auto sym_expr = expr_try_dynamic_cast<symbol_exprt>(tmp))
{
const irep_idt &base_name = ns.lookup(sym_expr->get_identifier()).base_name;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang-format:

-      const irep_idt &base_name = ns.lookup(sym_expr->get_identifier()).base_name;
+      const irep_idt &base_name =
+        ns.lookup(sym_expr->get_identifier()).base_name;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I applied the fix.

@qinheping qinheping force-pushed the convert_side_effect_with_variable_names branch from 8a3244f to 18f65f5 Compare May 11, 2022 23:09
@qinheping
Copy link
Collaborator Author

Would it be possible to please add a test? Such a test would probably go in the regression/goto-cc-cbmc/ and would either use a pattern checking for the expected name in the counterexample trace or could use --show-goto-functions.

I added a test that match the name of the checked variable against tmp_post_ptr.

@qinheping qinheping force-pushed the convert_side_effect_with_variable_names branch 2 times, most recently from fe258c1 to 5819620 Compare May 11, 2022 23:28
@SaswatPadhi
Copy link
Contributor

SaswatPadhi commented May 12, 2022

I like this idea, but I think we should generalize this beyond just symbols to be more useful.
Currently, the failed assertion also shows the source line number, so that helps in identifying the source expression behind a tmp_post intermediate.
The proposed fix only works for intermediates derived from a symbol, i.e. if op is a symbol, as you check with a try_dynamic_cast. So it wouldn't be super useful for intermediates for function calls and other expressions.

Would it be possible to keep these tmp_post variables around as they are, but in the assertion that's printed show the precise subexpression that violated the assertion?
I'm just thinking out loud. It may not be super easy to implement.

EDIT: Sorry! I was typing on my phone and accidentally hit "Close PR" :|

@SaswatPadhi SaswatPadhi reopened this May 12, 2022
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please not that there appears to be stray whitespace at the end of lines. https://github.com/diffblue/cbmc/runs/6400807830?check_suite_focus=true says which ones, but arguably the cppcheck report is a bit more useful on this occasion as whitespace changes are difficult to understand in unified diff output...

@tautschnig
Copy link
Collaborator

I like this idea, but I think we should generalize this beyond just symbols to be more useful. Currently, the failed assertion also shows the source line number, so that helps in identifying the source expression behind a tmp_post intermediate. The proposed fix only works for intermediates derived from a symbol, i.e. if op is a symbol, as you check with a try_dynamic_cast. So it wouldn't be super useful for intermediates for function calls and other expressions.

Would it be possible to keep these tmp_post variables around as they are, but in the assertion that's printed show the precise subexpression that violated the assertion? I'm just thinking out loud. It may not be super easy to implement.

A challenge here is that the assertions are only generated after the transformations touched in this PR, so the code generating assertions does not even have access to the original expression anymore. Now it may well be possible to work around that in the following way (but this is just a sketch of an idea, I haven't tried this): For any introduced temporary symbol, set the pretty_name to be the string representation (as computed by from_expr_using_mode) of the original expression. I have no idea whether this breaks any other tooling, but I'm curious what the test added by @Herbping would look like if following that approach.

@SaswatPadhi
Copy link
Contributor

Thanks @tautschnig.

I'm curious what the test added by @Herbping would look like if following that approach.

Would be great if the from_expr_using_mode approach gives *(ptr++) and the assertion appears as:

[main.pointer_dereference.5] line 8 dereference failure: pointer outside object bounds in *(ptr++): FAILURE

@qinheping
Copy link
Collaborator Author

Thanks @tautschnig.

I'm curious what the test added by @Herbping would look like if following that approach.

Would be great if the from_expr_using_mode approach gives *(ptr++) and the assertion appears as:

[main.pointer_dereference.5] line 8 dereference failure: pointer outside object bounds in *(ptr++): FAILURE

Good idea! I will draft another PR to realize this approach.

if(auto sym_expr = expr_try_dynamic_cast<symbol_exprt>(tmp))
{
const irep_idt &base_name =
ns.lookup(sym_expr->get_identifier()).base_name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to ns.lookup(*sym_expr) for stronger typing!

@qinheping
Copy link
Collaborator Author

I created a new PR#6855 which I think is a better approach for this problem.

@qinheping qinheping closed this May 13, 2022
@tautschnig tautschnig reopened this May 25, 2022
@tautschnig
Copy link
Collaborator

Re-opening as #6855 needs a lot more testing (and perhaps discussion). @Herbping could you please address the whitespace issues so that this one can be merged?

@qinheping qinheping force-pushed the convert_side_effect_with_variable_names branch from 5819620 to b1f440a Compare May 25, 2022 22:00
@qinheping
Copy link
Collaborator Author

qinheping commented May 25, 2022

Re-opening as #6855 needs a lot more testing (and perhaps discussion). @Herbping could you please address the whitespace issues so that this one can be merged?

Sounds great @tautschnig. I applied Daniel's comment and addressed the whitespace issue.

@tautschnig tautschnig merged commit c6f83a2 into diffblue:develop May 25, 2022
@qinheping qinheping deleted the convert_side_effect_with_variable_names branch January 18, 2023 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants