Skip to content

Conversation

@MathiasVP
Copy link
Contributor

This PR gets rid of operands as SSA variables for range analysis. Merely removing those SSA variables causes nontermination on Wireshark, and in a previous PR I reduced the issue to the snippet:

struct X { int n; };
void read_argument(const X *);

void nonterminating_without_operands_as_ssa(X *x) {
  read_argument(x);
  while (x->n) {
    x->n--;
  }
}

the problem is the PhiInstruction before the entry to the while loop that merges the incoming edge from (1) x->n-- and from (2) the initial value of x at the entry point of the function. The second operand of that PhiInstruction is an inexact operand which means that the IR has deduced that it's not guaranteed that the entire operand will be consumed by the PhiInstruction. This can be seen by the ~ prefix on the operand in the IR for the above:

4| void nonterminating_without_operands_as_ssa(X*)
4|   Block 0
4|     v4_1(void)           = EnterFunction                  :
4|     m4_2(unknown)        = AliasedDefinition              :
4|     m4_3(unknown)        = InitializeNonLocal             :
4|     m4_4(unknown)        = Chi                            : total:m4_2, partial:m4_3
4|     r4_5(glval<X *>)     = VariableAddress[x]             :
4|     m4_6(X *)            = InitializeParameter[x]         : &:r4_5
4|     r4_7(X *)            = Load[x]                        : &:r4_5, m4_6
4|     m4_8(unknown)        = InitializeIndirection[x]       : &:r4_7
5|     r5_1(glval<unknown>) = FunctionAddress[read_argument] :
5|     r5_2(glval<X *>)     = VariableAddress[x]             :
5|     r5_3(X *)            = Load[x]                        : &:r5_2, m4_6
5|     r5_4(X *)            = Convert                        : r5_3
5|     v5_5(void)           = Call[read_argument]            : func:r5_1, 0:r5_4
5|     m5_6(unknown)        = ^CallSideEffect                : ~m4_4
5|     m5_7(unknown)        = Chi                            : total:m4_4, partial:m5_6
5|     v5_8(void)           = ^BufferReadSideEffect[0]       : &:r5_4, ~m4_8
-|   Goto -> Block 1

6|   Block 1
6|     m6_1(int)        = Phi                : from 0:~m4_8, from 2:m7_7 <--- Here!
6|     m6_2(unknown)    = Phi                : from 0:m4_8, from 2:m7_8
6|     r6_3(glval<X *>) = VariableAddress[x] :
6|     r6_4(X *)        = Load[x]            : &:r6_3, m4_6
6|     r6_5(glval<int>) = FieldAddress[n]    : r6_4
6|     r6_6(int)        = Load[?]            : &:r6_5, m6_1
6|     r6_7(int)        = Constant[0]        :
6|     r6_8(bool)       = CompareNE          : r6_6, r6_7
6|     v6_9(void)       = ConditionalBranch  : r6_8
-|   False -> Block 3
-|   True -> Block 2

7|   Block 2
7|     r7_1(glval<X *>) = VariableAddress[x] :
7|     r7_2(X *)        = Load[x]            : &:r7_1, m4_6
7|     r7_3(glval<int>) = FieldAddress[n]    : r7_2
7|     r7_4(int)        = Load[?]            : &:r7_3, m6_1
7|     r7_5(int)        = Constant[1]        :
7|     r7_6(int)        = Sub                : r7_4, r7_5
7|     m7_7(int)        = Store[?]           : &:r7_3, r7_6
7|     m7_8(unknown)    = Chi                : total:m6_2, partial:m7_7
-|   Goto (back edge) -> Block 1

9|   Block 3
9|     v9_1(void)  = NoOp                 :
4|     v4_9(void)  = ReturnIndirection[x] : &:r4_7, m6_2
4|     v4_10(void) = ReturnVoid           :
4|     v4_11(void) = AliasedUse           : ~m5_7
4|     v4_12(void) = ExitFunction         :

And when one uses the getAnInput() predicate on PhiInstructions, only the exact operands are returned. So we were missing the phi input at this edge, and this made the modulus analysis nonterminating.

This PR fixes this by replacing getAnInput() with getAnOperand().getAnyDef(). This means we'll include those inexact operands as well.

Commit-by-commit review recommended:

  • The first commit deletes operands from range analysis SSA
  • The second commit implements the above small fix
  • The final commit does a bit of cleanup

@MathiasVP MathiasVP requested a review from a team as a code owner November 8, 2023 09:58
@github-actions github-actions bot added the C++ label Nov 8, 2023
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Nov 8, 2023
@aschackmull
Copy link
Contributor

Something looks potentially a bit off. This is just based on a cursory glance, but based on the qldoc fo getAnyDef then that doesn't sound like a proper way to implement getting a phi input. If we have any sort of bound on that input, and it isn't an actual exact match for the input (as the qldoc says very well might be the case), then I fear that we'll conclude a wrong bound for the phi.

@MathiasVP
Copy link
Contributor Author

Something looks potentially a bit off. This is just based on a cursory glance, but based on the qldoc fo getAnyDef then that doesn't sound like a proper way to implement getting a phi input. If we have any sort of bound on that input, and it isn't an actual exact match for the input (as the qldoc says very well might be the case), then I fear that we'll conclude a wrong bound for the phi.

Thanks for the comments. I think the QLDoc is slightly misleading. Let me try to give some more context:

getAnyDef and getDef really doesn't do very different things. This is the definition of getDef(): https://github.com/github/codeql/blob/main/cpp/ql/lib/semmle/code/cpp/ir/implementation/aliased_ssa/Operand.qll#L85. The only difference is the condition involving MustExactlyOverlap. This condition is what prevents phi.getAnInput() from returning m4_8 in:

4 | Block 0:
4 |   m4_8(unknown)    = InitializeIndirection[x]       : &:r4_7
...
6 | Block 1:
6 |   m6_1(int)        = Phi                : from 0:~m4_8, from 2:m7_7

because:

@MathiasVP MathiasVP requested a review from rdmarsh2 November 8, 2023 15:26
@rdmarsh2
Copy link
Contributor

rdmarsh2 commented Nov 8, 2023

In this case it works, because the inexact operand is coming from the InitializeIndirection, whcih is just an unknown-size array of the same type, but I think Schack is right that it won't work in general. Consider the following:

void f(int x, int b) {
  float *p = &x;

  if (b) 
  {
    *p = 10.0;
  } else {
    x = 10;
  }
  // phi here has an inexact read from a float at a different type.
  printf("%d\n", x);
}

I think the phi node will end up with upper and lower bounds of 10 even though the float value is bit-converted to an int and will be wildly different.

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Nov 8, 2023

Hmm... But isn't that the behavior we'd get anyway on main because we have the (inexact) operand as the SSA variable?

I'm not even sure we support float* -> int* conversion in the current version of the library. I can't produce any constant bound on x in the above example.

@MathiasVP
Copy link
Contributor Author

So I think that, if anything is wrong on this PR, then I think it'll also be wrong currently on main anyway @rdmarsh2?

@MathiasVP
Copy link
Contributor Author

I had a good discussion with @aschackmull about this PR on Zoom today, and we came to roughly the same conclusion as you @rdmarsh2. Schack will open an alternative PR (that keeps inexact operands as SSA variables) which will supersede this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants