Skip to content

Add certain types of indirect function calls to the C++ call graph #7520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gsingh93 opened this issue Jan 5, 2022 · 1 comment
Closed
Labels
C++ question Further information is requested

Comments

@gsingh93
Copy link
Contributor

gsingh93 commented Jan 5, 2022

I'm dealing with a codebase that makes use of a lot calls to functions pointers stored in global/static arrays, which results in the call graph (Function.calls) not being very helpful. Here's a small example:

typedef unsigned long uintptr_t;

void func1() {}
void func2() {}
void func3() {}
void func4() {}

struct table_entry_t {
  int other_data;
  uintptr_t func;
};

uintptr_t func_table[] = {(uintptr_t)&func1, (uintptr_t)&func2};

table_entry_t nested_func_table[] = {{0, (uintptr_t)&func3},
                                     {1, (uintptr_t)&func4}};

void foo() {
  for (int i = 0; i < sizeof(func_table) / sizeof(func_table[0]); i++) {
    ((void (*)())func_table[i])();
  }

  for (int i = 0; i < sizeof(nested_func_table) / sizeof(nested_func_table[0]);
       i++) {
    ((void (*)())nested_func_table[i].func)();
  }
}

When I ask CodeQL to get the functions foo calls, I'd like it to return func1,func2, func3, and func4. I've solved this issue for now by implementing a class modeling function tables defined using array aggregate literals, which I'll show below, but I'm mainly opening this issue to ask whether CodeQL can add better support for these indirect pointers in the built-in call graph.

In the example below, I've defined FunctionTableArrayAggregateLiteral to model function tables, and then I add a custom edges predicate which checks not only for direct function calls with a.calls(b), but also indirect calls through a function table (technically I don't actually verify if the function pointer is called, but this hasn't been a problem for me so far):

import cpp

class FunctionTableArrayAggregateLiteral extends ArrayAggregateLiteral {
  FunctionTableArrayAggregateLiteral() { this.getAChild*() instanceof FunctionAccess }
  Function getFunctions() { result = this.getAChild*().(FunctionAccess).getTarget() }
}

query predicate edges(Function a, Function b) {
  a.calls(b)
  or
  exists(FunctionTableArrayAggregateLiteral functionTable |
    // The function table is accessed inside `a`
    functionTable.getEnclosingVariable().getAnAccess().getEnclosingFunction() = a and
    // All functions inside the table are added as edges
    b = functionTable.getFunctions()
  )
}

predicate getCallGraph(Function start, Function end, string startName, string endName) {
  edges+(start, end) and
  start.hasName(startName) and
  end.hasName(endName)
}

from Function start, Function end
where getCallGraph(start, end, "entry_func", end.getName())
select start, end

Of course this can lead to false positives in certain cases (just because one function in a function table is used doesn't mean all of them are), so I don't expect this exact type of solution to be used, but maybe something like it can be considered.

@gsingh93 gsingh93 added the question Further information is requested label Jan 5, 2022
@MathiasVP MathiasVP added the C++ label Jan 10, 2022
@MathiasVP
Copy link
Contributor

Hi @gsingh93,

Thanks for the issue. This is indeed something we would like to improve. For this specific case, your current solution is probably the best (or at least the easiest one). I don't think you'll get the behavior you want from Function.calls since that predicate mainly deal with syntax.

The predicate we're currently working on improving is the one you get from importing semmle.code.cpp.ir.dataflow.ResolveCall. You can use it like this:

import cpp
import semmle.code.cpp.ir.dataflow.ResolveCall

query predicate edges(Function a, Function b) {
  // resolve any calls inside the `a` function.
  resolveCall(any(Call call | call.getEnclosingFunction() = a)) = b
}

predicate getCallGraph(Function start, Function end, string startName, string endName) {
  edges+(start, end) and
  start.hasName(startName) and
  end.hasName(endName)
}

from Function start, Function end
where getCallGraph(start, end, "entry_func", end.getName())
select start, end

There are still some things missing from this library, however. For your example it doesn't give anything useful since it handles global variable initialization poorly (read: not at all). However, if you move the two function tables into foo like this:

void foo() {
  uintptr_t func_table[] = {(uintptr_t)&func1, (uintptr_t)&func2};

  for (int i = 0; i < sizeof(func_table) / sizeof(func_table[0]); i++) {
    ((void (*)())func_table[i])();
  }

  table_entry_t nested_func_table[] = {{0, (uintptr_t)&func3},
                                     {1, (uintptr_t)&func4}};

  for (int i = 0; i < sizeof(nested_func_table) / sizeof(nested_func_table[0]);
       i++) {
    ((void (*)())nested_func_table[i].func)();
  }
}

you get this:

edges
test.cpp:13:6:13:8 | foo | test.cpp:3:6:3:10 | func1 |
test.cpp:13:6:13:8 | foo | test.cpp:4:6:4:10 | func2 |

i.e., we now resolve the calls in the first loop. The calls in the second loop isn't resolved as it involves field read and writes, but you can hack around it by using a dataflow configuration like this:

import cpp
import semmle.code.cpp.ir.dataflow.DataFlow

// A configuration for finding function accesses flowing into function-pointer calls
class Conf extends DataFlow::Configuration {
  Conf() { this = "Conf" }

  override predicate isSource(DataFlow::Node source) { source.asExpr() instanceof FunctionAccess }

  override predicate isSink(DataFlow::Node sink) { sink.asExpr() = any(ExprCall call).getExpr() }
}

query predicate edges(Function a, Function b) {
  exists(FunctionAccess funcAccess, DataFlow::Node sink |
    // Flow from a function access to some sink (which is the expression of some `ExprCall`).
    any(Conf conf).hasFlow(DataFlow::exprNode(funcAccess), sink) and
    // And the call happens inside function `a`
    sink.getEnclosingCallable() = a and
    // And the function pointer is a pointer to function `b`
    b = funcAccess.getTarget()
  )
}

predicate getCallGraph(Function start, Function end, string startName, string endName) {
  edges+(start, end) and
  start.hasName(startName) and
  end.hasName(endName)
}

from Function start, Function end
where getCallGraph(start, end, "entry_func", end.getName())
select start, end

Again, because we're still improving the support for global variables on anything that starts by importing semmle.code.cpp.ir.*, we're not identifying the calls in your original example involving global variables. But for the one with local variables we get all the function pointer calls:

edges
test.cpp:13:6:13:8 | foo | test.cpp:3:6:3:10 | func1 |
test.cpp:13:6:13:8 | foo | test.cpp:4:6:4:10 | func2 |
test.cpp:13:6:13:8 | foo | test.cpp:5:6:5:10 | func3 |
test.cpp:13:6:13:8 | foo | test.cpp:6:6:6:10 | func4 |

I hope that helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants