Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codeql js taint tracking - write "recursive" additional taint step #19098

Open
DSimsek000 opened this issue Mar 24, 2025 · 2 comments
Open

codeql js taint tracking - write "recursive" additional taint step #19098

DSimsek000 opened this issue Mar 24, 2025 · 2 comments
Assignees
Labels
javascript Pull requests that update Javascript code question Further information is requested

Comments

@DSimsek000
Copy link

I have the following code:

function source() {
    fn0();
    fn1();
}

function fn0() {
    const a = process.env.SECRET;
    sink(a);
}

function fn1() {
    fn0();
}

In this example everything coming from the env is tainted. However, I do not want to find a taint path that starts at fn0 (where we have the env propread), but from source.

I came up with the following config that uses isAdditionalFlowStep to check whether the current function has a call to a function where an env propread is made:

module Config implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    exists(Function f |
      f.getName() = "source" and
      (
        source.asExpr() = f.getAParameter()
        or
        source.(DataFlow::FunctionNode).getFunction() = f
      )
    )
  }

  predicate isSink(DataFlow::Node node) {
    exists(DataFlow::CallNode cn |
      cn.getAnArgument() = node and
      cn.getCalleeName() = "sink"
    )
  }

  predicate isAdditionalFlowStep(DataFlow::Node fromNode, DataFlow::Node toNode) {
    exists(DataFlow::FunctionNode fnEnv, DataFlow::PropRead propRead |
      propRead = toNode and
      propRead.getPropertyName() = "SECRET" and
      propRead.asExpr().getEnclosingFunction() = fnEnv.getFunction() and
      fromNode.(DataFlow::InvokeNode).getACallee() = fnEnv.getFunction()
    )
    or
    toNode.(DataFlow::InvokeNode).getEnclosingFunction() =
      fromNode.(DataFlow::FunctionNode).getFunction()
  }
}

This finds the taint path source -> fn0, but not source->fn1->fn0.
I am not sure how to restructure the predicate to introduce these taint steps in a recursive manner.

@DSimsek000 DSimsek000 added the question Further information is requested label Mar 24, 2025
@mbg
Copy link
Member

mbg commented Mar 24, 2025

Hi @DSimsek000 👋🏻

It sounds like you want to identify functions as sources which (transitively) call a function that reads an environment variable?

I'd suggest starting with a predicate to identify functions which access environment variables (based on what you have in your isAdditionalFlowStep):

predicate readsFromEnvironment(Function f, PropAccess p) {
  p.getPropertyName() = "SECRET" and
  p.getEnclosingFunction() = f
}

You can write a predicate to check if one function calls another:

Function getACaller(Function callee) {
  exists(DataFlow::InvokeNode node |
    node.getACallee() = callee and
    node.getEnclosingFunction() = result
  )
}

And then find functions which call such a function:

from Function caller, Function callee
where
  readsFromEnvironment(callee, _) and
  caller = getACaller+(callee)
select caller, callee

This gets you two results (source -> fn0 and fn1 -> fn0). Note that source -> fn0 covers both the path directly to fn0 as well as via fn1.

If you further want to see every path from a function that (transitively) calls another that reads from environment variables, then you could look at CallGraph.qll for inspiration.

@mbg mbg self-assigned this Mar 24, 2025
@mbg mbg added the javascript Pull requests that update Javascript code label Mar 24, 2025
@DSimsek000
Copy link
Author

Hey @mbg, thanks for the suggestion! That definitely helps, and I also just noticed the bug in my isAdditionalFlowStep predicate that I shared above. I forgot to add the taint propagator from CallNode to FunctionNode:

toNode.(DataFlow::FunctionNode).getFunction() = fromNode.(DataFlow::InvokeNode).getACallee()

I think thats why source -> fn1 -> fn0 was not shown.

Here is the fixed version that "preserves" this information about the call graph in a path query. I am not sure if this is the best way to solve the problem, but a quick test showed that it seems to work in the example above.

predicate isAdditionalFlowStep(DataFlow::Node fromNode, DataFlow::Node toNode) {
     (
      exists(DataFlow::FunctionNode fnEnv, DataFlow::PropRead propRead |
        propRead = toNode and
        propRead.getPropertyName() = "SECRET" and
        propRead.asExpr().getEnclosingFunction() = fnEnv.getFunction() and
        fromNode.(DataFlow::InvokeNode).getACallee() = fnEnv.getFunction()
      )
      or
      toNode.(DataFlow::InvokeNode).getEnclosingFunction() =
        fromNode.(DataFlow::FunctionNode).getFunction()
      or
      toNode.(DataFlow::FunctionNode).getFunction() = fromNode.(DataFlow::InvokeNode).getACallee()
    )
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
javascript Pull requests that update Javascript code question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants