Basics of Static Analysis

The X You Need To Understand As For Static Analysis

  1. What are the differences between static analysis and (dynamic) testing?

    Static analysis is performing before running the program.

  2. Understand soundness, completeness, false negatives, and false positives.

    Soundness means that the set of answers contains all of the correct ones, and completeness represents that all of the answers in the set are correct while there maybe exists some correct answers not being included in the set.

    False negatives (FNs) represent the true ones yet not being recognized, while false positives (FPs) represent the fake ones yet being recognized as true.

  3. Why soundness is usually required by static analysis?

    Not being recognized is more serious than costing more.

  4. How to understand abstraction and over-approximation?

    Abstraction is using a more simple one to represent a set of ones, which is focused on the semantic values or simplifies the analysis, and over-approximation is merging several paths into a single one, which is using a more general way to conclude some situations, ensuring the soundness. (not perfect)

  5. The relation between compilers and static analyzers

    compiler.drawio (1)
  6. Understand 3AC and its common forms (in IR jimple)

    Three-Address Code (3AC) is the basis of static analysis, which is a type of intermedia representation (IR). It contains at most 3 operators in one command. It has some extensive forms, such as SSA, which includes control flow information in its semantic information. Its common forms involve loop, do-while, class, and so on. (not perfect)

  7. How to build basic blocks on top of IR?

    Find all of the entries.

    Entry: 1. Usually the first command; 2. The target of the jump operation; 3. The command follows the jump operation.

  8. How to construct control flow graphs on top of BBs?

    Find all of the edges.

    Edge: 1. Conditional/Unconditional jump; 2. The original order excepts the blocks ended with an unconditional jump.

    And then, add the entries and exists at the beginning of each control flow.

  9. Understand the three data flow analyses:

    • reaching definitions

      Based on each program points, analyse whether a definition can be spreaded from P to Q.

    • live variables

      Based on each program points, analyse whether a variable is dead.

    • available expressions

      Based on each program points, analyse whether 1) all paths passed involve x op y, and 2) there is no redefinition of x or y after the last x op y.

  10. Can tell the differences and similarities of the three data flow analyses

    Analysis Comparison

    Reaching Definitions Live Variables Available Expressions
    Domain Set of definitions Set of variables Set of expressions
    Direction Forwards Backwards Forwards
    May/Must May May Must
    Boundary OUT[entry] = ∅ IN[exit] = ∅ OUT[entry] = ∅
    Initialization OUT[B] = ∅ IN[B] = ∅ OUT[B] = U
    Transfer function OUT/IN = gen ∪ (IN/OUT - kill) OUT/IN = gen ∪ (IN/OUT - kill) OUT/IN = gen ∪ (IN/OUT - kill)
    Meet
  11. Understand the iterative algorithm and can tell why it is able to terminate

    Find an application-specific solution to the may/must analysis.

    Based on "OUT = gen ∪ (IN - kill)", We can clearly see that the main difference between OUT and IN is "gen - kill", and we know that "IN - OUT" >= 0, and "gen - kill" is unvariable, so there must exist a termiate of OUT.

image-20250203205208153

  1. Understand the functional view of iterative algorithm
  2. The definitions of lattice and complete lattice
  3. Understand the fixed-point theorem
  4. How to summarize may and must analyses in lattices
  5. The relation between MOP and the solution produced by the iterative algorithm
  6. Constant propagation analysis
  7. Worklist algorithm

image-20250204203429407

image-20250204203639383

  1. How to build call graph via class hierarchy analysis

    Class hierarchy analysis (CHA) is an efficient way to construct call graph.

    Initialize, remove m from worklist, judge m whether is in RM, if so, remove m from RM and Dispatch(cs) and go on.

  2. Concept of interprocedural control-flow graph

    ICFG = CFG + call & return edges

  3. Concept of interprocedural data-flow analysis

    Excepting introprocedural data flow, some data must be served as parameters crossing methods. So, there are data flows when methods calling and returning.

  4. Interprocedural constant propagation

    When it encountering returning a value as a constant, it must obey LHS to kill the local parameters.

  5. What is pointer analysis?

    Find which objects a pointer can point to.

  6. Understand the key factors of pointer analysis

    Heap abstraction, context sensitivity, flow sensitivity and analysis scope.

  7. Understand what we analyze in pointer analysis

    Local variables, static field, instance field and array elements

    i.e., operations like new, assign, store, load, call.

  8. Concept of context sensitivity (C.S.)

  9. Concept of context-sensitive heap (C.S. heap)

  10. Why C.S. and C.S. heap improve precision

  11. Context-sensitive pointer analysis rules

  12. Concept of information flow security

  13. Confidentiality and integrity

  14. Explicit flows and covert channels

  15. Use taint analysis to detect unwanted information flow