Basics of Static Analysis

#Static Analysis 2025-01-30

The X You Need To Understand As For Static Analysis

What are the differences between static analysis and (dynamic) testing?

Static analysis is performing before running the program.
Understand soundness, completeness, false negatives, and false positives.

Soundness means that the set of answers contains all of the correct ones, and completeness represents that all of the answers in the set are correct while there maybe exists some correct answers not being included in the set.

False negatives (FNs) represent the true ones yet not being recognized, while false positives (FPs) represent the fake ones yet being recognized as true.
Why soundness is usually required by static analysis?

Not being recognized is more serious than costing more.
How to understand abstraction and over-approximation?

Abstraction is using a more simple one to represent a set of ones, which is focused on the semantic values or simplifies the analysis, and over-approximation is merging several paths into a single one, which is using a more general way to conclude some situations, ensuring the soundness. (not perfect)
The relation between compilers and static analyzers
Understand 3AC and its common forms (in IR jimple)

Three-Address Code (3AC) is the basis of static analysis, which is a type of intermedia representation (IR). It contains at most 3 operators in one command. It has some extensive forms, such as SSA, which includes control flow information in its semantic information. Its common forms involve loop, do-while, class, and so on. (not perfect)
How to build basic blocks on top of IR?

Find all of the entries.

Entry: 1. Usually the first command; 2. The target of the jump operation; 3. The command follows the jump operation.
How to construct control flow graphs on top of BBs?

Find all of the edges.

Edge: 1. Conditional/Unconditional jump; 2. The original order excepts the blocks ended with an unconditional jump.

And then, add the entries and exists at the beginning of each control flow.
Understand the three data flow analyses:
- reaching definitions
  
  Based on each program points, analyse whether a definition can be spreaded from P to Q.
- live variables
  
  Based on each program points, analyse whether a variable is dead.
- available expressions
  
  Based on each program points, analyse whether 1) all paths passed involve x op y, and 2) there is no redefinition of x or y after the last x op y.

Can tell the differences and similarities of the three data flow analyses

Analysis Comparison

	Reaching Definitions	Live Variables	Available Expressions
Domain	Set of definitions	Set of variables	Set of expressions
Direction	Forwards	Backwards	Forwards
May/Must	May	May	Must
Boundary	OUT[entry] = ∅	IN[exit] = ∅	OUT[entry] = ∅
Initialization	OUT[B] = ∅	IN[B] = ∅	OUT[B] = U
Transfer function	OUT/IN = gen ∪ (IN/OUT - kill)	OUT/IN = gen ∪ (IN/OUT - kill)	OUT/IN = gen ∪ (IN/OUT - kill)
Meet	∪	∪	∩

Understand the iterative algorithm and can tell why it is able to terminate

Find an application-specific solution to the may/must analysis.

Based on "OUT = gen ∪ (IN - kill)", We can clearly see that the main difference between OUT and IN is "gen - kill", and we know that "IN - OUT" >= 0, and "gen - kill" is unvariable, so there must exist a termiate of OUT.

Understand the functional view of iterative algorithm
The definitions of lattice and complete lattice
Understand the fixed-point theorem
How to summarize may and must analyses in lattices
The relation between MOP and the solution produced by the iterative algorithm
Constant propagation analysis
Worklist algorithm

How to build call graph via class hierarchy analysis

Class hierarchy analysis (CHA) is an efficient way to construct call graph.

Initialize, remove m from worklist, judge m whether is in RM, if so, remove m from RM and Dispatch(cs) and go on.
Concept of interprocedural control-flow graph

ICFG = CFG + call & return edges
Concept of interprocedural data-flow analysis

Excepting introprocedural data flow, some data must be served as parameters crossing methods. So, there are data flows when methods calling and returning.
Interprocedural constant propagation

When it encountering returning a value as a constant, it must obey LHS to kill the local parameters.
What is pointer analysis?

Find which objects a pointer can point to.
Understand the key factors of pointer analysis

Heap abstraction, context sensitivity, flow sensitivity and analysis scope.
Understand what we analyze in pointer analysis

Local variables, ~~static field~~, instance field and ~~array elements~~

i.e., operations like new, assign, store, load, call.
Concept of context sensitivity (C.S.)
Concept of context-sensitive heap (C.S. heap)
Why C.S. and C.S. heap improve precision
Context-sensitive pointer analysis rules
Concept of information flow security
Confidentiality and integrity
Explicit flows and covert channels
Use taint analysis to detect unwanted information flow