The X You Need To Understand As For Static Analysis
What are the differences between static analysis and (dynamic) testing?
Static analysis is performing before running the program.
Understand soundness, completeness, false negatives, and false positives.
Soundness means that the set of answers contains all of the correct ones, and completeness represents that all of the answers in the set are correct while there maybe exists some correct answers not being included in the set.
False negatives (FNs) represent the true ones yet not being recognized, while false positives (FPs) represent the fake ones yet being recognized as true.
Why soundness is usually required by static analysis?
Not being recognized is more serious than costing more.
How to understand abstraction and over-approximation?
Abstraction is using a more simple one to represent a set of ones, which is focused on the semantic values or simplifies the analysis, and over-approximation is merging several paths into a single one, which is using a more general way to conclude some situations, ensuring the soundness. (not perfect)
The relation between compilers and static analyzers
Understand 3AC and its common forms (in IR jimple)
Three-Address Code (3AC) is the basis of static analysis, which is a type of intermedia representation (IR). It contains at most 3 operators in one command. It has some extensive forms, such as SSA, which includes control flow information in its semantic information. Its common forms involve loop, do-while, class, and so on. (not perfect)
How to build basic blocks on top of IR?
Find all of the entries.
Entry: 1. Usually the first command; 2. The target of the jump operation; 3. The command follows the jump operation.
How to construct control flow graphs on top of BBs?
Find all of the edges.
Edge: 1. Conditional/Unconditional jump; 2. The original order excepts the blocks ended with an unconditional jump.
And then, add the entries and exists at the beginning of each control flow.
Understand the three data flow analyses:
reaching definitions
Based on each program points, analyse whether a definition can be spreaded from P to Q.
live variables
Based on each program points, analyse whether a variable is dead.
available expressions
Based on each program points, analyse whether 1) all paths passed involve x op y, and 2) there is no redefinition of x or y after the last x op y.
Can tell the differences and similarities of the three data flow analyses
Analysis Comparison
Reaching Definitions Live Variables Available Expressions Domain Set of definitions Set of variables Set of expressions Direction Forwards Backwards Forwards May/Must May May Must Boundary OUT[entry] = ∅ IN[exit] = ∅ OUT[entry] = ∅ Initialization OUT[B] = ∅ IN[B] = ∅ OUT[B] = U Transfer function OUT/IN = gen ∪ (IN/OUT - kill) OUT/IN = gen ∪ (IN/OUT - kill) OUT/IN = gen ∪ (IN/OUT - kill) Meet ∪ ∪ ∩ Understand the iterative algorithm and can tell why it is able to terminate
Find an application-specific solution to the may/must analysis.
Based on "OUT = gen ∪ (IN - kill)", We can clearly see that the main difference between OUT and IN is "gen - kill", and we know that "IN - OUT" >= 0, and "gen - kill" is unvariable, so there must exist a termiate of OUT.

- Understand the functional view of iterative algorithm
- The definitions of lattice and complete lattice
- Understand the fixed-point theorem
- How to summarize may and must analyses in lattices
- The relation between MOP and the solution produced by the iterative algorithm
- Constant propagation analysis
- Worklist algorithm


How to build call graph via class hierarchy analysis
Class hierarchy analysis (CHA) is an efficient way to construct call graph.
Initialize, remove m from worklist, judge m whether is in RM, if so, remove m from RM and Dispatch(cs) and go on.
Concept of interprocedural control-flow graph
ICFG = CFG + call & return edges
Concept of interprocedural data-flow analysis
Excepting introprocedural data flow, some data must be served as parameters crossing methods. So, there are data flows when methods calling and returning.
Interprocedural constant propagation
When it encountering returning a value as a constant, it must obey LHS to kill the local parameters.
What is pointer analysis?
Find which objects a pointer can point to.
Understand the key factors of pointer analysis
Heap abstraction, context sensitivity, flow sensitivity and analysis scope.
Understand what we analyze in pointer analysis
Local variables,
static field, instance field andarray elementsi.e., operations like new, assign, store, load, call.
Concept of context sensitivity (C.S.)
Concept of context-sensitive heap (C.S. heap)
Why C.S. and C.S. heap improve precision
Context-sensitive pointer analysis rules
Concept of information flow security
Confidentiality and integrity
Explicit flows and covert channels
Use taint analysis to detect unwanted information flow