US 7,530,107 B1
Systems, methods and computer program products for string analysis with security labels for vulnerability detection
Kouichi Ono, Tokyo (Japan); Mika Saito, Yamato (Japan); Naoshi Tabuchi, Tokyo (Japan); and Takaaki Tateishi, Yamato (Japan)
Assigned to International Business Machines Corporation, Armonk, N.Y. (US)
Filed on Dec. 19, 2007, as Appl. No. 11/960,153.
Int. Cl. G06F 21/00 (2006.01)
U.S. Cl. 726—25  [717/154] 1 Claim
OG exemplary drawing
 
1. In a computer system configured to analyze security-labeled strings and to detect vulnerability, a method consisting of:
receiving a program with security labels previously assigned to strings known to have a vulnerability, wherein the strings have an associated character set, given by Σ, and an associated security label set, given by Z, wherein a set of security-labeled characters is defined by a Cartesian product of the associated character set and the security label set, given by Σ×Z, and wherein a set of security labeled strings is defined by a set of sequences of the security labeled characters, given by (Σ×Z)*, wherein the program with security labels is given as (Fi∪C)×(Σ×Z)*, where Fi is an input function and C is a constant;
translating the program into a static single assignment form;
constructing a control flow graph from the static single assignment form, the control graph having basic blocks as nodes;
extracting instructions relating to string functions and object variables;
calculating pre-conditions of variables for the basic blocks, having goto instructions and jump instructions, the jump instructions having jump edges defined when a respective object variable is true and fall through edges defined when the respective object variable is false;
wherein calculating the pre-conditions of the variables for the basic blocks include obtaining a condition under which the program reaches each of the basic blocks;
extracting constraints among the variables subject to a rule set for translating pre-conditions;
solving the constraints and obtaining a set of strings that the object variables form as a context-free grammar given by L(G), from a regular language expression, given by L(R), to obtain the set of security-labeled strings, wherein the context-free grammar is obtained by applying a transducer given by T=(Σ×Z, Γ×Z, S, s0, ρ, ω) where Γ is a set of output characters, S is a set of states with s0 as an initial state, ρ is a state-transition function and ω is an output function such that ρεS×(Σ×Z)→S and ωεS×(Σ×Z)→(Γ×Z);
checking if the set of security-labeled strings satisfies a rule of the rule set for translating pre-conditions, thereby assuring that no vulnerability exists on the security labeled strings;
identifying locations in the program where a vulnerability is detected, the vulnerability being detected when no common element φ exists given by L(G)∩L(R)≠φ and no vulnerability being detected when the comment element φ does exist given by L(G)∩L(R)=φ; and
identifying a false detection of vulnerabilities in the security-labeled strings via rules that include a must rule type, a must not-rule type, a target variable and the security-labeled strings,
wherein the must rule indicates that a string assigned to the target variable matches with the regular expression with security labels, and the must-not rule indicates that the string assigned to the target variable does not match with the regular expression with security labels, thereby indicating the false detection.