KIV Specifications for the VeriCode Code Generation

This page documents the KIV formalization of

Gerhard Schellhorn, Stefan Bodenmüller, Wolfgang Reif:
VeriCode: Correct Translation of Abstract Specifications to C Code.
Submitted to iFM 2024.

Project Overview gives an overview of the entire specification hierarchy.

KIV-Projects always link to an overview which gives the dependency graph of specifications. Clicking on one specification gives its text, and links to a theorem base summary (which lists names of theorems) and a theorem base listing (which shows the full sequent for each theorem). From there individual theorems and the full details of each proof are accessible.

The KIV project is a first specification that attempts to mechanize the results from our paper submitted to iFM 2024. It is still far from complete and may contain bugs. Nevertheless, it sets up the main specifications and datatypes needed for verifying the main theorems. Compared to the paper, where well-typedness is taken for granted, the formal specifications make well-typedness constraints explicit by defining predicates. Here is an overview over the main specifications, bottom-up:

Specifications of Syntax

Sort contains the data type of sorts: the primitive sort Atom, and the sorts L and P of lists and pairs.
OpXProc defines functions, variables and procedure symbols, together with their sorts.
Expr defines expressions (which each have a sort), and a predicate wt? that captures well-typed expressions.
Accessform defines access forms that satisfy predicate acc? together with functions that split an accessform into its assigned variable (asgvar) and its selector chain (selchain), together with the inverse function (mkacc) that constructs one from a variable and a selector chain. Access forms are used as left hand sides of assignments in concrete programs.
Exprpreds defines various predicates on expressions that define restricted classes. Concrete programs use these restrictions to define legal programs.
RHS defines right hand sides. Right hand sides are used in source programs (unrestricted) and in target programs, where they must be flat. Predicate rhs.crhs? expresses this constraint.
Prog defines programs. We use one type of program that is general enough to capture both abstract and concrete programs. Well-formed abstract/concrete programs must satisfy the predicates wfs? or wft?, respectively. Primitive functions vars (free variables), letv (variables bound by let's), asgv (assigned variables, including reference parameters of procedures) are defined as well.

Specifications of the Semantics of Abstract Programs

Sem-OpProc defines the algebra needed. Type U is the disjoint union of all carrier sets for each sort. The elements of the carrier set for sort Atom are constructed from elements of a type At (which includes the booleans tt and ff, and the 'void' atom written '○' in the paper) using an embedding ⌜ . ⌝. The semantics of constants c, functions f and procedures proc are c ^A, f ^A and proc ^A. proc^A(u,u0, u1, u2) holds, iff a call proc[e,x] where e evaluates to u and v(x) = u0 terminates, produces a valuation v' where v'(x) = u1 (the reference parameter has been updated), and returns u2. An element of the carrier set for sort L(s) is constructed from a list of elements [u1 ... un] (all of type s) with mklist(s, [u1, ...un]). Note that the sort of elements has to be given since the list may be empty. Similarly, an element of the carrier set for sort P(s1, s2) is constructed with mkpair.
Sem-Expr gives the abstract semantics ⟦e⟧(v) of expressions e using well-typed (predicate wt?) valuations v: Map(X,U). Again, the expression must be defined (predicate def?), which requires that head is not called on an empty list.
Sem-RHS gives the semantics of right hand sides ⟦rhs⟧(v,v',u'). Again, there is a def? predicate.
Sem-Prog defines the small-step semantics of abstract programs as a relation config ==> config' on configurations. An abstract configuration ⟨p,(v,u)⟩a consists of a program p, a valuation v and a computed value u (which may be void).

Specifications of the Semantics of Concrete Programs

Sem-Accessform defines well-defined access forms: wdsc?(u, sc) holds if the selectors of the selector chains sc can be applied without selecting head from the empty list. For a well-defined selector chain, the substructure of u at the selector chain can be selected with u[sc]. It can be replaced with another element u' by calling u[sc := u'].
OHeap defines the heap structure used in the concrete semantics as an instance of the library for separation logic used in KIV (maplets, separating conjunction etc. are defined there). The instance defines the objects O that are stored under references (of type Ref(O)), so the heap is of type Map(Ref(O), O), with the constraint that the map may never store nullref: Ref(O). The specification defines iterated separating conjunction Π* (needed for lists). The type Val consists of references and primitive values. An object of type O is a container (either a list or a pair) that is constructed from a list 'elems' of values (of type Val), which consists of elements (two for type s = P(s1, s2), any number for type s = L(s0)). It is constructed with mk(elems, s). The constructor abstracts from the internal structure of the heap object, which may be different for each type. It is just relevant that it stores the contained elements and allows updating one of them. The specification also defines wt?(val, s)(h) which checks that the heap h stores a well-defined structure of type s under val (and this structure makes up the entire heap h). The specification also defines structural equality (val1 == val2)(h) (the paper defines this via the abstraction relation, here a direct definition is given that can be proven to be equivalent).
Sem-CExpr is the semantics ⟦e⟧(st,h) of concrete expressions. It uses a heap h: Heap(O) and a store st: Map(X,Val). The semantics is defined, when e satisfies predicates def?(e, st, h) and legal?(e). The latter (see specification Exprpreds) defines expressions that do not allocate memory (used e.g. as the conditon of an if statement).
abs-Val defines the abstraction relation abs(val, u)(h) that asserts that heap h stores the data structure u under val. absNull(selchain, val, u) is defined too, which asserts the same except that the substructure under sc in the heap is the null reference. predicate leadsto(r,r0)(h) asserts that the structure stored under reference r in heap h includes reference r0.
Sem-CRHS specifies the semantics of concrete right hand sides as ⟦rhs⟧(st, h, st', h', val). The paper writes ⟨rhs, st, h⟩ => ⟨st', h', val⟩ instead. The semantics is defined (predicate def?) for specific right hand sides (which in particular uses flat expressions only) only, that use variables allocated in st obnly, and again make no attempt to compute the head of an empty list. It uses a similar convention for pre- and post-conditions as the one for programs (see next item).
Finally Sem-CProg defines the small-step semantics of concrete programs: conf ==> conf'. A concrete configuration conf has the form ⟨cp, (st, h, val)⟩c with a concrete program cp, a store st, a heap h, and a result value val. Transitions have to satisfy a precondition defined as 'pre(hP)(cp, st, h, val)' for each program construct cp (where it is relevant). The axiom that defines the precondition is always pre-construct-def. The precondition uses a heap predicate hP for the heap part that is not affected by the transition. There is a successful execution of the transition iff the precondition holds (axiom pre-construct). All successful executions (to conf' = ⟨cp', (st', h', val')⟩c) satisfy the postconditon post(hP)(cp, st, h, val)(cp', st', h', val') (axiom sem-construct). The postcondition is defined with axiom post-construct-def.
It should be noted, that the semantics of a call proc[x,y] (as part of Sem-CRHS) states that when initially abs(st[x],u)(h) and abs(st[x],u0)(h) hold, every u1, u2, such that proc^A(u, u0, u1, u2) holds, implies that there is an execution, such that afterwards abs(st[y],u1) and abs(val', u2) hold. This is stronger than just asserting that there is at least an execution with some u1,u2. This is adequate since we allow a procedure to be nondeterministic, even though the small programming language we define does not have a nondeterministic construct (it would be easy to add a construct like "p1 or p2").

Specifications of Procedure Declarations

There are tentative specifications of the semantics of procedure declarations on the abstract and concrete level. So far these are adequate for nonrecursive procedures. For (mutually) recursive procedures a least fixpoint of a set of procedure declarations would be necessary.

Compile Function, Abstraction Relation, Lemmas and Theorems

compile-aux defines some auxiliary functions and programs for translating abstract to concrete programs, in particular routines that flatten programs. The programs often use a set 'forbs' of variables as a reference parameter, that are already used and forbidden when new variables are needed. When a new variable of sort s is created with new(forbs,s), the variable is always immediately added to forbs.
compile contains the main compile# function, which first calls shiftLeftFlatten# and then translate#. The first corresponds to the first two steps described in the paper, the second to the last step. The programs are simplified versions of what is implemented in the code generator (which does shiftLeft first, but does not fully flatten, since this is unnecessary when producing Scala Code. Stronger flattening is done in another pass that is specific to producing C programs).
The translation creates bindings: List[(X,E)] which are a list of pairs of variables x and expressions e that are used in Let(x, e, ...). Specification Sem-bindings defines some auxiliary functions on these, in particular ⟦bindings⟧(v) adds them to a valuation (given that the variables are disjoint, and the expressions evaluated over v).
Specification abs defines the abstraction relation abs(v, st)(h) (the predicate is overloaded here, the paper defines 'Abs' with a capital letter) between abstract and concrete states. The 'theorems' section of the specification gives tentative lemmas and theorems that we intend to prove (subject to corrections).
The translation is sufficient to prove several lemmas that correspond to entries in the paper. All lemmas and proofs can be found in the theorem base of the specification. The theorem bases of other specifications (open the specification, then go to the theorem base) contain lots of lemmas already, most of them related to well-typedness of results.
- Lemmas compile-asg-cons-to-other-var-atomic, compile-asg-cons-to-same-var-atomic, compile-asg-cons-to-same-var-nonatomic, compile-asg-setFirst-to-same-var-nonatomic, compile-call, compile-let-tail-tail give results for compiling simple abstract programs (some of these are also given in the paper).
- Lemmas ac-asgconstosamevar, ac-asgtailtosamevar, and ac-call are cases of the main lemma (part 1) that show that executing an abstract step can be simulated by a positive number of concrete steps. Operator EF+(conf, λ conf'. pred(conf')) asserts that there is a positive number of steps of the small-step semantics (starting from conf) that reaches a state conf' satisfying predicate pred.
- Lemmas ca-asgconstosamevar, ca-asgtailtosamevar, and ca-call are cases of the other direction of the main lemma in the paper. They show that executing the compiled code always leads to a state (here with an empty program) that corresponds to the execution of an abstract step.

Used basic libraries:

The basic KIV library with basic data types.
KIV separation library with a formalization of Separation Logic.

Back to the overview of all KIV projects.
For general informations on KIV and a download site for the whole system refer to this page.

[Imprint] [Data Privacy Statement] [E-Mail]