Views
Language reference/Data types
From MCRL2
| Data types |
This page describes all built-in data types with their associated functions and operators, and describes how custom data types can be defined.
The BNF syntax description of data types can be found on its own page.
Contents |
Predefined functions
The following table lists predefined functions that are available for all data types:
| Equality | _==_ | Inequality | _!=_ | Conditional | if(_,_,_) | ||
| Less than | _<_ | Less than or equal | _<=_ | Greater than or equal | _>=_ | Greater than | _>_ |
For built-in data types (including structured types) the predefined functions are completely defined. For user defined data types, these are only partially defined (see the section on user-defined data types for detailed information).
Basic types
mCRL2 has a number of elementary datatypes, namely booleans (Bool), positive natural numbers (Pos), natural numbers (Nat), integers (Int) and real numbers (Real). Typical expression are true, false, x>17 && y<=24 and forall x:Nat. max(x,3)+5==min(y,8). All datatypes, including the standard data types, are internally represented using abstract data types. This has the advantage that numbers do not have a limited range. In particular, there is no largest number in any domain, and there are no smallest integers and reals.
The operators on boolean data types are given in the following table.
| True | true | False | false |
| Negation | !_ | Implication | _=>_ |
| Conjunction | _&&_ | Disjunction | _||_ |
| Universal quantification | forall _:_._ | Existential quantification | exists _:_._ |
Operators on numbers are given by the following table. In A2B the letters A and B stand for Pos, Nat, Int and Real. Of course, A and B cannot be the same. Note that mCRL2 is a strictly typed language, with a simple form of type coercion. Elements of type Pos can automatically become Nat, Nat can become Int and Int can become Real. For example 1 is of type Pos, but it can be used at any place where a number of an other type is expected. Internally, an explicit function of the form A2B(_) is written around it. The pretty printers of mCRL2 sometimes print this function, but sometimes omit it to improve readability.
| Positive numbers | 1, 2, 3, ... | Natural numbers | 0, 1, 2, 3, ... | Integers | ..., -2, -1, 0, 1,2, ... |
| Fractions | 0/1,0/2,0/3, ..., 1/1,1/2,1/3, ..., -1/1,-1/2,-1/3, ... 2/1,2/2,2/3, ..., -2/1,-2/2,-2/3, ... | ||||
| Minimum | min(_,_) | Maximum | max(_,_) | Absolute value | abs(_) |
| Negation | -_ | Successor | succ(_) | Predecessor | pred(_) |
| Addition | _+_ | Subtraction | _-_ | Multiplication | _*_ |
| Division | _/_ | Integer div | _div_ | Integer mod | _mod_ |
| Floor | floor(_) | Ceiling | ceil(_) | Rounding | round(_) |
| Exponentiation | exp(_,_) | Conversion | A2B(_) | ||
Lists
For every sort A the sort of lists over sort A can be made by writing List(A). The following operations are provided for lists:
| List enumeration | [_,...,_] | ||
| Element test | _in_ | Length | #_ |
| Cons | _|>_ | Snoc | _<|_ |
| Concatenation | _++_ | Element at position | _._ |
| The first element of a list | head(_) | The list without its first element | tail(_) |
| The last element of a list | rhead(_) | The list without its last element | rtail(_) |
The empty list is represented by an empty list enumeration, i.e. []. Also note that the lists [a,b], a |> [b] and [a] <| b are all equivalent.
Since abstract data types are used, it is not needed to provide a definition if no clear choice is possible. For instance the head of the empty list [] is not defined. In practice this means that head([]) will not be simplified any further. The _._ operator is zero-based.
Sets and bags
Sets and bags are common mathematical notions, and variants of lists, as they are often encountered in other specification languages. The types of sets and bags over a type A are declared by Set(A) and Bag(A). Sets can be denoted by explicit enumeration, e.g. {1,3,5,8} or by set comprehension. A typical example of a set is the set of all prime numbers, which can be written as
Bags are denoted by explicitly indicating the multiplicity of the elements. E.g. {a:1,b:2,c:3} is the bag with one a, two b's and three c's, and {a:1,b:1,b:1,c:1,c:2} denotes the same bag. The bag where each positive number n occurs n+1 times, is denoted as {n:Pos | n+1}.
Operators on sets are given by the following table.
| Set enumeration | {_,...,_} | Bag enumeration | {_:_,...,_:_} | Comprehension | {_:_|_} |
| Element test | _in_ | Bag multiplicity | count(_,_) | ||
| Subset/subbag | _<=_ | Proper subset/subbag | _<_ | Set complement | !_ |
| Union | _+_ | Difference | _-_ | Intersection | _*_ |
| Convert set to bag | Set2Bag(_) | Convert bag to set | Bag2Set(_) |
Structured types
Structured types are types that are typically used in functional programming languages. A structured type is of the following form:
struct c_1(p_1_1: S_1_1, ..., p_1_k1: S_1_k1)?r_1 c_2(p_2_1: S_2_1, ..., p_2_k2: S_2_k2)?r_2 ... c_n(p_n_1: S_n_1, ..., p_n_kn: S_n_kn)?r_n
This defines the type which we designate by S for brevity, together with the following functions:
- constructor functions c_i: S_i_1 # ... # S_i_ki -> S;
- projection functions p_i_j: S -> S_i_j;
- recogniser functions r_i: S -> Bool.
Projection and recogniser functions are optional; if a projection function is not specified the subsequent : symbol should also be left out, likewise if a recogniser function is not specified its preceding ? symbol should also be left out (see below for examples).
Note that structured sorts are often used in combination with sort references:
sort S = struct ... ;
This defines a structured sort which has S as an alias.
A well known and illustrative example of a structured sort is the definition of the tree data structure in which elements of some sort A can be stored:
sort Tree = struct leaf(A) | node(Tree,Tree);
This specifies that a tree is either an expression of the form leaf(a) where a is an expression of sort A, or a tree is an expression of the form node(u,v) where u and v are expressions of sort Tree.
The functions leaf: A -> Tree and node: Tree # Tree -> Tree are the constructors of sort Tree. Two expressions consisting of constructors of a structured sort are equal if and only if the constructor symbols are syntactically equal and the arguments are equal.
Projection and recogniser functions for the leaves and nodes of the tree can be declared as follows:
sort Tree = struct leaf(val: A)?is_leaf | node(left: Tree, right: Tree)?is_node;
This declares a sort Tree with:
- constructor functions leaf: A -> Tree and node: Tree # Tree -> Tree;
- projection functions val: Tree -> A and left,right: Tree -> Tree; on a tree of the shape leaf(a) the function val yields a, on a tree of the shape node(u,v), the functions left and right yield u and v, respectively;
- recogniser functions is_leaf,is_node: Tree -> Bool; on a tree t, is_leaf(t) yields true if and only if t is of the shape leaf(a), and is_node(t) yields true if and only if t is of the shape node(u,v).
An often used structured type is the enumerated type that consists of a finite number of elements. For instance, the sort MachineMode can be declared by
sort MachineMode = struct idle?is_idle | running?is_running | broken?is_broken;
A last common example is the sort of pairs for given sorts A and B.
sort Pair = struct pair(fst: A, snd: B);
For a pair p the expression fst(p) gives its first element and snd(p) the second element.
Functions
mCRL2 features the full use of functions. For example, the sort of functions from Nat to Nat is denoted by Nat -> Nat. Also functions from functions to functions, etc. are allowed, e.g. (Nat -> Nat)->(Nat -> Nat). There are two operators on functions, given in the following table.
| Function application | _(_,...,_) | Lambda abstraction | lambda _:_._ |
So, a function that multiplies two numbers, can be written as lambda n:Pos.lambda m:Pos.n*m. Note that the shorter lambda n,m:Pos.n*m is also allowed.
Functions are very useful to represent arrays. An array in which natural numbers can be stored is
sort Array = Nat -> Nat;
Getting the i-th element of an array f:Array is done by f(i), i.e. via ordinary function application. It is possible to define operations on an array as follows (see user-defined data below). For instance the update function update(i,m,f) that sets value m at position i in array f is defined by:
map update: Nat # Nat # Array -> Array; var i, n: Nat; f: Array; eqn update(i,n,f) = lambda j:Nat . if(i==j, n, f(j));
Where clauses
Where clauses may be used as an abbreviation mechanism in data expressions. A where clause is of the form e whr a0 = e0, . . . an = an end, where n >= 0. Here, e is a data expression and, for all i, 0 <= i < n, ai is an identifier and ei is a data expression. Expression e is called the body and each equation ai = ei is called a local definition. Each identifier ai is used as an abbreviation for ei in e, even if ai is already defined in the context. Also, an identifier ai may not occur in any of the expressions ej , 0 <= j < n. As a consequence, the order in which expressions occur is irrelevant.
Where clauses can be useful for efficiency reasons. For example, we can define a function that computes the square of the sum of two numbers as follows:
map square_sum: Int # Int -> Int; var x,y: Int; eqn square_sum(x,y) = (x+y) * (x+y);
When evaluating square_sum(e0,e1), this will rewrite to (e0+e1) * (e0+e1), which causes the expression e0+e1 to be evaluated twice.
Using a where clause we enforce that the right-hand sides of local definitions is evaluated exactly once:
var x,y: Int; eqn square_sum(x,y) = z*z whr z = x+y end;
Note that technically a where clause just introduces a beta-redex, so we could also have defined the following:
var x,y: Int; eqn square_sum(x,y) = (lambda z: Int. z*z)(x+y);
User-defined data
Besides using standard types and type constructors, there is a very direct way to define user defined sorts with a set of constructors and mappings. Internally, all standard types are defined in terms of these primitives.
For user defined sorts the predefined functions are only partially defined:
- The function if is completely defined.
- The operators !=, > and >= are defined in terms of ==, < and <=, respectively.
- The operators ==, < and <= are partially defined by x == x = true, x < x = false and x <= x = true. It is necessary to add equations to complete the definition to use these operators properly. In particular == is used in communication of actions using this data type. Furthermore, <= and < are required when using sets and bags of this data type.
Sorts can be declared by simply stating their existence:
sort D;A sort represents a set of elements that cannot be empty. There is no limit on the number of elements a sort can represent. It can be finite, but also countably or uncountably infinite.
Sorts can be derived from other sorts. By writing
sort E = D;
sort E becomes equal to sort D.
It is possible to define constructors for a sort NAT as in the following example where the natural numbers are defined as done in Peano arithmetic (and which is not the way the predefined sort Nat is defined internally in mCRL2, because it is far too inefficient):
sort NAT; cons zero: NAT; successor: NAT -> NAT;
If for some sort D there is a constructor with target sort D, we call the sort D a constructor sort. This implies that all terms of sort D can be written as the application of a constructor to subterms. In case of the sort NAT above, this subterm also consists of the application of a constructor. So, each element of sort NAT can be written as successor(successor(...successor(0)...)).
It is not possible that for some sort D there is only one constructor of the form
cons f: D -> D;
This implies that all terms consists of an unbounded sequence of applications of f, but this is not a proper term. Hence, no term with constructors can be made. So, as all elements of sort D should be equal to this non existing term, the domain D must be empty. But this is in contradiction with the fact that domains should at least contain one element. If a sort has no constructors, it can contain elements that can not be expressed by terms. Constructor domains always have at most a countably infinite number of elements. Non constructor sorts can have many more. Hence, the domain of real numbers can only be defined without constructors.
Besides constructors, auxiliary functions, generally called mappings, can be defined. For instance the definition of plus on the sort NAT can be declared as follows:
map plus: NAT # NAT -> NAT;
As plus is not a constructor, we know that each term of the form plus(t,u) can be written using constructors successor and zero only. We can use equations to make this relation explicit. Data types that are defined using constructors, mappings and equations are generally called equational abstract data types. The defining equations for plus are:
var n,m: NAT; eqn plus(n,zero) = n; plus(n,succesor(m)) = succesor(plus(n,m));
Using the keyword var the variables are declared that are used in the equation section immediately following the declaration.
The mechanism using map and equations is also very useful to define functions on built in data types. For instance defining a mapping count that counts the number of elements in a Tree as defined above, can be done as follows:
var a: A, t,t': Tree; eqn count(leaf(a)) = 1; count(node(t,t')) = count(t) + count(t');
The equations can also have conditions. For instance a successor function S on natural numbers that provides successors modulo some number N is written by:
eqn n < N -> S(n) = n + 1; n == N -> S(n) = 1;
The condition is written before an implication arrow.
The formal interpretation of equations is that terms are equal if they can be proven equal using the equations and standard equational reasoning. Furthermore, terms are not equal if assuming that they are equal leads to the conclusion that true equals false by equational reasoning.
However, the tools generally use term rewriting as the technique to reason with the equations. This means that the left hand side of each equation is syntactically matched to a term that must be rewritten. By doing this the variables in the left hand side get values. These values are substituted in the condition and the right hand side. Free variables are not allowed, in the sense that the variables that occur in the condition and the right hand side must occur in left hand side of each rewrite rule (note that not all rewriters check this). If the condition rewrites to true, the rewrite rule is applied and the term matching the lhs of the equation is replaced by the instantiated right hand side. If the condition rewrites to anything else but true, the rewrite rule is not applied.
The mCRL2 toolset contains two innermost and two just-in-time (jitty) rewrite strategies. The innermost rewriters apply the rewrite rules depth first on a term. It rewrites this subterm first to normal form, which is a term to which no further rewrite rule can be applied. This has the disadvantage that unnecessary rewriting is done. For example, in if(b,t1,t2) first t1 and t2 are rewritten, and depending on whether b is true or false only one of these is selected. A better strategy is to first rewrite b, and based on its value to determine whether t1 or t2 must be rewritten. This is what the jitty rewriter does.
For both rewriters there is a compiling version that translates the rewrite rules to C code, which is subsequently compiled and dynamically loaded in the main programme. The performance of the compiling rewriter is much better than that of the interpreting rewriters, and its use is strongly recommended for tasks which are rewriting intensive, such as the generation of a state space. Unfortunately, due to the lack of standard compilers, the compiling rewriters are not available on the Windows platform. This is also the reason that the compiling rewriters are not the default in the tools.
Because equations are dealt with as rewrite rules, there is a pitfall. The left hand side of an equation is matched on a term. However, the matching only works if the term has the same shape as the left hand side of the equation. So, writing a count function on Tree as done above is ok, because trees in normal form have as main function symbols either leaf or node. However, with natural numbers, a compact and very effective internal representation is used, which causes the following to go astray:
map Fib: Nat -> Nat; var n: Nat; eqn Fib(0) = 1; Fib(1) = 1; Fib(n + 2) = Fib(n) + Fib(n + 1);
The terms Fib(1), Fib(2), etc. will not be rewritten, because the internal representation of 1 is different from the 1 in the equation. Hence, they do not match. More fundamental is that 2 in no way has the shape n+2 and no matching will take place here either. The solution for this is to avoid pattern matching in the left hand side, if there is uncertainty about the shape of the normal forms of terms. In the concrete case above, it is strongly recommended to write:
map Fib: Nat -> Nat; var n: Nat; eqn n == 0 -> Fib(n) = 1; n == 1 -> Fib(n) = 1; n >= 2 -> Fib(n) = Fib(Int2Nat(n - 1)) + Fib(Int2Nat(n - 2));
Note that subtraction of two natural numbers yields an integer. In order to get the types right, a conversion function Int2Nat is required.
Language reference
| Processes
|
Copyright © 2005-2012 Technische Universiteit Eindhoven.
