![]() |
<Database Tutorial> |
[ Basics | Intermediate | Advanced ]
<bigwig> is equipped with an internal lightweight database capable of storing all of <bigwig>'s native values. In this way, external shared variables are accessed and manipulated as local ones, presenting the programmer with one uniform concept of data manipulation. A variable is made "shared" (a.k.a. persistent/global/static) by prepending its declaration with the type modifier shared. A shared variable is shared among all session threads. Among the native values are composite values of type tuple, relation, and vector.
For the moment, our solution is based on a very general iteration operator called factor, but we are currently integrating an external database with a subset of SQL into <bigwig>. As can be seen in the "SQL macro tutorial", <bigwig>'s syntactic macro abstraction mechanism can be used to clothe our solution as standard SQL queries.
service { // A Page Counter
session Counter() {
shared int n;
/* initially zero by default */
n++; // increase visible to all session threads
exit (html) n;
}
} |
The variable n declared in this example is prefixed with the type modifier shared. Shared variables are shared among all session threads. Thus, when one session thread updates it, subsequent reads in other sessions will get this latest written value. All <bigwig> services have a their own dedicated (internal) database where shared variables are stored.
service {
session LastAccess() {
shared time last; // initially `notime'
time t; // initially `notime'
t = last;
last = now();
if (t==notime) t = now();
exit (html) t;
}
} |
This example just serves to underline that all of <bigwig>'s types (except file handles) can be made shared, even time. The shared variable last will hold the date and time the session was last accessed and exit this value onto the client's browser. Here, the time value is output by casting it to an html value yielding a default formatting as specified in the type conversion section in the reference manual. Various getX functions (where X is {"Year", "Month", "Day", "Hour", "Minute", "Second", "Weekday"}) exist for referencing the components of a time value. These can be used to format time values more appropriately.
service {
html makeGuestBook(vector string w) {
int i;
html H = <html><ul><[guests]></ul></html>;
html GuestDoc = <html>
<li><[guest]>
<[guests]>
</html>;
for (i=0; i<|w|; i++) {
/* |w| is the length of vector w */
H = H <[guests = GuestDoc <[guest = w[i]]];
}
return H;
}
session Sign() {
shared vector string v;
html SignDoc = <html>
Please sign the guest book:<br>
<input type="text" name="guest">
</html>;
string s;
show SignDoc receive [s = guest];
v = v + vector { s }; // vector constant
exit makeGuestBook(v); // call-by-value
}
} |
This guestbook service has a shared vector v intended to hold the names of all the people who have signed the guestbook. The Sign session initially outputs a document prompting the client for his name. This name is subsequently added to the end of the guestbook list (v). The expression "vector { s }" is a constant string vector of length one. Finally, a document showing all the names of the people who have signed the guestbook is constructed by a call to a function makeGuestBook and exited to the client. The call to the function makeGuestBook causes the vector argument to be copied (call-by-value) and the function will thus operate on this copy (referred to as w). The function will iterate through the vector and build a document by plugging the names into document templates producing a document holding the list of the names.
Notes on efficiency!
service {
shared vector int v;
session InefficientSum() {
int i, sum;
/* highly inefficient!!! */
for (i=0; i<|v|; i++) { // database lookup (v)
sum += v[i]; // database lookup (v)
}
exit (html) sum;
}
session EfficientSum() {
int i, sum;
vector int local_v;
local_v = v; // only one database lookup (v)
/* highly efficient! */
for (i=0; i<|local_v|; i++) { // mem. (local_v)
sum += local_v[i]; // memory lookup (local_v)
}
exit (html) sum;
}
} |
This service has two sessions both calculating the sum of the integer values in the shared vector v (assumed to hold some numbers to be summed). The first session InefficientSum shows how not to do this, calculating the sum in a highly inefficient way. The second session EfficientSum, however, does the same thing, only efficiently. The InefficientSum session iterates through the shared vector using a for statement. Notice the reference to v (written in red font) in the condition expression and in the statement body of the for statement. These two references both cause the shared vector v to be read from the database (disk), regardless of whether it has just been read (it could have been modified in between). Consequently, the for statement will produce two database lookups for every iteration, yielding a very inefficient calculation. The session EfficientSum, however, looks up v (in the database) and assigns (by value) this vector to a local integer vector variable called local_v and performs the sum calculation on this local copy avoiding the many database lookups and yielding a much more efficient calculation. Thus, whenever it is required to iterate through a shared vector structure, it is preferable to ``work on a local copy''.
service {
session S() {
schema Person { // Declare a "Person" schema
bool is_male;
int age;
string name;
}
tuple Person p; // Declare p as a Person-tuple
/* t = tuple {is_male=false, age=0, name=""} */
p = tuple { is_male=true, age=42, name="John" };
/* p = tuple {is_male=true,age=42,name="John"} */
p.age++;
/* p = tuple {is_male=true,age=43,name="John"} */
...;
}
} |
This service defines a schema called Person which has three components, namely a boolean is_male, an integer age, and a string name. This schema is subsequently used to define a tuple p. Tuples correspond to structs in C and their components initially hold the initial values corresponding to their type. In this example, the variable p will initially have is_male to false, age to zero, and name to "" (the empty string). The first statement assigns to p a constant tuple expression with a schema that is compatible to that of Person. The next statement increases the age component of p by one (from 42 to 43).
Tuples are unordered
service {
session S() {
schema Person {
bool is_male;
int age;
string name;
}
tuple Person p1, p2;
p1 = tuple { is_male=true, age=42, name="John" };
p2 = tuple { name="John", is_male=true, age=42 };
if (p1 == p2) {
/* They are equal (i.e. exec. proceeds here) */
...;
}
}
} |
The purpose of this example is to illustrate that tuples are unordered, meaning that the if statement's condition expression will evaluate to true as the values held in p1 and in p2 are identical.
Tuple manipulation
service {
session S() {
schema A {
int n;
int m;
float f;
}
schema B {
int n;
string s;
}
schema C {
int n, m;
float f;
string s;
}
tuple A a;
tuple B b;
tuple C c;
a = tuple { n=42, f=3.14, m=7 };
b = tuple { s="foo", n=87 };
c = a << b; // tuple left-overwrite
/* c = tuple { n=87, f=3.14, m=7, s="foo"} */
c.s = "bar";
/* c = tuple { n=87, f=3.14, m=7, s="bar"} */
b = c \+ (n, s); // tuple project
/* b = tuple { n=87, s="bar"} */
...;
}
} |
This example is not useful as a service but merely meant to illustrate the functionality of the two tuple operators "<<" (">>") and "\+" ("\-"). Three schemas A, B, and C are defined and used to declare three tuple variables a, b, and c. The first and second statement, assign tuple constants to a and b. The third statement assigns to c the ``tuple left overwrite'' of a and b. This operation yields a tuple with a schema that is the union of the schemas of the two arguments. All components with the same names are required to have the same types. The result will contain all the union of the components of the two arguments picking the right one whenever both are present in the two arguments. This operator also has a dual, ">>" that instead picks the component from the left argument in case they are present in both arguments. Thus, c will after this operation have 87 as its n component. The fourth statement assigns to the s component of c the constant string "bar". The fifth and final statement assigns to b, c projected onto the two components; n and s. A dual tuple operation "\-" exist that instead of naming the components to keep, names the ones to ``throw away''.
Vectors of tuples
service {
session S() {
int i, age_sum;
schema Person {
bool is_male;
int age;
string name;
}
vector Person v;
v = vector {
tuple { is_male=true, age=43, name="John" },
tuple { is_male=false, age=42, name="Jane" }
};
for (i=0; i<|v|; i++) {
age_sum += v[i].age;
}
/* Here, age_sum = 85 */
...;
}
} |
Schemas and vectors can also be used to define ``tuple vectors'', which are vectors with tuples as entries. This service defines a Person-vector variable v and assigns to it a constant vector holding two tuples. The for statement will subsequently iterate through this vector and calculate the sum of the age components in the vector in age_sum (producing 85).
service {
session S() {
int n;
schema Person {
bool is_male;
int age;
string name;
}
relation Person r;
r = relation {
tuple { is_male=true,age=43,name="John" },
tuple { is_male=false,age=42,name="Jane" },
tuple { is_male=false,age=42,name="Jane" }
};
n = |r|; // n is 2 (not 3)
}
} |
This service again defines the schema Person used in the previous examples. This time, this schema is used to define a relation r (of Persons). Relations differ from vectors in that there is no ordering on the (tuple) elements. Thus, the constant relation assigned to r in the first statement will immediately ``reduce'' to a relation of ``length'' 2, ignoring the multiplicity of the tuple mentioned twice in the constant relation. Consequently, the expression |r| evaluates to 2 and not 3. Relations can be seen as a set of tuples. The primary operation on relations is called factor and is discussed below.
service {
session S() {
int age_sum;
schema Person {
bool is_male;
int age;
string name;
}
relation Person r;
r = relation {
tuple { is_male=true, age=38, name="Homer" },
tuple { is_male=false, age=34, name="Marge" },
tuple { is_male=true, age=10, name="Bart" }
tuple { is_male=false, age=8, name="Lisa" }
tuple { is_male=false, age=1, name="Maggie" }
};
factor (r) {
age_sum += #.age;
}; // Note: semi-colon required
/* Here, age_sum is (38+34+10+8+1 =) 91 */
...;
}
} |
The simplest factor expression takes one argument which must be a relation (here r). The (statement) body of the factor expression will be executed precisely once for each tuple in the relation given as argument. In each iteration the special (read-only) variable ``#'' will be set to the value of the current tuple. Thus the above example will calculate the sum of the ages of the persons in the relation r.
Factor (and return)
service {
session S() {
int age_sum;
schema Person {
bool is_male;
int age;
string name;
}
relation Person r, s;
r = relation {
tuple { is_male=true, age=38, name="Homer" },
tuple { is_male=false, age=34, name="Marge" },
tuple { is_male=true, age=10, name="Bart" }
tuple { is_male=false, age=8, name="Lisa" }
tuple { is_male=false, age=1, name="Maggie" }
};
s = factor (r) {
if (#.is_male) {
return #; // Add current tuple to result.
}
};
/* Here, s contains `homer' and `bart'. */
...;
}
} |
A factor expression evaluates to a relation. In the previous example this resulting value was ignored. Here, however, we will assign this value to the relation (Person) variable s. In the (statement) body of a factor expression return statements are permitted. The value resulting from execution of a factor expression will be the union of all the tuples (or relations) returned in its body. Consequently, the value of s in the example will be all the tuples for which the is_male field is true (that is, `homer' and `bart'). The type of the tuples (or relations) returned are not required to be the same as ``#'', but they are required to all be of the same schema which will be the type of the factor expression as a whole.
Factor (and identifier arguments)
service {
session S() {
int n;
schema Person {
bool is_male;
int age;
string name, hair;
}
relation Person r;
r = relation {
tuple { is_male=true, age=38,
name="Homer", hair="none" },
tuple { is_male=false, age=34,
name="Marge", hair="blue" },
tuple { is_male=true, age=10,
name="Bart", hair="yellow" },
tuple { is_male=false, age=8,
name="Lisa", hair="yellow" },
tuple { is_male=false, age=1,
name="Maggie", hair="yellow" }
};
factor (r; is_male, hair) {
n++;
};
/* Here, n = 4. */
...;
}
} |
A variant of the factor expression takes a comma separated list of identifiers after a semi-colon following the first (relation) argument. The relation resulting from the evaluation of the expression argument is projected onto these attributes (which must name attributes in the relational argument) forming a new relation for which any duplicates are removed. The statement will then be executed once per tuple in this relation, setting "#" to the value of the current tuple. Thus, the factor expression will iterate through the relation:
relation {
tuple {is_male = true, hair = "none"},
tuple {is_male = false, hair = "blue"},
tuple {is_male = true, hair = "yellow"},
tuple {is_male = false, hair = "yellow"}
}
The first three tuple are derived from `Homer', `Marge', and `Bart', respectively, whereas the last tuple comes from both `Lisa' and `Maggie' since they are indistinguishable with respect to gender and hair colour (recall that speaking of the first and fourth tuple really does not make sense since tuples of relations are unordered). The body of the factor expression in the example simply increases an integer variable n in each iteration of the four tuples, leaving n with a final value of 4 after execution of the factor expression.
Factor (and `@')
service {
session S() {
schema Person {
bool is_male;
int age;
string name, hair;
}
relation Person r;
schema AgeName {
int age;
string name;
}
relation AgeName a;
r = relation {
tuple { is_male=true, age=38,
name="Homer", hair="none" },
tuple { is_male=false, age=34,
name="Marge", hair="blue" },
tuple { is_male=true, age=10,
name="Bart", hair="yellow" },
tuple { is_male=false, age=8,
name="Lisa", hair="yellow" },
tuple { is_male=false, age=1,
name="Maggie", hair="yellow" }
};
a = factor (r; is_male, hair) {
if (|@|==2) return @;
};
...;
}
} |
Inside the (statement) body of a factor expression, the special (read-only) variable ``@'' is available. It will for each tuple in the iteration contain a relation with a schema that is the attributes of the relation given to the factor expression as argument (here, is_male, age, name, and hair), but without the ones names in the identifier list (here is_male and hair). Thus, in this example, the type of ``@'' is a relation with schema age (int) and name (string). The value of ``@'' will in each iteration contain a relation with the contributions (with the attributes named in the identifier list projected away) of the tuples of the current tuple processed. The tuples `Homer', `Marge', and `Bart', all give rise to a ``@'' relation of size 1 containing these tuples' ages and names. The tuple derived from `Lisa' and `Maggie' will give rise to a ``@'' relation of size 2 (which is returned from the factor expression). Thus, after the factor expression, a will contain the following relation:
relation {
tuple { age = 8, name = "Lisa" },
tuple { age = 1, name = "Maggie" }
}
Factor (and mutiple relation arguments)
service {
session S() {
schema Person {
int age;
string name;
}
relation Person m, f, all;
m = relation {
tuple { age=38, name="Homer" },
tuple { age=10, name="Bart" }
};
f = relation {
tuple { age=34, name="Marge" },
tuple { age=8, name="Lisa" },
tuple { age=1, name="Maggie" }
};
all = factor (m, f) {
return #;
};
...;
}
} |
The factor expression can also take multiple (relation) arguments with the effect that the iteration will be performed on the intersection of the relations. The schema of this relation is thus the intersection of the two schemas (which in this case is Person since both arguments are of schema Person). Thus, the variable all will after the factor expression be:
relation {
tuple { age=38, name="Homer" },
tuple { age=34, name="Marge" },
tuple { age=10, name="Bart" },
tuple { age=8, name="Lisa" },
tuple { age=1, name="Maggie" }
}
Factor (and `@n')
service {
session S() {
schema Person {
int age;
string name;
}
relation Person p;
schema Account {
int amount;
string name;
}
relation Account a;
schema PersonAccount {
int age, amount;
string name;
}
relation Person pa;
p = relation {
tuple { age=38, name="Homer" },
tuple { age=34, name="Marge" },
tuple { age=10, name="Bart" },
tuple { age=8, name="Lisa" },
tuple { age=1, name="Maggie" }
};
a = relation {
tuple { name="Homer", amount=87 },
tuple { name="Marge", amount=42 },
tuple { name="Bart", amount=1 },
tuple { name="Lisa", amount=304 }
};
pa = factor (p, a) {
return cart(relation {#}, cart(@1, @2));
};
...;
}
} |
This example exhibits a factor expression with two (relation) arguments. The iteration will thus be performed on the intersection of the relations. The iteration will thus in this example have a schema that is name (string). The special (read-only) variables ``@1'' and ``@2'' will for each iteration contain the rest relations compared to the first and second arguments of the factor expression. Consequently, for the iteration where `#' is equal to tuple { name="Homer" }, @1 and @2 will respectively be the relations:
relation { tuple { age=38 } }
relation { tuple { amount=87 } }
Thus, the variable pa will after the factor expression be:
relation {
tuple { name="Homer", age=38, amount=87 },
tuple { name="Marge", age=34, amount=42 },
tuple { name="Bart", age=10, amount=0 },
tuple { name="Lisa", age=8, amount=304 }
}
|
bigwig@brics.dk Last updated: November 2, 2001 |
|