- <body>
- <html>
- <head>
- <meta http-equiv="Content-Language" content="en-us">
- <meta name="GENERATOR" content="Microsoft FrontPage 5.0">
- <meta name="ProgId" content="FrontPage.Editor.Document">
- <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
- <title>Saffron overview</title>
- <style>
- A:link { color:#000066; }
- A:visited { color:#666666; }
- A.clsIncCpyRt,
- A.clsIncCpyRt:visited,
- P.clsIncCpyRt {
- font-weight:normal; font-size:75%; font-family:verdana,arial,helvetica,sans-serif;
- color:black;
- text-decoration:none;
- }
- A.clsLeftMenu,
- A.clsLeftMenu:visited {
- color:#000000;
- text-decoration:none;
- font-weight:bold; font-size:8pt;
- }
- A.clsBackTop,
- A.clsBackTop:visited {
- margin-top:10; margin-bottom:0;
- padding-bottom:0;
- font-size:75%;
- color:black;
- }
- A:hover,
- A.clsBackTop:hover,
- A.clsIncCpyRt:hover,
- A:active { color:blue; }
- A.clsGlossary {
- font-size:10pt;
- color:green;
- }
- BODY { font-size:80%; font-family:verdana,arial,helvetica,sans-serif; }
- BUTTON.clsShowme,
- BUTTON.clsShowme5 {
- font-weight:bold; font-size:11; font-family:arial;
- width:68; height:23;
- position:relative; top:2;
- background-color:#002F90;
- color:#FFFFFF;
- }
- DIV.clsBeta {
- font-weight:bold;
- color:red;
- }
- DIV.clsDocBody { margin-left:10px; margin-right:10px; }
- DIV.clsDocBody HR { margin-top:0; }
- DIV.clsDesFooter { margin:10px 10px 0px 223px; }
- DIV.clsFPfig { font-size:80%; }
- DIV.clsHi {
- padding-left:2em;
- text-indent:-2em
- }
- DIV.clsShowme { margin-bottom:.5em; margin-top:.5em; }
- H1{
- font-size:145%;
- margin-top:1.25em; margin-bottom:0em;
- }
- H2 {
- font-size:135%;
- margin-top:1.25em; margin-bottom:.5em;
- }
- H3 {
- font-size:128%;
- margin-top:1em; margin-bottom:0em;
- }
- H4 {
- font-size:120%;
- margin-top:.8em; margin-bottom:0em;
- }
- H5 {
- font-size:110%;
- margin-top:.8em; margin-bottom:0em;
- }
- H6 {
- font-size:70%;
- margin-top:.6em; margin-bottom:0em;
- }
- HR.clsTransHR {
- position:relative; top:20;
- margin-bottom:15;
- }
- P.clsRef {
- font-weight:bold;
- margin-top:12pt; margin-bottom:0pt;
- }
- PRE {
- background:#EEEEEE;
- margin-top:1em; margin-bottom:1em; margin-left:0px;
- padding:5pt;
- }
- PRE.clsCode, CODE.clsText { font-family:'courier new',courier,serif; font-size:130%; }
- PRE.clsSyntax { font-family:verdana,arial,helvetica,sans-serif; font-size:120%; }
- SPAN.clsEntryText {
- line-height:12pt;
- font-size:8pt;
- }
- SPAN.clsHeading {
- color:#00319C;
- font-size:11pt; font-weight:bold;
- }
- SPAN.clsDefValue, TD.clsDefValue { font-weight:bold; font-family:'courier new' }
- SPAN.clsLiteral, TD.clsLiteral { font-family:'courier new'; }
- SPAN.clsRange, TD.clsRange { font-style:italic; }
- SPAN.clsShowme {
- width:100%;
- filter:dropshadow(color=#000000,OffX=2.5,OffY=2.5,Positive=1);
- position:relative;
- top:-8;
- }
- TABLE { font-size:100%; }
- TABLE.clsStd {
- background-color:#444;
- border:1px none;
- cellspacing:0;
- cellpadding:0
- }
- TABLE.clsStd TH,
- BLOCKQUOTE TH {
- font-size:100%;
- text-align:left; vertical-align:top;
- background-color:#DDD;
- padding:2px;
- }
- TABLE.clsStd TD,
- BLOCKQUOTE TD {
- font-size:100%;
- vertical-align:top;
- background-color:#EEE;
- padding:2px;
- }
- TABLE.clsParamVls,
- TABLE.clsParamVls TD { padding-left:2pt; padding-right:2pt; }
- #TOC { visibility:hidden; }
- UL UL, OL UL { list-style-type:square; }
- .clsHide { display:none; }
- .clsShow { }
- .clsShowDiv {
- visibility:hidden;
- position:absolute;
- left:230px; top:140px;
- height:0px; width:170px;
- z-index:-1;
- }
- .#pBackTop { display:none; }
- #idTransDiv {
- position:relative;
- width:90%; top:20;
- filter:revealTrans(duration=1.0, transition=23);
- }
- /*** INDEX-SPECIFIC ***/
- A.clsDisabled {
- text-decoration:none;
- color:black;
- cursor:text;
- }
- A.clsEnabled { cursor:auto; }
- SPAN.clsAccess { text-decoration:underline; }
- TABLE.clsIndex {
- font-size:100%;
- padding-left:2pt; padding-right:2pt;
- margin-top: 17pt;
- }
- TABLE.clsIndex TD {
- margin:3pt;
- background-color:#EEEEEE;
- }
- TR.clsEntry { vertical-align:top; }
- TABLE.clsIndex TD.clsLetters {
- background-color:#CCCCCC;
- text-align:center;
- }
- TD.clsMainHead {
- background-color:#FFFFFF;
- vertical-align:top;
- font-size:145%; font-weight:bold;
- margin-top:1.35em; margin-bottom:.5em;
- }
- UL.clsIndex { margin-left:20pt; margin-top:0pt; margin-bottom:5pt; }
- LI OL { padding-bottom: 1.5em }
- /*** GALLERY/TOOLS/SAMPLES ***/
- FORM.clsSamples { margin-bottom:0; margin-top:0; }
- H1.clsSampH1 {
- font-size:145%;
- margin-top:.25em; margin-bottom:.25em;
- }
- H1.clsSampHead {
- margin-top:5px; margin-bottom:5px;
- font-size:24px; font-weight:bold; font-family:verdana,arial,helvetica,sans-serif;
- }
- H2.clsSampTitle {
- font-size:128%;
- margin-top:.2em; margin-bottom:0em;
- }
- TD.clsDemo {
- font-size:8pt;
- color:#00319C;
- text-decoration:underline;
- }
- .clsSampDnldMain { font-size:11px; font-family:verdana,arial,helvetica,sans-serif; }
- .clsShowDesc { cursor:hand; }
- A.clsTools {
- color:#0B3586;
- font-weight:bold;
- }
- H1.clsTools, H2.clsTools {
- color:#0B3586;
- margin-top:5px;
- }
- TD.clsToolsHome {
- font-size:9pt;
- line-height:15pt;
- }
- SPAN.clsToolsTitle {
- color:#00319C;
- font-size:11pt; font-weight:bold;
- text-decoration:none;
- }
- /*** DESIGN ***/
- P.cat {
- font-size:13pt;
- color:#787800;
- text-decoration:none;
- margin-top:18px;
- }
- P.author {
- font-size:9pt; font-style:italic;
- line-height:13pt;
- margin-top:10px;
- }
- P.date {
- font-size:8pt;
- line-height:12px;
- margin-top:0px;
- color:#3366FF;
- }
- P.graph1 {
- line-height:13pt;
- margin-top:-10px;
- }
- P.col {
- line-height:13pt;
- margin-top:10px; margin-left:5px;
- }
- P.cal1 {
- text-decoration:none;
- margin-top:-10px;
- }
- P.cal2 {margin-top:-10px; }
- P.photo { font-size:8pt; }
- /*** DOCTOP ***/
- #tblNavLinks A {
- color:black;
- text-decoration:none;
- font-family:verdana,arial,helvetica,sans-serif;
- }
- #lnkShowText, #lnkSyncText, #lnkSearchText, #lnkIndexText { font-size:8pt; font-weight:bold; }
- #lnkPrevText, #lnkNextText, #lnkUpText { font-size:7.5pt; font-weight:normal; }
- DIV.clsBucketBranch {
- margin-left:10px; margin-top:15px; margin-bottom:-10pt;
- font-style:italic; font-size:85%;
- }
- DIV.clsBucketBranch A,
- DIV.clsBucketBranch A:link,
- DIV.clsBucketBranch A:active,
- DIV.clsBucketBranch A:visited { text-decoration:none; color:black; }
- DIV.clsBucketBranch A:hover { color:blue; }
- /*** SDK, IE4 ONLY ***/
- DIV.clsExpanded, A.clsExpanded { display:inline; color:black; }
- DIV.clsCollapsed, A.clsCollapsed { display:none; }
- SPAN.clsPropattr { font-weight:bold; }
- #pStyles, #pCode, #pSyntax, #pEvents, #pStyles {display:none; text-decoration:underline; cursor:hand; }
- /*** jhyde added ***/
- CODE { color:maroon; font-family:'courier new' }
- DFN { font-weight:bold; font-style:italic; }
- </style>
- </head>
- <!--
- This sentence is here to fool javadoc (which is looking for a period, and otherwise finds one
- inside one of our header tables).
- -->
- <table border="1" class="clsStd" width="100%">
- <tr>
- <td colspan="2"><a href="index.html">Top</a> |
- <a href="http://public.perforce.com/guest/julian_hyde/saffron/doc/index.html">Web home</a> |
- <a href="http://sourceforge.net/projects/saffron/">SourceForge home</a>
- <div align="right">
- <a href="http://sourceforge.net">
- <img src="http://sourceforge.net/sflogo.php?group_id=46646&type=1" border="0" alt="SourceForge.net Logo">
- </a>
- </div>
- </td>
- </tr>
- <tr>
- <td colspan="2"><em>$Id: //guest/julian_hyde/saffron/doc/overview.html#5 $</em></td>
- </tr>
- <tr>
- <td colspan="2"><em>(C) Copyright 2002, Julian Hyde</em></td>
- </tr>
- <tr>
- <th align="right">Author</th>
- <td>Julian Hyde (<a href="mailto:julian.hyde@mail.com">julian.hyde@mail.com</a>)</td>
- </tr>
- <tr>
- <th align="right">Created</th>
- <td>November 5<sup>th</sup>, 2002</td>
- </tr>
- </table>
- <h1>Saffron overview</h1>
- <p>Julian Hyde, November 5th, 2001.</p>
- <h2>Overview</h2>
- <p>Saffron is a seamless integration of SQL-like relational expressions into Java.
- As we shall see, the tight integration means that relational expressions are
- much more powerful than those offered by SQL.</p>
- <h2>How is Saffron different from SQL?</h2>
- <ol>
- <li><b>You can use Saffron expressions directly from Java code</b>. For example,
- an <code>exists </code>condition:<blockquote>
- <pre>if (exists (select from emp where emp.salary > 50000)) {
- ...
- }</pre>
- </blockquote>
- <p>An <code>in </code>condition:</p>
- <blockquote>
- <pre>if ("Fred" in (select emp.ename from emp where emp.salary > 50000)) {
- ...
- }</pre>
- </blockquote>
- <p>A <code>for </code>loop:</p>
- <blockquote>
- <pre>for (emp in (select from emp order by emp.salary)) {
- System.out.println(emp.ename + " earns " + emp.salary);
- }</pre>
- </blockquote>
- <p>And an array initialized using a relational expression:</p>
- <blockquote>
- <pre>Emp[] programmers = (select from emp
- where emp.title.equals("Programmer"));</pre>
- </blockquote>
- </li>
- <li><b>Saffron</b><b> uses Java expression syntax</b>. You can call java methods,
- construct objects and arrays, and access java objects just as you would in
- regular java code. For example,<blockquote>
- <pre>select emp.ename.substring(ename.indexOf(".")) from emp</pre>
- </blockquote>
- </li>
- <li><b>Saffron</b><b> relational expressions use Java's type system</b>. Runtime type
- conversions (of, say, a <code>DECIMAL(8,2)</code> column to a java <code>float
- </code>variable) are inefficient, error-prone, and can lose information.
- The names of tables and columns are checked at compile time, so your programs
- are more maintainable.</li>
- <li><b>Saffron</b><b> doesn't use 3-valued logic</b>. A condition in Saffron
- always
- evaluates to <code>true</code> or <code>false</code>, whereas in SQL a
- condition can sometimes evaluates to NULL.</li>
- <li><b>Saffron</b><b> expressions are evaluated efficiently</b>. If an expression
- involves only Java objects (no tables), the Saffron compiler will generate Java
- code which is almost as efficient as hand-written code.</li>
- <li><b>Saffron</b><b> accesses heterogeneous data better</b>. Consider the following
- query, which accesses data from an in-memory array (<code>manufacturers</code>), and tables from two
- different relational databases (<code>orderDb </code>and <code>productDb</code>).<blockquote>
- <pre>String[] manufacturers = new String[] {"Heinz", "Guinness"};
- int[] customers = (
- select order.customerId
- from productDb.products as product
- join orderDb.orders as order on order.productId == productId
- where product.manufacturer in manufacturers);</pre>
- </blockquote>
- Should the query be executed on the orders database or the products database?
- And how should the manufacturer list be sent over: as a hard-coded in-list, or
- by restarting the query for each manufacturer in the list? It really
- doesn't matter: the Saffron optimizer will decide for you.
- </li>
- </ol>
- <h2>How does Saffron improve Java?</h2>
- <p>We have already discussed how Saffron makes your life easier if you are
- embedding SQL statements in Java using JDBC. But Saffron can help even if
- you are not accessing data stored in a relational database. This is
- because in-memory data is often relational, and the algorithms to deal with it
- are often the familiar relational operations: sort, join, filter, union,
- intersect, minus.</p>
- <p>Suppose I have two lists of customers, and I want to create a list of both,
- with duplicates eliminated. An experienced Java programmer would probably
- write something like this:</p>
- <blockquote>
- <pre>Customer[] addCustomerLists(Customer[] list1, Customer[] list2) {
- Hashtable hashtable = new Hashtable();
- for (int i = 0; i < list1.length; i++) {
- Customer c = list1[i];
- hashtable.put(c, c);
- }
- for (int i = 0; i < list2.length; i++) {
- Customer c = list2[i];
- hashtable.put(c, c);
- }
- Customer[] list3 = new Customer[hashtable.size()];
- hashtable.asList().copyInto(list3);
- return list3;
- }</pre>
- </blockquote>
- <p>They would test it, notice that the customers are now in arbitrary order, and
- add</p>
- <blockquote>
- <pre> java.util.Arrays.sort(list3);</pre>
- </blockquote>
- <p>just before returning the result. In Saffron, a programmer would write</p>
- <blockquote>
- <pre>Customer[] list3 = select from (list1 union list2) c order by c;</pre>
- </blockquote>
- <p>They don't need to know Hashtable is the 'correct' data structure to use.
- They tell the system <i>what </i>they want, and the system figures <i>how </i>to
- compute it efficiently.</p>
- <p>This is a rather simple example. It gets better, because Saffron is a <i>
- closed </i>language. This means that you can use a Saffron expression anywhere </p>
- <h2>Grammar</h2>
- <p><i>Definition</i>. A <dfn>set expression</dfn> is
- an expression which returns more than one item; namely, a query, an array, an
- iterator or collection, or a <code>Table</code>.</p>
- <h3>Relational expressions</h3>
- <blockquote>
- <pre><code>select</code> [<code>{</code>[<i>expression</i> [<code>as </code><i>alias</i>]], ...<code>}</code> | <i>expression</i>]
- [<code>from </code><i>from-expression</i>]
- [<code>where </code><i>expression</i>]
- [<code>order by </code>[<i>expression </i>[<code>asc </code>| <code>desc</code>]], ...]</pre>
- <pre><i>expression </i><code>union </code><i>expression</i></pre>
- <pre><i>expression </i><code>intersect </code><i>expression</i></pre>
- <pre><i>expression </i><code>minus </code><i>expression</i></pre>
- </blockquote>
- <p>The syntax for the from list <i>from-expression</i> is as follows:</p>
- <blockquote>
- <pre><i>expression </i>[as alias]</pre>
- <pre><i>from-expression </i>[<code>left</code> | <code>right</code> | <code>full</code> | ] <code>join</code><i>
- </i> <i>from-expression </i>[<code>on</code> <i>condition</i>]</pre>
- </blockquote>
- <p>Notes:</p>
- <ol>
- <li>The select list can have 3 forms.<ol>
- <li>An <i><dfn>empty select list</dfn> </i>means return everything from the from clause.
- If the from clause has precisely one item, the result is the same type as that
- item (for example, each row from <code>select from emp</code> has type <code>
- Emp</code>); otherwise, the result is a record, with one field per from item
- (for example, each row from <code>select from emp join dept</code> has type
- <code>{Emp emp; Dept dept}</code>; each row
- from <code>select</code> has type <code>{}</code>).</li>
- <li>A <dfn>single-item select list</dfn> means
- return rows without boxing them into records. So, for example, each row from
- <code>select three from new int[] {1, 2, 3} as three</code> is an <code>int</code>. </li>
- <li>A <dfn>boxed select list</dfn> (enclosed in
- <code>{</code> ... <code>}</code>) means create a record for each row. The
- optimizer will create a synthetic class to describe the type of the row. Two
- rows with the same types and aliases, in the same order, will have the same
- synthetic class (and are therefore union-compatible). For example, each
- row from <code>select {three} from new int[] {1, 2, 3} as three</code> is
- <code>class Synthetic$124 { int three; }</code>; the type of <code>select {emp.empno,
- emp.ename as name} from emp</code> is <code>class Synthetic$527 { int empno;
- String name; }</code>.</li>
- </ol>
- </li>
- <li>Aliases of items in from and select lists default to variable, field, and
- method names. The aliases of the fields in <code>Emp emp; select emp,
- dept.deptno, dept.dname.substring(3) from dept</code> are <code>emp</code>,
- <code>deptno</code>, and <code>substring</code>, respectively. Other
- kinds of expression have no default alias, and therefore an alias must be
- specified. Whether implied or explicit, the aliases in a select or from
- list must be distinct.</li>
- <li>If the from clause is missing, the input to the select statement is a
- table with one row and no columns.</li>
- <li>Expressions in the from list must be set expressions.
- Any kind of expression is allowed in the select list; queries are returned as
- arrays.</li>
- <li>If <code>on</code><i> condition </i>is omitted from a <i>from-expression</i>
- it defaults to <code>true</code> (hence, a cartesian product). The keywords
- <code>left</code>, <code>right</code> and <code>full</code> indicate that an
- outer join is to be performed.</li>
- <li>No plans to do <i>aggregation</i>.</li>
- </ol>
- <h3>Boolean expressions</h3>
- <p>We add the following boolean expressions to Java:</p>
- <blockquote>
- <pre><i>expression</i> <code>in </code><i>expression</i></pre>
- <pre><code>exists </code>(<i>expression</i>)</pre>
- </blockquote>
- <p>The right-hand expressions must be set expressions.</p>
- <p>These expressions can, of course, occur inside relational expressions. The
- sub-query so created is said to be <dfn>correlated</dfn> if it refers to rows
- from the outer query. For example, the following query
- is correlated on <code>dept.deptno</code>:</p>
- <blockquote>
- <pre><code>select
- from </code>depts <code>as </code>dept
- <code>where exists </code>(
- <code>select from </code>emps <code>as </code>emp
- <code> where </code>emp.deptno == dept.deptno &&
- emp.gender.equals("F"))</pre>
- </blockquote>
- <p>We do not provide a <code>not</code> operator to negate these. Use the
- regular Java <code>!</code> operator, for example, <code>!("apple" in fruit)</code>
- or <code>!exists (select from fruit where fruit.equals("apple")</code>.</p>
- <h3>Statements</h3>
- <blockquote>
- <pre><code>for </code>([<i>type</i>]<i> variable</i> <code>in </code><i>expression</i>) {
- <i>statement list
- </i>}</pre>
- </blockquote>
- <h3>Aggregation</h3>
- <p>Simple example:</p>
- <blockquote>
- <pre><code>select</code> emp.deptno, sum(emp.sal)<code> as</code> sumSal<code>
- group by</code> emp.deptno<code>
- from </code>emps <code>as </code>emp</pre>
- </blockquote>
- <p>Unlike SQL, the <code>group by</code> clause comes before the <code>from</code>
- clause, and is mandatory if you want to aggregate. The equivalent of the SQL <code>select
- sum(emp.sal) from emp</code> has an empty <code>group by</code> clause in Saffron: <code>select sum(emp.sal)
- group by from emp</code>.</p>
- <p>In Saffron, the <code>where</code> clause is applied <i>after</i> aggregation
- has been performed, more like a SQL <code>having</code> clause than a <code>
- where</code> clause. Expressions in the <code>where</code> clause can only
- reference aggregations and grouping expressions, not ungrouped expressions. If
- you want to filter the rows going into the aggregation, use a subquery in the
- from list:</p>
- <blockquote>
- <pre><code>select</code> femp.deptno, sum(femp.sal)<code> as</code> sumSal<code>
- group by</code> femp.deptno<code>
- from </code>(
- <code>select</code>
- <code>from</code> emps <code>as</code> emp
- <code>where</code> emp.gender.equals("F")) <code>as </code>femp
- <code>where </code>avg(femp.sal) > 50000</pre>
- </blockquote>
- <p>There are 3 kinds of aggregation function:</p>
- <ol>
- <li>Builtin aggregation functions (such as <code>count</code>) are treated
- specially by the system, and generate optimized code.</li>
- <li>A regular java function which takes a set (array, collection, or iterator)
- will be called if its argument matches the element type in the actual usage.
- For example, if <code>emp.sal</code> is of type <code>double</code>, then
- <code>double sum(double[] a)</code> would match. Likewise, <code>int
- count(Iterator iter)</code> would match <code>select count(emp) from emps as
- emp</code>, because Emp can be up-cast to Object, the element type of Iterator.</li>
- <li>User-defined aggregations.</li>
- </ol>
- <pA user-defined aggregation is a class which implements <code>interface {@link
- saffron.AggregationExtender}</code>. If you invoke the {@link
- saffron.AggregationExtender#aggregate} pseudo-method, the system generates the
- right code to calculate the aggregation.</p>
- <p>Now let's see how the Saffron system invokes a user-defined aggregation
- while evaluating a query. Suppose we have written <code>class MySum</code>,
- which can total both <code> int</code> and <code>double</code> values, and
- evaluate the query</p>
- <blockquote>
- <pre><code>select</code> MySum(i)
- <code>from new int</code>[] {2, 3, 4} <code>as</code> i</pre>
- </blockquote>
- <p>The Saffron system makes following calls. Note how the <code>resolve</code>
- method deals with operator overloading. Note how <code>int</code>s are wrapped
- as <code>Integer</code>s then in arrays to be passed to <code>addOne</code>, and returned from
- <code>result</code>.</p>
- <blockquote>
- <pre>AggregatorClass sum = new MySum();
- AggregationExtender intSum = sum.resolve(new Class[] {Integer.TYPE});
- Object total = intSum.start(null);
- intSum.addOne(total, new Object[] {new Integer(2)});
- intSum.addOne(total, new Object[] {new Integer(4)});
- Object total2 = intSum.start(null);
- intSum.addOne(total2, new Object[] {new Integer(3)});
- intSum.merge(total, total2);
- System.out.println("Result is " +
- ((Integer) intSum.result(total)).intValue());</pre>
- </blockquote>
- <p>Examples of more complex aggregations:</p>
- <ul>
- <li>The <code>median</code> aggregation ({@link saffron.ext.Median}) requires a
- variable-size workspace. A typical implementation would use an {@link java.util.ArrayList} to collect the values, sort them, then return the middle one.</li>
- <li>The <code>nth</code> aggregation {@link saffron.ext.Nth}) returns the <i>n</i>th item of a set. <i>n</i>
- is passed to the <code>start</code> method, so must be independent of any
- particular row. Example:<blockquote>
- <pre><code>select</code> new Nth(3).aggregate(name) <code>from</code> (<code>select from</code> emps <code>order by</code> salary)</pre>
- </blockquote>
- </li>
- <li>The <code>closest</code> aggregation computes the nearest of a set of
- points to a given point. Its <code>start</code> method receives the latitude
- and longitude of the initial point (as <code>new Object[] {new Double(42), new
- Double(129)}</code>), and the <code>addOne</code> method receives the latitude
- and longitude of each subsequent point.<blockquote>
- <pre><code>select</code> closest(42, -129, office.latitude, office.longitude)
- <code>from</code> offices <code>as</code> office</pre>
- </blockquote>
- </li>
- </ul>
- <p>Notes & issues:</p>
- <ol>
- <li>Arguments are zero or more primitive types or objects. Primitives are
- passed wrapped as objects.</li>
- <li>A robust aggregation of median (not possible without invoking custom
- code-generation) would internally expand it to<blockquote>
- <pre><code>select </code>nth(count() / 2, sorted)
- <code>from</code> (<code>select from new int</code>[] {2, 3, 4} <code>as</code> i <code>order by</code> i) <code>as </code>sorted</pre>
- </blockquote>
- </li>
- <li>Distinct aggregations. Syntax? Implementation?</li>
- <li>Could we use aggregations as rolling averages? (I guess the <code>{@link saffron.AggregationExtender}.result()</code> method would have to be non-destructive.)</li>
- </ol>
- <h2>Type-checking</h2>
- <p>One of Java's most important features is its strong type system. To
- take advantage of this, every relational expression is always has a result which
- is an array type.</p>
- <p>This doesn't mean that the array is actually created. If you use the
- <code>for (... in ...)</code> construct, for example, the system will implement the loop
- using an iterator, and is unlikely to construct the result as an array.
- And the optimizer tries hard not to convert intermediate sets, such as that
- created by an <code>in </code>operator, into arrays.</p>
- <p>Regular java expressions can also be used as relational expressions:</p>
- <ul>
- <li>arrays</li>
- <li>collections (<code>Collection</code>, including <code>Vector</code>)</li>
- <li>iterators (<code>Enumeration</code>, <code>Iterator</code>)</li>
- </ul>
- <p>The problem with collections and iterators (until generics arrive in Java
- 1.4) is that we don't know what type of objects they contain. We allow the
- programmer to specify that type information using by a dummy cast to an array
- type. For example:</p>
- <blockquote>
- <pre>Vector fruit = new Vector();
- fruit.addElement("apple");
- fruit.addElement("banana");
- fruit.addElement("orange");
- fruit.addElement("mango");
- String[] tropicalFruit = (select fruit
- from (String[]) fruit
- where fruit.indexOf("n") >= 0);</pre>
- </blockquote>
- <p>You can cast a collection or iterator to an array of primitive types,
- provided that the collection does not contain null values.</p>
- <h3>Casting input datasets</h3>
- <p>Some data sources, such as vectors, iterators and external data sources such
- as JDBC tables, contain very little type information, and this makes it
- difficult to write concise queries. For example,</p>
- <blockquote>
- <pre>Vector emps = new Vector();
- emps.addElement(fred);
- emps.addElement(barney);
- for (emp in select from emps as emp where ((Emp) emp).deptno == 20) {
- print(((Emp) emp).name);
- }</pre>
- </blockquote>
- <p>Better to tell the system just once that <code>emps </code>is a collection of
- <code>Emp </code>objects:</p>
- <blockquote>
- <pre>for (emp in select from (Emp[]) emps as emp where emp.deptno == 20) {
- print(emp.name);
- }</pre>
- </blockquote>
- <p>The cast does not necessarily cause the vector to be converted to an array:
- we are simply providing the Saffron compiler with more type information, and
- allowing it to choose the most efficient plan.</p>
- <h3>Determining the output type</h3>
- <p>By default, a relational expression returns an array. You can produce results
- in a different format by using a dummy cast, like this:</p>
- <blockquote>
- <pre>Iterator emps;
- Iterator femaleEmps = (Iterator)
- (select from (Emp[]) emps as emp
- where emp.gender.equals("F"));</pre>
- </blockquote>
- <p>This is particularly useful where the output is large (even infinite!) and
- you to iterate over the first few rows without computing the whole result.</p>
- <p>The available types are {@link java.util.Iterator}, {@link
- java.util.Enumeration}, {@link java.util.Collection}, {@link java.util.Vector},
- and <code>type[]</code>. In future, we may add {@link java.sql.ResultSet}.</p>
- <h3><a name="Tables_and_Schemas">Tables and Schemas</a></h3>
- <p>The interfaces Table, Schema and Connection allow you to access data sources
- which have arbitrary access methods. A table (<code>interface {@link
- saffron.Table}</code>) represents a set of relational data, and it provides
- methods to describe the data source, discover its access methods, and so
- forth. Schemas (<code>interface {@link saffron.Schema}</code>) make it easy to
- expose a set of tables and the information necessary to compile programs which
- use them. If you can to access data in a JDBC database, derive a class from
- <code>class {@link saffron.ext.ReflectSchema}</code> and create a member of
- type {@link saffron.ext.JdbcTable} for each table your wish to access
- (<code>class {@link sales.Sales}</code> is an example of this.)</p>
- <p>Let's look at what happens when you compile a program which uses a
- schema.</p>
- <ol>
- <li>The compiler notices a piece of code (<code>sales.emps</code>) which
- accesses a member or method of an object (<code>sales</code>) which implements <code>
- interface {@link saffron.Connection}</code>.</li>
- <li>The compiler calls the static method <code>getSchema</code> on that
- object. (Why call a static method, not a method in the interface?
- Well, remember that this is happening at compile time. The real object
- does not exist yet.)</li>
- <li>The compiler invokes the schema's <code>getMemberTable</code> or <code>
- getMethodTable</code> method, to retrieve a <code>Table</code>. If there is no
- such table, these methods may throw <code>openjava.mop.NoSuchMemberException</code>,
- which will be reported as a compilation error.</li>
- <li>The compiler invokes the table's <code>toRel</code> method, passing the
- piece of parse tree which represents the expression which will yields the
- schema. (The <i>real </i>schema object only gets created at run time,
- remember.)</li>
- <li>The compiler calls schema's getTable method with the parse object
- be for example, class Sales.</li>
- </ol>
- <p>In particular, we provide <code>abstract class {@link saffron.ext.JdbcTable}
- implements {@link saffron.Table}</code> to make it easy to access tables via
- JDBC. We recommend that you group tables into a schema class, which extends
- <code>class {@link saffron.ext.ReflectSchema}</code>. For
- example,</p><blockquote>
- <pre>class SalesConnection
- extends saffron.ext.JdbcConnection
- implements saffron.Connection
- {
- public SalesConnection(String connect) {
- super(connect);
- }
- // required for saffron.Connection
- public static saffron.Schema getSchema() {
- return new SalesSchema();
- }
- };
- class SalesSchema
- extends saffron.ext.ReflectSchema
- implements saffron.Schema
- {
- public static class Emp {
- public int empno;
- public String name;
- public int deptno;
- };
- public static class Dept {
- public int deptno;
- public String name;
- };
- public saffron.Table emp = new saffron.ext.JdbcTable(
- this, "EMP", Emp.class);
- public saffron.Table depts = new saffron.ext.JdbcTable(
- this, "DEPT", Dept.class);
- };
- SalesConnection sales = new SalesConnection("jdbc:odbc:MyDatabase");
- for (emp in sales.emps) {
- <i>some code</i>
- }</pre>
- </blockquote>
- <p>The Saffron compiler creates an instance of the schema, and interrogates it
- regarding the objects it contains. When it sees the expression <code>sales.emps</code>,
- it invokes <code>SalesConnection.getSchema().getTableForMember("emps")</code> to
- get a {@link saffron.Table} which describes <code>emps</code> and, in particular,
- that it returns rows of type <code>Emp</code>. The translator then calls
- the table's <code>toRel()</code> method to create a relational expression node;
- in this case a {@link saffron.rel.TableAccess}. (The translator actually converts <code>
- sales.emp</code> into a brief intermediate expression <code>(Emp[])
- sales.contentsAsArray("emps")</code>. The <code>contentsAsArray</code> function
- is a placeholder which has special meaning to the translator. Note how an array
- cast is used to provide type information.)</p>
- <p>As well as type information, the schema could in future provide information
- about volumetrics, indexes, and even custom rules for optimizing the query.</p>
- <p><i>todo</i>: Expose <code>Map</code>s (including <code>Hashtable</code>s) as
- relations with 2 columns.</p>
- <h3>Non-relational schemas</h3>
- <p>A schema could be used to present non-relational data in a relational format.
- It would know that 'getDept(40)' was the same as 'select from dept where deptno
- = 40' and 'dept.getEmployees()' is the same as 'select from emp where emp.deptno
- == dept.deptno'. We would need a mechanism to construct an <dfn>
- extent</dfn> (the set of all instances of a given
- entity). The schema could declare an explicit method to find the instances of an
- entity; another way is to use mandatory relationships (for example, since every
- employee must work in a department, given all departments, we can find all
- employees by traversing the 'works in' relationship). It helps if the
- relationship is many- (or one-) to-one, since we don't need to eliminate
- duplicates. There could be more than one access path to an entity.</p>
- <p>How to represent such mappings? Views: dept.getEmployees() is equivalent to
- select from emp where emp.deptno == dept.deptno. Better, parameterized views.
- Example {@link sales.Reflect} presents Java's type system ({@link
- java.lang.Class}, {@java.lang.reflect.Field}, <i>et cetera</i>) as
- relations.</p>
- <p><i>todo: </i>Could allow a type (e.g. <code>boolean</code>,
- <code>int</code>) to serve as a relation. Could implement by iterating over all
- possible values of that type (rarely practical) or joining to an attribute of a
- more finite relation. Example: <code>select from int as i where i in (select
- emp.deptno from emp)</code>.</p>
- <h2>Null semantics</h2>
- <p>Saffron uses Java, not SQL, null semantics. <code>null</code>
- is always the Java null value. Every boolean expression evaluates to
- either <code>true </code>or <code>false</code>. And yes, <code>null ==
- null</code> evaluates to <code>true</code>.</p>
- <h2>Comparison semantics</h2>
- <p>In Saffron, as in Java, the <code>==</code> operator
- compares addresses, not content. The expression <code>"abc" == "a"
- + "bc"</code> may, on some Java systems, evaluate to false, so you should
- write <code>"abc".equals("a" + "bc")</code> instead.</p>
- <p>The <code>in </code>operator expands to expressions which use the <code>
- equals()</code> and <code>hashCode()</code> methods, so you should only uses it
- on classes for which these methods are well-behaved. And the expression</p>
- <blockquote>
- <pre>null in new String[] {"apple", "orange"}</pre>
- </blockquote>
- <p>will give a <code>NullPointerException</code>.</p>
- <p>The <code>order by</code> operator requires that objects implement the <code>
- interface java.lang.Comparable</code>. (<i>todo: Compile- or run-time error if
- not?</i>)</p>
- <p>Primitive types are compared using the primitive operators <code>==</code>,
- <code><</code> etc.</p>
- <p>Two instances of a synthetic classes are equal if and only if each of their
- components are equal. Synthetic classes are compared lexicographically.
- Example:</p>
- <blockquote>
- <pre>select
- from (select from twos join threes) as i
- join (select from twos join threes) as j
- where i < j</pre>
- </blockquote>
- <h2>Primitive values</h2>
- <p>Saffron accepts primitive values, such as <code>int</code>, <code>char</code>,
- <code>double</code>, in most places that it accepts objects. A couple of
- exceptions:</p>
- <ol>
- <li>Outer joins need to produce null values. For example,<pre>select {twos, threes}
- from new int[] {2, 4, 6, 8} as twos
- left join new int[] {3, 6, 9} as threes
- on twos == threes</pre>
- would produce<pre>2, 0
- 4, 0
- 6, 6
- 8, 0 </pre>
- </li>
- <li>Saffron uses Java collections to implement certain algorithms. For example,
- in the following expression, it uses a Hashtable to detect values which have
- already been emitted.
- <pre>select from new int[] {2, 4, 6, 8} as twos
- minus
- select from new int[] {3, 6, 9} as threes</pre>
- It automatically boxes and unboxes primitive values.</li>
- </ol>
- <h2>Planning and evaluation</h2>
- <p>The Saffron system runs in static and dynamic mode.</p>
- <p>In static mode, relational expressions are embedded into Java programs. A
- pre-processor checks that the relational expressions are valid, and converts
- them into regular java. There are many ways to evaluate a relational expression, </p>
- <p>A query optimizer chooses an evaluation plan.</p>
- <h2>Evaluation model</h2>
- <h2>Implementation issues</h2>
- <h3>Composite rows</h3>
- <p>If a relational expression has more than one expression in its <code>select
- </code>clause, it produces rows which are not atomic. For example,</p>
- <blockquote>
- <pre>int[] twos = new int[] {2,4,6,8};
- int[] threes = new int[] {3,6,9};
- for (i in (select twos, threes twos cross join threes)) {
- ...
- }</pre>
- </blockquote>
- <p>creates rows which are pairs of <code>int</code>s. It is translated to</p>
- <blockquote>
- <pre>int[] twos = new int[] {2,4,6,8};
- int[] threes = new int[] {3,6,9};
- for (int x = 0; x < twos.length; x++) {
- for (int y = 0; y < threes.length; y++) {
- Synth1 i = new Synth1(twos[x], threes[y]);
- ...
- }
- }</pre>
- </blockquote>
- <p>This illustrates two issues with composite rows. First, these rows need
- a type. The Saffron compiler solves this issue by generating a synthetic
- class (in this case, <code>class Synth1</code>). We don't know what this
- class will be called, so we use <code>for (<i>variable</i> in ...)</code>
- syntax, which lets the system infer the type for us.</p>
- <p>Second, the system creates a new object for every row, which might be
- expensive. In this case, they are emitted from the query, so we can't avoid
- creating them. If the composite rows are not emitted, the Saffron optimizer will
- do its best to avoid creating them.</p>
- <h2>How the optimizer works</h2>
- <p>The job of the optimizer is to convert a parse tree containing Saffron expressions into a regular parse tree. Its input and output, therefore, are both
- parse trees.</p>
- <p>In between times, it searches for Saffron expressions, and converts them to
- tree of intermediate relational expressions (<code>class {@link saffron.rel.Rel}</code>).</p>
- <p>These relational expressions still contain fragments of regular Java
- expressions for select expressions, method calls, and so forth. The Java
- expressions are translated into internal form. Except for references to
- variables outside the Saffron expression, variables tend to disappear, to be
- replaced by mysterious-looking expressions. For example, in the expression</p>
- <blockquote>
- <pre>select
- from location
- join (select
- from emp
- join dept on emp.deptno == dept.deptno)
- on location.state.equals(dept.state) </pre>
- </blockquote>
- <p>the join condition <code>location.state.equals(dept.state)</code> becomes <code>$input0.state.equals($input1.$f1.state)</code>.
- Here, <code>$input0</code> and <code>$input1</code> represent the 2
- inputs to the join node (which is the context for this Java fragment); <code>$f1</code>
- is the first field of the composite row produced by the inner query. That
- field's type is Dept, and therefore it has a <code>state</code> field.</p>
- <p>The optimizer uses a dynamic programming approach based upon that used by the
- Volcano optimizer. Starting with the initial expression, it applies
- transformation rules to convert it, and its sub-expressions, into semantically
- equivalent expressions. It estimates the cost of each expression, and
- cultivates variants of expressions until it finds cheap implementations of them.</p>
- <p>An expression can be implemented using several <dfn>calling
- conventions</dfn>. Calling conventions include array,
- vector, iterator. The most important (and efficient) calling convention is
- inline. Inline means that a piece of code will be executed for every row
- in the expression; no collection of objects is created, just <i>stuff happens</i>.
- An inline implementation of</p>
- <blockquote>
- <pre>for (o in (select from emp join dept on emp.deptno == dept.deptno)) {
- System.out.println(o.emp.ename + " works in " + o.dept.dname);
- }</pre>
- </blockquote>
- <p>would be</p>
- <blockquote>
- <pre>for (int i = 0; i < emp.length; i++) {
- for (int j = 0; j < dept.length; j++) {
- if (emp[i].deptno == dept[j].deptno) {
- Synth1 o = new Synth1(emp[i], dept[j]);
- System.out.println(o.emp.ename + " works in " + o.dept.dname);
- }
- }
- }</pre>
- </blockquote>
- <p>This is rather tricky to implement, because the stuff in the middle of the
- for loop corresponds to stuff at the top of Rel tree. Each Rel node's
- implement method has to handle a callback from a child saying 'I have placed a
- row in variable <i>xyz</i>; please generate code to do something with it.'</p>
- <p>If two expressions have the same meaning but different calling conventions,
- they are not immediately interchangeable. It may be cheaper to evaluate a
- particular expression as an array, but another algorithm may find it more
- convenient if that expression is a vector. For this reason, there are
- special kinds of rule called <dfn>conversion rules</dfn>.
- They can convert from one convention to another, but they have a non-zero cost.
- The optimizer may decide that it is cheaper to build a vector in the first
- place, rather than paying the cost of converting it to a vector later.</p>
- <p>There is a special kind of node called a <dfn>Terminator</dfn>
- (<code>class {@link saffron.rel.java.Terminator}</code>).
- After a tree has been optimized, a Terminator is placed on the root to drive
- code-generation. It behaves somewhat like a Rel node (in that it has
- inputs, and can be called-back to generate code), but it does not actually
- produce any result. (<i>todo</i>: Maybe Terminator should be a calling
- convention?)</p>
- <h1>Appendix 1: Design notes</h1>
- <h2>Syntax of <code>group by</code></h2>
- <p>There are several problems with a SQL-like aggregation:</p>
- <ol>
- <li>If a <code>select</code> statement contains a <code>group by</code> clause,
- the validation rules for expressions within that statement are altered: only
- expressions which are in the <code>group by</code> can be used outside of an
- aggregate function.</li>
- <li>Even if there is no <code>group by</code> clause present, aggregation
- implicitly occurs if an aggregate function occurs anywhere in the <code>select</code>
- statement. For example, every expression in<blockquote>
- <pre>select emp.gender.toLowerCase(), emp.name
- from emp
- where emp.name.startsWith("P")</pre>
- </blockquote>
- <p>becomes illegal if an aggregation is added:</p>
- <blockquote>
- <pre>select emp.gender.toLowerCase(), emp.name
- from emp
- where emp.name.startsWith("P") && min(emp.salary) > 50000</pre>
- </blockquote>
- <p>This syntax problem could be fixed fairly easily, by requiring an empty <code>
- group by</code> clause in such cases.</p>
- </li>
- <li>If we allow functions on sets to be used as aggregation functions, then
- certain statements are ambiguous. Consider the following:<blockquote>
- <pre>int count(Object[] a) {
- int n = 0;
- for (int i = 0; i < a.length; i++) {
- if (a[i] != null) {
- n++;
- }
- }
- return n;
- }
- class Emp {
- String name;
- Person[] dependents;
- };
- select count(emp.dependents) from emp;</pre>
- </blockquote>
- <p>It is not clear whether Saffron should return a row for each employee with
- the number of the dependents that employee has, or a single number, the number
- of employees whose <code>dependents</code> field is not null.</li>
- </ol>
- <p>The underlying problem is that SQL aggregation makes a scalar expression
- behave like a set, and this undermines the usual rules of expression validation
- and type-checking. We can make the set explicit, but the resulting expression</p>
- <blockquote>
- <pre>select deptno,
- sum(select emp.sal from emps as emp
- where emp.deptno == deptno) as sumSal,
- min(select emp.sal from emps as emp
- where emp.deptno == deptno) as minSal
- from (select distinct deptno from emps) as deptno</pre>
- </blockquote>
- <p>is more verbose -- the underlying set, <code>emps</code>, is referenced three
- times -- and perhaps less efficient.</p>
- <p>We need to make grouping an aggregation explicit.</p>
- <p>QUEL aggregations are more like scalar sub-queries. They are more verbose. They are also more powerful: note how we can sum only female employees'
- salaries. In this syntax, the set of values to be aggregated over (in this case,
- <code>select ... from emp</code>) is formed for each aggregate expression; the
- optimizer recognizes common expressions and evaluates them only once. Example:</p>
- <blockquote>
- <pre><code>select distinct </code>emp.deptno,
- min(<code>select</code> emp.sal <code>from </code>emps <code>as </code>emp2
- <code>where </code>emp2.deptno == emp.deptno) <code>as </code>minSal
- sum(<code>select</code> emp.sal <code>from </code>emps <code>as </code>emp2
- <code>where </code>emp2.deptno == emp.deptno &&
- gender.equals("F")) <code>as </code>sumFemaleSal
- <code>from </code>emps <code>as </code>emp</pre>
- </blockquote>
- <h2>Aggregation types</h2>
- <p>Jim Gray et al. [<a href="#GBLP96">GBLP96</a>] classify aggregation operators into 3
- categories, in increasing order of complexity:</p>
- <ul>
- <li><dfn>Distributive</dfn> aggregations requires fixed storage, and two totals can
- be merged. Example: <code>sum</code>, <code>count</code>, <code>min</code>,
- <code>max</code>. To merge two partial sums, take <code>sum(S1 u S2) =
- sum({sum(S1), sum(S2)})</code>. (Here, <code>S1</code> and <code>S2</code> are
- the sets of values which formed the partial sums, and <code>S1 u S2</code> is
- their union, the set of all values.) <code>min</code> and <code>max</code>
- work the same. <code>count</code> is different, in that it cannot be used to
- merge two sums; instead, you use <code>+</code>: <code>count(S1 u S2) =
- count(S1) + count(S2)</code>.</li>
- <li><dfn>Algebraic</dfn> aggregations also require
- fixed storage, but two totals cannot be merged. Example: <code>avg</code>,
- <code>stddev</code>.</li>
- <li><dfn>Holistic</dfn> aggregations require
- variable storage. Example: <code>median</code>, <code>countDistinct</code>.</li>
- </ul>
- <p>Some algebraic aggregates can be decomposed into distributive aggregates, as
- follows: avg(S) = sum(S) / count(S). We could provide better
- support for these; maybe a new method <code>Expression {@link saffron.rel.Aggregation}.preExpand()</code>, which is called before the parse tree is
- converted into a {@link saffron.rel.Aggregate} relational operator.</p>
- <p>It is hard to implement holistic aggregations efficiently. For example, <code>
- median</code> can be translated into an efficient relational expressions by
- hand:</p>
- <blockquote>
- <pre>select {nth(countEmp / 2, salary) as medianSalary}
- group by {emp.deptno}
- from (select from emp order by emp.deptno) empSorted
- join (select {emp.deptno, count() as empCount}
- group by {emp.deptno} from emp) empCounted
- on empSorted.deptno == empCounted.deptno</pre>
- </blockquote>
- <p>Similarly distinct <code>count</code>:</p>
- <blockquote>
- <pre>select {
- empGrouped.deptno,
- count(empGrouped.gender) as countDistinctGender}
- group by {empGrouped.deptno}
- from (
- select distinct {emp.deptno, emp.gender}
- from emps as emp) as empGrouped</pre>
- </blockquote>
- <p>We should supply an extender which can transform either the parse tree or the
- relational operator tree in such a way.</p>
- <h2>Start parameters for aggregations</h2>
- <p>Aggregations can take parameters per query, per group, and per row. Examples:</p>
- <ul>
- <li>Conventional aggregations take parameters only per row.</li>
- <li>The nth(count, value), count is per-group, and value is per-row.</li>
- <li>In min(collator, value), collator is per-query, and value is per-row.</li>
- </ul>
- <p>One way to provide per-group parameters is for aggregations to have two sets
- of parameters: the first passed at start time, the other passed per row.</p>
- <p>But custom aggregations can also do the job. They are a little less efficient,
- and the syntax is a little more verbose, but what's going on is clearer.
- </p>
- <p>Custom aggregation to do locale-specific min:</p>
- <blockquote>
- <pre>select {emp.dept.deptno,
- new LocaleMin(emp.dept.locale).apply(name) minName}
- group by {emp.dept}
- from emps as emp</pre>
- </blockquote>
- <h2>Problems with iterators</h2>
- <p><i>The ideas in this section are not implemented. So, if your iterator is
- infinite, the optimizer may generate a plan which never finishes. And it may
- generate a plan which requires iterators to be restarted.</i></p>
- <p>Iterators (and enumerations) present peculiarities that finite data sources such as JDBC tables, arrays,
- and collections don't present.</p>
- <p><b>Restarting an iterator</b></p>
- <p>First, they need to be restarted. Suppose the optimizer implemented</p>
- <blockquote>
- <pre>Iterator nouns, verbs;
- for (i in (String[]) verbs intersect (String[]) nouns) {
- print(verb + " is both a noun and a verb");
- }</pre>
- </blockquote>
- <p>as a nested loops join,</p>
- <blockquote>
- <pre>while (nouns.hasNext()) {
- String noun = (String) nouns.next();
- while (verbs.hasNext()) {
- String verb = (String) verbs.next();
- if (verb.equals(noun)) {
- print(verb " is both a noun and a verb");
- }
- }
- }</pre>
- </blockquote>
- <p>This doesn't work. The problem is, the second time around the outer loop,
- <code>verbs.hasNext()</code> has already returned <code>false</code>.</p>
- <p>If possible, you should use a class which implements {@link java.util.Collection}
- or {@link saffron.runtime.Iterable}; the optimizer calls the <code>iterate()</code>
- method each time it needs to start an iteration.</p>
- <p>If you supply an {@link java.util.Iterator}, the optimizer wraps it in a {@link
- saffron.runtime.BufferedIterator}.</p>
- <p><i>Enhancement</i>: If the plan never requires that the iterator is
- restarted, the optimizer should leave the iterator unwrapped.</p>
- <p><b>Infinite iterators</b></p>
- <p>The second problem with iterators is that they can be infinite. This is a bad
- idea if you're copying the results to an array, but you can achieve spectacular
- results if you're prepared to go with the whole <i>streaming thing</i>. Here's
- an example:</p>
- <blockquote>
- <pre>class MultiplesOf implements Iterator {
- int i, n;
- MultiplesOf(int n) {
- this.n = n;
- this.i = 0;
- }
- public boolean hasNext() {
- return true;
- }
- public Object next() {
- return i += n;
- }
- public void remove() {
- throw new UnsupportedOperationException();
- }
- }
- Iterator twos = new MultiplesOf(2);
- Iterator tens = (Iterator) (select from twos where twos % 5 == 0);</pre>
- </blockquote>
- <p>We have created one infinite iterator from another!</p>
- <p>The problem is that the optimizer might be tempted to do stupid things, like
- storing the results in an intermediate array. We tell it not to by implementing
- an empty interface, {@link saffron.runtime.Infinite}.</p>
- <p><i>We have not implemented support for infinite iterators. Expressions
- involving infinite iterators might just go on for ever.</i></p>
- <h1>Appendix 2: Futures</h1>
- <p>List:</p>
- <ol>
- <li>Aggregation<ol>
- <li>Allow built-in aggregations in regular java (that is, outside <code>
- select</code> statements with a <code>group by</code> clause). Example #1:
- <code>select {deptno, count(select from emps as e2 where e2.deptno ==
- e1.deptno) as countEmp} from emps as e1</code>. Example #2: <code>if (count(select from
- emps as e2 where e2.deptno == e1.deptno</code>.</li>
- </ol>
- </li>
- <li>Schema on top of persistence layer -- JDO say. Or JAXB. Talk to Castor
- folks about it.</li>
- <li>Refactoring.<ol>
- <li>search for obsolete & deprecated</li>
- </ol>
- </li>
- <li>Demos<ol>
- <li>Swing application where you can enter expressions or statements, and
- execute them. Can create a set of named connections.</li>
- </ol>
- </li>
- <li>Environment<ol>
- <li>log</li>
- <li>hsqldb</li>
- </ol>
- </li>
- <li>OpenJava<ol>
- <li>Obsolete MSJavaCompiler maybe</li>
- <li>Drive compiler from system properties</li>
- <li>Drive parser from system properties (or only have one parser, and turn off
- saffron features programmatically).</li>
- </ol>
- </li>
- </ol>
- <h1>Appendix 3: References</h1>
- <dl>
- <dt><a name="GBLP96">GBLP96</a></dt>
- <dd>J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational
- aggregation operator generalizing group-by, cross-tab, and sub-totals. In <em>
- IEEE 12th International Conference on Data Engineering</em>, pages 152—159,
- 1996 (<a href="http://research.microsoft.com/~gray/DataCube.doc">doc</a>).</dd>
- </dl>
- <table border="1" width="100%" class="clsStd">
- <tr>
- <td>End <i>$Id: //guest/julian_hyde/saffron/doc/overview.html#5 $</i></td>
- </tr>
- </table>
- </html>
- </body>
# | Change | User | Description | Committed | |
---|---|---|---|---|---|
#5 | 1748 | Julian Hyde | saffron: convert unit tests to JUnit; add CallingConvention.ITERABLE; lots of other stuf...f; release 0.1. « |
23 years ago | |
#4 | 1485 | Julian Hyde | Saffron/Mondrian: John Sichi's feedback. | 23 years ago | |
#3 | 1474 | Julian Hyde |
saffron: Aggregations are working. Renamed 'aggregator' to 'aggregation'. |
23 years ago | |
#2 | 1467 | Julian Hyde |
saffron: First saffron check-in; incorporate my changes to openjava. |
23 years ago | |
#1 | 1461 | Julian Hyde | saffron: First check in. Just documents, and the unmodified OpenJava 20001010 source... files. « |
23 years ago |