Wednesday, February 28, 2007

Review topic: SQL, overview

WROX Publishing offers an Microsoft SQL Server 7.0 book dated 1999 (I am sure that there are newer versions) authored by Robert Vieira. He offers a lot of detailed discussions of normalization, and a perspective on various techniques for complex set processing, a comparison of
(1) nested queries with subqueries
(2) correlated subqueries (processing in both directions)
(3) joins
(4) cursors.

A cursor offers a query, which may be very complex, that returns a data set whose rows may then be processed sequentially as in a procedural language program, where there is a lot of processing for each "record" (like a row).

The SQL community tries to get programmers to think in terms of "set processing" whenever possible, which is why tests on the subject often present puzzling and complicated queries. But when a query is well written, complicated programming or hash logic outside the query is often avoided. On the other hand, system performance must be considered. Correlated subqueries and cursors sometimes degrade performance.

Thursday, February 15, 2007

Review topics: SQL wrap-up

ANSI SQL sources seem to indicate that UNION functions (which combine two or more result sets) are supposed to eliminate duplicates. However, with DB2, Murach indicates that it may or may not, depending on how it is coded, and a UNION may or may not provide acceptable performance in production. In DB2, UNIONs with unnamed intermediate columns apparently retain the names in the first SELECT, which can give misleading results.

With the ORDER BY, the acceptable options are ASC and DESC.

Column names should always be separated by commas.

A JOIN, compared to a UNION, brings together two or more tables as matched (with unmatched row-column values for FULLs) into a single result set.

A self-join relates to a table joined to itself.

A subquery must always be coded in parentheses.

When multiple tables are "joined" in a query without a join condition, a Cartesian Product results.

A subquery after an IN list can retain only one column.

Tuesday, February 13, 2007

SQL review -- more topics

Microsoft Access SQL supports the crosstab funtionality of converting rows to columns through the TRANSFORM and PIVOT parameters. Here is a web reference. This does not appear to be part of standard ANSI.

When you design a query in Access with the query wizard, and then go into View in the toolbar, you can look at the SQL statement that it generates. In some cases, punctuation is added to make it easier to read.

There are some conventions in Ansi SQL that Access does not support, such as using an AS parameter directly in a subsequent WHERE clause.

Access documentation claims that it re-engineers correlated subqueries for more efficient operation (rather than redoing the whole subordinate query multiple times) when possible.

More on Ansi:

Note the difference between a DROP TABLE, where all rows are lost and are not recoverable, and a DELETE FROM followed by a DROP. The DELETE backs up every record on the log.

A foreign key can refer to a primary key or to a unique constraint. There is no order to a constraint.

A subquery in an IN list may reference only one column.

A JOIN (inner or outer) is not a set operation. A MINUS is a valid set operation.

Ansi supports single quotes only.

NULLs cannot be a key value.

Remember in removing data that you can't delete a row that is referenced for a foreign key.

Wednesday, February 07, 2007

Review topic: SQL, correlated subqueries, and more misc

If you wanted to know all of the customers in an invoice table that haven't been coded in a master customer table, you could code a correlated subquery. DB2, for example, will run the inner subquery for every row of the outer query, which is often inefficient. To find a "non match" you would code a WHERE NOT EXISTS before the subquery, which still has to be run for every row. A more efficient way to code is simply to use the "NOT IN" option and code the second query.

A synonym for the INNER JOIN is "equi-join" (or equijoin).

A UNION will eliminate duplicates but otherwise display all output "rolled up" and sometimes column names will have misleading meaning.

The SELECT DISTINCT ignores NULL, but the COUNT(*) includes NULL.

To define an index, the first part of the composite key must be used. An index cannot be created on a VIEW.

ANSI SQL always recongizes aggregate functions (SUM, AVG) but does not necessarily offer triggers, checkpoints. b-trees, or even stored procedures. But stored procedures are typically very important in many large DB2 shops.

When a NULL is concatenated in any string, you get a NULL result. This may be important in shops that do batch cycle database replication from a mainframe legacy (DB2) to a midtier in another DBMS.

ANSI SQL recognizes CASE structures (as in most languages) and even remainder on division.

To get rid of an entire table, you use the DROP command (not DELETE) and have to be careful about foreign key dependencies (INT REFERENCES) when the table was created.

View names must be unique and may not specify orders, because of the way they are executed. Views created by SELECT DISTINCT can't be updated.

Back in the early 1990s, RBsase (from Microrim) was the first relational dabatase to offer SQL on a home PC (it as a specific menu item). Around 1990 or so, Ashton-Tate created dBase4, where you had to go into SQL mode to use SQL (otherwise it was like dBase III+). RBase made a lot of creating views.

When Microsoft offers testdrive versions of Visual Studio ADO .NET, it supplies a script to create all the tables in SQL Server, with many CREATE TABLE and INSERT statements to create the test databases for the classwork exercises for its Visual C# and Visual Basic, etc. books. The user could conceivably pattern his own database construction after these.

A table can have only one clustered index, but may have other non-clustered indices.

Here is a reference for normal forms. The first normal form would maintain that a column must be an elementary item and not itself a list. The second normal form requires that a item depend on only one key. The third normal form is the "if and only if" rule and removes any "transitive dependency". That is: Data should depend on the key, the whole key, and nothing but the key!

Tuesday, February 06, 2007

Review Topic: SQL, various comments 1

It appears that a foreign key may or may not have the same column name as a column in the original table. Most textbook examples use a different name. In my own past experiences with MySQL, I found using the same column name and qualifying with the table name worked fine.

The WHERE clause is not required in a SELECT query.

A SELECT with a GROUP BY followed by a HAVING can apply different levels of elimination. On an Invoice table, the SELECT could eliminate orders less than a certain $$ amount, and then the HAVING could eliminate groups (say by customer number) that don't meet a minimum number of orders. Imagine such a SELECT, for example, in reporting on salesmen who did or did not meet a quota. The HAVING clause is coded at the end of the statement.

In DB2, NULL is always the last value in a collating sequence. It is like "missing" in SAS. A NULL value for a column in a row does not count in a true or false determination. The AVG function does not include the count of rows with NULL values in the denominator (it does appear to be included in COUNT).

You can code IS NULL but you do not code "= NULL" because NULL is not a value.

It is common for SQL examinations to ask how many rows are returned by a query, so the rules for NULL are important.

In Microsoft Access, the RUNSQL function in the Macro facility appears to requre the creation of a new table to be accepted.