Intro To Structured Query Language (SQL)

The intent of this tutorial is to provide a basic introduction to writing queries so that beginners can begin to extract data from an existing relational database. While there are other SQL commands that can be executed to create tables and indexes or to insert, update and delete data, this tutorial will concentrate only on SQL queries to assist the reader in retrieving pre-existing data through definition and relevant example.

The SQL used in this tutorial is "ANSI" (American National Standards Institute), or standard SQL. No specific database management systems features will be referenced.

SQL is a query language that allows access to data residing in relational database management systems (RDBMS), such as Sybase, Oracle, Informix, DB2, Microsoft SQL Server, Access and many others. To retrieve information users execute 'queries' to pull the requested information from the database using criteria that is defined by the user.
 
A query, in its simplest form is constructed using the following basic query statements SELECT, FROM, WHERE and ORDER BY. The SELECT clause defines what columns or fields you want to see in your results, the FROM clause defines from what table the columns reside in, the WHERE clause defines any special criteria that must be met in order to be displayed, and finally the ORDER BY clause in which you define the sequence you want to display the results. While the only two query clauses that are required are SELECT and FROM, they are almost always accompanied by the WHERE and ORDER BY clauses to restrict the amount of data retrieved and to present it in an orderly fashion.  

Database Tables
Keys
SQL Select Clause
SQL Where Clause
Order By
Compound Conditions
Using IN & BETWEEN
Using LIKE
Joining Tables
Using Aliases & Column Concatenation
Aggregate Functions
Group By
Distinct
For More Information

Database Tables

Tables are probably a good place to start. Data that resides in a relational database is stored in what are referred to as tables. If you think of a table as a series of rows and columns similar to a spreadsheet then then you should have no problem following this tutorial. Each column contains a piece of data such as first name and birth date, while each row contains all of the data elements for a single record, or individual. For discussion purposes we'll use a few sample tables shown below, one to store family related information and one to store information specific to each individual family member. Another table describes attendee types and would be referred to as a code table. A couple of other tables list classes and classes attended by individuals that began on a certain date. In the example below you will notice that the information in the individual table contains a column that also exists in the family table, FAM_ID. This unique identifier links the two tables together, that is where the term relational comes into play. The other tables contain similarly shared columns, but more about that later, here is what the sample tables look like.
 
There are 4 family records in the Family table.
FAMILY
FAM_IDNAME_LASTSALUTATIONSTREET_ADDRESSCITYSTATEMAIL_CODE
001SamplesJohn & Mary1234 Main StreetAnytownOH40000
002JonesBob376 North West StreetSomwhereOK60000
003SmithJohn & Nancy123 Treelined LaneOverthereCA90000
004BrownTom123 Elm StreetAnywhereFL20000

 
There are 7 individual or family member records in the Individual table related to 4 family records.
INDIVIDUAL
IND_IDFAM_IDNAME_LASTNAME_FIRSTBIRTH_DTMARITAL_STATUSSEXSALARYATTEND_TP
001001SamplesJohn1959-06-25MarriedM35000MEMBR
002001SamplesMary1959-12-31MarriedF60000MEMBR
003002JonesBob1960-02-12SingleM40000MEMBR
004003SmithJohn1963-08-23MarriedM105000VISIT
005003SmithNancy1962-11-14MarriedF0VISIT
006004BrownTom1961-11-11DivorcedM80000MEMBR
007003SmithSusie1997-04-28SingleF0UNK

 
There are 3 records in the Attendee Type table.
ATTENDEE_TYPE
ATTEND_TPATTEND_DESC
MEMBRMember
VISITVisitor
UNKUnknown

 
There are 4 records in the CLASS table.
CLASS
CLASS_IDCLASS_DESC
E101English 101
E201English 201
S101Spanish 101
S201Spanish 201

 
There are 9 records in the CLASS_HISTORY table.
CLASS_HISTORY
CLASS_IDCLASS_DTIND_ID
E1011999-03-01001
E1011999-03-01002
E1011999-03-01004
E1011999-03-01005
E1012000-03-01003
E1012000-03-01006
E2012000-09-01002
E2012000-09-01004
E2012000-09-01005

 
Back To Top
 


Keys

A primary key is a column or set of columns that uniquely identifies the rest of the data in any given row. For example, in the FAMILY table, the FAM_ID column uniquely identifies that row. The IND_ID column does the same in the INDIVIDUAL table. These unique column keys are used throughout the database to track the individuals and families, rather than the individual or family names. So even if two John Smith's existed in the INDIVIDUAL table they could be uniquely referenced using the IND_ID value.
 
A foreign key is a column in a table where that column is a primary key of another table, which means that any data in a foreign key column must have corresponding data in the other table where that column is the primary key. This relationship preserves the referential integrity of the database.
 
Note that each record in the INDIVIDUAL table references a column (FAM_ID) from the FAMILY table. The INDIVIDUAL FAM_ID column in this case is a foreign key because it is a primary key in the FAMILY table. A record could not reside in the INDIVIDUAL table without a related record in the FAMILY table.
 
Back To Top
 


SQL Select Clause

Remember that the minimum requirements for writing a SQL query is the SELECT & FROM clauses. The SELECT clause is where you request the pieces of information, the columns (separated by commas), that you want to see, and the FROM clause is where you define the table or tables from which the columns reside in.
 
For example:
SELECT column_name, column_name, column_name ...
FROM table_name
 
So if we wanted to retrieve a list of all families and their addresses, the query would look like this;
 
SELECT NAME_LAST, SALUTATION, STREET_ADDRESS, CITY, STATE, MAIL_CODE
FROM FAMILY;
 
The results of the query would be:
NAME_LASTSALUTATIONSTREET_ADDRESSCITYSTATEMAIL_CODE
SamplesJohn & Mary1234 Main StreetAnytownOH40000
JonesBob376 North West StreetSomwhereOK60000
SmithJohn & Nancy123 Treelined LaneOverthereCA90000
BrownTom123 Elm StreetAnywhereFL20000

 
Note that we did not request to see the FAM_ID column, specifically we asked to see the columns called NAME_LAST, SALUTATION, STREET_ADDRESS, CITY, STATE and MAIL_CODE. Each column and table name must be specified exactly as it is defined and will not contain any spaces.
 
If you wanted to see every column without having to type each column name you could use an asterisk (*);
 
SELECT *
FROM table_name
 
Or in our example:
SELECT *
FROM FAMILY;
Note that in our example the column FAM_ID would then appear if you used the asterisk.
 
Back To Top
 


SQL Where Clause

The WHERE clause is used to conditionally retrieve only the information that you want to display. Here is a list of the operators that can be used. It is important to note that when querying character or date/datetime type of data that the parameters used in the WHERE clause be surrounded with an apostrophe ( ' ), when querying against numeric type data it is not required.
 
Relational Operators
=Equal
<> or != (check your manual)Not Equal
<Less Than
>Greater Than
<=Less Than or Equal To
>=Greater Than or Equal To

 
To see how the WHERE clause works lets take a look at some examples.
 
If we wanted to see a list of individuals that were married you could run the following query;
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE MARITAL_STATUS = 'Married';
 
NAME_LASTNAME_FIRST
SamplesJohn
SamplesMary
SmithJohn
SmithNancy

 
Back To Top
 


Order By

The ORDER BY clause is used to sort the result set in the desired sequence. The syntax for order by is;
ORDER BY COLUMNNAME [ASC/DESC]
 
The following example sorts the result in descending first name within ascending last name order. The default for the ORDER BY clause is ascending order so if you want your results to be in ascending order it is not necessary to specify the ASC parameter.
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE MARITAL_STATUS = 'Married';
ORDER BY NAME_LAST ASC, NAME_FIRST DESC
 
NAME_LASTNAME_FIRST
SamplesMary
SamplesJohn
SmithNancy
SmithJohn

An alternate syntax for ORDER BY is to use the column numbers as they appear in the SELECT statement. In the example we could have used;
ORDER BY 1, 2 DESC
 
In certain situations (such as with union queries which is not discussed in this tutorial) you are required to use the column number because the column names can be different.
 
Back To Top
 


Compound Conditions

The AND operator joins two or more conditions, and displays a row only if ALL of the conditions are met.
 
For example, building upon our query above, we may want to further restrict the query results using the date of birth column. If we only want to retrieve Individuals that are married and have a birth date on or before December 31, 1959 we could write;
 
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE MARITAL_STATUS = 'Married' AND BIRTH_DT <= '1959-12-31';
 
NAME_LASTNAME_FIRST
SamplesJohn
SamplesMary

 
Note: we could also have written the birth_dt portion of the where clause using the < operator;
...AND BIRTH_DT < '1960-01-01'
 
also noteworthy is the fact that many databases combine date and time column types into a single datetime type of field which could prove extremely important when writing queries. When adding or updating a datetime field unless the time portion is explicity defined it will usually default to 00:00:00 (hours, minutes, seconds), which is the equivalent to midnight 12 a.m.
 
For instance, if the time portion of the BIRTH_DT for Mary Samples was 7:39 p.m. (19:39:00) the above query would not retrieve the record for Mary Samples. In this case we might want to query using the following example.
...AND BIRTH_DT <= '1959-12-31 23:59:59'
so that it would include the entire day for 12/31/1959 through 11:59:59 p.m.
 
The OR operator joins two or more conditions, and displays a row if ANY of the conditions are met.
 
For example, if we wanted to see a list of who had a marital_status of single or divorced with a birth date earlier than January 1, 1990 we could write a query like this;
 
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE ( MARITAL_STATUS = 'Single' OR MARITAL_STATUS = 'Divorced' ) AND BIRTH_DT <= '1990-01-01';
 
NAME_LASTNAME_FIRST
JonesBob
BrownTom

The above query is for demonstration purposes only. Using an IN statement might provide a more efficient method to retrieve the information when querying against a single column using multiple arguments.
 
You can combine AND and OR statements in your WHERE clause. If you do you will want to parenthesize your different statements to make sure you will be returning the results you think you are requesting, plus will add clarity to your query.
For example:
...WHERE column_x = 0 AND (column_y = 1 OR column_z = 2)
will not return the same results as
...WHERE (column_x = 0 AND column_y = 1) OR column_z = 2
 
Back To Top
 


Using IN & BETWEEN

In some cases a more efficient method to using compound statements is to use the IN or BETWEEN statements.
 
In our previous example we wrote a query to retrieve those individuals with a marital status of either 'Single' or 'Divorced' that used a compound where clause (using OR).
 
A more clear and efficient method to write the query is to use the IN statement, which does the equivalent of multiple OR statements for you. The IN statement is followed by a comma separated list of values to search for within a set of parenthesis. The following query will return the same result set as the earlier example;
 
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE MARITAL_STATUS IN ('Single','Divorced') AND BIRTH_DT <= '1990-01-01';
 
The BETWEEN statement will search a column based upon a range of values. For example if we wanted return a list of individuals with a birth date within a certain range, for instance January 1, 1959 and December 31, 1959, we could write the following query.
 
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE BIRTH_DT BETWEEN '1959-01-01 00:00:00' AND '1959-12-31 23:59:59';
 
NAME_LASTNAME_FIRST
SamplesJohn
SamplesMary

Note that the search is inclusive of both parameters, that means that anyone born at midnight on 1/1/1959 or at 11:59 pm on 12/31/1959 would meet the query requirements and be retrieved. It is a better alternative than querying where a date is >= AND <= to some date range.
 
Another option is to use NOT with an IN or BETWEEN. This would retrieve the direct opposite results. Examples of using NOT are;
...WHERE MARITAL_STATUS NOT IN ('Single','Divorced');
or
...WHERE BIRTH_DT NOT BETWEEN '1959-01-01 00:00:00' AND '1959-12-31 23:59:59';
 
Back To Top
 


Using LIKE

The LIKE operator can be used to search a column at a somewhat higher level using special wildcard characters. For instance if you want to return records where a column value begins or ends with a certain character or set of characters.
 
If you wanted to find all of the individuals whose last name began with an 'S' you could write a query like this;
 
SELECT NAME_LAST, NAME_FIRST
FROM INDIVIDUAL
WHERE NAME_LAST LIKE 'S%'
 
NAME_LASTNAME_FIRST
SamplesJohn
SamplesMary
SmithJohn
SmithNancy
SmithSusie

 
The percent sign ( % ) represents any possible character or group of characters that would follow the 'S'.
To search for those individuals whose last name ends with an 'S' you would use
...LIKE '%S'
To search for those individuals whose last name contained an 'S' somewhere you would use
...LIKE '%S%'
 
The underscore character ( _ ) can be used to represent a single character. This comes in handy when you know the exact position you are expecting a certain character to appear. For instance if you wanted to search for those individuals whose last name contained an S in the second position you would use
...LIKE '_S%'
 
As with IN and BETWEEN, you can also use the NOT operator with LIKE, to search for individuals whose last name did not start with an 'S' you would use
...NAME_LAST NOT LIKE 'S%'
 
For further details on using LIKE and other SQL commands you should consult your database manual or a good SQL reference book.
 
Back To Top
 


Joining Tables

Recalling our discussion about keys, the reason that they exist is to uniquely identify records in a table and are used to preserve relationships between data across multiple tables so that the data does not have to be duplicated in every table. Therefore it is necessary to retrieve data from two or more tables in order to produce the desired results.
 
In our example we see that a relationship between the INDIVIDUAL and FAMILY tables exist through the FAM_ID column. This allows us to tie the two tables together and determine the mailing address for each individual. The following example joins the INDIVIDUAL and FAMILY tables together to obtain the mailing address for women;
 
SELECT INDIVIDUAL.NAME_LAST, INDIVIDUAL.NAME_FIRST,
FAMILY.STREET_ADDRESS, FAMILY.CITY, FAMILY.STATE, FAMILY.MAIL_CODE
FROM INDIVIDUAL, FAMILY
WHERE INDIVIDUAL.FAM_ID = FAMILY.FAM_ID
AND INDIVIDUAL.SEX='F';
 
Note that both tables that we are querying to obtain information from are listed in the FROM clause. The WHERE clause contains a statement that ties the two tables together by comparing the FAM_ID column from both tables, and the WHERE clause contains a statement to further restrict the results to only women.
 
Also note that columns in the query are prefixed using the table from which they reside followed by a period ( . ), this is referred to as dot notation and avoids ambiguity in having the query engine determine which column to use on its own. This is only necessary when a column exists in more than one table being joined. For instance because the column SEX only resides in the INDIVIDUAL table and the address fields only reside inthe FAMILY table, it was not necessary to use dot notation, but was instead used for clarity.
 
We can join to a third table, ATTENDEE_TYPE to display the Attendee Type Description for each individual. A relationship between the INDIVIDUAL and ATTENDEE_TYPE table exists through the column ATTEND_TP, which is the primary key in the ATTENDEE_TYPE table and a foreign key in the INDIVIDUAL table.
 
SELECT INDIVIDUAL.NAME_LAST, INDIVIDUAL.NAME_FIRST,
   FAMILY.STREET_ADDRESS, FAMILY.CITY, FAMILY.STATE, FAMILY.MAIL_CODE,
   ATTENDEE_TYPE.ATTEND_DESC
FROM INDIVIDUAL, FAMILY, ATTENDEE_TYPE
WHERE INDIVIDUAL.FAM_ID = FAMILY.FAM_ID
   AND INDIVIDUAL.ATTEND_TP = ATTENDEE_TYPE.ATTEND_TP
   AND INDIVIDUAL.SEX='F';
 
Here's another example. To produce a report showing the ID and Name of those individuals that have taken a class, we will need to join the CLASS table, CLASS_HISTORY and INDIVIDUAL tables together. The CLASS_HISTORY table lists all individuals that attended a class beginning on a certain day, the CLASS table lists the description of each class and the INDIVIDUAL table lists each individual record.
 
SELECT INDIVIDUAL.NAME_LAST, INDIVIDUAL.NAME_FIRST, CLASS_HISTORY.CLASS_ID, CLASS.CLASS_DESC
FROM INDIVIDUAL IND, CLASS_HISTORY HST, CLASS CLS
WHERE CLASS_HISTORY.CLASS_ID = CLASS.CLASS_ID
   AND CLASS_HISTORY.IND_ID = INDIVIDUAL.IND_ID

 
Back To Top
 


Using Alias's & Column Concatenation

Alias's are names that are assigned to tables in the FROM clause of a query that are then used as prefixes in all dot notations for columns referenced in the query. This makes it easier to write queries as well as provides clarity in reading a query, it may also be required to eliminate ambiguity caused when two or more tables are joined together and the query engine cannot determine which table the column you are referencing should be used. The alias can be one or more characters as long as it is unique and helps make the query more clear to you. It could be as simple as the letters a, b, c which would result in columns being referenced as a.columnname, b.columnname, etc.
 
Concatenation is a fancy term used to describe the process of combining two or more character columns together.
 
In the example below the alias's are IND for the INDIVIDUAL table and FAM for the FAMILY table. We are also using column concatenation to combine the last and first name columns and defining them as Name. The STREET_ADDRESS column is being alias'd as Street Address, and we are also concatenating the CITY, STATE & MAIL_CODE columns and aliasing it as City State Zip. Note that in our concatenation definition we are also embedding commas and spaces where necessary.
 
SELECT 'Name' = IND.NAME_LAST + ', ' + IND.NAME_FIRST,
   'Street Address' = FAM.STREET_ADDRESS,
   'City State Zip' = FAM.CITY + ', ' + FAM.STATE + ' ' + FAM.MAIL_CODE
FROM INDIVIDUAL IND, FAMILY FAM
WHERE IND.FAM_ID = FAM.FAM_ID
   AND IND.SEX='F';
 
NameStreet AddressCity State Zip
Samples, Mary1234 Main StreetAnytown, OH 40000
Smith, Nancy123 Treelined LaneOverthere, CA 90000
Smith, Susie123 Treelined LaneOverthere, CA 90000

 
An alternate method to create a column alias can also be used using the following syntax;
SELECT COLUMNNAME AS ALIASNAME ...
 
Back To Top
 


Aggregate Functions

Aggregate functions are functions that summarize results of a query instead of reporting each detail record. This document will discuss the following aggregate functions;

  • SUM (columnname) reports the column total for all rows meeting the conditions in the WHERE clause, column must be numeric
  • AVG (columnname) reports the average for the column
  • MAX (columnname) reports the the largest figure for the column
  • MIN (columnname) reports the smallest figure for the column
  • COUNT(*) reports the number of rows meeting the conditions in the WHERE clause
This query reports the total number of records (COUNT), and the total (SUM) and average (AVG) of all salaries from the INDIVIDUAL table.
SELECT COUNT(*), SUM(SALARY), AVG(SALARY)
FROM INDIVIDUAL;
 
This query reports the smallest (MIN) and largest (MAX) salary amount from the INDIVIDUAL table WHERE the ATTEND_TP is equal to MEMBR and the SALARY amount is greater than zero.
SELECT MIN(SALARY), MAX(SALARY)
FROM INDIVIDUAL
WHERE ATTEND_TP = 'MEMBR'
   AND SALARY > 0;
 
MIN(SALARY)MAX(SALARY)
3500080000

 
Aggregate functions can be combined with the GROUP BY clause to report aggregate figures at sub total levels. Refer to the Group By section for more information.
 
Back To Top
 


Group By

The GROUP BY clause can be used to report aggregate figures at sub total levels. For instance we could further break down the query example discussed in the aggregate section to report totals by ATTEND_TP instead of reporting a single grand total figure for all individuals that meet the WHERE criteria.
 
Here is an example of a query that reports the total (SUM) and average (AVG) salary amount from the INDIVIDUAL table for each ATTEND_TP where the SALARY amount is greater than zero. This time however the results are broken down by attendee_type.
 
SELECT 'Attendee Type'=ATTEND_TP 'Total Salary'=SUM(SALARY), 'Average Salary'=AVG(SALARY)
FROM INDIVIDUAL
WHERE SALARY > 0
GROUP BY ATTEND_TP
 
Note that the GROUP BY clause follows the WHERE clause.
 
Attendee TypeTotal SalaryAverage Salary
MEMBR21500053750
VISIT105000105000

 
Back To Top
 


Distinct

Getting Rid of Duplicates. If you wanted to produce a report showing the ID and Name of only those individuals that have ever taken a class, you may not want them to be listed more than once if they had attended more than one class. To prevent duplicate names you can use the DISTINCT keyword. The DISTINCT keyword is placed in the SELECT clause preceeding the column that would produce duplication.
 
SELECT DISTINCT IND.NAME_LAST, IND.NAME_FIRST
FROM INDIVIDUAL IND, CLASS_HISTORY HST
WHERE HST.IND_ID = IND.IND_ID
 
Back To Top
 


For More Information

As mentioned at the start, there are many other SQL commands that can be used to write much more complex queries. The intent of this tutorial was meant to provide a basic introduction to writing queries for the beginner to extract data from an existing relational database. For more information I encourage you to seek additional resources available over the world wide web as well as from a variety of books available from your local or on-line bookstore. Here are a couple of books to get you started available from Amazon.com.
 
Sams Teach Yourself SQL in 10 Minutes (4th Edition)
Sams Teach Yourself SQL in 10 Minutes has established itself as the gold standard for introductory SQL books, offering a fast-paced accessible tutorial to the major themes and techniques involved in applying the SQL language. Forta's examples are clear and his writing style is crisp and concise. As with earlier editions, this revision includes coverage of current versions of all major commercial SQL platforms. New this time around is coverage of MySQL, and PostgreSQL. All examples have been tested against each SQL platform, with incompatibilities or platform distinctives called out and explained.
  
Beginning SQL Queries: From Novice to Professional (Books for Professionals by Professionals)
Beginning SQL Queries is aimed at laypeople who need to extract information from a database who are new to SQL. The book is especially useful for business intelligence analysts who must ask more complex questions of their database than their GUI based reporting software supports. Such people might be business owners wanting to target specific customers, scientists and students needing to extract subsets of their research data, or end users wanting to make the best use of databases for their clubs and societies.

Back To Top

 

This article also available in Serbo-Croatian language translated by Jovana Milutinovich of Webhostinggeeks.com.

© Copyright 1996-2012, Mount Vernon, Ohio USA, Thunder Software
This document is available for use by any internet user, but may not be included in any other document,
web site or server, published in any other form or copied or mass produced in any way without permission.