In this post I’m going to attempt to elevate your SQL skills to a satisfactory level in 5 steps, each step can be read in about a minute for a total of 5 minutes of your time. The SQL syntax I’m going to use is called T-SQL (Transact-SQL) and is compatible with SQL Server for example. Different databases can use different syntax but for the most part they are pretty similar and you can easily google the equivalent syntax for whatever database you may be working with.
Here is the agenda for the next 5 minutes:
Step 1 – The basics
Step 2 – Filtering
Step 3 – Joining
Step 4 – Grouping
Step 5 – Advanced SQL
Without wasting any time, let’s get started.
STEP 1 – THE BASICS
I’m guessing if you’re reading this you already understand the concept of tables and databases but basically a table is like a spreadsheet made up of columns and rows and a database is a collection of tables. SQL is the language we use to retrieve or query rows from those tables. Now, consider the following customer table:
Table: LU_CUSTOMER
Columns: CUSTOMER_ID, CUST_FIRST_NAME, CUST_LAST_NAME, ZIPCODE, AGE_YEARS
Say we want to query all the columns and records in the table LU_CUSTOMER. Well, just about all SQL queries begin with the keyword “SELECT” followed by the columns you want to return with each column separated by a comma. In this case we want all the columns so we would type “SELECT CUSTOMER_ID, CUST_FIRST_NAME, CUST_LAST_NAME, ZIPCODE, AGE_YEARS“.
The next step is to tell the database which table we want to select from by using the keyword “FROM” followed by the table name. So when you put it all together you get “SELECT CUSTOMER_ID, CUST_FIRST_NAME, CUST_LAST_NAME, ZIPCODE, AGE_YEARS FROM LU_CUSTOMERS”. This query will return all the customer records from the LU_CUSTOMER table. (NOTE: If you want to return ALL columns, instead of typing each column name you can simply use an asterisk “*”, for example to query all columns from a table use SELECT * FROM TABLE_NAME)
Now, if you want to sort the results you would add the “ORDER BY” keyword followed by the columns you want to sort by. For example, to order by CUSTOMER_ID, you would use “ORDER BY CUSTOMER_ID”. The default sorting is ascending but you could add “DESC” after the column name to order in descending order. The final query with sorting would look like this:
STEP 2 – FILTERING
Usually when you are querying a table you only want a specific set of records. Now, using our previous SQL query, let’s say you only want customers with the ZIPCODE equal to 85253. Filters are added to a query by using the keyword “WHERE” followed by the condition. So, in our example we would add WHERE ZIPCODE = 85253 to our above query.
The resulting query would be:
Below is an example of query results.
It is also worth mentioning that you can use other operators such as ZIPCODE > 85253, which would return all customers with zip codes greater than 85253. If we wanted multiple zip codes like 85253, 19124 and 07013, we could modify the WHERE statement to use the keyword “IN” like this: WHERE ZIPCODE IN (85253, 19124, 07013).
Now, taking it a step further you can apply multiple filters in a WHERE clause. For example, if we wanted all customers with a zip code of 85253 and a last name of Janssen, our filter would look like WHERE ZIPCODE = 85253 AND CUST_LAST_NAME = ‘Janssen’. It is also worth pointing out that string or text values must be contained within single quotes.
STEP 3 – JOINING
Now, databases will usually have many tables and a single query will usually need data from multiple tables to obtain the desired results. To accomplish this you will need to join tables together using SQL.
Recall the LU_CUSTOMER table from the previous steps and now let’s add a new table that contains customer sales information named CUSTOMER_SLS. This table simply contains the total dollar sales and total units for each customer.
Table: CUSTOMER_SLS
Columns: CUSTOMER_ID, TOT_DOLLAR_SALES, TOT_UNIT_SALES
Suppose we need to write a query that returns a customers zip code, last name and total sales units for all customers with a zip of 85253. Well, customer name is in the LU_CUSTOMER table and unit sales is in the CUSTOMER_SLS table. So, to join these two tables together we first build our select statement we learned in step 1 which would read SELECT ZIPCODE, CUST_LAST_NAME, TOT_UNIT_SALES.
Next, we add FROM LU_CUSTOMER. Now we need to join the two tables together based on a common field. Usually related tables will have an identity column in common. In this case both tables contain CUSTOMER_ID. In order to join these tables we use the keyword “JOIN” followed by the table name and “ON” followed by a link between matching columns.
The resulting query with a join will look like this:
Notice how we set the CUSTOMER_ID from LU_CUSTOMER equal to the CUSTOMER_ID from CUSTOMER_SLS. This is how you tell the database to link the two tables together.
It is worth noting that when you are working with tables that have the same column names, in order for the database to know which column you are referring to you must prefix the column name with “TABLE.” or in our example LU_CUSTOMER.CUSTOMER_ID and CUSTOMER_SLS.CUSTOMER_ID.
Now, what we have done is joined the CUSTOMER_SLS table to the LU_CUSTOMER table based on the CUSTOMER_ID. This query will only return customers that have matching customer id’s in both tables. For example, if customer id 12345 has a record in the LU_CUSTOMER table but does NOT have a matching record in the CUSTOMER_SLS table, then this customer would not be included in this query. This type of join where only matching records are returned is called an INNER JOIN and is the most common type of join.
Now, what if we wanted the query to return all customers even if they don’t have a matching record in the CUSTOMER_SLS table? Well, this type of join is called an outer join. If we wanted the query to function this way, we would simply change the keyword “JOIN” to “LEFT OUTER JOIN” and that’s it.
I could spend a lot more time on joins but we only have 5 minutes! 🙂
The final query with outer join would look like:
Below are some example query results using an outer join, notice the TOT_UNITS_SALES columns are empty (NULL) for some of the records. These customers did NOT have a matching record in the CUSTOMER_SLS table so the value is empty because there is no value.
STEP 4 – GROUPING
Grouping is a feature of SQL that allows you to total or aggregate data. For example, using the above query from the previous step, suppose we wanted to modify the query so it returned the zip code and total units sales for customers with the zip code 85253. This is where you will need to use the keyword “GROUP BY”. A group by does just that, it groups records together so you can perform some type of total or aggregate function. In our case we want to group by zip code, so we would add the statement GROUP BY ZIPCODE below the “WHERE” filter.
The query would look like this:
Now, any fields that are not in the “GROUP BY” must be wrapped in some sort of aggregate function like SUM, AVG or COUNT. In our example, we want the TOTAL unit sales so we used the SUM function with the column TOT_UNIT_SALES like this SUM(TOT_UNIT_SALES). In a nutshell this SQL statement tells the database we only want one record for each zip code and the other field “TOT_UNIT_SALES” will be a total of units sales for ALL records returned by the query. In this case the only records that will be totaled are ones with a zip code of 85253 as specified in the WHERE filter.
Below is an example of query results.
Let’s take this one step further and remove the filter “WHERE ZIPCODE = 85253”. The query will now return total units sales for ALL zip codes. Now, returning total units sales for all zip codes is great but let’s suppose you are only interested in zip codes where the total unit sales is greater than 2000. How would you accomplish this? You might be thinking you would add a filter “WHERE TOT_UNIT_SALES > 2000” but this is wrong because this will filter customer records before they are grouped by zip code.
Remember, the goal of our query is to return the TOTAL unit sales by zip code and we want to filter the results AFTER they are grouped. To accomplish this you need to use the “HAVING” keyword after your “GROUP BY”. The HAVING clause is how you can add filtering to the results of a GROUP BY. In this example you would add HAVING SUM(TOT_UNIT_SALES) > 2000 to the query.
The resulting query would be:
The below screenshot shows sample query results showing all zip codes returned have a total unit sales greater than 2000.
STEP 5 – ADVANCED SQL
It is important to think of SQL queries like spontaneous tables. Understanding this makes it easier to realize that queries can be joined just like tables. That’s right, queries can be joined to tables and other queries.
For example, let’s consider the following query:
In this SQL query I have joined together the results of two separate queries! This is completely valid SQL.
Now, let’s talk about the rules, first each query must be wrapped in parenthesis and aliased. To alias something simply means to give it a name. As you can see outside the parenthesis of each query I have used the name “query1” and “query2” respectively. You might also have noticed that I aliased a column in the second query. After SUM(TOT_UNIT_SALES), I used the alias name “TOTAL_SALES”.
See below screen shot for example results from this query.
As you can see the last column header reads “TOTAL_SALES” which is the alias name I gave this column.
Congratulations you now know satisfactory SQL!
Feel free to leave comments below.