Thursday, January 1, 2015

TOP in T-SQL




Using the TOP clause lets you select n rows from the beginning of the selected output. 
You can use TOP with the following syntax in T-SQL :

        SELECT TOP n
        *
        FROM table_name;

TOP is mostly used to fetch the TOP n rows according to any particular ORDER . For example selecting the TOP employees taking the maximum salary in an Organization. The query would look like this:

        SELECT TOP 5
        *
        FROM emp
        ORDER BY salary;

We can use the TOP to get random outputs also without using ORDER BY according to our requirements. We may use TOP to select randomly TOP 5 lucky draw winner.

TOP can be used with two other options PERCENT and WITH TIES. These are optional and can be skipped while using the TOP. We will look into each in detail one by one:

  • PERCENT : This option is used if we want to use the TOP to select a certain percentage of all the data in the table, such as getting TOP 25% of the entries. The syntax for using TOP with PERCENT is :

        SELECT TOP 25 PERCENT
        *
        FROM emp;

The above query will select TOP 25% of the rows from a table.

  • WITH TIES : This option can only be used with ORDER BY. WITH TIES is used to encourage the DUPLICATE values for the last row in the result. When used with WITH TIES option, TOP will return TOP n rows and if the last row is 5th then if the 6th row has the same value as the 5th row then using the WITH TIES option the 6th row will also be returned . The syntax for using WITH TIES with TOP is :

        SELECT TOP 25 PERCENT
        *
        FROM emp;


emp is the table containing the following data :

Empno
ename
job
mgr
hiredate
sal
comm
deptno
7369
SMITH
CLERK
7902
1980-12-17
800.00
NULL
20
7499
ALLEN
SALESMAN
7698
1981-02-20
1600.00
300.00
30
7521
WARD
SALESMAN
7698
1981-02-22
1250.00
500.00
30
7566
JONES
MANAGER
7839
1981-02-04
2975.00
NULL
20
7654
MARTIN
SALESMAN
7698
1981-09-29
1250.00
1400.00
30
7698
BLAKE
MANAGER
7839
1981-01-05
2850.00
NULL
30
7782
CLARK
MANAGER
7839
1981-06-09
2450.00
NULL
10
7788
SCOTT
ANALYST
7566
1987-07-13
3000.00
NULL
20
7839
KING
PRESIDENT
NULL
1981-11-17
5000.00
NULL
10
7844
TURNER
SALESMAN
7698
1981-08-09
1500.00
0.00
30
7876
ADAMS
CLERK
7788
1987-07-13
1100.00
NULL
20
7900
JAMES
CLERK
7698
1981-12-03
950.00
NULL
30
7902
FORD
ANALYST
7566
1981-12-03
3000.00
NULL
20
7934
MILLER
CLERK
7782
1982-01-23
1300.00
NULL
10










The output of this query will be:

Empno
ename
Job
mgr
hiredate
sal
comm
deptno
7369
SMITH
CLERK
7902
1980-12-17
800.00
NULL
20
7499
ALLEN
SALESMAN
7698
1981-02-20
1600.00
300.00
30
7521
WARD
SALESMAN
7698
1981-02-22
1250.00
500.00
30
7566
JONES
MANAGER
7839
1981-02-04
2975.00
NULL
20
7654
MARTIN
SALESMAN
7698
1981-09-29
1250.00
1400.00
30

Now WITH TIES won’t make any difference in our case. The result seems similar to the query without using the WITH TIES. But if we had 2 entry in our table as :

7654
MARTIN
SALESMAN
7698
1981-09-29
1250.00
1400.00
3
7654
MARTIN
SALESMAN
7698
1981-09-29
1250.00
1400.00
30

Then using TOP without WITH TIES will not return both the rows because on of the above duplicate rows will not be returned as only one comes in TOP 5 and the other one is at 6th position so only one is returned.

But using TOP with WITH TIES will return BOTH 5th and 6th row.

Empno
ename
Job
mgr
Hiredate
sal
comm
deptno
7369
SMITH
CLERK
7902
1980-12-17
800.00
NULL
20
7499
ALLEN
SALESMAN
7698
1981-02-20
1600.00
300.00
30
7521
WARD
SALESMAN
7698
1981-02-22
1250.00
500.00
30
7566
JONES
MANAGER
7839
1981-02-04
2975.00
NULL
20
7654
MARTIN
SALESMAN
7698
1981-09-29
1250.00
1400.00
30
7654
MARTIN
SALESMAN
7698
1981-09-29
1250.00
1400.00
30

NOTE : the duplicate values are only displayed id they are the last rows in our output. This will not work with any row appearing at the position less than our n (TOP n) because for any row at position less than `n` the duplicate values are returned even without using WITH TIES and the duplicates are counted in the no. of rows returned.


TOP is executed at last while executing a query because the user expects the final output to be shortened according to the number n , not the random output. For example when using the TOP clause with ORDER BY statement , you will expect it to result the TOP n rows of the ORDERED LIST not the TOP n rows of the random data in the TABLE and then ORDER it. Here is an example to make you understand better:

Consider the following query which selects the TOP 5 salary takers in a company

       SELECT TOP 5
       *
       FROM emp;
       ORDER BY sal;

Our query will return the following rows (TOP 5 salary takers):

Empno
ename
job
mgr
hiredate
sal
comm
deptno
7839
KING
PRESIDENT
NULL
1981-11-17
5000.00
NULL
10
7902
FORD
ANALYST
7566
1981-12-03
3000.00
NULL
20
7788
SCOTT
ANALYST
7566
1987-07-13
3000.00
NULL
20
7566
JONES
MANAGER
7839
1981-02-04
2975.00
NULL
20
7698
BLAKE
MANAGER
7839
1981-01-05
2850.00
NULL
30

But imagine if the TOP clause were to run before the ORDER BY clause what would have been the output:

First the Database will SELECT the TOP 5 rows FROM the table which stores data randomly without any ORDER.
Then the database would execute the ORDER BY clause on the TOP 5 rows selected from the randomly ordered data. The output will look like this:

Empno
ename
job
mgr
hiredate
sal
comm
deptno
7566
JONES
MANAGER
7839
1981-02-04
2975.00
NULL
20
7499
ALLEN
SALESMAN
7698
1981-02-20
1600.00
300.00
30
7521
WARD
SALESMAN
7698
1981-02-22
1250.00
500.00
30
7369
SMITH
CLERK
7902
1980-12-17
800.00
NULL
20
1001
Varun
NULL
NULL
NULL
100.00
NULL
NULL

So if the TOP clause would have been executes before the ORDER BY clause we would get unexpected results.

No comments:

Post a Comment