In a previous tip, I described how to find gaps in sequences using Oracle's powerful analytic functions. TechRepublic member Sachin Jayashekhar asked about a related problem in a forum post for the article. This member wanted to know how to find the first and last values of a sequence once the gaps are detected. I've recreated Sachin's sample data and added a few more rows in the script in Listing A.
Sachin's sample script used the analytic functions FIRST_VALUE and LAST_VALUE to attempt to find the endpoints of the sequence. Although this worked for Carton A and Carton C in Sachin's sample data, Carton B's output was incorrect because it had multiple runs within the carton. Since Sachin was partitioning by carton_number, FIRST_VALUE and LAST_VALUE always returned the first and last values of the entire carton, not each run of serial_numbers within it.
This would have worked if there had been an additional column, such as run_id, to use in partitioning the data. In that case, the analytic clause
OVER (PARTITION BY carton_number ORDER BY serial_number)
would simply be rewritten as:
OVER (PARTITION BY carton_number, run_id ORDER BY serial_number)
However, the table had no such column. Instead, we have to use LEAD and LAG functions, as in the original article, to detect the gaps between sequence runs. Listing B shows one way to do it.
There are four possible cases for a given row:
- The row is the first row of a multi-row sequence.
- The row is the last row of a multi-row sequence.
- The row is a middle row, neither first nor last, of a multi-row sequence.
- The row is a single row sequence.
The code has to account for all of these cases. If it's the first row or a single row, we need to print the serial number in the "FROM" column of our report. If it's the last row or a single row, we print the serial number in the "TO" column. If it's a middle row, it can be safely ignored.
Because analytic functions cannot be nested (i.e., used as an argument to another analytic function), we have to use two layers of subquery to accomplish this. The first subquery, named aquery, uses the LAG function to determine the start of a new sequence. It does this by subtracting the current serial number from the previous row's serial number. If the difference is 1, it's a continuation of an existing sequence. If it's greater than 1, it's a new sequence. (Using ORDER BY serial_number in the analytic clause makes sure the numbers are in ascending order.)
This calculated column, named diff, is then tested in the second subquery, named bquery. The difference from the current row is compared to the difference in the following one, via the LEAD function. If this next_diff value is 1, there is at least one more row in the sequence run. If the next_diff is greater than 1, we're at the end of a sequence. Note that the default value 999 is assigned if there is no next row (i.e., we reached the end of a carton), so the row is marked as a gap.
Finally, our main query selects from bquery and tests for the four cases above. A WHERE clause knocks out the unwanted middle rows, by selecting only rows that represent the beginning or ending of a sequence. Two CASE statements, one each for the "FROM" and "TO" columns, compare the values of diff and next_diff and returns either a serial number or NULL.
The resulting output is very similar to that described in Sachin's post.
Miss a tip?
Check out the Oracle archive, and catch up on our most recent Oracle tips.