This page contains a few tricks with Oracle SQL that I have found to be useful over the years. Some of them might apply directly to other SQL dialects, or do so after only minor changes; however, I make no such guarantee in any given case. For natural reasons, the main intended audience consists of software developers using SQL on regular basis as a complement to their main languages (Java, C, whatnot). Many things mentioned here will be elementary to an SQL specialist.
If you know a better way to achieve some of the below (always a possibility with SQL), please let me know.
This page was originally written in 2012, but went unpublished until 2024. Even the 2024 publishing is in so far a little premature as there were several to-dos left in, including a “check code for correctness and format it”, and as there are plenty of other tips that could potentially be included. (I suspect that I, even back then, could have found one or two dozen tips more valuable than the discussion of DB links towards the end. I would speculate that I had simply encountered that topic shortly before the time of writing.)
As it has been several years since I last dealt with SQL (in general or specifically for Oracle) in a non-trivial manner, I find it best to publish what I have and be rid of the text for now.
While I have done some polishing, most was a matter of (the English) language, I have not checked various details for correctness (I am hampered both by being rusty and by not having an Oracle DB available), and I have left the code “as is” (contrary to the aforementioned to-do—caveat lector).
I make corresponding reservations for sloppy errors by 2012 me, which 2012 me would have found and corrected during polishing and proof-reading, but which 2024 me might not have.
I make further reservations for what might have changed since the time of writing.
Someone very new to Oracle or SQL in general might want to read up
on the many functions available, including e.g. substr
,
nvl
, and nvl2
; and other syntactical niceties, e.g. the
case
statement
(the more handy and compact, but Oracle specific and slightly weaker,
decode
is used repeatedly below). Knowledge of aggregate functions is a
near must (their complements, the analytical functions, are partially discussed
below).
PL/SQL brings a lot of power, but is off-topic. I will likely add another article on the topic in the future.
Learning some of the specifics of SQL*Plus can also be very beneficial, but since other SQL clients (even Oracle’s own SQL-Developer) are not sufficiently compatible, it too remains off-topic.
Typically, I will not give instruction on the basics of the used functionality, but assume that the basics are already known. (If in doubt, the reader who does not know can easily find the corresponding information online.)
Using a temporary table can be a great saver of both time and effort when using variations of the same (exploratory) query or making several such queries with similar base data. For instance, when comparing two sets of data I often find myself using one or several temporary tables to keep various partial steps (the first often containing the result of a join of two tables, identifying the relevant entries and giving some minimal data).
(The “exploratory” is to differ from queries used by production code, queries called over officials views, and similar. In such cases, temporary tables are only very rarely suitable and the priority should be to find and enforce good execution plans. Note that the means to do this are also better, as e.g. the creation of a new index is a far more reasonable action than for an “exploratory” query. In a pinch, a “materialized view”, which has the advantage of staying up-to-date, is a better choice for production code.)
A particularly useful trick is to replace a
select ... from ... where ... in ([long list of identifiers])
with a join on a temporary table.
Normally, this is a waste of time—but consider wanting to select
the identifiers not found in the original table. By using a join instead,
this becomes very simple (see the following section for an example; this
technique, obviously, works wonderfully on regular tables too).
To reduce cluttering in the main schema, especially when a production system is concerned, I recommend putting such tables in a separate schema—ideally one specific to the querying user.
Why not use a with
clause rather than a temporary table?
A with
clause can be quite handy when only one eventual select is
to be executed only once. Here the need to create and later drop the table
is removed; and if several temporary tables would have been needed, it can
be easier to keep track. However, when repeated executions are intended
(usually the case when working interactively and adapting the statements)
or when the temporary table is intended for several queries (very often
the case), then the temporary table wins out.
Further, a temporary table can be more flexible, not being limited to a single (if possibly complex) SQL statement on existing tables (consider e.g. wanting to access data delivered in a CSV file) and allowing easier incremental combinations of data (say, selecting from table A, inspecting the results, then combining with table B, inspecting the results, and so on).
When the creation of a temporary table with external data is not an option
on a given system
(e.g. due to insufficient rights or a wish to reduce the risk of accidental
interference with the official data model), selects from dual
in combination with unions can be an option.
Example to find all account numbers in a given list not present in a table:
select account_number from (
(select ’2340923’ as account_number from dual) union
(select ’2340925’ as account_number from dual) union
...
(select ’2346723’ as account_number from dual)
) as account_numbers
left outer join account_table on account_table.account_number = account_numbers.account_number
where account_table.id is null;
If a sufficiently large other table is present, the set of unions can be replaced by something like
select decode(rownum, 1, ’2340923’, 2, ’2340925’, ...) as account_number from
other_table where rownum < ...
Beware that the time needed for parsing the actual SQL statement can go into the seconds when even a few hundred entries are concerned. Further, obviously, it is not possible to add an index to this type of “table”. Correspondingly, it is best used with smaller data sets. (Indeed, there is likely to be an upper limit in the number of entries available in this manner; however, I have not yet used so large a set. The first technique works up to at least 10.000 entries in Oracle 11.)
Looking back from 2024, I am uncertain about this parsing claim (and I do not have the time to do research). Even adjusting for the performance increases in the interim, seconds sounds like a lot. It might be that I intended the full pre-execution phase, including the building of an execution plan, and expressed myself sloppily. Network delays might also be a contributor, but only rarely by that much.
Actually generating the corresponding statements is not hard. For instance, given a file with one account number per row, the Vim command
%s/.*/(select ’&’ as account_number from dual) union/
will take care of the issue, leaving only the “frame” for manual insertion.
Oracle provides an extensive “dictionary” that can be used
to query information about the database and its contents. For instance,
a select * from user_tables
gives information on various tables,
while select * from user_constraints
gives information on constraints.
(Ditto for all_tables
and dba_tables
, etc., which might be a better choice,
depending on what information is wanted and what rights the current user has. A deeper
discussion goes beyond the scope of this page. For a full overview of the dictionary,
see Oracle documentation.)
Comparisons of dates can be tricky; in particular, as the default display
only has a granularity of a day. (Note that Oracle differs from ANSI in that
dates correspond more to timestamps.) A naive date1 = date2
might
then fail even if two dates appear to be the same. By first formatting the
dates with to_char
, this problem is avoided (and any arbitrary
granularity can be chosen, e.g. month, day, or second).
Oracle supports a number of set operators—of which most developers only use
union
. Others, e.g. minus
, can be very handy. The corresponding
result can often be found by some mixture of joins and selection criteria.
but using a set operator is usually more elegant and easier to read, and can
sometimes bring a performance benefit.
In addition to the aggregate functions (e.g. counts and sums with an optional
group by ... having ...
) that should be common knowledge, Oracle
has analytical functions that can be used for similar tasks, but sometimes
are more flexible. Here only the select
clause is altered to contain
e.g. sum(price) over (partition by invoiceNumber)
. A particular benefit
is the ability to print various results together with the “normal” data
of a query: An aggregate function can only occur with non-aggregated fields
that the query groups by; an analytical function can appear with arbitrary
fields.
decode
is almost magical when used correctly. Consider e.g.
wishing to have a single query give the counts of all invoices, all invoices
in state 2, and all invoice in state 5, per city (the meaning
of these state numbers is not relevant):
select count(*), sum(decode(state, 2, 1, 0)), sum(decode(state, 5, 1, 0))
from invoice group by city;
Indeed, the number of tricks to be found with decode
is only limited
by the imagination. Even most case
statements can be replaced by it.
(Whether actually doing so is a good idea will depend on the circumstances.)
For those wondering, the arguments are, in order, the value to decode,
arbitrary pairs of before and after values, and an optional default value.
(If no default value is given, null is used.) The first use above, then,
yields 1 if state = 2
and else 0. (After which the summing amounts to a count.) Much of the magic stems from
the ability to use fields, function calls, whatnot, as values—not just constants.
Comparisons with a null
value always yield false
, leading many an
unwary developer to faulty queries—and those wary to extra work.
Strictly speaking, at the time of writing, Oracle does not have a proper boolean datatype.
What actually happens is that a great many expressions, including comparisons, that involve a
null
value always result in null
—even when null
is compared with it self. In a next step,
a null
value is treated approximately as false
in a
“boolean context”. Notable exceptions to this include the below
decode
, nvl
, and nvl2
.
For simplicity, I will gloss over this.
Our friend decode
can help with e.g.
decode(field1, field2, 1, 0) = 1
Depending on the data in the rows being compared, this statement can then evaluate as e.g.
decode(’rabbit’, ’hare’, 1, 0) = 1 => false
decode(’rabbit’, null, 1, 0) = 1 => false
decode(’rabbit’, ’rabbit’, 1, 0) = 1 => true
or
decode(null, null, 1, 0) = 1 => true
While this is not as easy to understand and read as field1 = field2
(with a different semantic, yielding false in the last example),
it is at least as good as nvl(field1, X) = nvl(field2, X)
(where X refers to some non-null
value that is guaranteed not to be present in its
own right) and there is no need for the X—the choice of which can be tricky
and cannot be kept identical from case to case.
Further, comparison through nvl
is less flexible, including that
it is not suited for use in the “select” part of a query, and that it
cannot be extended to handle less trivial comparisons. Assume e.g. that
we want to display the result of a comparison using a logic of equal
(in the sense of “=”) => 1, both values null
=> -3, the first
\{null} => -1, the second => -2, otherwise unequal => 0:
select decode(field1, field2, nvl2(field1, 1, -3), null, -1, nvl2(field2, 0, -2))
from ...
(Where nvl2(a, b, c)
could, obviously, be replaced with
decode(a, null, c, b)
. The former is the better choice stylistically,
which is why I used it. The latter, however, demonstrates that I did not
cheat when I used another function.)
While this is a bit hard to read, it was written off the top of my head,
demonstrating the power of decode
—and a more readable version can
possibly be found with a little more thought. (Not to mention that the
rules were unusually convoluted to begin with.)
Doing the same with nvl
?
No way occurs to me. Indeed, even with the slightly more powerful nvl2
,
I see no way (without using other functionality, notably the case
statement—but with case
, nvl
/nvl2
would no longer be
needed).
What about case
? Well, the result could be better in at least some
circumstances; however, it would also be harder to write, take more space,
and might wreck the flow and readability of the surrounding SQL. Indeed,
if the above statement is given a comment to declare the underlying intention,
it will likely always be the better choice...
Normally, I am strongly for writing code with a focus on readability, including
avoiding division/multiplication by two through a shift operator, and a certain
skepticism towards the C and Java “ternary operator” (x ? y : z
).
SQL, however, is an odd beast, where the rules for what makes good code overall seem to be different. This for, at least, two reasons:
Optimizing the readability of one part of a query (e.g. the above) can affect the rest of the query negatively.
There is a great potential for compactness (e.g. through decode
)
not present in most higher-level languages. Compactness, in turn, can in
its own ways lead to better readability—provided that the reader is
sufficiently familiar with the language and its standard idioms.
Links between databases can be very handy, allowing access to tables from a foreign database (“there”) as if they were present in the current one (“here”). However, they can be a performance disaster. My own experiences are not deep enough to give very specific advice (in particular, because I lack knowledge of the way the query optimizers make decisions in this area), but the one golden rule is to move as little data as possible over the link.
Now, this is not as easy as it sounds, because the actual evaluation of criteria often takes place “here” even when the data is “there”. In other words, if a million-row table “there” is accessed to retrieve just one row, it could be that the full million rows are still sent over the link...
Ways to avoid such waste include:
Setting up a view “there” that includes the select criteria. This has the side-effect that all the criteria in the view are evaluated “there” (while “here” need not even realize that a view is used instead of a table).
Not making repeated queries on “there” data, instead copying all relevant data in one go and then making the repeated queries “here”. Yes, it might even pay off to copy entire tables. (This, however, is rarely the optimal solution and should be reserved for quick-and-dirty work.)
The driving_site
hint can specify where a particular query should
be executed, which can optimize the network transports by moving a little
data from “here” to “there”, restricting the data, and then moving a little
data from “there” to “here”—rather than moving a lot of data from
“there” to “here” and then restricting the data.
(select /*+DRIVING_SITE(table)*/ [rest of query]
)
The following is an automatically generated list of other pages linking to this one. These may or may not contain further content relevant to this topic.