SQL MODEL Clause: Charlie and the Chocolate Factory, Not Lucy and Ethyl
on November 18, 2010 at 5:44 amMy previous entry gave an example using Oracle’s SQL Model Clause. We’re given the first 2 columns, and want to know how to derive the third; the third column groups values such that “the sum of col1 is
A COL1 GRP
--- ---------- ----------
1 10 1
2 90 1
3 1 2
4 5 2
5 5 2
6 50 2
7 99 3
8 2 4
9 10 4
10 50 4
I stated that the answer was “not simply TRUNC’ing the rolling sum of COL1.”
Iudith mentzel asked why. When I first read through the problem, I thought along the same lines.
Let’s step back first. The original question by Kyes (Kais Blah) is the well-known “knapsack problem.” The wikipedia entry re-states the problem like this: “Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than a given limit and the total value is as large as possible”. Typically, the illustration is trying to fill up a backpack.
Note that in Kyes’ original question, his desired answer for the 2nd “run” or “group” (or “backpack”) is records 3-9 excluding 7. Record seven is alone in the third group. He’s trying to optimally fill each knapsack.
A COL1 GRP
--- ---------- ----------
3 1 2
4 5 2
5 5 2
6 50 2
8 2 2
9 10 2
7 99 3
We didn’t answer this original question. It’s in the realm of combinatorial optimization, and a hard problem (how far do you look ahead to find something that’ll fit better? etc. etc).
Instead, we simplified it by playing Lucy and Ethyl at the chocolate factory. We just load up the knapsacks as the records roll by, stopping when the knpapack is full enough that we can’t fit the next record.
Only the MODEL clause in SQL can solve even this simplified problem for a large number of knapsacks.
Pack to Iudith’s question. I’ll pick some nasty “edge” examples, and compare MODEL results with analytic function results. The analytic function (represented by INC_SUM and INC_GRP) just “TRUNC the rolling sum of COL1.”
WITH t as (
select 0 a, 0 col1 from dual where 0=1
union all select 1,99 from dual
union all select 2,2 from dual
union all select 3,99 from dual
union all select 4,2 from dual
union all select 5,99 from dual
union all select 6,2 from dual
union all select 7,99 from dual
union all select 8,2 from dual
union all select 9,99 from dual
union all select 10,2 from dual
)
select a,col1, ratchet, grp, inc_sum, inc_grp
from t
model
dimension by ( a)
MEASURES
( col1, 0 as ratchet, 0 as grp, 0 as inc_sum, 0 as inc_grp )
rules automatic order
(ratchet[1] = col1[CV()]
,ratchet[a>1] = CASE WHEN col1[CV()]+ratchet[CV()-1] THEN col1[CV()]+ratchet[CV()-1]
ELSE col1[CV()]
END
,grp[1] = 1
,grp[a>1] = CASE WHEN col1[CV()]+ratchet[CV()-1] THEN grp[CV()-1]
ELSE grp[CV()-1]+1
END
,inc_sum[1] = col1[CV()]
,inc_sum[a>1] = col1[CV()]+inc_sum[CV()-1]
,inc_grp[any] = trunc(inc_sum[cv()]/100)+1
)
order by a
/
A COL1 RATCHET GRP INC_SUM INC_GRP
—– ———- ———- ———- ———- ———-
1 99 99 1 99 1
2 2 2 2 101 2
3 99 99 3 200 3
4 2 2 4 202 3
5 99 99 5 301 4
6 2 2 6 303 4
7 99 99 7 402 5
8 2 2 8 404 5
9 99 99 9 503 6
10 2 2 10 505 6
The discrepancy isn’t obvious until you get to the third, fourth, etc groups/backpacks. The INC_GRP (analytic function) thinks it can cram records worth a total of 101 in the same backpack! The MODEL clause knows it’s hit the 100 threshold and moved to the next pack.
Without the MODEL clause, you were overwhelmed, like Lucy and Ethyl. Now you’re more like Charlie in the Chocolate Factory. The MODEL clause could be your Golden Ticket!
Go to Source
Recent Comments