Monday, March 19, 2012

Exatract Data, thanks.

Greetings!

I would like to extract data from the raw data below, here are the
criteria:

(1) I want accounts starting with 2 only, i.e., excluding accounts
starting with 3.
(2) I want to keep the person_id, account, and open_date for each
account starting with 2.
(3) I want to create a new variable to show each person's earliest
open_date of the accounts starting with 2.

My query below would give me person 10001's earliest open_date for
EVERYONE. Can anyone tell me what the problem is? Thanks a lot!

select person_id, account, open_date
,min_open=(select min(open_date) from test_5 where
person_id=test_5.person_id
and account like '2%')
from test_5
where account like '2%'

-Raw Data-
person_idaccount open_date
10001220000015/15/2003
10001220000026/20/2004
10001300000012/2/2002
10002220000038/12/2004
10002220000049/15/2004
10002300000022/16/2005

-Ideal Output-
person_idaccount open_dateMin_Open
10001220000015/15/20035/15/2003
10001220000026/20/20045/15/2003
10002220000038/12/20048/12/2004
10002220000049/15/20048/12/2004

-My Output-
person_idaccount open_dateMin_Open
10001220000015/15/20035/15/2003
10001220000026/20/20045/15/2003
10002220000038/12/20045/15/2003
10002220000049/15/20045/15/2003On 16 Mar 2005 12:17:29 -0800, rong.guo@.gmail.com wrote:

(snip)
>My query below would give me person 10001's earliest open_date for
>EVERYONE. Can anyone tell me what the problem is? Thanks a lot!
>select person_id, account, open_date
>,min_open=(select min(open_date) from test_5 where
>person_id=test_5.person_id
>and account like '2%')
>from test_5
>where account like '2%'

Hi rong.guo,

You didn't post CREATE TABLE and INSERT statements, so I can't test it,
but I guess that this is caused because you forgot to use aliasses to
distinguish the outer and the inner version of the table.

select person_id, account, open_date
,min_open=(select min(open_date)
from test_5 AS t2
where t2.person_id = t1.person_id
and account like '2%')
from test_5 AS t1
where account like '2%'

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks Hugo! It works perfectly!

I also tried Method 2, which gave me the same result as your method. I
am more used to using JOIN function. But I am wondering if Method 1 has
any advantages over Method 2 in the long term. It seems Method 1
doesn't need GROUP BY while Method 2 does, I am wondering if I have
more than 50 variables in the SELECT, will the GROUP BY in Method 2
create any potential problems? Thanks!

--Method 1--
select person_id, account, open_date
,min_open=(select min(open_date) from test_5 as X where
(X.person_id=Y.person_id
and account like '2%'))
from test_5 as Y
where account like '2%'

--Method 2--
select a.person_id, a.account, a.open_date, min(b.open_date)
from test_5 as a
join test_5 as b
on a.person_id=b.person_id
where a.account like '2%'
group by a.person_id, a.account, a.open_date|||On 17 Mar 2005 06:24:54 -0800, rong.guo@.gmail.com wrote:

>Thanks Hugo! It works perfectly!
>I also tried Method 2, which gave me the same result as your method. I
>am more used to using JOIN function. But I am wondering if Method 1 has
>any advantages over Method 2 in the long term.

Hi rong.guo,

That's hard to say from here. It depends on lots of factors. The best
way to find out which version is better in your situation, is to test
both (emptying the cache in between) and compare the execution times,
the output of SET STATISTICS IO ON and the execution plans used.

There's also a third method you can use:

SELECT a.person_id, a.account, a.open_date, b.min_open_date
FROM test_5 AS a
JOIN (SELECT person_id, MIN(open_date) AS min_open_date
FROM test_5
WHERE account LIKE '2%'
GROUP BY person_id) AS b
ON a.person_id = b.person_id
WHERE a.account LIKE '2%'

(snip)
>--Method 2--
>select a.person_id, a.account, a.open_date, min(b.open_date)
>from test_5 as a
>join test_5 as b
>on a.person_id=b.person_id
>where a.account like '2%'
>group by a.person_id, a.account, a.open_date

You'll have to add
AND b.account LIKE '2%'
to this query, otherwise it is not equivalent to query 1.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment