Questions

TSQL - Select all columns for JUST DISTINCT records

+
0 Votes
Locked

TSQL - Select all columns for JUST DISTINCT records

coolaid09
Have a table that is something like this:

table: eventRecords
cols:
timestamp - datetime
eventid - varchar
user - varchar
filepath - varchar

I want to select one record for each unique "eventid", and I need for the data in all columns to be returned - not just eventid. The problem is:
"Select distinct eventid, timestamp, user, filepath from eventRecords" returns duplicate reports for eventid. Yes, I know this is the expected functionality for SQL. I've tried numerous variations to just return ONE record for each eventid, including:
Declare @mycount varchar
Set @mycount=(SELECT COUNT (DISTINCT [eventid])
FROM eventRecords)
And then plugging in @mycount as:
SELECT DISTINCT top (@mycount) eventid, timestamp, user, filepath from eventRecords

This brings back the right number of records, but there are duplicates of eventid still...

What step am I missing?
  • +
    0 Votes
    Tony Hopkinson

    The most recent one, least, the first one it bumbs into?

    e.g.

    Select * From
    (select EventID,Max([timestamp]) as MostRecentEvent from eventrecords) latestevents, eventRecords
    Where latestevents.eventid = eventrecords.eventid and latestevents.MostRecentEvent = eventrecords.[timestamp]

    Your table schema has no uniqueness property, if I was guessing eventid should be caseid or ticketid

    HtHs

    +
    0 Votes
    coolaid09

    Tony, I tried to run your example, but an error displayed that said eventid is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. So I changed the query to:

    Select * From (select eventid,Max([TimeStamp]) as MostRecentEvent
    from eventrecords group by eventid)
    latestevents, eventrecords
    Where latestevents.eventid = eventrecords.eventid
    and latestevents.MostRecentEvent = eventrecords.[timestamp]
    group by eventrecords.eventid, latestevents

    Now, the following displays:
    Invalid column name 'latestevents'.

    I'm a SQL noob - what is latestevents, and is the synatx right in your example?

    I've simplified the description of the table - there is a column for "ID" (PK, char), which is unique for each record, but not for each eventid. I did not include it because when I do, even more duplicate eventids are returned.

    Not sure what you mean when you say "if I was guessing eventid should be caseid or ticketid".

    +
    0 Votes
    Tony Hopkinson

    Select * From (select eventid,Max([TimeStamp]) as MostRecentEvent
    from eventrecords group by eventid)
    latestevents, eventrecords
    Where latestevents.eventid = eventrecords.eventid
    and latestevents.MostRecentEvent = eventrecords.[timestamp]

    Only need the first group by

    latestevents is a an internal name for the sub query in the parentheses.

    +
    0 Votes
    coolaid09

    I tried making this change - some duplicate eventids were returned. But I found a query on another forum that seems to work:

    SELECT eventid, TimeStamp, user, filepath
    FROM (SELECT eventid, TimeStamp, filepath, user,
    ROW_NUMBER() OVER(PARTITION BY eventid
    ORDER BY TimeStamp DESC)
    AS seq
    FROM eventrecords) AS T
    WHERE seq = 1 order by eventid;

    +
    0 Votes
    coolaid09

    While the dups are gone, when I compare the total count next to a COUNT DISTINCT, I'm missing 23 records...argh!!

    +
    0 Votes
    coolaid09

    Tried including a range of timestamps in the query, but I put the range toward the end of the statement. The results are more accurate with the timestamp range in the subquery.

    SELECT eventid, TimeStamp, user, filepath
    FROM (SELECT eventid, TimeStamp, filepath, user,
    ROW_NUMBER() OVER(PARTITION BY eventid
    ORDER BY TimeStamp DESC)
    AS seq
    FROM eventrecords where ((TimeStamp >= '2010-4-1 00:00:00.000')
    and (TimeStamp <= '2010-4-21 00:00:00.000'))) AS T
    WHERE seq = 1 order by eventid;

    +
    0 Votes
    Tony Hopkinson

    Never used over or partition, and I outright refuse to use row_number. If I can't describe it as a set, I change my schema not my SQL.

  • +
    0 Votes
    Tony Hopkinson

    The most recent one, least, the first one it bumbs into?

    e.g.

    Select * From
    (select EventID,Max([timestamp]) as MostRecentEvent from eventrecords) latestevents, eventRecords
    Where latestevents.eventid = eventrecords.eventid and latestevents.MostRecentEvent = eventrecords.[timestamp]

    Your table schema has no uniqueness property, if I was guessing eventid should be caseid or ticketid

    HtHs

    +
    0 Votes
    coolaid09

    Tony, I tried to run your example, but an error displayed that said eventid is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. So I changed the query to:

    Select * From (select eventid,Max([TimeStamp]) as MostRecentEvent
    from eventrecords group by eventid)
    latestevents, eventrecords
    Where latestevents.eventid = eventrecords.eventid
    and latestevents.MostRecentEvent = eventrecords.[timestamp]
    group by eventrecords.eventid, latestevents

    Now, the following displays:
    Invalid column name 'latestevents'.

    I'm a SQL noob - what is latestevents, and is the synatx right in your example?

    I've simplified the description of the table - there is a column for "ID" (PK, char), which is unique for each record, but not for each eventid. I did not include it because when I do, even more duplicate eventids are returned.

    Not sure what you mean when you say "if I was guessing eventid should be caseid or ticketid".

    +
    0 Votes
    Tony Hopkinson

    Select * From (select eventid,Max([TimeStamp]) as MostRecentEvent
    from eventrecords group by eventid)
    latestevents, eventrecords
    Where latestevents.eventid = eventrecords.eventid
    and latestevents.MostRecentEvent = eventrecords.[timestamp]

    Only need the first group by

    latestevents is a an internal name for the sub query in the parentheses.

    +
    0 Votes
    coolaid09

    I tried making this change - some duplicate eventids were returned. But I found a query on another forum that seems to work:

    SELECT eventid, TimeStamp, user, filepath
    FROM (SELECT eventid, TimeStamp, filepath, user,
    ROW_NUMBER() OVER(PARTITION BY eventid
    ORDER BY TimeStamp DESC)
    AS seq
    FROM eventrecords) AS T
    WHERE seq = 1 order by eventid;

    +
    0 Votes
    coolaid09

    While the dups are gone, when I compare the total count next to a COUNT DISTINCT, I'm missing 23 records...argh!!

    +
    0 Votes
    coolaid09

    Tried including a range of timestamps in the query, but I put the range toward the end of the statement. The results are more accurate with the timestamp range in the subquery.

    SELECT eventid, TimeStamp, user, filepath
    FROM (SELECT eventid, TimeStamp, filepath, user,
    ROW_NUMBER() OVER(PARTITION BY eventid
    ORDER BY TimeStamp DESC)
    AS seq
    FROM eventrecords where ((TimeStamp >= '2010-4-1 00:00:00.000')
    and (TimeStamp <= '2010-4-21 00:00:00.000'))) AS T
    WHERE seq = 1 order by eventid;

    +
    0 Votes
    Tony Hopkinson

    Never used over or partition, and I outright refuse to use row_number. If I can't describe it as a set, I change my schema not my SQL.