Question

  • Creator
    Topic
  • #2220027

    TSQL – Select all columns for JUST DISTINCT records

    Locked

    by coolaid09 ·

    Have a table that is something like this:

    table: eventRecords
    cols:
    timestamp – datetime
    eventid – varchar
    user – varchar
    filepath – varchar

    I want to select one record for each unique “eventid”, and I need for the data in all columns to be returned – not just eventid. The problem is:
    “Select distinct eventid, timestamp, user, filepath from eventRecords” returns duplicate reports for eventid. Yes, I know this is the expected functionality for SQL. I’ve tried numerous variations to just return ONE record for each eventid, including:
    Declare @mycount varchar
    Set @mycount=(SELECT COUNT (DISTINCT [eventid])
    FROM eventRecords)
    And then plugging in @mycount as:
    SELECT DISTINCT top (@mycount) eventid, timestamp, user, filepath from eventRecords

    This brings back the right number of records, but there are duplicates of eventid still…

    What step am I missing?

All Answers

  • Author
    Replies
    • #3032834

      Clarifications

      by coolaid09 ·

      In reply to TSQL – Select all columns for JUST DISTINCT records

      Clarifications

    • #3032761

      Your problem is which one !

      by tony hopkinson ·

      In reply to TSQL – Select all columns for JUST DISTINCT records

      The most recent one, least, the first one it bumbs into?

      e.g.

      Select * From
      (select EventID,Max([timestamp]) as MostRecentEvent from eventrecords) latestevents, eventRecords
      Where latestevents.eventid = eventrecords.eventid and latestevents.MostRecentEvent = eventrecords.[timestamp]

      Your table schema has no uniqueness property, if I was guessing eventid should be caseid or ticketid

      HtHs

      • #3032730

        Tried the example…

        by coolaid09 ·

        In reply to Your problem is which one !

        Tony, I tried to run your example, but an error displayed that said eventid is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. So I changed the query to:

        Select * From (select eventid,Max([TimeStamp]) as MostRecentEvent
        from eventrecords group by eventid)
        latestevents, eventrecords
        Where latestevents.eventid = eventrecords.eventid
        and latestevents.MostRecentEvent = eventrecords.[timestamp]
        group by eventrecords.eventid, latestevents

        Now, the following displays:
        Invalid column name ‘latestevents’.

        I’m a SQL noob – what is latestevents, and is the synatx right in your example?

        I’ve simplified the description of the table – there is a column for “ID” (PK, char), which is unique for each record, but not for each eventid. I did not include it because when I do, even more duplicate eventids are returned.

        Not sure what you mean when you say “if I was guessing eventid should be caseid or ticketid”.

        • #2828258

          oops.

          by tony hopkinson ·

          In reply to Tried the example…

          Select * From (select eventid,Max([TimeStamp]) as MostRecentEvent
          from eventrecords group by eventid)
          latestevents, eventrecords
          Where latestevents.eventid = eventrecords.eventid
          and latestevents.MostRecentEvent = eventrecords.[timestamp]

          Only need the first group by

          latestevents is a an internal name for the sub query in the parentheses.

        • #2828052

          Still some dups, but I think I’ve found something that works

          by coolaid09 ·

          In reply to oops.

          I tried making this change – some duplicate eventids were returned. But I found a query on another forum that seems to work:

          SELECT eventid, TimeStamp, user, filepath
          FROM (SELECT eventid, TimeStamp, filepath, user,
          ROW_NUMBER() OVER(PARTITION BY eventid
          ORDER BY TimeStamp DESC)
          AS seq
          FROM eventrecords) AS T
          WHERE seq = 1 order by eventid;

        • #2828035

          Hah, you know more about it than I do !

          by tony hopkinson ·

          In reply to Still some dups, but I think I’ve found something that works

          :p

        • #2828033

          Apparently not…

          by coolaid09 ·

          In reply to Hah, you know more about it than I do !

          While the dups are gone, when I compare the total count next to a COUNT DISTINCT, I’m missing 23 records…argh!!

        • #2828017

          maybe i’ve fixed it?

          by coolaid09 ·

          In reply to Apparently not…

          Tried including a range of timestamps in the query, but I put the range toward the end of the statement. The results are more accurate with the timestamp range in the subquery.

          SELECT eventid, TimeStamp, user, filepath
          FROM (SELECT eventid, TimeStamp, filepath, user,
          ROW_NUMBER() OVER(PARTITION BY eventid
          ORDER BY TimeStamp DESC)
          AS seq
          FROM eventrecords where ((TimeStamp >= ‘2010-4-1 00:00:00.000’)
          and (TimeStamp <= '2010-4-21 00:00:00.000'))) AS T WHERE seq = 1 order by eventid;

        • #2827994

          Couldn’t say mate

          by tony hopkinson ·

          In reply to Apparently not…

          Never used over or partition, and I outright refuse to use row_number. If I can’t describe it as a set, I change my schema not my SQL.

Viewing 1 reply thread