Friday, 22 February 2013

How to delete duplicate records in a table ?

For deleting duplicate records we have different ways.I have a Product_table.

Table name: product_table

product_id product_name
       1             xyz
       1             xyz
      30            abc
      30            abc
       5             ttt

Using rowid remove the duplicate records.This query will work only SQL

Syntax: delete from <table name> where rowid not in( select max(rowid) from <table_name> group by <col>);

Query: delete from product_table where rowid not in(select max(rowid) from
product_table group by product_id);

This is another way to delete duplicates. This will work in teradata.
here first create a set table with the definition  then insert the records into set table. Al ready we know the set table does not allow duplicate records. 

create set table product_table_t as (select * from product_table) with no data;

insert into product_table_t select * from product_table;

drop table product_table;

rename table product_table_t as product_table

using distinct, eliminating duplicate records

create temporary table with no data

Query : create table product_t as (select * from product_table) with no data

insert records in to table product_t with using distinct

insert into product_t select distinct * from product_table

drop the the product_table

rename table

rename table product_t as product

Please comment here if you know any other method of removing duplicates.


3 comments:

  1. No Rowid concept in teradata

    ReplyDelete
  2. Nice post, by following these, I have made one example using volatile table, you can follow that: http://pauldhip.blogspot.dk/

    ReplyDelete
  3. select deptid, max(salary) from employees group by deptid..

    ReplyDelete