Saturday, 22 August 2015

Change DB collation with removing duplicate records

Problem statement:

One of the development team is using content management package which uses SQL Server for storing its metadata. After couple of months the team came to know that the database collation is of case sensitive, so in the customized tables they are having duplicate values, which they don't want.

Requirement:

1. The requirement is to change the collation to default (case-insensitive).
2. Remove all the duplicate values in the custom tables.

Here is a small test DB of how it has done:

-- Create database with case sensitive collation

USE [master]

GO

CREATE DATABASE [collation_test] ON  PRIMARY 

( NAME = N'collation_test', FILENAME = N'C:\Program Files\Microsoft SQL Server

\MSSQL10.MSSQLSERVER\MSSQL\DATA\collation_test.mdf' , SIZE = 3072KB , FILEGROWTH = 1024KB 

)

 LOG ON 

( NAME = N'collation_test_log', FILENAME = N'C:\Program Files\Microsoft SQL Server

\MSSQL10.MSSQLSERVER\MSSQL\Log\collation_test_log.ldf' , SIZE = 1024KB , FILEGROWTH = 10%)

 COLLATE SQL_Latin1_General_CP1_CS_AS

GO

-- Create table

CREATE TABLE [dbo].[test1](

[number] [nchar](30) NOT NULL, 

[name] [nchar](300) NOT NULL

) ON [PRIMARY]

-- Insert into table

INSERT INTO dbo.test1 VALUES

('1', 'Aa'),

('1', 'aA'),

('2', 'AA'),

('1', 'aa'),

('2', 'aA'),

('2', 'Ba');



-- Check the case sensitive

  select * from test1 where name ='Aa'

-- Change the collation to case insensitive

USE [master]

GO

alter DATABASE [collation_test] 

 COLLATE SQL_Latin1_General_CP1_CI_AS

GO                                                                                                                                            

-- Apply collation changes for columns

Note: The COLLATE clause can be applied only for the char, varchar, text, nchar, nvarchar, and ntext data types.

ALTER TABLE dbo.test1 ALTER COLUMN number

            nchar(30)COLLATE Latin1_General_CI_AS NOT NULL;

ALTER TABLE dbo.test1 ALTER COLUMN name

            nchar(300)COLLATE Latin1_General_CI_AS NOT NULL;

-- Delete the duplicate values

delete delta FROM

(SELECT ROW_NUMBER() OVER (PARTITION BY number, name ORDER BY name) cnt

 FROM test1) delta

WHERE delta.cnt > 1

Solution:
The above iteration has applied for all the tables to change its collation and remove duplicates. It worked perfectly.

No comments:

Post a Comment