Sunday, 6 September 2020

If you are looking for ... Use this

 

IF YOU ARE LOOKING FOR... USE THIS
A fully managed, elastic data warehouse with security at every level of scale at no extra cost Azure SQL Data Warehouse
Fast, easy and collaborative Apache Spark-based analytics platform Azure Databricks
A fully managed cloud Hadoop and Spark service backed by 99.9% SLA for your enterprise HDInsight
A fully managed cloud service that enables you to easily build, deploy and share predictive analytics solutions Machine Learning
An on-demand, real-time stream processing service with enterprise-grade security, auditing and support Stream Analytics
A no-limits data lake built to support massively parallel analytics Data Lake Store
A fully managed on-demand pay-per-job analytics service with enterprise-grade security, auditing and support Data Lake Analytics
An enterprise-wide metadata catalogue that makes data asset discovery simple Data Catalog
A data integration service to orchestrate and automate data movement and transformation Data Factory

Saturday, 15 September 2018

Basic Hive commands

To see databases

show databases;
(first time 'default' db will be shown)

Create database

create database <database_name>;

To print database name on the hive prompt

set hive.cli.print.current.db=true;

Change database

use <database_name>;

Create table

create table <tablename> (<columnname> <datatype>, ...) row format delimited fields terminated by ',';
Eg: create table newtable(id int, name string, mobileno bigint) row format delimited fields terminated by ',';

To see tables

show tables;

To see the structure of a table

describe <tablename>;

Insert data into a table

load data local inpath '<path of the text file>' into table <tablename>;

Thursday, 12 November 2015

SQL Server Vs MongoDB



SQL Server Mongo DB
Microsoft product Mongo DB product
For commercial Open source
Works on windows Works on Windows, Linux, OS X, Solaris
Scalable vertically Scalable horizontally
SQL query language JSON query language
Database Database
Fixed Schema Dynamic Schema
Table Collection
Row document or BSON document
Column Field
Index Index
Primary key Primary key
Foreign keys No foreign keys
Joins Embedded document and linking
Triggers No triggers available
 

Saturday, 22 August 2015

Change DB collation with removing duplicate records

Problem statement:

One of the development team is using content management package which uses SQL Server for storing its metadata. After couple of months the team came to know that the database collation is of case sensitive, so in the customized tables they are having duplicate values, which they don't want.

Requirement:

1. The requirement is to change the collation to default (case-insensitive).
2. Remove all the duplicate values in the custom tables.

Here is a small test DB of how it has done:

-- Create database with case sensitive collation

USE [master]

GO

CREATE DATABASE [collation_test] ON  PRIMARY 

( NAME = N'collation_test', FILENAME = N'C:\Program Files\Microsoft SQL Server

\MSSQL10.MSSQLSERVER\MSSQL\DATA\collation_test.mdf' , SIZE = 3072KB , FILEGROWTH = 1024KB 

)

 LOG ON 

( NAME = N'collation_test_log', FILENAME = N'C:\Program Files\Microsoft SQL Server

\MSSQL10.MSSQLSERVER\MSSQL\Log\collation_test_log.ldf' , SIZE = 1024KB , FILEGROWTH = 10%)

 COLLATE SQL_Latin1_General_CP1_CS_AS

GO

-- Create table

CREATE TABLE [dbo].[test1](

[number] [nchar](30) NOT NULL, 

[name] [nchar](300) NOT NULL

) ON [PRIMARY]

-- Insert into table

INSERT INTO dbo.test1 VALUES

('1', 'Aa'),

('1', 'aA'),

('2', 'AA'),

('1', 'aa'),

('2', 'aA'),

('2', 'Ba');



-- Check the case sensitive

  select * from test1 where name ='Aa'

-- Change the collation to case insensitive

USE [master]

GO

alter DATABASE [collation_test] 

 COLLATE SQL_Latin1_General_CP1_CI_AS

GO                                                                                                                                            

-- Apply collation changes for columns

Note: The COLLATE clause can be applied only for the char, varchar, text, nchar, nvarchar, and ntext data types.

ALTER TABLE dbo.test1 ALTER COLUMN number

            nchar(30)COLLATE Latin1_General_CI_AS NOT NULL;

ALTER TABLE dbo.test1 ALTER COLUMN name

            nchar(300)COLLATE Latin1_General_CI_AS NOT NULL;

-- Delete the duplicate values

delete delta FROM

(SELECT ROW_NUMBER() OVER (PARTITION BY number, name ORDER BY name) cnt

 FROM test1) delta

WHERE delta.cnt > 1

Solution:
The above iteration has applied for all the tables to change its collation and remove duplicates. It worked perfectly.

Friday, 12 June 2015

Always On - commit comparing normal DB - Lab test (POC)


Objective:


The objective of this POC is to understand the time delay or the time to get committed for insert statement for the database which is configured with AlwaysOn synchronous and async failover.
Note: Synchronous commit for a insert/update query will first harden the transaction in the secondary AG and based on the acknowledgement the primary will commit.

Server configuration:










Test Case:


1.       We are going to create one database named Sync_POC_WithoutAG is a standalone DB which is not involved in AlwaysOn.
2.       In the second set we are going to create one database named Sync_POC which will be involved in AlwaysOn with Synchoronus failover on the secondary server with a readable copy.
3.       Both (#1 and #2) DBs, we are going to create a table with same structure and insert data.
4.       We are planning to insert the data with different record counts.
5.       The above insert will happen based on a loop and find the time difference between both.
6.       The same test has been observed for AlwaysOn AG with async mode.





Setup AlwaysOn with Sync:

 


 Setup AlwaysOn with Async:




 

Create table script:


CREATE TABLE [dbo].[Name](
       [id] [numeric](10, 0) IDENTITY(1,1) NOT NULL,
       [name] [varchar](50) NULL,
 CONSTRAINT [PK_Name] PRIMARY KEY CLUSTERED
(
       [id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO


Simulation query:


DECLARE @StartTime datetime,@EndTime datetime
SELECT @StartTime=GETDATE()

DECLARE @intLimit AS INT = 100000
DECLARE @intCounter AS INT = 1

WHILE @intCounter <= @intLimit
BEGIN

INSERT INTO Name VALUES ('Name1')
SET @intCounter = @intCounter + 1
END

SELECT @EndTime=GETDATE()
SELECT DATEDIFF(ms,@StartTime,@EndTime) AS [Duration in milliseconds]

Sample observations for standalone DB:













Sample observations for AlwaysOn Sync DB:











  

Sample observations for AlwaysOn Async DB:






  

Analysis:



No of records inserted
Normal DB (milli secs)
Sync AG DB (milli secs)
Sync AG DB (milli secs) NIC card
Async AG DB (milli secs)
100
0.106
0.203
0.203
0.103
250
0.266
0.541
0.541
0.26
500
0.573
1.071
1.071
0.516
750
0.716
1.643
1.643
0.85
1000
0.971
1.963
1.963
1.3
2500
2.446
4.963
4.963
2.94
5000
4.746
12.276
12.273
5.41
7500
7.411
15.506
15.502
8.683
10000
9.696
20.813
20.811
11.26
25000
25.161
50.411
50.41
25.813
50000
50.523
107.312
107.31
53.97
75000
74.603
155.45
155.45
82.406
100000
98.99
214.82
214.81
108.016

 Conclusion:


Time taken =  (Avg. time of Sync (or) Async AG DB / Avg. time of Normal DB)
From the above analysis, the time taken to do an insert in AlwaysOn Sync would be twice (2.125 milli-secs) the time as compared to a standalone (or) normal DB.
Same way, to do an insert in AlwaysOn Async would be bit time consuming (1.091 milli-secs) compared to the DB which is not part of AG.

Time taken for insert query:
Normal DB (without AlwaysOn enabled)                                            = 1 milli sec
AlwaysOn Async                                                                                  = 1.1 milli sec
AlwaysOn Sync                                                                                    = 2.1 milli sec
AlwaysOn Sync with dedicated NIC card                                            = 2.1 milli sec


Note: All the above observations are based on the primary DB commit.

Saturday, 24 January 2015

Automate DB Restore

To automate the database restore, create the following steps.

Step 1: Take Backup of source and destination.

BACKUP DATABASE <DB Name> TO DISK = <disk path> 

Step 2:  Script out permissions on destination database and save it or store it in a table.

Use <DB name >
GO
SET NOCOUNT ON
print 'Use ' + db_name() 
print 'go'
--Grant DB Access---

select 'if not exists (select * from dbo.sysusers where name = N''' + usu.name +''' )' + Char(13) + Char(10) +
'         EXEC sp_grantdbaccess N''' + lo.loginname+  '''' + ', N''' + usu.name +  '''' + Char(13) + Char(10) +
'GO'
from sysusers usu , master.dbo.syslogins lo
where usu.sid = lo.sid and (usu.islogin = 1 and usu.isaliased = 0 and usu.hasdbaccess = 1)
go

--Add Roles---

select 'if not exists (select * from dbo.sysusers where name = N''' + name +''' )' + Char(13) + Char(10) +
'         EXEC sp_addrole N''' + name +  '''' + Char(13) + Char(10) +
'GO'
from sysusers where uid> 0 and uid=gid and issqlrole=1
go
--Add RoleMember---

select 'exec sp_addrolemember N''' + user_name(groupuid) + ''', N''' + user_name (memberuid) + '''' + Char(13) + Char(10) +
'GO'
from sysmembers where  user_name (memberuid) <> 'dbo' order by groupuid

--Add Alias Login also---

select 'if not exists (select * from dbo.sysusers where name = N''' + a.name +''' )' + Char(13) + Char(10) +
'         EXEC sp_addalias N''' + substring(a.name , 2, len(a.name)) +  '''' + ', N''' + b.name +  '''' + Char(13) + Char(10) +
'GO'
from sysusers a , sysusers b where a.altuid = b.uid and a.isaliased=1
go
SET NOCOUNT OFF

--Add object & DB level permission also---

SELECT CASE WHEN perm.state <> 'W' THEN perm.state_desc ELSE 'GRANT' END
+ SPACE(1) + perm.permission_name + SPACE(1) + 'ON ' + QUOTENAME(USER_NAME(obj.schema_id)) + '.' + QUOTENAME(obj.name) 
+ CASE WHEN cl.column_id IS NULL THEN SPACE(0) ELSE '(' + QUOTENAME(cl.name) + ')' END
+ SPACE(1) + 'TO' + SPACE(1) + QUOTENAME(USER_NAME(usr.principal_id)) COLLATE database_default
+ CASE WHEN perm.state <> 'W' THEN SPACE(0) ELSE SPACE(1) + 'WITH GRANT OPTION' END AS '--Object Level Permissions'
FROM sys.database_permissions AS perm
INNER JOIN
sys.objects AS obj
ON perm.major_id = obj.[object_id]
INNER JOIN
sys.database_principals AS usr
ON perm.grantee_principal_id = usr.principal_id
LEFT JOIN
sys.columns AS cl
ON cl.column_id = perm.minor_id AND cl.[object_id] = perm.major_id
ORDER BY perm.permission_name ASC, perm.state_desc ASC


SELECT CASE WHEN perm.state <> 'W' THEN perm.state_desc ELSE 'GRANT' END
+ SPACE(1) + perm.permission_name + SPACE(1)
+ SPACE(1) + 'TO' + SPACE(1) + QUOTENAME(USER_NAME(usr.principal_id)) COLLATE database_default
+ CASE WHEN perm.state <> 'W' THEN SPACE(0) ELSE SPACE(1) + 'WITH GRANT OPTION' END AS '--Database Level Permissions'
FROM sys.database_permissions AS perm
INNER JOIN
sys.database_principals AS usr
ON perm.grantee_principal_id = usr.principal_id
WHERE perm.major_id = 0
ORDER BY perm.permission_name ASC, perm.state_desc ASC

Step 3: Refresh destination database from source database backup.

----Make Database to single user Mode
ALTER DATABASE <Destination DB>
SET SINGLE_USER WITH
ROLLBACK IMMEDIATE

----Restore Database
RESTORE DATABASE <Destination DB>
FROM DISK = <disk path>

Step 4: Apply Step 2 Output: saved in table on destination database.
Step 5: Run orphan user script to fix orphan users on destination database. Save the output in a different table.

Use <destination_database>. 
GO
setnocount on
select 'exec' +' '+'sp_change_users_login ''update_one'',''' + name + ''',''' + name + '''' +char(13) 
fromsysuserssu
wheresid NOT IN 
(selectsid from master..syslogins )
AND 
islogin = 1 
AND  
name NOT LIKE 'guest' and su.issqluser=1


Step 6: Run step 5 output: saved in table on the destination database 
Step 7: Change database owner to SA.

Use <destination_database>
GO
sp_changedbownersa


Sunday, 26 October 2014

Execute SSIS package using domain(windows) account through SQL job

Environment:

In my client's production environment, we use service account (Domain based) having read/write/execute access and SQL login for read only (support) access. Since the interactive login is disabled at the group policy, no users can directly connect to SSMS with servicce account.

To execute any jobs to do a push/pull of data, the user needs to execute it from a different/remote server where as to share the load of ETL. 

Problem statement:

In the above setup, user needs to run his job\packages only using the service account (Windows) as it has the read/write/execute access.

How to execute a ssis package job using windows account?

Solution:

Answer is to setup a proxy and provide the credentials in the job step.

1.      Open SQL Server Management Studio (2008 R2) and connect to the SQL server that would run the SSIS package as a SQL Server system admin or local server admin.


2.      Run the following script (provide the actual service account (AD) and its network password that would be used to connect to the database) to create a proxy.

USE msdb

CREATE CREDENTIAL UserSvc WITH IDENTITY = 'XYZ\UserAcc', SECRET = 'Domain pwd of xyz'

GO

Note: SQL Server will store the above-given network password in an encrypted way; no security issues.

EXEC dbo.sp_add_proxy @proxy_name = 'UserProxy',
      @enabled = 1,
      @description = 'Proxy to execute User SSIS as service account',
      @credential_name = 'UserSvc'
GO

EXEC dbo.sp_grant_proxy_to_subsystem
      @proxy_name = 'UserProxy',
    @subsystem_name = N'Dts'
GO

3. Assuming the above script ran without errors, open the SQL Agent job (from SQL Server Management Studio) used to run the SSIS package and edit the SSIS package step. 
Choose "UserProxy" from the Run As drop-down and click OK. See the screenshot below for reference. 


4. Edit  the connection string to include password as below:


5. By setting up above, we can execute a package using domain account.