Thursday, March 23, 2017

SQL Server on Linux. How "set-collation" works.

NOTE: This blog post has been done on SQL Server vNext CTP 1.4 on Ubuntu 16.04. The behavior of SQL Server in the later production release might be different.

In this blog post I want to show cool "set-collation" functionality in SQL vNext on Linux.
I'll do it in two parts.
The first part will show conventional T-SQL operations with collation.
In the second part I will change collation of the entire server.

Part 1.

At first will show the version:
SELECT @@Version;
GO
Microsoft SQL Server vNext (CTP1.4) - 14.0.405.198 (X64)
Mar 11 2017 01:54:12
Copyright (C) 2016 Microsoft Corporation. All rights reserved.
on Linux (Ubuntu 16.04.1 LTS)

Now show current collations in my system:
SELECT name, collation_name FROM sys.databases;
GO

Will create a new table in "LinuxTest" DB and try to insert Cyrillic text in Non-Unicode column "DefaultText" with a result:
USE LinuxTest;
GO
DROP TABLE IF EXISTS tbl_Collation_Test;
GO
CREATE TABLE tbl_Collation_Test(
 ID INT IDENTITY(1,1),
 Test_Name VARCHAR(50),
 DefaultText VARCHAR(50),
 UnicodeText NVARCHAR(50)
);
GO
INSERT INTO tbl_Collation_Test(Test_Name, DefaultText, UnicodeText)
VALUES ('Default Collation "SQL_Latin1_General_CP1_CI_AS"', 
 N'Самый простой тестовый текст',
 N'Самый простой тестовый текст');
GO
SELECT * FROM tbl_Collation_Test;
GO

As you can see we totally lost our text in non-Unicode column and text in Unicode column was perfectly preserved.

Now I will change collation of the entire "LinuxTest" DB:
GO
USE master;  
GO  
ALTER DATABASE LinuxTest COLLATE SQL_Ukrainian_Cp1251_CI_AS;  
GO  
SELECT name, collation_name FROM sys.databases;
GO

Collation of "LinuxTest" DB has changed to "SQL_Ukrainian_Cp1251_CI_AS"

Will try to insert Cyrillic text the second time:
USE LinuxTest;
GO
INSERT INTO tbl_Collation_Test(Test_Name, DefaultText, UnicodeText)
VALUES ('New Collation "SQL_Ukrainian_CP1251_CI_AS"', 
 N'Второй по простоте текст',
 N'Второй по простоте текст');
GO
SELECT * FROM tbl_Collation_Test;
GO

Despite of changed collation on the database level we still lost our non-Unicode value. 
Why is that?
Lets check collations on individual columns:
SELECT name, collation_name FROM sys.columns
WHERE object_id = OBJECT_ID('tbl_Collation_Test')
GO
As you can see, changing collation of "LinuxTest" DB does not change collation within existing "tbl_Collation_Test" table.

Will try to fix that:
ALTER TABLE tbl_Collation_Test
ALTER COLUMN DefaultText VARCHAR(50) 
COLLATE SQL_Ukrainian_CP1251_CI_AS;  
GO  

SELECT name, collation_name FROM sys.columns
WHERE object_id = OBJECT_ID('tbl_Collation_Test')
GO
As you can see "DefaultText" Non-Unicode column now is set for "SQL_Ukrainian_CP1251_CI_AS" collation.
Will try to insert Cyrillic text again:
INSERT INTO tbl_Collation_Test(Test_Name, DefaultText, UnicodeText)
VALUES ('Another try of "SQL_Ukrainian_CP1251_CI_AS"', 
 N'Третий самый простой текст',
 N'Третий самый простой текст');
GO
SELECT * FROM tbl_Collation_Test;
GO

And now that works!!!
We can save our non-Latin characters within non-Unicode column, which saves us almost 50% of space for that column!

Part 2.

Now login into Ubuntu to change the entire SQL Server collation.
The entire procedure is described in Microsoft Documentation: https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-configure-mssql-conf#collation

1. At first run "set-collation" witin "mssql-conf" configuration module:
sudo /opt/mssql/bin/mssql-conf set-collation
2. It interactively asks for a Collation I want to setup and I enter "SQL_Ukrainian_Cp1251_CS_AS" (please note that in "Part 1" I've used Case Insensitive collation and now for the sake of test I use Case Sensitive one)

3. After entering new collation name configuration utility gave me an error by saying that SQL Server is still running and I must stop it to perform that operation.

4. Interestingly, it uncovers new way to "Start/Stop" SQL Server service by using configuration utility with "stop-service" command.

5. To save on typing I stop SQL Server by conventional Linux command:
sudo systemctl stop mssql-server

After stopping the SQL Server service I repeat steps #1 & #2:

After configuration utility successfully run it asks for starting SQL Server service:

It asks to use configuration utility's "start-service" command, but I start SQL Server by conventional Linux command:
sudo systemctl start mssql-server

Now I get back to Windows and SSMS and check my server collations:
USE master;
GO
SELECT name, collation_name FROM sys.databases;
GO
As you can see, when I changed SQL Server collation, the collation has been changed not only for system databases, but for my "LinuxTest" DB too.  It switched from "SQL_Ukrainian_Cp1251_CI_AS" to "SQL_Ukrainian_Cp1251_CS_AS"!

Look deeper:
USE LinuxTest;
GO
SELECT name, collation_name FROM sys.columns
WHERE object_id = OBJECT_ID('tbl_Collation_Test')
GO
All columns in my test table also change their collations and overwrite the individual column settings!

The final result of the testing is following:
When you change database collation for a database it does not affect existing objects.
However, when you change collation of the whole server it affects ALL existing objects in ALL databases on the server.

NOTE: This blog post has been done on SQL Server vNext CTP 1.4 on Ubuntu 16.04. The behavior of SQL Server in the later production release might be different.

No comments:

Post a Comment