First tests with SQL Server vNext on Linux and Windows

The IT world is changing. The major players are changing. The old rules are evolving and the old barriers between “gardens” are disappearing. Microsoft is changing, but it is not alone. The first preview of Microsoft SQL Server vNext for Linux debuted in November 2016. On the same day, Microsoft became a platinum member of the Linux Foundation. And on the same day Google joined the .NET Foundation Technical Steering Group (Google announcement here).

SQL server vNext on Linux and Windows

I am too curious, so I immediately created a couple of VM on Azure to try the new SQL Server on Linux, Ubuntu 16.04, and on Windows Server 2016. The VMs are D2_V2 (2 core, 7 GB ram). Both virtual machines run on a standard storage account, i.e. traditional hard disks.

The setup on Linux was straightforward and fast. Starting from a running Ubuntu 16.04 and following the documentation, in less than 10 minutes I have got a running Sql Server. The installation is done through APT. Before starting, it is necessary to update the package repositories list and the a simple command installs SQL Server:

sudo apt-get install -y mssql-server

After that, another command allows choosing the SA password and setup the start-up mode of the sql service:

sudo /opt/mssql/bin/sqlservr-setup

That’s all. SQL Server is running on Linux! But bear in mind that SQL Server for Linux lacks a lot of features compared to the full version for Windows. Details here. You should also install the tools sqlcmd and bcp for Linux.

On Windows Server 2016 the installation was much longer: the iso or cab needed to be downloaded first. The well-known (old?) graphical setup wizard has many options and executes a lot of checks during the setup. It required more than 40 minutes to complete.

Connecting from a remote machine

I managed both the SQL servers directly from my PC through the internet using the SQL Management Studio (it is recommended to upgrade to the latest version. Previous version sometime hangs managing Linux machines).

To connect to the remote machines, I needed to add a security rule to allow inbound TCP traffic on port 1433 on both virtual machines. Important note: you should restrict the allowed source to your public IP. It is not a good idea to expose a sql tcp port to the full internet.

On Windows Server 2016 it is also required a similar rule on the internal firewall. On the other hand, Linux is already open to accept connections (perhaps too much open, in my opinion).

Using the latest SQL Server Management Studio, I created an empty database on both the VMs. A funny behavior: the paths from the Linux machine are always prefixed with a “C:”, so the resulting path is something like: C:\var\opt\mssql\data\mytestdb.mdf

Restore a database

I downloaded the sample AdventureWork2014 backup from Microsoft site and restored it on both VMs. The command was the same:

RESTORE DATABASE [AdventureWorks2014] 
FROM DISK = N'C:\Fabry\sql_backup\AdventureWorks2014.bak' 
WITH  FILE = 1,  
MOVE N'AdventureWorks2014_Data' 
TO N'C:\Fabry\mssql_data\AdventureWorks2014_Data.mdf',  
MOVE N'AdventureWorks2014_Log' 
TO N'C:\Fabry\mssql_data\AdventureWorks2014_Log.ldf', 
NOUNLOAD, STATS = 5
 
RESTORE DATABASE [AdventureWorks2014] 
FROM DISK = N'C:\home/Fabry/db_backup\AdventureWorks2014.bak' 
WITH  FILE = 1,
MOVE N'AdventureWorks2014_Data' 
TO N'C:\var\opt\mssql\data\AdventureWorks2014_Data.mdf',  
MOVE N'AdventureWorks2014_Log' 
TO N'C:\var\opt\mssql\data\AdventureWorks2014_Log.ldf',  
NOUNLOAD,  STATS = 5

Some performance tests

AdventureWorks2014 database is not so big for a D2_V2 virtual machine. So I decided to “fatten” it using the script from sqlskills.com. It creates two new big tables, Sales.SalesOrderHeaderEnlarged and Sales.SalesOrderDetailEnlarged.

The first run of the script was done without changing the predefined auto-growth of the database files. The resulting mdf was 1.5 GB and the LDF was 5 GB. A lot of time was consumed by allocating new disk space. This is the normal behavior of sql server and both the versions behaved the same way. I did a second run, after removing the two new tables. As expected it was faster.

  First run Second run
Windows 00:10:19 00:03:44
Linux 00:05.06 00:02:31

Linux was faster than Windows on both cases. It is surprising. I expected SQL server on Linux slower than Windows. I repeated the tests and the result are the same.

I did other tests running different queries and Linux was always faster. I repeated the test after couple of days. The VMs were shutdown, deallocated and restarted. The results were more or less the same. I am still investigating about these strange results.

  Windows Windows Linux Linux
Query Cpu time Elapsed time Cpu time Elapsed time
(0)   57000   34000
(1) 4329 2667 2919 6037
(2) 4390 2287 2344 1316
(3) 56092 38323 31683 21553
(4) 3063 1877 1713 862

(milliseconds)

Query_0

SELECT count_big(1) FROM 
    HumanResources.Department as T1, HumanResources.Department as T2,
    HumanResources.Department as T3, HumanResources.Department as T4,
    HumanResources.Department as T5, HumanResources.Department as T6,
    HumanResources.Department as T7, HumanResources.Department as T8

Query_1

CHECKPOINT
DBCC DROPCLEANBUFFERS
SELECT distinct
   sales.SalesOrderHeaderEnlarged.salesorderid,
   Production.Product.Name
   from Sales.SalesOrderHeaderEnlarged
   inner join Sales.SalesOrderDetailEnlarged 
   on Sales.SalesOrderHeaderEnlarged.SalesOrderID =
      Sales.SalesOrderDetailEnlarged.SalesOrderID
   inner join Production.Product 
   on Production.Product.ProductID = 
      Sales.SalesOrderDetailEnlarged.ProductID
   where linetotal > 20000

Query_2

...like Query_1 but without CHECKPOINT and DBCC DROPCLEANBUFFERS

Query_3

SELECT count(1)
   FROM [Sales].[SalesOrderDetail] as SOD1, 
        [Sales].[SalesOrderDetail] as SOD2
   WHERE SOD1.ProductID=SOD2.ProductID 
         AND SOD1.UnitPrice > SOD2.UnitPrice

Query_4

SELECT distinct [CarrierTrackingNumber] 
   FROM [Sales].[SalesOrderDetailEnlarged]

[UPDATE 2016-12-04]

Following the notes from Marco Russo, I started a larger session of tests using different machine size on Azure. The results are very contradictory and inconsistent, in particular using virtual machine of the DS series. Different runs in different time in the day, showed different results. I can only guess about the reasons (CPU overcommitment?). On the other hand, DSv2 series showed a more stable behavior. The table below shows the results.

    Windows   Linux   Linux VS Windows  
VM size query cpu time elapsed time cpu time elapsed time cpu time elapsed time
D2_V2 0 61594 36302 60037 34298 -2.53% -5.52%
D2_V2 1 2687 4475 2448 3126 -8.89% -30.15%
D2_V2 2 2891 1827 2304 1266 -20.30% -30.71%
D2_V2 3 33531 23454 30760 20974 -8.26% -10.57%
D2_V2 4 1984 1008 1649 830 -16.89% -17.66%
D11_V2 0 63078 37782 56302 32051 -10.74% -15.17%
D11_V2 1 3296 4276 2532 3915 -23.18% -8.44%
D11_V2 2 2796 2135 2110 1536 -24.54% -28.06%
D11_V2 3 35234 24252 29466 20013 -16.37% -17.48%
D11_V2 4 1953 982 1507 759 -22.84% -22.71%
DS2_V2 0 57593 32205 58225 33492 1.10% 4.00%
DS2_V2 1 1954 6339 2889 6016 47.85% -5.10%
DS2_V2 2 2359 1368 2343 1340 -0.68% -2.05%
DS2_V2 3 30844 21008 30984 20937 0.45% -0.34%
DS2_V2 4 1687 844 1578 795 -6.46% -5.81%
DS11_V2 0 57047 31664 60381 34344 5.84% 8.46%
DS11_V2 1 2094 6325 2818 6048 34.57% -4.38%
DS11_V2 2 2296 1274 2271 1242 -1.09% -2.51%
DS11_V2 3 30078 20419 31596 21454 5.05% 5.07%
DS11_V2 4 1672 851 1614 816 -3.47% -4.11%