The differences between Azure Synapse and SQL Server

Richard Gao
1 min readMay 22, 2022

--

  1. Recursive CTEs are not available as of now
  2. Modifying a table with a CTE is not possible (unless you have hash index)
  3. Introduced in Synapse are Distributions: Hash, Round Robin and Replicate. In general Hash is the best table if you are able to get a decent column to index on. Round robin will always do some “Shuffling” when querying data that causes it to be slower. The more unique this column the better. That or ensure that the data will have the least skew possible.

Distributions in synapse will determine how you assign data to each of the 60 distributions in your database. The column you use for this distribution should follow the following rules:

  • Is used in JOIN, GROUP BY, DISTINCT, OVER, and HAVING clauses. When two large fact tables have frequent joins, query performance improves when you distribute both tables on one of the join columns. When a table is not used in joins, consider distributing the table on a column that is frequently in the GROUP BY clause.
  • Is not used in WHERE clauses. This could narrow the query to not run on all the distributions.
  • Is not a date column. WHERE clauses often filter by date. When this happens, all the processing could run on only a few distributions. Ie. if 10 users are trying to filter on the same date, this means all of the calculations will be done by one distribution.

--

--

Richard Gao

Computer Science and Data Enthusiast | Linkedin: https://www.linkedin.com/in/richard-gao-csecon/ | Shovelling data into the AI engine