2022年3月28日 星期一

Azure Synapse Analytics

 Azure Synapse Analytics

SQL Server Performance Counter

 

sys.dm_os_performance_counters (Transact-SQL)

Run Performance Monitor

Windows Performance Monitor Overview



1.磁碟IO效能相關效能計數器

Average Disk sec/Read

Average Disk sec/Read value is below 8ms

Average Disk sec/Read is proportional to time needed for one disk rotation. For example, a disk that makes 3,600 round per minute needs 60s/3600 = 0.016 seconds, i.e. 16 milliseconds to make one rotation. Average Disk sec/Read for that disk should be a multiple of 16 milliseconds. The time added to one disk rotation is the queuing time and the time needed for data transit across the I/O bus

Value (ms)Performance
< 8   (< 0.008 sec)Excellent
8 – 12 (0.008 ~ 0.012 sec)OK
12 – 20  (0.012 ~ 0.020 sec)Fair
> 20 ( > 0.020 sec))Bad

Maximum peaks during excessive I/O operations can be up to 25 milliseconds, but values constantly higher than 20 milliseconds indicate poor performance



Average Disk sec/Write

non-cached writes are the same as for Average Disk sec/Read

Usually, the read and write speed on a disk are different. The recommended values for non-cached writes are the same as for Average Disk sec/Read. In case of cached writes, the values are very different – values higher than 4 milliseconds indicate poor performance, while the values less than 1 milliseconds indicate the best performance

Value (ms)Performance
< 1 ( < 0.001 sec)Excellent
1 – 2 (0.001 ~ 0.002 sec)OK
2 – 4 (0.002 ~ 0.004 sec)Fair
> 4 (> 0.004 sec)Bad

If the Average Disk sec/Read and Average Disk sec/Write values are constantly above the recommended values, it’s an indication of a disk bottleneck and additional analysis is required


https://www.sqlshack.com/sql-server-disk-performance-metrics-part-1-important-disk-performance-metrics/


Use the SQLIOSim utility to simulate SQL Server activity on a disk subsystem

https://docs.microsoft.com/en-US/troubleshoot/sql/tools/sqliosim-utility-simulate-activity-disk-subsystem

Understanding SQLIOSim Output

https://www.itprotoday.com/sql-server/understanding-sqliosim-output


https://docs.microsoft.com/en-us/azure-stack/hci/manage/diskspd-overview

http://sharedderrick.blogspot.com/2011/09/sqliosim-sql-server-2008.html



2.記憶體相關效能計數器

\SQLServer:Memory Manager\Target Server Memory (KB) 
\SQLServer:Memory Manager\Total Server Memory (KB) 
\SQLServer:Buffer Manager\Buffer cache hit ratio
\SQLServer:Buffer Manager\Page life expectancy


Target Server Memory(KB)

The total amount of dynamic memory the server can consume

Target Server Memory (KB) is the amount of memory that SQL Server is willing (potential) to allocate to the buffer pool under its current load


Total Server Memory(KB)

The Total Server Memory is what SQL currently has allocated

The Total Server Memory is the current amount of memory that SQL Server is using

The Total Server Memory is the current amount of memory currently assigned to SQL Server.

If this counter is still growing the server has not yet reached its steady-state, and it is still trying to populate the cache and get pages loaded into memory.  Performance will likely be somewhat slower during this time since more disk I/O is required at this stage.  This behavior is normal.  Eventually Total Server Memory should approximate Target Server Memory.


Buffer Cache Hit Ratio

Percent of page requests satisfied by data pages from the buffer pool

It gives the ratio of the data pages found and read from the SQL Server buffer cache and all data page requests. The pages that are not found in the buffer cache are read from the disk, which is significantly slower and affects performance

The recommended value for Buffer Cache Hit Ratio is over 90. When better performance is needed, the minimal acceptable value is 95. A lower value indicates a memory problem


Page Life Expectancy

Duration, in seconds, that a page resides in the buffer pool

SQL Server has more chances to find the pages in the buffer pool if they stay there longer. If the page is not in the buffer pool, it will be read from disk, which affects performance. If there’s insufficient memory, data pages are flushed from buffer cache more frequently, to free up the space for the new pages

When there’s sufficient memory on the server, pages have a high life expectancy. The normal values are above 300 seconds (5 minutes) and the trend line should be stable. It’s recommended to monitor the values over time, as frequent quick drops indicate memory issues. Also, a value drop of more than 50% is a sign for deeper investigation


SQL Server memory performance metrics – Part 4 – Buffer Cache Hit Ratio and Page Life Expectancy

https://www.sqlshack.com/sql-server-memory-performance-metrics-part-4-buffer-cache-hit-ratio-page-life-expectancy/


3.處理器CPU相關效能計數器


Processor:% Processor Time

Processor (_Total) \ % Processor Time

Sustained values > 90% on a single processor machine, or > 80% on a multiprocessor machine should be investigated

This counter monitors the amount of time the CPU spends executing a thread that is not idle. A consistent state of 80 percent to 90 percent might indicate the need to upgrade your CPU or add more processors.

For multiprocessor systems, monitor a separate instance of this counter for each processor. This value represents the sum of processor time on a specific processor. To determine the average for all processors, use the System: %Total Processor Time counter instead.

  • Processor: % Privileged Time

    Corresponds to the percentage of time the processor spends on execution of Microsoft Windows kernel commands, such as processing of SQL Server I/O requests. If this counter is consistently high when the Physical Disk counters are high, consider installing a faster or more efficient disk subsystem.

     Note

    Different disk controllers and drivers use different amounts of kernel processing time. Efficient controllers and drivers use less privileged time, leaving more processing time available for user applications, increasing overall throughput.

  • Processor: %User Time

    Corresponds to the percentage of time that the processor spends on executing user processes such as SQL Server.

  • System: Processor Queue Length

    Current Depth of the thread Scheduler Ready Queue.  

  • If the Ready threads per processor value is > 2 with some frequency this may indicate a processor bottleneck.

  • Corresponds to the number of threads waiting for processor time. A processor bottleneck develops when threads of a process require more processor cycles than are available. If more than a few processes attempt to utilize the processor's time, you might need to install a faster processor. Or, if you have a multiprocessor system, you could add a processor. 




Monitor CPU Usage

https://learn.microsoft.com/en-us/sql/relational-databases/performance-monitor/monitor-cpu-usage?view=sql-server-ver16



[SQL SERVER]SQL Server Perfmon Counters

名稱建議值說明

SQLServer: Buffer Manager: Buffer cache hit ratio

>95%
優:99.999%
從buffer讀取data page擊中率,該值越高表示資料都是從記憶體中讀取

SQLServer: Buffer Manager: Page life expectancy

>300
停留超過5分鐘
data page留在buffer的時間,停留時間越長,表示SQL Server不需要從disk讀取資料到memory,停留時間越短,表示Server可能需要增加記憶體

SQLServer: Buffer Manager: Checkpoint Pages / Sec

 將記憶體中的Dirty Page 和log page寫入disk,checkpoint頻率和 recovery interval設定有關聯,如果該值太高可能表示記憶體不足或是 recovery interval設定太高(預設0表示每分鐘處理)

SQL Server: Databases:Log Flushes/sec

 log page每秒寫入transaction log file頻率,該值太高可能是交易處理邏輯不良導致交易次數太高,如1000筆資料每一次commit該值就會增加1000,如果批次分成2個交易(每500筆commit)該值只會增加2

SQL Server: Databases:Log Growths

0transaction log file被擴大次數,每一次當transaction log file被擴大時,所有使用者活動必須被停止,直到transaction log file成長完畢,這通常表示transaction log file空間沒有被reuse或是初始值設定不良所導致,要避免使用者請求處理期間被擴大影響效能
SQLServer: SQL Statistics: Batch Requests/Sec 每一秒處理請求的數量,該值通常和硬體等級有關,也通常表示SQL Server吞吐量,基本上透過該值需和其他值互相搭配參考。
SQLServer: SQL Statistics: SQL Compilations/Sec< Batch Requests/Sec *10%每秒TSQL發生編譯次數(包含重新編譯),編譯是一個昂貴操作,基本上該值越低越好,越高可能 ad hoc查詢過多,或使用不必要的recompile提示,可能須適當改寫TSQL降低編譯次數。
SQLServer: SQL Statistics: SQL Re-Compilations/Sec<Compilations/Sec *10%每秒發生重新編譯次數,該值應該要接近0,因為重新編譯可能導致deadlock和exclusive compile locks,該值需參考Batch Requests/Sec和SQL Compilations/Sec
SQLServer: Access Methods: Page Splits / Sec< 100 Batch Requests/Sec *20%每秒發生頁面分割次數,頁面分割是昂貴操作,要減少頁面分割次數,需要查看資料表和索引設計,避免非順序資料新增或是針對高寫資料表調整索引填充因子。

SQLServer: General Statistic: Processes Block

 process blocked another process數量,理想情況下該值應該接近0,但實務上短暫封鎖是可接受的,如果該值太高你應該詳細調查是那些因素造成,以我個人經驗大多都是不良的平行處理所造成(參考SQL SERVER]找出封鎖的處理序)
SQLServer: Locks:

Lock Wait Time (ms)

<10 msthread請求lock等待時間,該值過高可能是blocking、不必要的鎖定擴大或是TSQL不佳所造成,也有可能是I/O問題(硬碟壞一顆),透過 sys.dm_os_waiting_tasks 查看鎖定資源,調整TSQL、設計正確索引或是最小化鎖定(nolock)
SQLServer: Locks: Lock Waits / Sec0thread請求lock等待時間。
整體lock時間應該要越短越好,例如應該盡量縮短交易處理時間,避免大交易處理(可拆多個小交易)
SQLServer: Locks: Avg Wait Time (ms)<500 ms每個lock請求等待時間,一般來說平均超過500ms表示有過多的封鎖,該值需參考Lock Waits/sec 取得一個平衡。
SQLServer: Locks: Deadlock/s<1每秒死結次數,該值應該等於0,該值太高表示交易處理邏輯不良,或是沒有使用正確索引導致都是full scan,需重新調整邏輯處理或是改善TSQL效能。
SQL Server: Latches : Latch Waits/sec

(Total Latch Wait Time) / (Latch Waits/ Sec) < 10

該數值表示無法立即授予Latch的請求。

Latch是SQL Serve內部用來同步資源一種資料結構,和操作系統的read/write lock類似,Latch保護資源使得資料存取井然有序,例如說當某一個session取得某個資源latch獨佔時,其他session如果也需要該資源latch時,則必須等待該latch釋放才可存取。

大方向來說,latch分兩大類,一buffer latch、另一I/O latch,buffer latch一般<1 ms,I/O latch一般<10 ms。

note:PAGELATCH_SH wait for PFS or SGAM pages in tempdb,表示tempdb有嚴重的資源競爭,透過切割tempdb datafile和啟用TF1118可獲得改善(參考[SQL SERVER]了解制式範圍和混和範圍)

SQL Server: Latches : 

Avg Latch Wait Time (ms)

<1 ms請求latch所等待時間。
SQL Server: Latches :

Total Latch Wait Time (ms)

(Total Latch Wait Time) / (Latch Waits/ Sec) < 10

該數值表示最後Latch請求的總等待時間,該值可提供其他計數器參考使用

SQL Server - Performance Counter Guidance

 

 

SQL Performance Counters

 

Object

Counter

Preferred Value

Description

SQLServer:Access Methods

Forwarded Records/sec

< 10 per 100 Batch Requests/Sec

Rows with varchar columns can experience expansion when varchar values are updated with a longer string.  In the case where the row cannot fit in the existing page, the row migrates and access to the row will traverse a pointer.  This only happens on heaps (tables without clustered indexes). Evaluate clustered index for heap tables.  In cases where clustered indexes cannot be used, drop non-clustered indexes, build a clustered index to reorg pages and rows, drop the clustered index, then recreate non-clustered indexes.

SQLServer:Access Methods

Full Scans / sec

(Index Searches/sec)/(Full Scans/sec) > 1000

This counter monitors the number of full scans on base tables or indexes. Values greater than 1 or 2 indicate that we are having table / Index page scans. If we see high CPU then we need to investigate this counter, otherwise if the full scans are on small tables we can ignore this counter.  A few of the main causes of high Full Scans/sec are
• Missing indexes
• Too many rows requested
Queries with missing indexes or too many rows requested will have a large number of logical reads and an increased CPU time.

SQLServer:Access Methods

Index Searches/sec

(Index Searches/sec)/(Full Scans/sec) > 1000

Number of index searches. Index searches are used to start range scans, single index record fetches, and to reposition within an index. Index searches are preferable to index and table scans.  For OLTP applications, optimize for more index searches and less scans (preferably, 1 full scan for every 1000 index searches). Index and table scans are expensive I/O operations.

SQLServer:Access Methods

Page Splits/sec

< 20 per 100 Batch Requests/Sec

Number of page splits per second that occur as the result of overflowing index pages. Interesting counter that can lead us to our table / index design. This value needs to be low as possible. If you find out that the number of page splits is high, consider increasing the fillfactor of your indexes. An increased fillfactor helps to reduce page splits because there is more room in data pages before it fills up and a page split has to occur.

Note that this counter also includes the new page allocations as well and doesn’t necessarily pose a problem.  The other place we can confirm the page splits that involve data or index rows moves are the fragmented indexes on page splits.

SQL Server:Buffer Manager

Buffer Cache hit ratio

> 90%

This counter indicates how often SQL Server goes to the buffer, not the hard disk, to get data. The higher this ratio, the less often SQL Server has to go to the hard disk to fetch data, and performance overall is boosted. Unlike many of the other counters available for monitoring SQL Server, this counter averages the Buffer Cache Hit Ratio from the time the last instance of SQL Server was restarted. In other words, this counter is not a real-time measurement, but an average of all the days since SQL Server was last restarted. In OLTP applications, this ratio should exceed 90-95%. If it doesn't, then you need to add more RAM to your server to increase performance. In OLAP applications, the ratio could be much less because of the nature of how OLAP works. In any case, more RAM should increase the performance of SQL Server OLAP activity.

SQL Server:Buffer Manager

Free list stalls/sec

< 2

Free list stalls/sec is the frequency with which requests for available database pages are suspended because no buffers are available. Free list stall rates of 3 or 4 per second indicate too little SQL memory available.

SQL Server:Buffer Manager

Free pages

> 640

Total number of pages on all free lists.

SQL Server:Buffer Manager

Lazy Writes/Sec

< 20

This counter tracks how many times a second that the Lazy Writer process is moving dirty pages from the buffer to disk in order to free up buffer space. Generally speaking, this should not be a high value, say more than 20 per second or so.  Ideally, it should be close to zero. If it is zero, this indicates that your SQL Server's buffer cache is plenty big and SQL Server doesn't have to free up dirty pages, instead waiting for this to occur during regular checkpoints. If this value is high, then a need for more memory is indicated.

SQL Server:Buffer Manager

Page Life Expectancy

> 300

This performance monitor counter tells you, on average, how long data pages are staying in the buffer. If this value gets below 300 seconds, this is a potential indication that your SQL Server could use more memory in order to boost performance.

SQLServer:Buffer Manager

Page lookups/sec

(Page lookups/sec) / (Batch Requests/sec) < 100

Number of requests to find a page in the buffer pool. When the ratio of page lookups to batch requests is much greater than 100, this is an indication that while query plans are looking up data in the buffer pool, these plans are inefficient. Identify queries with the highest amount of logical I/O's and tune them.

SQL Server:Buffer Manager

Page reads/sec

< 90

Number of physical database page reads issued. 80 – 90 per second is normal, anything that is above indicates indexing or memory constraint.

SQL Server:Buffer Manager

Page writes/sec

< 90

Number of physical database page writes issued. 80 – 90 per second is normal, anything more we need to check the lazy writer/sec and checkpoint counters, if these counters are also relatively high then, it’s memory constraint.

SQLServer:General Statistics

Logins/sec

< 2

> 2 per second indicates that the application is not correctly using connection pooling.

SQLServer:General Statistics

Logouts/sec

< 2

> 2 per second indicates that the application is not correctly using connection pooling.

SQLServer:General Statistics

User Connections

See Description

The number of users currently connected to the SQL Server.

 

Note: It is recommended to review this counter along with “Batch Requests/Sec”.   A surge in “user connections” may result in a surge of “Batch Requests/Sec”.  So if there is a disparity (one going up and the other staying flat or going down), then that may be a cause for concern. With a blocking problem, for example, you might see user connections, lock waits and lock wait time all increase while batch requests/sec decreases. 

SQL Server:Latches

Latch Waits/sec

(Total Latch Wait Time) / (Latch Waits/Sec) < 10

This is the number of latch requests that could not be granted immediately. In other words, these are the amount of latches, in a one second period that had to wait.

SQL Server:Latches

Total Latch Wait Time (ms)

(Total Latch Wait Time) / (Latch Waits/Sec) < 10

This is the total latch wait time (in milliseconds) for latch requests in the last second

SQL Server:Locks 

Lock Wait Time (ms)

See Description”

 

 

Total wait time (milliseconds) for locks in the last second.

 

Note: For “Lock Wait Time” it is recommended to look beyond the Avg value.  Look for any peaks that are close (or exceeds) to a wait of 60 sec.   Though this counter counts how many total milliseconds SQL Server is  waiting on locks during the last second, but the counter actually records  at the end of locking event.  So most probably the peaks represent one huge locking event.  If those events exceeds more than 60seconds then they may have extended blocking and could be an issue. In such cases, thoroughly analyze the blocking script output. Some applications are written for timing out after 60 seconds and that’s not acceptable response for those applications.

 

SQL Server:Locks 

Lock Waits/sec

0

This counter reports how many times users waited to acquire a lock over the past second.  Note that while you are actually waiting on the lock that this is not reflected in this counter—it gets incremented only when you “wake up” after waiting on the lock. If this value is nonzero then it is an indication that there is at least some level of blocking occurring.  If you combine this with the Lock Wait Time counter, you can get some idea of how long the blocking lasted.  A zero value for this counter can definitively prove out blocking as a potential cause; a nonzero value will require looking at other information to determine whether it is significant. 

SQL Server:Locks 

Number of Deadlocks/sec

< 1

The number of lock requests that resulted in a deadlock.

SQLServer:Memory Manager

Total Server Memory(KB)

See Description

The Total Server Memory is the current amount of memory that SQL Server is using.  If this counter is still growing the server has not yet reached its steady-state, and it is still trying to populate the cache and get pages loaded into memory.  Performance will likely be somewhat slower during this time since more disk I/O is required at this stage.  This behavior is normal.  Eventually Total Server Memory should approximate Target Server Memory.

SQLServer:SQL Statistics

Batch Requests/Sec

See Description

This counter measures the number of batch requests that SQL Server receives per second, and generally follows in step to how busy your server's CPUs are. Generally speaking, over 1000 batch requests per second indicates a very busy SQL Server, and could mean that if you are not already experiencing a CPU bottleneck, that you may very well soon. Of course, this is a relative number, and the bigger your hardware, the more batch requests per second SQL Server can handle. From a network bottleneck approach, a typical 100Mbs network card is only able to handle about 3000 batch requests per second. If you have a server that is this busy, you may need to have two or more network cards, or go to a 1Gbs network card.

 

Note: Sometimes low batch requests/sec can be misleading.  If there were a SQL statements/sec counter, this would be a more accurate measure of the amount of SQL Server activity.  For example, an application may call only a few stored procedures yet each stored procedure does lot of work.  In that case, we will see a low number for batch requests/sec but each stored procedure (one batch) will execute many SQL statements that drive CPU and other resources.  As a result, many counter thresholds based on the number of batch requests/sec will seem to identify issues because the batch requests on such a server are unusually low for the level of activity on the server.  

 

We cannot conclude that a SQL Server is not active simply by looking at only batch requests/sec.  Rather, you have to do more investigation before deciding there is no load on the server.  If the average number of batch requests/sec is below 5 and other counters (such as SQL Server processor utilization) confirm the absence of significant activity, then there is not enough of a load to make any recommendations or identify issues regarding scalability.

 

SQLServer:SQL Statistics

SQL Compilations/sec

< 10% of the number of Batch Requests/Sec

The number of times per second that SQL Server compilations have occurred. This value needs to be as low as possible. If you see a high value such as over 100, then it’s an indication that there are lots of adhoc queries that are running, might cause CPU usage, solution is to re-write these adhoc as stored procedure or use sp_executeSQL.

SQLServer:SQL Statistics

SQL Re-Compilations/sec

< 10% of the number of SQL Compilations/sec

This needs to be nil in our system as much as possible. A recompile can cause deadlocks and compile locks that are not compatible with any locking type.

 

 

Performance Counters for SQL Server 2005                                                                                     

( SQL Server Premier Field Engineers in Microsoft UK )

SQL Server Database Mirroring 資料庫鏡像

SQL Server Database Mirroring 資料庫鏡像