2022年5月20日 星期五

Client connection behavior when connect to Always on Listener

 Client connection behavior when connect to Always on Listener

1.單一子網路 Single Subnet Failover Cluster

Failover後IP不變,AP端預設connection timeout 15 seconds 

SqlConnection.ConnectionTimeout Property

The time (in seconds) to wait for a connection to open. The default value is 15 seconds.


2.跨子網路 Multi-Subnet Failover Cluster

當用戶端跟Microsoft Windows DNS servers發出解析請求時,DNS會一次回傳跨網段的所有IP給用戶端,但是因為只有一個IP online,所以可能會造成用戶端連線問題。

用戶端接收到所有IP後,會循序一個一個嘗試連線直到Connection timeout。

預設TCP 連線逾時嚐試是21 seconds,每21秒才會嘗試下一個IP

而.Net client預設連線逾時是15 seconds,因此,有些AP會遇到間歇性的連線逾時,因為有些AP取得第一IP是online的IP就不會逾時。

所以使用ADO.NET with .NET Framework 3.5SP1 or SQL Native Client 11.0 OLEDB連線到Listener或network name or Client Access Point(Multi-Subnet),則有50%的機會遇到time out


解決方法1 (如果沒有權限修改Cluster Resource屬性)

修改連線字串將connection timeout 設定為 30 seconds (this value results in a 20-second TCP timeout period plus a 10-second buffer).

優點:如果發生跨子網路的容錯移轉,用戶端復原時間很短。

缺點:半數的用戶端連線需要 20 秒以上


解決方法2 (如果有權限可以修改Cluster Resource屬性)

將network name of your availability group listener 的RegisterAllProvidersIP 設定為 0.

優點: 您不需要增加用戶端連線逾時值。

缺點: 如果發生跨子網容錯移轉,用戶端復原時間可能會是 15 分鐘或更長的時間,視您的 HostRecordTTL 設定和跨網站 DNS/AD 複寫排程的設定而定。

因為DNS只有註冊active online IP,failover後新節點主機會向DNS更新A record修改跨網域的IP,但用戶端會因為HostRecordTTL預設為20分鐘,所以可能會因為本機OS cached DNS entry還未超過20分鐘,與等待超過後才會重新向DNS取得跨網域的IP

PS.降低HostRecordTTL會增加網路流量也會增加DNS Server的負載(The drawback to setting the value to a lower number is how often the client OS will query the DNS server),如果只有幾十台到百台AP,修改此參數可能不會有明顯的衝擊,但若有上千台AP主機,怎會造成DNS負載過大的問題,此兩個參數都經過一些測試驗證後才能取的一個設定上的平衡


Cluster resource有兩個參數影響DNS註冊

The first parameter of interest is called RegisterAllProvidersIP. This parameter determines whether the Windows Cluster will register all of the IP addresses the AGL is dependent on, or only the one active IP address. When set to 1 (default if the AGL is created from SQL Server), the AGL clustered resource is created with all of the IP addresses the AGL is dependent on, registered in DNS. When set to 0, only the one active IP address is registered in DNS (the IP address in the subnet hosting the primary replica). (NOTE: if a Client Access Point is created using Windows Failover Cluster Manager, the RegisterAllProvidersIP parameter is set to 0 by default.)

The second parameter is called HostRecordTTL. This parameter governs how long (in seconds) before cached DNS entries on a client OS are expired, forcing the client OS to re-query the DNS server again to obtain the current IP address. By default, this value is 1200 (20 minutes). This means that after a client OS makes a call to the DNS server to resolve this name to an IP address, the client OS will cache that value for 20 minutes, only querying the DNS server again after that cached record expires. If this value is reduced to 120 or 60 for example, then the client OS cached copy will expire much more quickly.



解決方法3 (建議)

1.前端程式連線字串加上 MultiSubnetFailover=True

前提應用程式必須使用native client libraries for SQL 2012 或 .NET 4.5 libraries 或 earlier .NET libraries with hotfixes

用戶端則會一次嘗試連線所有IP,只要一個IP成功即開始連線

will receive a SYN request at the TCP layer “in parallel” (technically one immediately after the other, but not waiting for acknowledgement – so effectively “in parallel”)


  • Faster multi-subnet failover to a multi-subnet listener for an Always On Availability Group or Failover Cluster Instances.
  • Faster single subnet failover to a single subnet listener for an Always On Availability Group or Failover Cluster Instances.
    • This feature is used when connecting to a listener that has a single IP in a single subnet. This performs more aggressive TCP connection retries to speed up single subnet failovers.
  • Named instance resolution to a multi-subnet Always On Failover Cluster Instance.
    • This is to add named instance resolution support for an Always On Failover Cluster Instances with multiple subnet endpoints.


是否支援MultiSubnetFailover參數的Driver

Driver Support

The following table summarizes driver support for Always On availability groups:

DriverMulti-Subnet FailoverApplication IntentRead-Only RoutingMulti-Subnet Failover: Faster Single Subnet Endpoint FailoverMulti-Subnet Failover: Named Instance Resolution For SQL Clustered Instances
SQL Native Client 11.0 ODBCYesYesYesYesYes
SQL Native Client 11.0 OLEDBNoYesYesNoNo
ADO.NET with .NET Framework 4.0 with connectivity patch*YesYesYesYesYes
ADO.NET with .NET Framework 3.5 SP1 with connectivity patch**YesYesYesYesYes
Microsoft ODBC Driver 13.1+ for SQL ServerYesYesYesYesYes
Microsoft JDBC Driver 4.0+ for SQL ServerYesYesYesYesYes
Microsoft OLE DB Driver for SQL ServerYesYesYesYesYes

*Download the connectivity patch for ADO .NET with .NET Framework 4.0: https://support.microsoft.com/kb/2600211.

**Download the connectivity patch for ADO.NET with .NET Framework 3.5 SP1: https://support.microsoft.com/kb/2654347.

*Download the new Microsoft OLE DB Driver for SQL Server: https://aka.ms/downloadmsoledbsql.





Reference:

Configure a listener for an Always On availability group

Driver and client connectivity support for availability groups

Timeout occurs when you connect to an AlwaysOn listener in multi-subnet environment

Connection Timeouts in Multi-subnet Availability Group

Connection times out when you use AlwaysOn availability group listener with MultiSubnetFailover parameter

遇到此情境時AP主機需安裝hotfix


Timeout expired messages when connecting to SQL Server

Determine the type of timeout expired errors

From a connectivity perspective, you encounter the following timeout issues:

  • Connection timeout (15 seconds by default)
  • Query or command timeout (30 seconds by default)



沒有留言:

SQL Server Database Mirroring 資料庫鏡像

SQL Server Database Mirroring 資料庫鏡像