Pages

Friday, March 10, 2017

SPN's deleted during cluster failover

This article is describing SPN’s disappearing while failing over a clustered instance. But this behavior can be applicable also to other services.

DESCRIPTION

SQL Server instances attempts to register Server Principal Names (SPN’s) during their startup phase and deletes them during the shutdown sequence. Entries about these actions can be found in the ERRORLOG file. In order to be able to perform this, the SQL Server Service account needs to have in active directory the following two rights in AD:
  • Read Service Principal Name
  • Write Service Principal Name

For a SQL Server failover clustered instance the creation and deletion of SPN works as follows:
  1. Start up instance on node C1N1

    1. Bring SQL Server depended resources online (shared disk, IP address, network name)
    2. Bring SQL Server resource online (sqlservr.exe process)
    3. The SQL Server process queries the active directory (AD), checks if SPN does not exists and creates it (node C1N1)
  2. Failover of the instance to node C1N2

    1. SQL Server process is starting the shutdown process
    2. During this phase the AD is contacted and the SPN is deleted (node C1N1)
    3. SQL Process and resource is brought offline and afterwards also the rest of the resources, like shared disk, ip address, network name (node C1N1)
    4. Failover clustered resources are brought online on node C1N2 (shared disk, IP address, network name)
    5. Bring SQL Server resource online (sqlservr.exe process)
    6. The SQL Server process queries the active directory (AD), checks if SPN does not exists and creates it (node C1N1)

CAUSE

Everything works fine when the two nodes are connecting to the same domain controller. But everything changes when we the nodes are “speaking” with different domain controllers.

If node C1N1 communicates with the domain controller DC1 and the node C1N2 communicates with DC2, the following scenario can occur.
  • During the 2.b) phase, C1N1 deletes the SQL Server SPN from DC1
  • During the phase 2.f) SQL Server communicates with DC2, where he might still find the SQL Server SPN (the replication of the SPN changes from DC1 to DC2 did not yet happen)
  • Moments later, the change SPN change gets replicated from DC1 and DC2 and deleting the SQL Server SPN


SOLUTION


This i"s an architecture design that leads to the SPN disappearing. The only solution is to remove the “Read Service Principal Name” and “Write Service Principal Name” privileges to the SQL Server service account and set the SPNs manually.

No comments:

Post a Comment