Monday, August 11, 2014

ADFS 2012 R2 Web Application Proxy servers in Load Balanced Configuration loses trust with ADFS farm (Event ID 422).

TL;DR: If you have a load balanced ADFS farm, make sure you have the June 2014 update rollup for Windows RT 8.1, Windows 8.1, and Windows Server 2012 R2.
_________________________________________________________ 

We recently implemented ADFS 2012 R2 (aka ADFS 3.0) in our production environment to allow our internal domain credentials to be used with an outside application provider.

We setup two Web Application Proxy servers in the DMZ, with a load balancer in front of them.  We have another load balancer on the internal network that the traffic from the Web Application Proxy servers flows through to load balance the internal ADFS servers as well.


With this configuration, we can easily reboot any single server in the system for patches and maintenance, without having to schedule a downtime.  We are monitoring port 443 with the load balancers so as soon as 443 doesn't answer on a particular server, it is taken out of the load balancer config automatically.

After configuring all of this, we were pretty sure we had setup everything correctly to have this be a highly redundant system.  Now we just needed to test.

I was testing the load balancer configuration by rebooting different ADFS servers in our farm and next thing I know I can no longer reach the ADFS login page.  At first we thought this was related to the load balancers not recognizing the servers had come back up after a reboot, but from looking at the event logs on the web application proxy servers it was evident that we had other issues.




This error was occuring on both Web Application Proxy servers on a 1 minute interval.

The first thing we did to get things back up and running quickly was to run the following command (as one line) from each of the ADFS proxy servers to reestablish the trust.


 Install-WebApplicationProxy –CertificateThumbprint 'abcd1234' -FederationServiceName adfs.yourdomain.com



You can find your thumbprint in the details of your ADFS certificate.  This is not our ADFS thumbprint of course, but an example.




What this command should do it re-establish the ADFS trust and create self signed certs with whatever internal ADFS server it is currently communicating with.  After that, a process is supposed to run to copy the newly created trust config to the other ADFS servers in the farm.

This way you don't have to create a trust with each and every ADFS server.  You make a trust with one and the config is stored in the database and copied to the others ADFS servers in the farm.  Very cool design.  However, this is where it went wrong.  Without the device registration feature enabled, which was not needed for what we were trying to accomplish, this feature of distributing the proxy trust config to the other servers in the farm never happens.  So you have a trust with only one of the internal ADFS servers and this is a problem in a load balanced environment.

Researching this further, we found the following KB detailing our issue:


A fix was in the July 2014 Rollup located here:


Once we applied the roll up and rebooted, the issue never cropped back up.  I've been monitoring the event logs since then and everything is working as expected at this point, so looks like the July 2014 roll up took care of this issue.