Skype for Business – Niggle Notes – Login & Routing Performance

I wanted to share with you some notes from the field as it where. The most common problems I see in deployments that where either PoC’s come production or just bad habits. I am not sure the reason why so many fall foul to these common mistakes, but they seriously impact the performance of Skype for Business. This post is a collection of a few of these that I have come across, and hopefully will help you not make the same mistakes.

No or Incorrect Certificate Revocation Lists (CRLs) for Internal Certificates

This is by far the single most common problem I come across in deployments. This is mainly down to the large portion of SME businesses failing to understand what is required when not only deploying Skype for Business but a Certificate Authority. When deploying a CA from the Server Manager install wizard, you get a basic CA. This CA comes configured with a CRL that is LDAP based. This means that the CRLs are stored in Active Directory.

For most applications, this perhaps is OK, for Skype for Business however, it has some serious performance impacts. Skype for Business doesn’t “break” with just an LDAP CRL, it just lags in performance on certain processes such as client login. The problem is that Skype for Business only checks for web hosted CRLs that come from HTTP (80) sources. It cannot check LDAP for these CRLs. When a client logs in we can see using fiddler that every time, even when NTLM authentication is not used, the client checks for a valid CRL distribution point.

image

Here we can see when checking an external certificate (because I am logging in externally) that it is checking OSCP/CRL. When this is OK, the client logs in pretty fast, however, if we are unable to reach the CRL, then the client tries several times before giving up and continuing with the login process. The result is slower login times.

This also affects Microsoft Surface Hubs as well when signing into Skype for Business.

The good news is that you can easily resolve this by configuring a web server to host the CRLs and then configuring the certificate authority to stamp this location into the issued certificates. This does mean you have to reissue all your Skype for Business internal certificates and restart services, but the benefits of doing this can reduce login times by approximately 5 to 10 seconds.

Misconfigured _sipinternaltls DNS Records

Another issue that slows down logins is the miss configuration of the _sipinternaltls._tcp.domain.com SRV DNS record when you have a Front End DR pool. When you create these records, they are assigned a priority and weight of ZERO. This means that to anyone requesting these records, the DNS server will return them in round robin. The impacts of this is that potentially 50% of your users client will attempt to sign in to the wrong front end pool. When this happens, the front end pool that does not home the users will redirect them to their pool for signing in. Although this is built in functionality, it is not optimal to rely on this as it introduces a delay that could save your users between 1-2 seconds of time.

The way to fix this issue is to assign a weight to each SRV record. The common approach is to work in multiples of 10, so assign the SRV record pointing to the active pool a weight of 10, and the DR pool a weight of 20.

This will ensure that the Skype for Business client will always try to connect to the active pool first, before attempting the DR pool. Weighting of these records also is required if you are deploying Snom Desk phones (although Snom UC edition is no longer a supportable device). If you don’t weight these records, you will find the Snom phones will fail to login.

DNS for Multiple Global Front End Pools

As a supplement to the last point, when deploying multiple active front end pools for a global organisation, sending every client to one main front end pool for user and pool discovery is inefficient and takes a performance hit for those users who are sending needless data over a WAN link. I have seen deployments that have used split brain DNS and chosen to replicate this to all domain controllers in the forest. While this works for single central forest deployments, if you have users and pools spread over different datacenters across the globe, it makes sense to ensure that these users are homed on the front end closest to them. As well as this it also makes sense to ensure that they are able to take the shortest route possible to access these services. The problem with a standard replicated DNS zone is that is doesn’t provide this level of configuration, it’s one change per organisation.

The better solution is to configure a zone that is only replicated to DNS servers within the same region as the user. This way each region can be configured to point records to the active pools that matter for their region, ensuring that priorities are given to the servers within that region.

In order to do this you need to create an application partition in DNS and enlist the required DNS servers into this partition for replication. Then configure your zone within that partition. Once done, you can have dedicated zones configured specifically for users within different regions.

Federation Signalling Traffic

Geographically dispersed Edge Server deployments have often been misunderstood. It is common for people to assume that just because you have an edge server assigned to a local front end pool that users homed on that front end pool will use the local edge for all federation traffic. This is not the case. Indeed the local edge pool will perform localised media paths in peer to peer calls, but the federation signalling traffic (SIP) will always traverse the edge pool that has been assigned the federation route, like the following diagram.

image

I have seen different poor configurations that have led to problems such as one way federation and poor voice quality. Most notably is the configuration of multiple _sipfederationtls SRV records pointing to edge pools that are not assigned the federation route. There are different opinions on this matter around resilience or disaster recovery. However, personally I choose to configure one SRV record that points to the Edge pool that is assigned the federation route. Other people have suggested to weight the SRV records like the _sip._tls record but I have not seen consistent results with that configuration. The benefit of doing it my way is that you can control your traffic consistently and provide a more stable service to the end users. Don’t forget that in some global organisations it may not be possible for a client subnet in the London Office to access the internal DMZ interface of an Edge server in Los Angeles Office for instance. If we had multiple SRV records then potentially you may experience issues when federated contacts try and contact you or vice versa.

When configuring the Time to Live on the federation record, I recommend setting a small time limit of around 5 to 10 minutes (if your host supports it), that way in a complete disaster, you can update the record to the DR pool in the time it takes for you to complete failover.

Also, don’t forget to enable the local edge pool for media. If you don’t do this, then both signalling and media will go via the federation edge pool by default, causing unnecessary performance hits on WAN links and voice quality as a result.

Misunderstanding PSTN Gateway Failover Routing

Where customers have deployed a DR mediation servers or gateways for voice resilience I have seen two configurations that in the event of a real voice failover situation would cause a complete or partial voice outage.

The first common miss configuration when two PSTN gateways have been deployed is that they are added to the same Voice Route. In this configuration when a call is placed to the PSTN it will try ANY gateway that is associated to it. There appears to be no order to it, its not like round robin, so in this event you could have calls perhaps going over a sub optimal network route, or calls failing completely depending on your carrier configuration.

If you want to configure failover routing then you must create two voice routes. The first route should be your primary route, and the second your backup. In the voice route table the routes are processed by the PSTN usages in the order in which they appear in the table, so primary must always be above the backup route.

The second common miss configuration is when customers deploy a DR mediation server and associate it with a PSTN gateway, then do nothing. In the event that this mediation server is called on to make or receive calls, the calls will go no where, because no voice routes exist. In a similar manner to the above additional voice routes should be configured

Not Removing Ext from the Caller ID

This is becoming less and less of a problem nowadays with the uptake in SIP, but if your users use the extension attribute in their line uri like tel:+441270212001;ext=2001 then you may fall foul with some carriers still not being able to strip this from the call as they receive the Caller ID. In order to remove this possibility and avoid long unnecessary troubleshooting, it is recommended that calls placed over PSTN trunks are stripped of the extension. In Skype for Business you can do this on the trunk configuration, specifically the Calling Number Translation Rules. Here you would simply match your line uri format and replace with just the tel number e.g. Match ^(\+\d{12});ext=(\d{4})$ Translate to $1.

That’s it for this post, I could go on and on, maybe a future post with some more to come….

Hope this helps you.

Advertisements

2 comments

  1. Mark, for _sipinternaltls, you want to utilize the priority not the weight. If two records have the same priority, then the weight will be used to distribute the requests. I.e. it is a fancier form of Round Robin. Put your primary Pool as the lower Priority number (i.e. 0) if you don’t want people connecting to the DR pool unless the primary is down. Maybe you were just referring to how to get most of the connections to the primary while still allowing connections to the DR pool though.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s