Yesterday I encountered a problem with an Office 365 hybrid environment where mail suddenly began looping back and forth between the on premise environment and office 365 for all remote mail users. No changes had been made to the environment.
Mail was transferred successfully to Office 365 using the correct connector, but office 365 was then passing the mail back to on premise. This resulted in a mail loop and users sending e-mail to office 365 accounts receiving an NDR with the following:
servername.local #<servername.local #5.4.6 smtp;554 5.4.6 Hop count exceeded - possible mail loop>
Following a support call with Microsoft lasting around 4 hours, it turns out an internal change has been made to the way Microsoft deal with wildcard certificates. By changing the Office 365 inbound connector to use the SubjectAlternativeName of the wildcard certificate rather than the subject, our issue was resolved:
PS C:\> Get-InboundConnector "Inbound" | fl Id,Tls* Id : Inbound 2 TlsSenderCertificateName : <I>CN=COMODO RSA Organization Validation Secure Server CA, O=COMODO CA Limited, L=Salford, S=Greater Manchester, C=GB<S>CN=*.domain.co.uk, OU=PremiumSSL Wildcard, O=Organisation, STREET=Road Name, L=Location, S=County, PostalCode=Postal Code, C=GB
PS C:\> Get-InboundConnector "Inbound" | fl Id,Tls* Id : Inbound 2 TlsSenderCertificateName : *.domain.co.uk
The subject of the certificate had been automatically used by the hybrid configuration wizard and been working for at least the past three months.
Updated 3rd November
Microsoft have now provided the following update, though no such incident appears in the Office 365 portal (for me at least).
Current Status: Engineers have confirmed with some customers that the workaround resolves the issue. Currently, engineers are developing and testing a long-term fix for the code defect, which is expected to take an extended period of time to complete. User Impact: Users with mailboxes hosted on-premises are receiving an error message when attempting to send email to Office 365-hosted users. As a workaround, administrators can enable IP-based inbound on-premises connectors in Office 365 to successfully send email. Customer Impact: Your organization is affected by this event. Impact is specific to a subset of your users. Engineers have received a few isolated customer reports of this issue. Incident Start Time: Monday, November 2, 2015, at 8:53 AM UTC Preliminary Root Cause: A code defect caused an issue with a certificate-based connector. Next Update by: Wednesday, November 4, 2015, at 8:00 PM UTC