Platform Stability, Reliability, and Security
EUM is dedicated to upholding a reliable and secure environment for our Customers.
This article describes the operational procedures we have incorporated to ensure EUM Customers and partners can rely on a safe, stable and well performing platform at all times and in all places.
In this article:
- Site reliability engineering culture
- Operation and incident management
- Handling of Customer inquiries
Site Reliability Engineering Culture
EUM has fully adopted the Site Reliability Engineering (SRE) culture and structure in our Support and Engineering teams to ensure platform stability, reliability, and security at all levels. This approach supports our goal of creating scalable and reliable software systems.
Role of the Support and Engineering Team
Incorporating aspects of software engineering into the area of operations and infrastructure, a key principle behind SRE, tasks our highly technically skilled and experienced people with the job of ensuring operational reliability.
Our engineers are specialists with broad skillsets who collectively understand the whole stack of applications, and the underlying infrastructure. This creates a team structure and healthy dynamic where the support team works together with engineers to answer Customer inquiries, solve issues, analyze problems, discover trends, and ensure reliability by preventing issues from escalating.
As technical skills are a priority for our approach to operation, all levels of engineers are highly trained on Microsoft’s technology. The team also completes Secure Code training courses on a yearly basis.
With a mission to ensure systems are running and performing optimally at all times, we constantly keep an eye on sites and services across all the three regional data centres in which EUM is available to ensure they are operating within healthy metrics.
Our constant tracking and analysis of platform numbers means that most incidents will be discovered before they manifest. In addition, our continuous capturing of data provides insights into how to improve the platform for optimal security, stability and reliability. The support and engineering teams also use this knowledge to identify and develop product improvements.
We react to incidents on all levels. If a Customer implementation shows unhealthy numbers, we ensure that the Customer is contacted (if required) so we can find a solution. If an issue is creating broader concerns, we will address it immediately.
Operation and Incident Management
To ensure timely detection and response to potential security incidents, we provide 12/5 monitoring of data activity of the EUM platform worldwide.
Handling of Customer Inquiries
Due to our monitoring and proactive engagement with Customers, most issues are handled at an early stage and most times before the Customer has detected them.
When Customers experience issues with EUM functionality or have general questions, we have a wealth of helpful resources and a team of helpful support engineers ready to assist.
Complete product documentation and answers to frequently asked questions are available on our website to aid with detailed product information at all hours every day of the year. If Customers experience broken functionality or general issues, our support engineers are ready to assist with 9/5 support EST.