Microsoft Lumos is now open source. It was being actively been used in select Microsoft products, and will now be available for the general web and app development community. The library reportedly allowed engineers to detect hundreds of changes in metrics and reject thousands of false alarms surfaced by anomaly detectors.
Lumos Reduces False-Positive Alert Rate By Over 90 Percent, Claims Microsoft:
Lumos is a new methodology that includes existing, domain-specific anomaly detectors. However, Microsoft assures the Python library can reduce the false-positive alert rate by over 90 percent. In other words, developers can now confidently go after persistent issues instead of intermittent ones which weren’t having a long-term detrimental effect. The health of online services is usually monitored by tracking Key Performance Indicator (KPI) metrics over time. Engineers conducting ‘Regression Analysis’ require a lot of time and resources to weed out issues which can be indicative of major problems. These problems can result in escalating operational costs and even loss of users if not addressed.
— Dataware Tech Ghana (@datawareghana) April 1, 2020 Needless to add, tracking down the root cause of every KPI regression is time-consuming. Moreover, teams often spend a lot of time analyzing the issues only to find they were a mere anomaly. This is where Microsoft Lumos comes in handy. The Python library eliminates the process of establishing whether a change is due to a shift in population or a product update by providing a prioritized list of the most important variables in explaining changes in the metric value. Microsoft Lumos also serves the wider purpose of understanding the difference in a metric between any two datasets, Interestingly, the platform includes ‘bias’, and by comparing a control and treatment data set while remaining agnostic to the time series component, Lumos can investigate the anomalies.
— Dataware Tech Ghana (@datawareghana) March 2, 2020
How Does Microsoft Lumos Work?
Microsoft Lumos works with the principles of A/B testing to compare pairs of data sets. The Python library begins by verifying if the regression in the metric between data sets is statistically significant. It then follows up with a population bias check and bias normalization to account for any population changes between the two data sets. Lumos decides the issue isn’t worth pursuing if there’s no statistically significant regression in the metric. However, if the delta in the metric is statistically significant, Lumos marks the features and ranks them according to their contribution to the delta in the target metric.
— Lisa Wood Shapiro (@LisaWShapiro) June 29, 2020 The Lumos Python Library serves as the primary tool for scenario monitoring of hundreds of metrics. Developers and teams conducting performance analysis could monitor and work on the reliability of calling, meetings, and public switched telephone network (PSTN) services at Microsoft. The library is operational on Azure Databricks, the company’s Apache-spark-based big data analytics service. It has been configured to run with multiple jobs that are arranged as per priority, complexity, and metrics type. The jobs complete asynchronously. It means if the system detects an anomaly, a Lumos workflow is triggered, and the library then intelligently analyzes and checks if the anomaly is worth pursuing and addressing. Microsoft has noted that Lumos isn’t guaranteed to catch all regressions in services. Additionally, the service will require a large number of datasets to offer reliable insights. The company is planning to include continuous metrics analysis, perform better feature ranking, and bring in feature clustering as well. These steps should address the primary challenge of multicollinearity in feature ranking.