Demonstrating the power of Azure Machine Learning by predicting the likelihood of electricity outages across a network.
Australian Electricity network operators are faced with the challenge of maintaining high service quality standards while under significant expenditure constraints.
To meet network performance standards and minimise the risk of assets failing in service, a typical transmission or distribution business invests tens to hundreds of millions of dollars annually to inspect, maintain and replace network assets. This expenditure comes under close scrutiny from the Australian Energy Regulator which mandates that asset investments meet the criteria for prudence and efficiency. When electricity outages occur, network operators are financially impacted through SAIDI and SAIFI penalties in the regulatory framework; as such, the operators are incentivised to minimise these interruptions.
While asset investments are largely driven by risk assessments, it is not a trivial task to understand and quantify asset risk; predicting the probability of an asset failing in service is notoriously difficult for several reasons:
These issues impact a network operators ability to analyse asset risks at a granular level and to optimise investment decisions. Readify has investigated the potential for Machine Learning technologies to address the underlying complexity of this problem and deliver new insights about past, present and future risks in an electricity network.
We wanted to demonstrate the capacity of Machine Learning to address complexity and deliver insights from unpredictable systems.
The problem we chose to address: to make daily predictions about risk of weather-based supply interruptions at different locations in a network, based only on publicly available data. This problem demonstrates the four aforementioned challenges“ complex systems, limited information, decentralised knowledge and unconsolidated datasets.
Our model is trained using Azure Machine Learning to detect patterns and trends in historical power interruptions, and to link these patterns with weather indicators and locations. Using this trained model, the current weather observations and forecast can be analysed to generate machine learning predictions of future outage risks.
Data-driven problem solving can be simplified by implementing a tested and proven process. Our process follows six key steps. Our application of these steps to the outage prediction model is explained in more detail in the following subsections.
The first step in this process is to develop an understanding of the business context for the problem; this serves as the basis for defining and refining the objectives of the analysis.
For the current example, we started off by identifying potential benefits of this type of analysis in the context of electricity asset management.
The potential benefits include:
These ideas, along with our understanding of the broader industry context, were used to inform the direction of our analysis.
Given our challenge of using only publicly available data, the data understanding phase focussed primarily on researching sources and content of public data and testing how they relate back to the business objectives.
Our analysis is underpinned by two key data sources:
A weather-based model is heavily dependent on the geospatial location of assets; this is required to match appropriate weather observations. As such, we focussed only on networks which had publicly reported the approximate locations of incidents. For this demonstration, we have chosen data from the ACT.
With a region of geolocated assets, we investigated the availability and granularity of weather data, as well as the proximity of weather stations to network substations and interruption locations. The BOM data source was shown to be suitably granular and representative of local weather at incident locations. The below figure represents the location of substations (blue) and weather stations (orange) in the ACT.
Further investigation of the RIN data revealed a significant volume of weather failures attributed to wind. Matching these events with the relevant BOM data confirmed a correlation between wind speed and failure rate. Wind was therefore chosen as a key data point for the analysis.
An additional source of valuable data was identified in substation demand datasets. Demand is largely driven by weather; as such, the demand data represents a suitable reinforcement for local weather data. In particular, it adds an additional degree of geospatial granularity to the input dataset.
With an understanding of the contents and potential use of the chosen datasets, data preparation can begin.
The data preparation steps undertaken in this example are as follows:
The modelling component is undertaken in the Azure Machine Learning Studio (ML Studio). The prepared dataset is imported into ML Studio and pre-processed before training.
As the training dataset is relatively small and deals with a heavy class imbalance, the dataset is augmented using a Synthetic Minority Oversampling Technique (SMOTE), an inbuilt function in ML Studio. This technique generates additional observations without copying old observations (which would lead to an overfitting model). This technique proved successful in improving a weighted average of precision and recall.
The machine learning component was implemented through a Boosted Decision Tree algorithm. This algorithm was chosen based on its ability to be robust against small and imbalanced datasets, statistical outliers and feature multicollinearity. This choice was verified by validating the performance of Boosted Decision Trees against other algorithms.
A Boosted Decision Tree model is trained on a randomised sample split of training data. 70% of data is used for training and 30% for scoring and validation. The model design and parameters are tuned based on the performance of the model when run on the validation dataset. Once refined, this trained model is implemented as a predictive model to generate outputs from forecast inputs. The model takes a set of inputs (location, weather, demand) and uses them to predict the likelihood of a weather-related interruption to supply.
The performance of the trained model is evaluated within ML Studio, tested on the 30% validation dataset. The process is to test the performance of the model when run with ‘unseen’ data that was excluded from the model training. The evaluation involves analysis of the precision and recall of the model, as well as cross validation of the model results. This evaluation is part of an iterative process of refining model performance.
The second stage of evaluation involves more broadly assessing the value of the model outputs. The business objectives serve as a point of reference against which the model results can be measured; the outputs should offer insights that assist the end user in achieving these objectives. Once again, this evaluation provides an opportunity to iterate on the model and improve the value of the outputs.
Once a model has been trained and evaluated, it is ready to be deployed. The deployment phase involves automation of an end-to-end process flow within Azure.
That includes the following services:
Automated deployment of cloud services utilises the Azure Resource Management template. The objective is to automate the entire prediction process – from raw data to visualised insights – with no dependence on manual triggering or intervention. It is possible with the orchestration tool available in the Azure stack – Data Factory – that connects services used to ingest, process and visualise data and triggers parts of pipelines based on configured schedules.
By moving datasets and solution modules to the cloud, there is an additional advantage to the data analytics process. As the team worked together on different parts of the solution, every team member had access to consistent and up to date versions of both data and algorithms. This way, the final pipeline can produce a consistent and predictable result.
This case study demonstrates the potential for Machine Learning technology to address complex issues in the electricity context.
Working together with the network businesses, and with access to larger and more diverse datasets, a number of innovative applications can be addressed in the electricity industry.
Some examples include:
In collaboration with the network businesses, we now hold the potential to deliver significant innovative value through Azure Machine Learning.