SKYSCRAPERS ENERGY PREDICTION:
HOW MUCH ENERGY WILL A BUILDING CONSUME?
This project was carried out as part of the TechLabs “Digital Shaper Program” in Aachen (Summer Term 2020).
INTRODUCTION
We’re right in the middle of an ongoing climate change and human-driven global warming and that’s why we wanted to work on a project, which might help to reduce the speed of global warming by looking at the energy consumption of buildings. The project aims to reduce the increasing speed of global warming and helps to visualize the impact of energy-reducing actions on the overall energy consumption and emissions. The building sector is one of the major contributors to the overall energy consumption in Germany. Private households alone make up 30% of the energy consumption, most of which is required for space heating [1]. Thus, implementing new technologies in buildings can have a large impact on energy consumption and therefore CO2 emissions.
The monitoring of energy savings has two key elements. Forecasting future energy usage without improvements, and forecasting energy use after a specific set of improvements have been implemented, like the installation and purchase of a heat pump or installation of thermal insulation. One problem, that arises from the fact that the energy markets are growing rapidly, is the lack of cost-effective and precise procedures for forecasting energy consumption. Current methods of estimation are fragmented and do not scale well. Some assume a specific meter type or don’t work with different building types.
Our project is based on the Kaggle challenge called “Great Energy Predictor”, from which we have taken our data. In this competition, the task was to develop accurate models of metered building energy usage. The idea of the project is to predict energy consumption data of different skyscrapers based on past energy consumption data and weather data to compare predictions and current data. Based on those numbers companies can validate the performance of retrofits, which were built to reduce energy consumption.
The project solves the lack of missing energy consumption data from the past, which are necessary to validate the quality and efficiency of retrofits of different skyscrapers. Based on old and current energy consumption data, it is possible for investors to invest in the most suitable retrofit to reduce costs and emissions of skyscrapers in the future.
The datasets came from the above mentioned Kaggle challenge. It contained data from over a thousand buildings at different sites in the world from a three-year period. Corresponding weather data for the different sites were also given. Combining this information energy predictions for the buildings were made.
METHODOLOGY
We started our project by analyzing and exploring our data. We continued with cleaning the given datasets because they required some significant cleaning. It sounds easy but took us some time, because the datasets were quite big and are based on American/British measurement units and time zones, so we needed to convert the units.
The meter reading data included many missing values or zero readings. Later this had to be distinguished between false and true zero readings. One example of true zero reading would be zero readings in steam meter readings during summer because steam heating is turned off during the summer months. False zero readings were excluded, and rows of many missing values deleted.
Furthermore, outliers were detected and cleared out. One site and several individual buildings needed to be thrown out completely due to great offsets.
Cleaning of the weather data also included searching for outliers, deleting, as well as generating new values for missing timestamps through interpolation.
The data prediction is based on Random Forest Regressor from the scikit-learn library. Scikit-learn is a general open-source library for data analysis in Python. It is based on other Python libraries: NumPy, SciPy, and Matplotlib, and also contains a number of implementations for various popular machine learning algorithms.
A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is controlled with the max_samples parameter if bootstrap=True (default), otherwise the whole dataset is used to build each tree. In this method, several trees are branched out, each one trained with different data samples and sampling done with replacement. In the end, predictions are averaged to give out a final result.
Then we plotted our prediction compared to our actual data to see how good the prediction of our regressor was.
After that, we tried to optimize the Regressor to get a better result. We increased the count of trees and tried to plot some learning curves to see how good our system was working, but this didn’t really work out. So, in the end, our prediction worked partly and probably would have given a better result with a higher count of trees (n_estimators).
Nevertheless, we want to say that the TechLabs journey was a very fun, challenging, and instructive experience. We all gained some basic programming skills and are interested in learning more. We definitely want to encourage interested people to register for the next program by saying no matter what prior experience you benefit or if you don’t even have any experience in coding yet, TechLabs is a great way to get started.
TEAM
Rebecca Kunde
Laura Lippert
Hanna Reichel
Max Reichel
TechLabs Aachen e.V. reserves the right not to be responsible for the topicality, correctness, completeness or quality of the information provided. All references are made to the best of the authors’ knowledge and belief. If, contrary to expectation, a violation of copyright law should occur, please contact aachen@techlabs.org so that the corresponding item can be removed.