Introduction and Project Objective
The primary aim of my project was to create a comprehensive heat pump electricity usage dataset for grid modellers. This work is crucial for understanding and predicting energy consumption patterns, which can significantly impact grid management, energy efficiency strategies, and heat pump technology adoption.
Data Source and Initial Considerations
I used data kindly provided by Trystan Lea from OpenEnergyMonitor, a company specialising in independent heat pump monitoring. The data represents average energy consumption from all heat pumps monitored on https://heatpumpmonitor.org/, yielding an average Seasonal Coefficient of Performance (SCOP) of approximately 3.3 across all installations.
Initially, I considered extracting individual Coefficients of Performance (COP) for all heat pumps over time. However, concerns about data robustness at the time of analysis led me to forgo this approach. As the number of monitored heat pumps grows rapidly, this may soon become feasible.
Data Preprocessing and Analysis
Normalization and Smoothing
The first step involved normalising the aggregated data to represent a single heat pump’s output, crucial for universal applicability in modelling. I applied a 4-hour centred rolling average for trend identification and to smooth the data.
Note: Smoothing may cause model to under-predict actual load – this step should be removed as more data is accumulated.
The data prior to mid-2023 showed significantly more variance than latter data, likely due to fewer systems being recorded on heatpumpmonitor. Consequently, I only used data from October 2022 onwards for training, with earlier data kept as a separate verification dataset.
Initial Observations
Raw data revealed interesting patterns, including extreme dips and peaks around 23:00 and 07:00-08:00, aligning with typical sleep/wake times. This highlighted the significant impact of human behaviour on energy consumption patterns, suggesting potential areas for future grid management strategies.
Weather Data Integration
I integrated weather data using Open-Meteo’s Historical Weather API, selecting a location near Banbury for typical UK temperatures. This simplified approach enabled easier error identification and correction, though it may be refined in future iterations.
Model Development, Initial Approach: Random Forest Regressor
My first attempt used a Random Forest Regressor, which performed well for most temperatures but struggled with extremes, especially rare low temperatures. I initially created complex features like daytime heating load and solar gain functions, which I later removed as they masked deeper issues.
Discovering Limitations
A significant limitation emerged when predicting electricity use solely from apparent temperature. The model couldn’t extrapolate beyond the highest training data values, as evidenced by a plateau in the prediction plot.
Exploring Alternatives
To address the limitations of the Random Forest Regressor, I explored alternatives:
- Linear Regression: These models were less accurate than the Random Forest.
- Gradient Boosting Models: XGBoost, like the Random Forest couldn’t extrapolate power usage to lower temperatures.
- Polynomial Regression: Tended to make unrealistic predictions for extreme temperatures.
- Neural Networks: Experimented with simple networks, but they didn’t they didn’t outperform my chosen solution.
A Hybrid Model
To address these limitations, I developed a model combining Random Forest and Linear Regression:
- Split the dataset at a 2% temperature threshold.
- The coldest 2% of temperatures were predicted via a linear regression model.
- The other 98% were predicted with a Random Forest Regression.
Reasoning: Random Forest excels at capturing complex patterns but struggles with extrapolation. Linear Regression, while simpler, can extend trends beyond observed data. The 2nd percentile threshold balances sufficient data for both models.
Evaluation of the Model
I employed a comprehensive evaluation approach:
- Quantitative: Used MSE and R² metrics.
- Visual: Plotted predicted vs. actual values across temperature ranges. This was to check the model behaviour in extreme temperature scenarios.
- Domain Knowledge: Assessed predictions against expected heat pump behaviour.
The hybrid model offered the best balance of performance, interpretability, and robustness, handling extreme scenarios without sacrificing accuracy in normal conditions.
Mitigating Over-fitting
My efforts to reduce overfitting included careful feature selection, minimising random forest leaves, and using regularisation in Linear Regression. Given limited data, examining training data and predictions proved more effective than relying solely on R-squared values.
Feature Engineering and Selection
Throughout the process, I conducted extensive feature engineering and selection, including:
- Creating time-based features (hour of day, day of week, etc.)
- Generating lagged features to capture temporal dependencies
- Calculating rolling averages and other statistical measures of weather variables
- Experimenting with interaction terms between different features
Model Limitations
- Uncertain performance at extreme temperatures due to limited data
- Questionable accuracy during warm periods and transitional seasons
- Potential inaccuracies due to older heat pumps in the dataset operating differently from modern standards
Future Improvements
Potential enhancements include:
- Removing initial smoothing as the dataset grows
- Developing separate models for different seasons or climate zones
- Incorporating data from a wider geographical range
- Refining the model with more data from unusually cold periods
These improvements depend on continued high-quality data collection and model refinement.
Conclusion
This project has advanced my understanding of heat pump electricity consumption patterns. Key findings include:
- The crucial role of data quality and quantity, especially for extreme weather events
- The potential of hybrid models to overcome limitations of individual machine learning techniques
- The significant impact of human behaviour on heat pump energy consumption
Despite current limitations, this work provides a foundation for future improvements as more data becomes available.
All the training data, code, and outputs are available at https://github.com/Miteas/Heat_pump_electricity_load_model
Further Work
The next step will involve using this model’s output to develop electricity grid scenarios based on heat-pump COP.
Acknowledgements
I extend my gratitude to:
- Trystan Lea and Glyn Hudson at OpenEnergyMonitor for providing the dataset
- Open-Meteo for their free, open-source weather API
All training data, code, and outputs are available at https://github.com/Miteas/Heat_pump_electricity_load_model
Relevant websites: