This project was carried out as part of the TechLabs “Digital Shaper Program” in Münster (summer term 2021).

Abstract

As climate change causes drier and hotter summers the responsible usage of water becomes more and more important. By drawing on our Data Science skills with Python, we decided to tackle this problem by improving water conservation in private homes. Our project helps to make accurate forecasts regarding the water demand of garden plants by using data analytics methods. We programmed queries, using data from global meteorological datasets and satellite imagery on ground water. Users get feedback on different types of plants, enabling them to water their plants only when necessary.

Water about me🌱?

One of the main challenges of our time is to interact sustainably with our natural environment. This includes, among many other important topics, the need to reduce our waste of water. Nowadays, the ubiquitous access to if you need it makes it easy to forget that water is a limited resource.

With our project “Water about me🌱?”, we want to address part of the topic “wasting water”. Specifically, we want to help to tackle the challenge of over- or underwatering your outdoor plants, by giving clear advice on whether a plant needs to be watered today. We are particularly interested in outdoor plants as their management can be more challenging than the management of indoor flowers or vegetation. Climates in the house are rather stable and sufficiently managed by a routine, water use and need being the same every week. However, varying conditions outside, such as periods of little rainfall or intense sunshine, mean that every day is different and needs to be evaluated anew for water usage. We are interested in helping both the environment and you by helping to make sustainable decisions more easily.

We live in a time where our everyday life confronts us with a growing number of challenges as our world is more and more transforming into a digital one. Nowadays, having fewer points of contact with nature and especially with the cultivation of plants, one finds it more difficult to assess whether the daily water needs of garden plants are met. Therefore, some things like watering our plants might be left behind. Additionally, due to climate change, the incredibly rapid growth of cities and many other factors, every flourishing ecosystem despite its size becomes more and more important. Based on the daily rainfall, we can estimate whether the daily water needs of your outdoor plants are met or not. This allows you to water your garden accurately and not by simply trusting your gut feeling. Thereby, the indiscriminate use of sprinkler systems and others will become unnecessary. Furthermore, as a side effect, interacting with vegetation closest to our center of life leads to a deeper connection and awareness of what we so urgently need to protect.

At the start of the project stage, we brainstormed various ideas, often born from the diverse backgrounds of our team members (which made the ideation phase fascinating!). The question of ‘where do we get the data from’ is always something to keep in mind when you have just one semester to acquire new skills and develop a whole project. However, we were mostly driven by our motivation to solve a problem that is very much shaped by everyday experiences. Thinking about biology and botanic was a field unfamiliar to all of us and we were eager for a new challenge. After considering a few project ideas, some more local or “Münster”-based than others, we settled on trying to improve the upkeeping of private gardens or balconies.

While working on our project, we learned the valuable data science lesson that in a lot of times you invest more time resources into data acquisition and data cleaning than in creating fancy ML models. To create a recommendation of how much water is required in the garden, we needed data on different physical properties that are more or less readily accessible to be used in Python projects.

For weather properties, which includes sunshine hours and daily precipitation, we relied on the Meteostat API. The website provides very quick and easy access to weather data via a purpose-made python package without the need for us to register on the service.

However, this requires us to know the exact longitude and latitude of the user’s location. Asking them would be highly inconvenient, since both numbers contain quite a few digits and if you get Münster’s 52° latitude wrong by just one degree at 53°, you can already enjoy a North Sea vacation. Thankfully, there is always a python package to handle common problems, and in this case the “geopy library” provides a flexible tool to recognize all types of inputs, ranging from Münster to specifics such as Kinderhaus in Münster-Nord or regional spellings such as ‘Timbuktu’ or ‘Tombouctou’.

One of the best available reservoirs of water for plants lies obviously in the water that is already in the earth, accumulated as a mixture of previous days’ rainfall and groundwater. Accessing reliable data for this was more difficult, since some of the services providing this information distribute the data only commercially to agriculture clients. The European Space Agency (“ESA”) has the “Soil Moisture and Ocean Salinity” mission, relying on satellite imagery and infrared measurements to estimate soil moisture contents quite accurately. Nevertheless, while the data repository is freely accessible, it is meant for long-term climate studies, rather than for daily requests of weather information. To retrieve the data, we need to connect to their ftp server and navigate to the day’s folder. As the data is not yet aggregated but separated according to the pattern of satellite overflights, we decided to simply rely on a bulk download of all the files to work with them locally. Transforming the new-to-us NetCDF file format into pandas is straightforward. Consequently, without many difficulties we can find the closest geographical measuring data and extract the soil moisture content, which we can use in our water calculations.

The task to obtain reliable data regarding the water demand of outdoor plants turned out to be trickier than expected. One “challenge” is that one could use a scientific database, which includes thousands of plants identified by their Latin name. We wanted to limit the response options of the users of our project, as it did not seem realistic for us that a person, who tries to figure the water requirements of their plants, does know how to distinguish different types of roses for example. Therefore, we decided against such a huge data source. This left us with the challenge to find specific water requirements, for a rather unspecified object and unspecified environmental factors, such as the kind of soil the plant is growing in. Hence, we needed to approximate the required data. Firstly, we conducted general research on which kind of factors to consider for the water needs of outdoor plants. Next, we collected the information from several websites and clustered our data in the later used categories of plants. As the information was mainly given on a weekly or monthly basis, we also needed to calculate the daily amount of water needed. Resources used by us included interviews posted on the website of “WDR”, garden planning websites and garden-know-how websites, such as “gartnwissn.de”.

So, finally our project “Water about me” (previously known as “Vergieß mich nicht”) came to life. Users are prompted to enter their location, whose weather data are then gathered for the current day. If the user has a few seconds of patience, the soil moisture records are also pulled from the SMOS website (this involves a larger dataset, since the processing is done only locally). Using all available measurements, the program then calculates the water currently available to plants solely by nature. We then draw on the compiled records of plants’ minimum daily water requirements to identify those at risk and compile a recommendation list for the user by asking them to provide the type of vegetation in their garden. Various scenarios, such as missing data or wrong user inputs, are considered.

A big advantage of our approach and data sources lies in their universal coverage. Relying on Meteostat and the SMOS database allows us to theoretically query any location in the world (of course, some locations will have better data quality than others). ESA and the company behind Meteostat are also big enough that a certain data accuracy and long-term availability can be ensured.

A further enhancement of our project would include the implementation of a machine learning based prediction algorithm. This would enable users to take actions ahead of time, e.g., when going on vacation for a few days. We already built the foundation for the implementation but ultimately decided that a machine learning model would currently be out of scope of our project for several reasons. Firstly, as part of our group is working full-time and another part of our group is currently writing their master thesis, all of us were limited in our time resources. Secondly, we were all beginners regarding our data science skills. Implementing a good machine learning model would have required more skills than we have learned in our data science track. Therefore, we decided against pursuing the idea of a machine learning model as it would not be wise to implement a quickly but poorly created model. To prove the importance of our projects and its goals, it’s a necessity to include a wider variety of empirical data to create a more scientific foundation. To inform our users about the importance of greening within and around their center of living, needs to be an essential part of our idea, not only for humanity but also for our wildlife. Achieving these considerations, our calculations need to be based on a model that implies exact measurements and data, which needs to be developed considering our ambitions. Even though there is still a lot of work to do, our project is undeniably an easy way to raise awareness within the population regarding the daily waste of water and to contribute their part to a better future. We expect to simultaneously achieve short-term satisfaction of the users by using the query and long-term improvement for our environment.

The team

Eric Bauer Data Science Python

Henriette Schnelle Data Science Python

Marvin Stecker Data Science Python

Lena Eichhorn Data Science Python

Mentor

Tobias Spronk

Our community Members share their insights into the TechLabs Experience