Environmental News Organizer

Thomas Fountzoulas, Celin Tausch, Kris Otte, Lotte Skou

Introduction

The environmental problems have become a global blight, showing day by day the negative effects of human activity on the planet. However, despite the significance of the problem, there are still countries in which the environmental issues are being marginalized. Even in our communities, that belong to the first world, other issues dominate and often overshadow the importance of taking care of our environment every day.

Our motive is to confront the aforementioned situation and shed light on specific environmental issues in Europe. We would like to create a user-friendly interface in the form of an interactive map that displays information about the most “popular” environmental problems in each European country. Popularity will be defined as the environmental issues that appears most frequently in the local news stream of different countries. Moreover, we will provide easy access to some of the relevant newspaper articles identified. In this fashion, on the map, the ‘’green status’’ of each country will be reflected, providing important information and increase awareness

The main challenges that we have faced throughout the process were threefold. In the following we present a brief summary of these:

  • As a first, we had to decide the geographical scope of our project regarding the countries for which we will provide the relevant information. Although our initial aim was to stress the environmental issues globally, due to time- and technical constraints, we concluded to narrow down the scope to the European countries.
  • The second challenge we confronted with, was the identification and accurate translation of the keywords that will define our themes-categories. We chose to create 7 themes including: air, ocean, co2 emission, food industry, forests, fashion and transportation. For each theme, we specified keywords that will help us to retrieve the corresponding data. We decided to further narrow down the country pool selecting the most populated European countries together with countries represented by members of our group. We also used online software and people from our network in order to help us with the precise translation of the themes and keywords. The 9 countries are: England, France, Germany, Spain, Italy, Poland, Russia, Denmark and Greece.
  • Lastly, we had extensive discussions regarding the data collection method we would like to follow. After considering the advantages and disadvantages of each method, for reason that are been discussed below, we concluded to work with APIs.

Methodology & Results

Early on in the process, we had to make a decision on how we wanted to mine relevant data for the project. The choice stood between data scraping and using APIs. The main advantage of working with APIs is that most of the time, they are less complex and faster in terms of collecting and cleaning data compared to data scraping. On the other hand, we acknowledge that through data scraping, data can be mined and tailored without any restrictions. This huge advantage of the data scraping method was the disadvantage of API´s. Thus, we knew that it could potentially be difficult to find an API that was built exactly for the scope and purpose of our project. After thorough consideration, we ended up choosing the API approach. We realized, that learning how to use data scraping to a satisfactory degree would take considerable time, and therefore it could potentially take a long time before we even had any data to work with. In addition, our group suffered from the fact that two members from the Data Science track decided to terminate their Techlabs journey early on. Therefore, mastering data scraping seemed even more difficult.

As an initial step, we decided to scope down the project and focus on the countries in Europe. This decision was made in respect of technicality and time. More specifically, we identified an API well suited for the purposes of our project called the News API. However, a disadvantage is that the specific API does not have a global covering. At this point, we became aware that in order to make a platform that covers articles from the entire world, we would need to identify additional API´s. Given the time spent on finding the News API and the effort to make it work, we realized that we did not have the time and the expertise to pull it through. Therefore, we decided that it would be more valuable to invest all of our time and effort in fewer countries, and try to make the platform work as good as possible within the predefined time frame. An important rationale for this approach was, that we would always be able “to scale up” relatively easily if we ended up having excess time.

We chose to work with our database by using Python as our primary language. This was recommended by our mentors from TechLabs, largely because Python gives an edge in working with APIs easier and with less complexity. Thus, considering our lack of coding experience at the time, we choose to follow the recommendation and go with the easier way.

The News API

As it has already been mentioned, the data for our project is provided by the News API. The News API allows you to search for articles from over 50,000.00 news sources and blogs around the world. It gives the option to filter your search by various request parameters, such as country, language, specific news source, etc. It also allows for a rather advanced searching on keywords, which is illustrated in the following screenshot:

However, the request parameters are dependent on which end-point is chosen. There are two main end-points. The first one allows you to search in the headlines of the articles, and the second one allows you to search in the main body of the articles. The possible request parameters for each of the end-points are illustrated by the following screenshots:

This setup in the API posed a major challenge for our project. As the screenshots display, it is only possible to search for specific countries by using the headline end-point, and not the article end-point. The main issue is that the headline of the article is not adequate enough in terms of searching for relevant articles, since the amount of possible relevant key-words in a headline is relatively small. To tackle this problem, we had to use the article end-point, but without being able to select specific European countries to filter our search. To solve this issue we utilized the “domains” parameter in the article end-point. More specifically, we identified the most popular local news sources for each country. In this way, we have made sure that the identified articles relate to each of the countries included in our scope.

The News API also allows you to filter your search results in many different ways, illustrated in the following screenshots:

Relevant for the scope of this project is the “totalResults”, which shows how many articles exist on each of the 7 keyword compilations (see next screenshot). This number is indicative of which of the 7 environmental themes are most “important” and “relevant” in each country. Based on these numbers, we rank the 7 themes from most to least important. The “title” filter is used to showcase titles of the top 5 articles of each environmental theme in each of the 9 countries, giving the option to the user to do further research.

The News API presents the data in JSON format. An important limitation to the developer’s account is that it only allows you to retrieve articles that are one month old. This forced us to limit the scope of our project further, so that our platform only showcases the most recent trends in the environmental news stream. Also, with the free developer account you are allowed to make 500 requests from the API monthly. Fortunately, we created multiple accounts, so this was not a barrier.

Creating the search strings for the API

We have created collections of relevant keywords for each environmental theme, which we have compiled into request strings for the News API. The following screenshot illustrates how the relevant keywords are compiled into search strings for each of the 7 environmental themes for England:

The next step was to translate the keywords in the themes into 9 different languages. For this challenge we used Google Translate, which might have affected the validity of the search results.

MongoDB atlas

In order to store the data we have retrieved we created a database on MongoDB Atlas. This cloud-based database assisted the back-end people to easily access the database from their own computers. After retrieving the data from the News API, we have iterated data into 63 arrays of key-value pairs (9 countries X 7 themes). The 63 arrays were afterwards inserted into one big array, which was then passed into the database. This process summarized in the following screenshots:

The following screenshot shows the structure of the documents in the database:

After inserting the data to the database we have made sure that this data can be updated on an ongoing basis with the new data that is retrieved from the news API. This code is illustrated in the following screenshot:

Google Cloud Scheduler

In order to automate the process of first retrieving the newest data from the News API, and then updating it in the database, we are using the Google Cloud Sceduler service, which can be set to automatically run the necessary scripts at a certain frequency. We have chosen to update the scripts, and thereby data on our platform, once a week.

The main function of our back-end is the connection of the front-end that displays our results to the database that stores the relevant data derived from our API. However, this also reflected one of the major challenges we faced within our project journey. Our front-end was mainly coded in html, jquery and javascript while we used Python to get and store the data. To bridge these language differences we used a micro web framework called Flask that is written in Python and was suggested to us by the Techlabs team. Before we implemented the framework, we defined our RESTful routes to give a structure to our back-end interface and manage requests efficiently. After that, we integrated the routes into the flask framework which then allowed us to link the front-end structure to our data sets.

Our starting point of creating the front-end was the map design. The easiest way of doing this was by finding a SVG-file-picture of Europe. A SVG-file-picture contains code for each country and works in HTML. After pasting all of the SVG-code into the HTML-file, we were able to alter it and started to adjust it for our project purpose. In order to make the map interactive, we added a so-called “g class”. This class divides each country into its own element. By doing this, we were able to add a mouse hover effect to each country. The hover-effect itself had to be applied and styled in CSS. After making all this interactive and assuring ourselves that we had a functional map, it was time to add the background, style the interface with colors, add text, etc.. For this part, we mainly relied on Bootstrap.

Our final front-end web development is satisfying to us, we managed to keep it minimalistic and to add the elements we wanted to. If we had had more time, we would have liked to add other features to our application. We talked about adding a click-option to the countries, which would take you to another website and show a wordcloud showing more than just the five more common topics.

Conclusion and learning points

After several ups and downs, fast progress and also some setbacks, we were finally able to build a working online news organizer that showed the most relevant environmental news from nine selected European countries. Although we had to narrow down our project scope significantly, we still met our goal of creating our own first web app, learned Python and successfully retrieved data via API for the first time. Furthermore, we realised a project that highlights and increases awareness for some of the most prevalent and severe environmental problems of our time.

Our community Members share their insights into the TechLabs Experience