Using Web Scraping to see how companies perform on ESG - Part I

Caption: Advancing environmental, social, and governance investing / Deloitte Insights

As a society, we are looking to tackle the challenges posed to us by climate change through both mitigation and adaption. A further two trends are playing an increasingly important role in confronting climate change: socially responsible investing and the notion of Environmental, Social and Governance (ESG) principles. But why?

As argued by Harald Walkate, Head of ESG for Natixis Investment Managers, “the answer lies in greater clarity about what investors… want to achieve…” Today, more than ever, investors are looking to set practical targets for both financial returns and societal impact. Deloitte Center for Financial Services (DCFS) expects client demand to drive ESG-mandated assets to comprise half of all professionally managed investments in the United States by 2025. This illustrates that there is a real drive to converge both profits and purpose through socially responsible investing.

This blog will share some ways that one can identify and look up a company’s ESG score. I will explore this through two key tools in coding**, web scraping and API**. A basic understanding of Python programming and website setup would be good but it is not essential.

SETUP

What is web scraping ?

Web scraping is the process of automatically collecting structured website data. It is used to extract data from public websites which do not have an Application Programming Interface (API) from public websites that or have limited access to the data. For definition of API, please go to this LINK. The web scraping process can be summarised into 4 steps.

Identify the target website
Use a scraper to request getting the code of the page
Use a locator to find the data from the code
Save in a data structured format

The code of the page is the information that you can extract from. It is written in HTML format which is a standard markup language for creating webpages. An HTML page structure consists of both HEAD and BODY , with data extracted from the BODY. (figure 2) Please go to this LINK for further documentation.

Source: HTML Introduction / W3School

The next step is extract the HTML code. Scraper is a software that helps to extract all code from a web page. The most common web scrapers are Selenium and ****BeautifulSoup. Personally, I prefer using Selenium to BeautifulSoup because of its ability to extract interactive content.

Selenium scapes content after loading the full page. It can scrape content found in Javascript or other interactive items. A drawback is that efficiency is lower since it requires significant amount of processing power. However, Selenium is still the go-to tool to scrape a few webpages.

The details on how to use Selenium will be explained in the coming section.

The final step is to locate the element and extract the code. We need to use element locator to locate the element on a webpage. The most convenient way is to right click and select "Inspect" in Google Chrome to see the web page structure. After that, we point the cursor to the areas that is the target. This will then show you the tag to extract. Take this twitter page as an example. We target the text on the tweet and using "Inspect", we are told the attribute, class="css-901oao css-16my406 r-poiln3 r-bcqeeo r-qvutc0", that we need to search in the webpage through Selenium.