Using Web Scraping to see how companies perform on ESG - Part II

Caption: What is ESG and why is it important ? / Carbonclick

In Part I, we demonstrated using web scraping to extract stock codes from S&P index website. The next step is to input the stock codes into an Environment, Social, Governance (ESG) API to observe which industries and companies are relatively outperformed in ESG areas. Here are the steps:

Completed in Part I

Identify the website to extract companies' stock code
Use SELENIUM to make a request to get the HTML of the page
Use a div locator to extract the stock code from each index page
Save in a python list format and transform into DataFrame

In this blog, we focus on

Searching the ESG score by inputting the stocks code in an ESG API

5.1. Mapping the stock codes with stock exchange code to fulfil ESG API format

5.2. Create a dataframe with the search results from ESG API

5.3. Visualize, analyze and elaborate on the findings

5.1 Mapping the stock codes and stock exchange code to fulfil ESG API format

The ESG API requires one to input the stock codes and stock exchange code to search for the ESG score. We download information from the ticker website and map it with our dataframe. We will further explain how to use Pandas in a separate blog.

Download the mapping csv file: Here

# map the sector to industry
sector_map = ['CommIT', 'Consumer', 'Finance', 'HealthCare', 
							'Industrials', 'InfoTech', 'RealEstate']
d = {'Sector': df_sector_ticker['Sector'].unique(), 'SectorMap': sector_map}
df_sector_map = pd.DataFrame(data=d)
df_sector_ticker = df_sector_ticker.merge(df_sector_map, on='Sector', how='left')

# map the ticker to stock exchange 
# fit ESG API format that before feeding in ESG API
df_ticker_screener = pd.read_csv("StocksScreener.csv")
df_sector_ticker = df_sector_ticker.merge(df_ticker_screener, 
on='Ticker', how='left')
df_sector_ticker['APITicker'] = df_sector_ticker['StockExchange'].str.lower() + ":" + df_sector_ticker['Ticker'].str.lower()

5.2 Create dataframe with the search results from ESG API

The next step is to spilt the newly formatted stock codes into a group of 3 chunks to search for the ESG API. The ESG API will stop returning results if we feed the chunks too quickly. Therefore, we search the stock codes every 5 minutes by using sleep function. We save each company's ESG data as a json format and compile it all into a list.

# combine every 3 tickers into 1 string -> feed and search in ESG API
tickers_strs = []
t_strs = ''
for t_count, t_value in enumerate(df_sector_ticker['APITicker'].unique()):
    t_strs = t_strs + t_value.lower() + ','
    if not t_count % 3 or t_count + 1 == len(df_sector_ticker['APITicker'].unique()):
        tickers_strs.append(t_strs[:-1])
        t_strs = ''

# setting up ESG API with token
api_code = 'CREATE YOUR OWN TOKEN'
esg_json = []
for t in tickers_strs:
    url = "<https://tf689y3hbj.execute-api.us-east-1.amazonaws.com/prod/authorization/search?q=>" + t + "&token=" + api_code
    r = requests.get(url)
    for j in r.json():
        try:
            print(j['company_name'])
            esg_json.append(j)
        # not every company has data in the API
				except:
            print("Error")
            continue
    # need to sleep for 5 minutes, otherwise it will stop returing request
    time.sleep(300)

In the ESG data there are environment, social and governance scores that can provide us with a high-level sense of ESG development. A dataframe to store the score information and deliver the analysis need to then be created.

# setup ESG Score dataframe
df_ticker_info = pd.DataFrame([], columns=['Company_name', 'Ticker', 'Environment_grade', 'Social_grade', 'Governance_grade',
                                            'Total_grade', 'Total_level', 'Last_processing_date', 'Environment_score', 'Social_score', 
                                            'Governance_score', 'Total_score'])
for e in esg_json:
    new_row = {'Company_name': e['company_name'],
               'Ticker': e['stock_symbol'],
               'Environment_grade': e['environment_grade'],
               'Social_grade': e['social_grade'],
               'Governance_grade': e['social_grade'],
               'Total_grade': e['total_grade'],
               'Total_level': e['total_level'],
               'Last_processing_date': e['last_processing_date'],
               'Environment_score': e['environment_score'],
               'Social_score': e['social_score'],
               'Governance_score': e['governance_score'],
               'Total_score': e['total']}
    df_ticker_info = df_ticker_info.append(new_row, ignore_index=True)

df_sector_ticker = df_sector_ticker.merge(df_ticker_info, on='Ticker', how='left')

# remove tickers that cannot search ESG score from the API
df_sector_ticker = df_sector_ticker.dropna()
df_sector_ticker.reset_index(drop=True, inplace=True)

# convert ESG Score datatype from string to int64
df_sector_ticker['Environment_score'] = df_sector_ticker['Environment_score'].astype('int64')
df_sector_ticker['Social_score'] = df_sector_ticker['Social_score'].astype('int64')
df_sector_ticker['Governance_score'] = df_sector_ticker['Governance_score'].astype('int64')
df_sector_ticker['Total_score'] = df_sector_ticker['Total_score'].astype('int64')