Media Analysis of PSI-20 Companies

Insights from News and Public Coverage

Hugo Veríssimo

124348

data04.parquet

aliases news
companies
Banco Comercial Português [Banco Comercial Português, BCP] [{'ExtractedText': 'DN   13 de Setembro de 200...
Galp Energia [Galp Energia, GALP] [{'ExtractedText': 'RTP Galp reforça posição n...
EDP [EDP, Energias de Portugal, Electricidade de P... [{'ExtractedText': 'DN-Sinteses Negocios 9 de ...
Sonae [Sonae, SON] [{'ExtractedText': 'DN-Sinteses 5 de Março de ...
Mota-Engil [Mota-Engil, EGL] [{'ExtractedText': 'RTP Lucro da Mota-Engil so...

\(\ \)

pd.read_parquet("data04.parquet").iloc[0,1][0].keys()
dict_keys(['ExtractedText', 'linkToArchive', 'newsProbability', 'newsSource', 'tstamp'])

NER and Sentiment Analysis

NER

  • The model pt_core_news_sm from spacy was used to extract:

    • “PER” - named entity that represents a person

    • “ORG” - named entity that represents a group or organization

    • “LOC” - named entity that indicates a specific place

    • “MISC” - named entity that doesn’t fit into the other categories

    • “NOUN” - part-of-speech tag that identifies a noun in the sentence

  • Additionally, the model was not filtering certain meaningless words and expressions, requiring the implementation of specific rules to address this issue.

Sentiment Analysis

  • The initial approach involved the use of pipeline from transformers. However the models consistently failed to load, leading to the exploration of an alternative solution in order to bypass this issue.

  • The solution involved using:

    • deep_translator to overcome language restrictions.

    • vaderSentiment and textblob to extract the sentiment of the news.

data05.parquet

aliases news keywords
companies
Banco Comercial Português [Banco Comercial Português, BCP] [{'ExtractedText': 'DN   13 de Setembro de 200... {'03 Mar': {'count': 2.0, 'date': {'201503': 2...
Galp Energia [Galp Energia, GALP] [{'ExtractedText': 'RTP Galp reforça posição n... {'00h00': {'count': 7.0, 'date': {'201004': 1....
EDP [EDP, Energias de Portugal, Electricidade de P... [{'ExtractedText': 'DN-Sinteses Negocios 9 de ... {'00h00': {'count': 4.0, 'date': {'201004': No...
Sonae [Sonae, SON] [{'ExtractedText': 'DN-Sinteses 5 de Março de ... {'00h00': {'count': 3.0, 'date': {'201004': No...
Mota-Engil [Mota-Engil, EGL] [{'ExtractedText': 'RTP Lucro da Mota-Engil so... {'15h30': {'count': 2.0, 'date': {'201509': 1....

\(\ \)

pd.read_parquet("data05.parquet")["news"].iloc[0][0].keys()
dict_keys(['ExtractedText', 'linkToArchive', 'newsNER', 'newsProbability', 'newsSentiment', 'newsSource', 'tstamp'])

data05.parquet

pd.read_parquet("data05.parquet")["keywords"].iloc[0]['03 Mar'].keys()
dict_keys(['count', 'date', 'filter', 'news', 'sentiment', 'source', 'type', 'weight'])
 
  • count: number of mentions of the keyword

  • date: dictionary where the keys represent months (%Y%m) and the values represent the count of keyword mentions in that month

  • filter: value between 0 and 1 that represents the importance level of the keyword

  • news: list of URLs linking to news where the keyword appears

data05.parquet

  • sentiment: average sentiment of the news that mention the keyword

  • source: dictionary where the keys are news sources and the values represent the count of news from that source that mention the keyword

  • type: set of values indicating the categories that the keyword belongs to: PER, ORG, LOC, MISC and/or NOUN

  • weight: metric that balances factors such as the total number of mentions, the probability of the news being news and the importance of the keyword’s category

Visualizations

News Sources

noticias por fonte

Even though the search was limited to 33 websites, most of them only have news from after 2020, and others don’t present any results at all.

Word Cloud

wordcloud bcp

A word cloud offers valuable insights into the company’s core business areas and highlights the key individuals associated with them.

Word Cloud

wordcloud galp wordcloud galp wordcloud galp wordcloud galp

Stock Price and News Analysis

By using the Alpha Vantage API to retrieve stock prices and combining it with an analysis of extracted keywords from news, valuable insights into market movements can be uncovered.

Stock Price and News Analysis

Stock Price and News Analysis

Image 1 Image 2
Visualizations representing data from Galp Energia (GALP.LS).
  • The correlation between news and stock prices was analyzed using metrics such as news volume, logarithm of volume, volume changes, and sentiment, compared to stock prices and price change measures.

  • Correlation coefficients show no significant link between news and stock prices, possibly due to the low trading volume in the Portuguese stock market.

Keywords Interaction Analysis

matrix corr keywords

To analyze how the keywords of different companies relate to each other, a correlation matrix was created, resulting in 49 787 136 cells.

Keywords Interaction Analysis

Some of the standout relationships are presented, due to their high correlation and relevance.

Comp. 1 Keyword 1 Keyword 2 Comp. 2
BCP Sonangol petrolífera BCP
BCP João Rendeiro Tribunal da Relação de Lisboa BCP
BCP HNA CESE GLP
GLP Ganhos da Galp Adolfo Mesquita Nunes GLP
GLP jogos seleção GLP
GLP Setgás Setgás EDP
GLP Petrobrás Petrobrás EGL
EDP Cajastur Hidrocantábrico EDP
SON Portucel Suzano SON
EGL EGF privatização EGL

Company Relationship Map

Choose a company:

logo bcp logo galp logo edp logo sonae logo motaengil