dict_keys(['ExtractedText', 'linkToArchive', 'newsProbability', 'newsSource', 'tstamp'])
Insights from News and Public Coverage
124348
aliases | news | |
---|---|---|
companies | ||
Banco Comercial Português | [Banco Comercial Português, BCP] | [{'ExtractedText': 'DN 13 de Setembro de 200... |
Galp Energia | [Galp Energia, GALP] | [{'ExtractedText': 'RTP Galp reforça posição n... |
EDP | [EDP, Energias de Portugal, Electricidade de P... | [{'ExtractedText': 'DN-Sinteses Negocios 9 de ... |
Sonae | [Sonae, SON] | [{'ExtractedText': 'DN-Sinteses 5 de Março de ... |
Mota-Engil | [Mota-Engil, EGL] | [{'ExtractedText': 'RTP Lucro da Mota-Engil so... |
\(\ \)
The model pt_core_news_sm
from spacy
was used to extract:
“PER” - named entity that represents a person
“ORG” - named entity that represents a group or organization
“LOC” - named entity that indicates a specific place
“MISC” - named entity that doesn’t fit into the other categories
“NOUN” - part-of-speech tag that identifies a noun in the sentence
Additionally, the model was not filtering certain meaningless words and expressions, requiring the implementation of specific rules to address this issue.
The initial approach involved the use of pipeline
from transformers
. However the models consistently failed to load, leading to the exploration of an alternative solution in order to bypass this issue.
The solution involved using:
deep_translator
to overcome language restrictions.
vaderSentiment
and textblob
to extract the sentiment of the news.
aliases | news | keywords | |
---|---|---|---|
companies | |||
Banco Comercial Português | [Banco Comercial Português, BCP] | [{'ExtractedText': 'DN 13 de Setembro de 200... | {'03 Mar': {'count': 2.0, 'date': {'201503': 2... |
Galp Energia | [Galp Energia, GALP] | [{'ExtractedText': 'RTP Galp reforça posição n... | {'00h00': {'count': 7.0, 'date': {'201004': 1.... |
EDP | [EDP, Energias de Portugal, Electricidade de P... | [{'ExtractedText': 'DN-Sinteses Negocios 9 de ... | {'00h00': {'count': 4.0, 'date': {'201004': No... |
Sonae | [Sonae, SON] | [{'ExtractedText': 'DN-Sinteses 5 de Março de ... | {'00h00': {'count': 3.0, 'date': {'201004': No... |
Mota-Engil | [Mota-Engil, EGL] | [{'ExtractedText': 'RTP Lucro da Mota-Engil so... | {'15h30': {'count': 2.0, 'date': {'201509': 1.... |
\(\ \)
dict_keys(['ExtractedText', 'linkToArchive', 'newsNER', 'newsProbability', 'newsSentiment', 'newsSource', 'tstamp'])
dict_keys(['count', 'date', 'filter', 'news', 'sentiment', 'source', 'type', 'weight'])
count
: number of mentions of the keyword
date
: dictionary where the keys represent months (%Y%m
) and the values represent the count of keyword mentions in that month
filter
: value between 0 and 1 that represents the importance level of the keyword
news
: list of URLs linking to news where the keyword appears
sentiment
: average sentiment of the news that mention the keyword
source
: dictionary where the keys are news sources and the values represent the count of news from that source that mention the keyword
type
: set of values indicating the categories that the keyword belongs to: PER, ORG, LOC, MISC and/or NOUN
weight
: metric that balances factors such as the total number of mentions, the probability of the news being news and the importance of the keyword’s category
Even though the search was limited to 33 websites, most of them only have news from after 2020, and others don’t present any results at all.
A word cloud offers valuable insights into the company’s core business areas and highlights the key individuals associated with them.
By using the Alpha Vantage API to retrieve stock prices and combining it with an analysis of extracted keywords from news, valuable insights into market movements can be uncovered.
The correlation between news and stock prices was analyzed using metrics such as news volume, logarithm of volume, volume changes, and sentiment, compared to stock prices and price change measures.
Correlation coefficients show no significant link between news and stock prices, possibly due to the low trading volume in the Portuguese stock market.
To analyze how the keywords of different companies relate to each other, a correlation matrix was created, resulting in 49 787 136 cells.
Some of the standout relationships are presented, due to their high correlation and relevance.
Comp. 1 | Keyword 1 | Keyword 2 | Comp. 2 |
---|---|---|---|
BCP | Sonangol | petrolífera | BCP |
BCP | João Rendeiro | Tribunal da Relação de Lisboa | BCP |
BCP | HNA | CESE | GLP |
GLP | Ganhos da Galp | Adolfo Mesquita Nunes | GLP |
GLP | jogos | seleção | GLP |
GLP | Setgás | Setgás | EDP |
GLP | Petrobrás | Petrobrás | EGL |
EDP | Cajastur | Hidrocantábrico | EDP |
SON | Portucel | Suzano | SON |
EGL | EGF | privatização | EGL |
Choose a company: