Data Extraction for preselected commodities portfolio
Author
Rodrigo Hermont Ozon, Érick Oliveira Rodrigues
Published
October 7, 2024
Abstract
This small document have the goal to share the time series extraction and the two basic features building, like price returns and their conditional variance…
C:\Users\rodri\AppData\Local\Programs\Python\PYTHON~3\Lib\site-packages\yahooquery\utils\__init__.py:1470: FutureWarning: 'S' is deprecated and will be removed in a future version. Please use 's' instead of 'S'.
C:\Users\rodri\AppData\Local\Programs\Python\PYTHON~3\Lib\site-packages\yahooquery\utils\__init__.py:1470: FutureWarning: 'S' is deprecated and will be removed in a future version. Please use 's' instead of 'S'.
C:\Users\rodri\AppData\Local\Programs\Python\PYTHON~3\Lib\site-packages\yahooquery\utils\__init__.py:1470: FutureWarning: 'S' is deprecated and will be removed in a future version. Please use 's' instead of 'S'.
C:\Users\rodri\AppData\Local\Programs\Python\PYTHON~3\Lib\site-packages\yahooquery\utils\__init__.py:1470: FutureWarning: 'S' is deprecated and will be removed in a future version. Please use 's' instead of 'S'.
C:\Users\rodri\AppData\Local\Programs\Python\PYTHON~3\Lib\site-packages\yahooquery\utils\__init__.py:1470: FutureWarning: 'S' is deprecated and will be removed in a future version. Please use 's' instead of 'S'.
C:\Users\rodri\AppData\Local\Programs\Python\PYTHON~3\Lib\site-packages\yahooquery\utils\__init__.py:1470: FutureWarning: 'S' is deprecated and will be removed in a future version. Please use 's' instead of 'S'.
C:\Users\rodri\AppData\Local\Programs\Python\PYTHON~3\Lib\site-packages\yahooquery\utils\__init__.py:1470: FutureWarning: 'S' is deprecated and will be removed in a future version. Please use 's' instead of 'S'.
C:\Users\rodri\AppData\Local\Programs\Python\PYTHON~3\Lib\site-packages\yahooquery\utils\__init__.py:1470: FutureWarning: 'S' is deprecated and will be removed in a future version. Please use 's' instead of 'S'.
Code
# Check if 'adjclose' exists in the returned dataifisinstance(portfolio_prices, pd.DataFrame) and'adjclose'in portfolio_prices.columns: portfolio_prices = portfolio_prices[['adjclose']].reset_index()else:raiseKeyError("The data fetched does not contain 'adjclose'. Check Yahoo Finance for availability.")# Pivot the DataFrame to have a similar format to yfinance outputportfolio_prices = portfolio_prices.pivot(index='date', columns='symbol', values='adjclose')portfolio_prices.index = pd.to_datetime(portfolio_prices.index)portfolio_prices.dropna(inplace=True)# Ensure the index is properly namedportfolio_prices.index.name ="Date"# Renaming columns for better readabilityportfolio_prices.columns = ["corn_fut","wheat_fut","KCWheat_fut","rice_fut","Feeder_Cattle","soymeal_fut","soyF_fut","soybeans_fut"]
Showing the prices time series side by side: (data in level)
# Preparar os dados no formato long para os log-retornosportfolio_log_returns_long = portfolio_log_returns.reset_index().melt(id_vars='Date', var_name='Commodity', value_name='Log Return')def plot_log_returns_with_ggplot(data, title, ylabel, background='white', fig_height=10, fig_width=10):# Cria o gráfico usando plotnine (ggplot) p = (ggplot(data, aes(x='Date', y='Log Return', color='Commodity')) + geom_line() + facet_wrap('~Commodity', ncol=1, scales='free_y') +# Um gráfico em cima do outro labs(title=title, x='Date', y=ylabel) + theme_minimal() +# Define o tema minimalista com fundo branco theme( figure_size=(fig_width, fig_height), # Ajuste da altura e largura da figura panel_background=element_text(fill=background), plot_background=element_text(fill=background), axis_text_x=element_text(rotation=45, hjust=1), subplots_adjust={'wspace': 0.25, 'hspace': 0.5} # Ajuste do espaçamento entre os gráficos ))return pp_log_returns = plot_log_returns_with_ggplot(portfolio_log_returns_long, 'Log Returns of Commodities Over Time', 'Log Return', background='white', fig_height=12, fig_width=8)# Exibir o gráficop_log_returns
<plotnine.ggplot.ggplot object at 0x000001CBE4804250>
As risk measure, we use the conditional variances (volatilities), to deal better with day by day of the prices log-returns.
The GARCH(1,1) model with an asymmetric Student-t distribution is not directly available in most Python libraries. However, we can still use a GARCH(1,1) model with a standard Student-t distribution to estimate the conditional variance. The GARCH(1,1) model is represented as follows:
\(\epsilon_t\) is the error term, modeled as conditional on past information.
\(\sigma_t^2\) is the conditional variance at time \(t\).
\(\omega, \alpha, \beta\) are the parameters to be estimated, with \(\omega > 0, \alpha \geq 0, \beta \geq 0\).
\(z_t\) follows a Student-t distribution with \(\nu\) degrees of freedom to capture the heavy tails observed in financial returns.
Code
# Initialize an empty DataFrame to store conditional variancescond_variances = pd.DataFrame(index=portfolio_log_returns.index, columns=portfolio_log_returns.columns)# Loop through each commodity's log-returns and fit a GARCH(1,1) modelfor col in portfolio_log_returns.columns:# Fit a GARCH(1,1) model with a Student-t distribution for each series of log returns model = arch_model(portfolio_log_returns[col], vol='Garch', p=1, q=1, dist='t') res = model.fit(disp='off')# Extract conditional variances and store them in the DataFrame cond_variances[col] = res.conditional_volatility# Show the first few rows of the conditional variances DataFramecond_variances.head()
# Preparar os dados no formato long para as variâncias condicionaiscond_variances_long = cond_variances.reset_index().melt(id_vars='Date', var_name='Commodity', value_name='Conditional Variance')# Função para criar o gráfico com fundo branco ou transparente e ajustar o tamanho da figuradef plot_cond_variances_with_ggplot(data, title, ylabel, background='white', fig_height=10, fig_width=10):# Cria o gráfico usando plotnine (ggplot) p = (ggplot(data, aes(x='Date', y='Conditional Variance', color='Commodity')) + geom_line() + facet_wrap('~Commodity', ncol=1, scales='free_y') +# Um gráfico em cima do outro labs(title=title, x='Date', y=ylabel) + theme_minimal() +# Define o tema minimalista com fundo branco theme( figure_size=(fig_width, fig_height), # Ajuste da altura e largura da figura panel_background=element_text(fill=background), plot_background=element_text(fill=background), axis_text_x=element_text(rotation=45, hjust=1), subplots_adjust={'wspace': 0.25, 'hspace': 0.5} # Ajuste do espaçamento entre os gráficos ))return p# Exemplo de uso para as variâncias condicionais das commoditiesp_cond_variances = plot_cond_variances_with_ggplot(cond_variances_long, 'Conditional Variances Over Time (GARCH(1,1))', 'Conditional Variance', background='white', fig_height=12, fig_width=8)p_cond_variances
<plotnine.ggplot.ggplot object at 0x000001CBE4262CD0>
portfolioPrices_df <-as_tibble(portfolioPrices, rownames ="date")portfolioPrices_df$date <-ymd(portfolioPrices_df$date)portfolioReturs_df <-na.omit( ROC( portfolioPrices ), type ="discrete" ) |>as_tibble(rownames ="date")portfolioReturs_df$date <-ymd(portfolioReturs_df$date)colnames(portfolioReturs_df) <-c("date","ret_corn_fut","ret_wheat_fut","ret_KCWheat_fut","ret_rice_fut","ret_Feeder_Cattle","ret_soymeal_fut","ret_soyF_fut","ret_soybeans_fut")# Remover a coluna com nome NAportfolioReturs_df <- portfolioReturs_df[, !is.na(colnames(portfolioReturs_df))]# Verificar novamente os nomes das colunas para garantir que estão corretoscolnames(portfolioReturs_df)
\(\epsilon_t\) is the error term, modeled as conditional on past information.
\(\sigma_t^2\) is the conditional variance at time \(t\).
\(\omega, \alpha, \beta\) are the parameters to be estimated, with \(\omega > 0, \alpha \geq 0, \beta \geq 0\).
\(z_t\) follows an asymmetric Student-t distribution with \(\nu\) degrees of freedom to better capture the heavy tails and skewness observed in financial returns.
Code
# Load necessary packageslibrary(rugarch)# Define the GARCH(1,1) model specification with Student-t distributionspec <-ugarchspec(variance.model =list(model ="sGARCH", garchOrder =c(1, 1)),mean.model =list(armaOrder =c(0, 0), include.mean =TRUE),distribution.model ="std"# Using Student-t distribution)# Estimate the model for each asset in the portfolio and extract conditional variancesgarch_models <-list()conditional_variances <-list()for (i incolnames(portfolioReturs)) { garch_models[[i]] <-ugarchfit(spec, data = portfolioReturs[[i]]) conditional_variances[[i]] <-sigma(garch_models[[i]])^2}# Convert conditional variances list to a data frameconditional_variances_df <-do.call(cbind, conditional_variances) %>%as.data.frame() %>%mutate(time =seq_along(conditional_variances[[1]]))colnames(conditional_variances_df) <-c("cond_var_corn_fut","cond_var_wheat_fut","cond_var_KCWheat_fut","cond_var_rice_fut","cond_var_Feeder_Cattle","cond_var_soymeal_fut","cond_var_soyF_fut","cond_var_soybeans_fut","time")# Reshape data for plottingconditional_variances_long <- conditional_variances_df %>%pivot_longer(!time, names_to ="Variables", values_to ="Value")
And the plot of the conditional variance (risk):
Code
conditional_variances_long |>group_by(Variables) |>plot_time_series( time, Value,.interactive = F, # Change for TRUE for better visualization.facet_ncol =2,.smooth =FALSE ) +theme(strip.background =element_rect(fill ="white", colour ="white") )
References
Gujarati, D., N. (2004) Basic Econometrics, fourth edition, The McGraw−Hill Companies
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate Data Analysis. Pearson.
Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3. Accessed on oct 2023.
Code
# Total timing to compile this Quarto documentend_time = datetime.now()time_diff = end_time - start_timeprint(f"Total Quarto document compiling time: {time_diff}")
Total Quarto document compiling time: 0:00:56.102477