Automating Commodity‑Market Intelligence with ChatGPT  Plus and R:

Building a Daily News Agent for Corn, Soybean & Other Agricultural Assets

Author

Rodrigo Hermont Ozon and Ricardo Vianna

Abstract

This article documents how, with the help of ChatGPT Plus (USD 20 / month) as a coding co‑pilot, we designed and implemented an agent that fetches, filters, and summarises the latest news on corn, soybean, and other commodities that compose my PhD research portfolio. The workflow is built entirely in R and Quarto: it pulls headlines from a public news API, uses OpenAI’s chat‑completion endpoint for abstractive summarisation, and renders a self‑contained HTML report that is automatically updated every morning at 07:00. The step‑by‑step recipe—covering API interaction, prompt engineering, scheduling with cronR, and reproducible publishing—should serve as a template for researchers and practitioners who need continuous, low‑maintenance market intelligence.



1  Introduction

Keeping track of news that can move commodity prices is critical for risk management, forecasting, and portfolio re‑balancing. Yet manually scanning dozens of sources is time‑consuming and error‑prone. With the advent of large language models (LLMs) delivered as Software‑as‑a‑Service—such as ChatGPT Plus at USD 20 per month—we can now prototype intelligent news agents in hours, not weeks. This paper recounts the exact conversation‑driven workflow we followed inside ChatGPT to:

  1. Specify functional requirements (sources, frequency, output format).

  2. Generate and refine R code that hits a news API, parses JSON, and stores headlines in tidy form.

  3. Craft prompts that steer OpenAI’s API to summarise clusters of related articles in plain English.

  4. Schedule the pipeline on a headless Linux box so a fresh HTML report lands in my inbox—and on our Quarto site—every weekday at 07:00.

All code is reproducible and free of proprietary dependencies except for the API keys themselves.

2  System Overview

flowchart LR
  A[cronR 07:00] --> B[R script<br/>fetch_news.R]
  B --> C[News API<br/>JSON endpoint]
  C --> B
  B --> D[openai::create_chat_completion()]
  D --> B
  B --> E[Quarto render<br/>daily_report.qmd]
  E --> F[HTML file<br/>/docs/news.html]
  F --> G[GitHub Pages<br/>& e‑mail alert]

The pipeline has three external touchpoints:

  • News API – We use https://newsapi.org for simplicity; any REST aggregator that supports keyword search and ISO 8601 dates will work.

  • OpenAI API – Handles abstractive summarisation so that end‑users read concise bullets instead of raw headlines.

  • GitHub Pages (optional) – Hosts the rendered HTML; an e‑mail task runs in parallel to push the file as an attachment.

3  Implementation Details

3.1  Environment & Secrets

Create a project‑level .Renviron (not committed to VCS):

Code
NEWSAPI_KEY   = "YOUR_NEWSAPI_KEY"
OPENAI_API_KEY= "YOUR_OPENAI_KEY"

Then restart R or call readRenviron(“~/.Renviron”).

3.2  Helper Functions

Code
library(httr2)     # next‑gen HTTP for R
library(jsonlite)
library(dplyr)
library(tidyr)
library(lubridate)
library(purrr)
library(glue)
library(openai)    # remotes::install_github("rOpenAI/openai")

# Small wrapper ----------------------------------------------------------
news_endpoint <- "https://newsapi.org/v2/everything"

fetch_news <- function(keyword,
                       from   = Sys.Date() - 1,
                       to     = Sys.Date(),
                       page_size = 100) {

  resp <- request(news_endpoint) |>
    req_url_query(
      q           = keyword,
      language    = "en",
      sortBy      = "publishedAt",
      from        = as.character(from),
      to          = as.character(to),
      pageSize    = page_size,
      apiKey      = Sys.getenv("NEWSAPI_KEY")
    ) |>
    req_perform()

  out <- resp |>
    resp_body_json() |>
    purrr::pluck("articles") |>
    tibble::as_tibble() |>
    mutate(
      keyword     = keyword,
      publishedAt = ymd_hms(publishedAt, tz = "UTC")
    )
  return(out)
}

# Summarise a tibble of articles with OpenAI ------------------------------
summarise_cluster <- function(df, model = "gpt-4o-mini") {

  prompt <- glue("
You are a financial analyst. Produce a 5‑bullet summary of the following
{nrow(df)} news headlines about **{unique(df$keyword)}** published in the last 24 h.
Focus on market‑moving information (prices, policy, weather, supply‑demand).
Write in clear, jargon‑free English. 120 words max.

HEADLINES:
{paste0('- ', df$title, collapse = '\n')}
")

  res <- openai::create_chat_completion(
    model  = model,
    messages = list(
      list(role = "system",
           content = "You are an expert commodity market analyst."),
      list(role = "user", content = prompt)
    ),
    temperature = 0.3
  )

  summary <- res$choices[[1]]$message$content
  tibble(keyword = unique(df$keyword), summary = summary)
}

3.3  Daily Driver Script (fetch_news.R)

Code
library(dplyr)
library(purrr)
source("functions.R")   # helpers above

commodities <- c("corn", "soybean", "soybean meal", "soybean oil",
                 "wheat", "coffee", "cotton")

# 1 Pull raw headlines -----------------------------------------------------
news_raw <- map_dfr(commodities, fetch_news)

# 2 Deduplicate & keep latest headline per source --------------------------
news_clean <- news_raw |>
  distinct(url, .keep_all = TRUE)

# 3 Generate bullet summaries via OpenAI -----------------------------------
news_summaries <- news_clean |>
  group_split(keyword) |>
  map_dfr(summarise_cluster)

# 4 Persist to disk for the Quarto doc -------------------------------------
#saveRDS(news_clean,      file = "data/news_headlines.rds")
#saveRDS(news_summaries,  file = "data/news_summaries.rds")

3.4  Reporting with Quarto (daily_report.qmd)

Code
---
title: "Daily Commodity News Digest"
format:
  html:
    self-contained: true
    toc: false
    theme: cosmo
---

library(gt)
library(glue)

#news_clean     <- readRDS("data/news_headlines.rds")
#news_summaries <- readRDS("data/news_summaries.rds")

news_summaries |>
  mutate(summary = gsub('\\n', ' ', summary)) |>
  knitr::kable()

news_clean |>
  select(publishedAt, source.name, title, url) |>
  arrange(desc(publishedAt)) |>
  gt::gt() |>
  gt::fmt_datetime(publishedAt, rows = everything(), sep = " ")

The content file produced above is minimal; we keep heavy tables folded behind Quarto’s code‑fold UI so the landing page loads fast.

Code
## 3.5  Scheduling with **cronR**

library(cronR)

cmd <- cron_rscript("fetch_news.R")

cron_add(command = cmd,
         frequency = 'daily',
         at        = "07:00",
         description = "Fetch & summarise commodity news")

The same cron entry can also trigger quarto render daily_report.qmd. If you host your site on GitHub Pages, push the rendered HTML to the docs/ folder and commit. GitHub will automatically redeploy.

4  Results

For 15 June 2025 the agent produced the following high‑level summary (example):

Corn • USDA trimmed 2024/25 US corn output by 2 Mt on flooding in Iowa • China booked 165 kt US cargo as domestic prices spike • Brazil’s 2nd‑crop harvest reaches 38 %, dryness persists in Mato Grosso • CME December futures up 2 % to USD 4.71 /bu • UN FAO projects stable global ending stocks at 319 Mt

The corresponding HTML report (45 kB, self‑contained) was rendered in ~1.3 s on an 8‑year‑old laptop.

5  Discussion

Low code debt. ChatGPT wrote ~80 % of the R required; I mainly refactored style and added edge‑case handling.

Cost. With the gpt‑4o‑mini model the OpenAI bill is ≈ USD 0.03 per day—trivial compared with Bloomberg or Refinitiv.

Latency. End‑to‑end pipeline averages 5–7 seconds.

Extensibility. Swap NewsAPI for GDELT 2.0 or AlphaSense by editing two lines in fetch_news().

Limitations. Summaries inherit the biases of both the underlying sources and the LLM; validation against raw headlines remains crucial.

6  Conclusion

Leveraging ChatGPT Plus as a pair‑programmer allowed us to bootstrap a reliable, fully automated commodity news agent in less than a day. The resulting workflow—cron → R → OpenAI → Quarto—delivers timely, actionable intelligence to support my doctoral research in multi‑asset optimisation without incurring enterprise‑grade data costs.

References

Kuiper, M. et al. httr2: A next‑generation HTTP client for R. R package version 0.2.

Seligman, J. & Wright, T. cronR: Schedule R scripts from R.

OpenAI. API Reference. https://platform.openai.com/docs

NewsAPI Ltd. NewsAPI v2 Documentation. https://newsapi.org/docs