Header Image

Table of Contents

Project Overview

Using AI to predict the outcomes of NBA games.

This project aims to streamline the process of predicting NBA game outcomes by focusing on advanced AI prediction models rather than extensive data collection and management. Unlike my previous project, NBA Betting, which aimed to create a comprehensive feature set for predicting NBA games through extensive data collection, this project simplifies the process. While the previous approach benefited from various industry-derived metrics, the cost and complexity of managing the data collection were too high. This project focuses on a core data set, such as play-by-play data, and leverages deep learning and GenAI to predict game outcomes.

Current State

The project is in active development with a complete data collection pipeline and basic prediction engines. Recent infrastructure cleanup removed unnecessary complexity (Airflow orchestration, Wandb experiment tracking) to focus on the core GenAI prediction engine development.

The current system supports seasons 2023-2026 with complete PBP → GameStates → PlayerBox/TeamBox → Features → Predictions pipeline. The default installation includes only the current season (2025-2026); historical data is available separately. The web app provides a simple interface for displaying games with current scores and predictions.

Project Flowchart

The project is built around a few key components:

Future Goals

Foundational Model Outline
  1. Data Sourcing: Focus on a minimal number of data sources that fundamentally describe basketball. Currently, we use play-by-play data from the NBA API. In the future, incorporating video and tracking data would be interesting, though these require considerably more resources and access.
  2. Prediction Engine: This is the core of the project and the current development focus. The current prediction engine options will be replaced with a DL and GenAI-based engine, allowing for decreased data parsing and feature engineering while also scaling to predict more complex outcomes, including individual player performance.
  3. Data Storage: Future data storage will more seamlessly integrate with the prediction engine. The storage requirements will combine the current SQL-based data used for the API and web app with more advanced vector-based storage for RAG-based GenAI models.
  4. Web App: This is the project's front end, displaying the games for the selected date along with current scores and predictions. The interface will remain simple while usability is gradually improved. A separate GenAI chat will be added in the future to allow users to interact with the prediction engine and modify individual predictions based on their preferences.

Guiding Principles

Project Guiding Principles

Web App

Web App Home Page Web App Game Details

Prediction Engines

Currently, there are a few basic prediction engines used to predict the outcomes of NBA games. These serve as placeholders for the more advanced DL and GenAI engines that will be implemented in the future. The current engines make pre-game predictions for home and away scores using ML models. These predictions are then used to calculate the win percentage and margin for the home team. Updated (after game start) predictions are based on a combination of the current game score, time remaining, and the pre-game predictions.

Current Prediction Engines

Performance Metrics

The current metrics are based on pre-game predictions for the home and away team scores, along with downstream metrics such as win percentage and margin. These simple predictors currently outperform the baseline predictor.

In the future, a more challenging baseline based on the Vegas spread will be added when the DL and GenAI models are implemented.

Prediction Engine Performance Metrics

Quick Start

Requirements

Installation

Clone the repository and run the automated setup:

git clone https://github.com/NBA-Betting/NBA_AI.git
cd NBA_AI
python setup.py

The setup script will:

  1. Create a virtual environment
  2. Install all dependencies
  3. Download the database and trained models from GitHub Releases
  4. Create your .env configuration file
  5. Verify the installation

Running the Web App

# Activate the virtual environment
source venv/bin/activate

# Start the web app
python start_app.py

Visit http://localhost:5000 to view games and predictions.

Command Line Options

# Use a specific predictor
python start_app.py --predictor=Tree

# Enable debug mode
python start_app.py --debug

# Set log level
python start_app.py --log_level=DEBUG

Available predictors: Baseline, Linear, Tree (default), MLP*, Ensemble*

*Requires PyTorch - uncomment in requirements.txt

Development Status

This project is in active development.

The core data pipeline and prediction engines are functional. The focus is now on building advanced DL/GenAI prediction engines using play-by-play data.

Disclaimer

This is a personal side project provided "as is" with no guarantees of quality, functionality, or ongoing maintenance. I've vibe-coded much of this release and while I'll try to address issues, I can't promise timely responses or fixes.

For production or commercial use: Consider using SportsRadar, the official NBA data partner. Their API would greatly simplify data management compared to scraping the NBA Stats API. I use this approach only because I can't justify the cost for a personal project.

Historical Data

The default setup downloads only the current season (2025-2026, ~1,300 games). A development database with 3 seasons (2023-2024 through 2025-2026, ~4,100 games total) is available from GitHub Releases.

To use it:

  1. Download NBA_AI_dev.zip from the latest release
  2. Extract to data/NBA_AI_dev.sqlite
  3. Update your .env:
DATABASE_PATH=data/NBA_AI_dev.sqlite

Usage Notes

Technical Notes