Using AI to predict the outcomes of NBA games.
This project aims to streamline the process of predicting NBA game outcomes by focusing on advanced AI prediction models rather than extensive data collection and management. Unlike my previous project, NBA Betting, which aimed to create a comprehensive feature set for predicting NBA games through extensive data collection, this project simplifies the process. While the previous approach benefited from various industry-derived metrics, the cost and complexity of managing the data collection were too high. This project focuses on a core data set, such as play-by-play data, and leverages deep learning and GenAI to predict game outcomes.
The project is in active development with a complete data collection pipeline and basic prediction engines. Recent infrastructure cleanup removed unnecessary complexity (Airflow orchestration, Wandb experiment tracking) to focus on the core GenAI prediction engine development.
The current system supports seasons 2023-2026 with complete PBP → GameStates → PlayerBox/TeamBox → Features → Predictions pipeline. The default installation includes only the current season (2025-2026); historical data is available separately. The web app provides a simple interface for displaying games with current scores and predictions.
The project is built around a few key components:
database_update_manager.py: The main module that
orchestrates the entire process.
schedule.py: Fetches the schedule from the NBA API
and updates the database.
players.py: Fetches and updates player reference data.
nba_official_injuries.py: Fetches injury reports from NBA's official injury report PDFs.
betting.py: Fetches betting lines (spreads/totals) from ESPN API and Covers.com.
pbp.py: Fetches play-by-play data for games and
updates the database.
game_states.py: Parses play-by-play data to
generate game states and updates the database.
boxscores.py: Fetches traditional boxscore stats (PlayerBox and TeamBox).
prior_states.py: Determines prior final game states
for teams.
features.py: Uses prior final game states to
generate features for the prediction engine.
prediction_manager.py: Generates predictions for
games using the chosen prediction engine.
games.py: Fetches game data from the database,
manages prediction updating and data formatting.
api.py: Defines the API endpoints.start_app.py: The main entry point for the web app
found in the root directory.
app.py: The main module that defines the Flask app
and routes.
game_data_processor.py: Formats game data from the
API for the web app.
templates/: Contains the HTML templates for the web
app.
static/: Contains the CSS and JavaScript files for
the web app.
Currently, there are a few basic prediction engines used to predict the outcomes of NBA games. These serve as placeholders for the more advanced DL and GenAI engines that will be implemented in the future. The current engines make pre-game predictions for home and away scores using ML models. These predictions are then used to calculate the win percentage and margin for the home team. Updated (after game start) predictions are based on a combination of the current game score, time remaining, and the pre-game predictions.
The current metrics are based on pre-game predictions for the home and away team scores, along with downstream metrics such as win percentage and margin. These simple predictors currently outperform the baseline predictor.
In the future, a more challenging baseline based on the Vegas spread will be added when the DL and GenAI models are implemented.
Clone the repository and run the automated setup:
git clone https://github.com/NBA-Betting/NBA_AI.git
cd NBA_AI
python setup.py
The setup script will:
.env configuration file# Activate the virtual environment
source venv/bin/activate
# Start the web app
python start_app.py
Visit http://localhost:5000 to view games and
predictions.
# Use a specific predictor
python start_app.py --predictor=Tree
# Enable debug mode
python start_app.py --debug
# Set log level
python start_app.py --log_level=DEBUG
Available predictors: Baseline, Linear,
Tree (default), MLP*, Ensemble*
*Requires PyTorch - uncomment in requirements.txt
This project is in active development.
The core data pipeline and prediction engines are functional. The focus is now on building advanced DL/GenAI prediction engines using play-by-play data.
This is a personal side project provided "as is" with no guarantees of quality, functionality, or ongoing maintenance. I've vibe-coded much of this release and while I'll try to address issues, I can't promise timely responses or fixes.
For production or commercial use: Consider using SportsRadar, the official NBA data partner. Their API would greatly simplify data management compared to scraping the NBA Stats API. I use this approach only because I can't justify the cost for a personal project.
The default setup downloads only the current season (2025-2026, ~1,300 games). A development database with 3 seasons (2023-2024 through 2025-2026, ~4,100 games total) is available from GitHub Releases.
To use it:
NBA_AI_dev.zip from the latest releasedata/NBA_AI_dev.sqlite.env:DATABASE_PATH=data/NBA_AI_dev.sqlite
valid_seasons in
config.yaml.