Skip to content

SQLite Integration - Complete ✅

Summary

All data is now stored in SQLite database as the primary source, with JSON files maintained as backup.

Database Location

models/neural_network/training/stock_data.db (1.2GB)

Tables

1. current_stock_data (NEW)

  • Purpose: Current stock data from yfinance
  • Records: 435 stocks
  • Fields: ticker, prices, financial metrics, JSON blobs for complex data
  • Updated by: scripts/data_fetcher.py

2. Existing Tables

  • assets: Asset registry
  • snapshots: Historical snapshots for training
  • forward_returns: Training labels
  • nn_predictions: Neural network predictions
  • valuation_predictions: Classic model valuations

Updated Scripts

Data Fetcher (scripts/data_fetcher.py)

  • Writes to: SQLite (primary) + JSON (backup)
  • New method: save_to_sqlite()
  • Behavior: Atomic writes to both storages

Stock Data Reader (src/invest/data/stock_data_reader.py)

  • New module: Unified interface for reading stock data
  • Methods:
  • get_stock_data(ticker) - Get single stock
  • get_all_tickers() - List all tickers
  • get_stocks_by_sector(sector) - Filter by sector
  • get_stock_count() - Total count
  • Format: Returns dict matching JSON cache format for compatibility

Prediction Script (scripts/run_multi_horizon_predictions.py)

  • Updated: load_stock_cache() now reads from SQLite first
  • Fallback: Uses JSON if SQLite fails
  • Transparent: No changes needed to rest of code

Migration Script

scripts/migrate_json_to_sqlite.py - Migrated 435 stocks from JSON to SQLite - No errors during migration - Can be re-run to sync changes

Testing

# Test data fetcher
uv run python scripts/test_sqlite_integration.py

# Test reader
uv run python scripts/test_sqlite_reader.py

# Run predictions (uses SQLite automatically)
uv run python scripts/run_multi_horizon_predictions.py

Backward Compatibility

  • ✅ JSON files still written as backup
  • ✅ JSON fallback in reader if SQLite fails
  • ✅ Existing code works without changes
  • ✅ Dashboard still reads from dashboard_data.json

Next Steps (Optional)

  1. Update dashboard generator to read from SQLite
  2. Add --no-json-backup flag to data_fetcher
  3. Eventually remove JSON files after testing period
  4. Add SQLite optimization (VACUUM, indexes)

Benefits

  • Single source of truth: All data in one database
  • Better queries: Can filter by sector, date, metrics
  • Atomic updates: No partial writes
  • Reduced disk usage: No duplicate JSON + DB storage (eventually)
  • Faster reads: Indexed database queries
  • Easier backup: Single file to backup

File Sizes

  • SQLite DB: 1.2GB
  • JSON cache: ~17MB (436 files × ~40KB each)
  • Dashboard JSON: 2.3MB

Schema

See docs/sqlite_integration_plan.md for detailed schema design.