SQLite Integration - Complete ✅¶

Summary¶

All data is now stored in SQLite database as the primary source, with JSON files maintained as backup.

Database Location¶

models/neural_network/training/stock_data.db (1.2GB)

Tables¶

1. `current_stock_data` (NEW)¶

Purpose: Current stock data from yfinance
Records: 435 stocks
Fields: ticker, prices, financial metrics, JSON blobs for complex data
Updated by: scripts/data_fetcher.py

2. Existing Tables¶

assets: Asset registry
snapshots: Historical snapshots for training
forward_returns: Training labels
nn_predictions: Neural network predictions
valuation_predictions: Classic model valuations

Updated Scripts¶

Data Fetcher (`scripts/data_fetcher.py`)¶

Writes to: SQLite (primary) + JSON (backup)
New method: save_to_sqlite()
Behavior: Atomic writes to both storages

Stock Data Reader (`src/invest/data/stock_data_reader.py`)¶

New module: Unified interface for reading stock data
Methods:
get_stock_data(ticker) - Get single stock
get_all_tickers() - List all tickers
get_stocks_by_sector(sector) - Filter by sector
get_stock_count() - Total count
Format: Returns dict matching JSON cache format for compatibility

Prediction Script (`scripts/run_multi_horizon_predictions.py`)¶

Updated: load_stock_cache() now reads from SQLite first
Fallback: Uses JSON if SQLite fails
Transparent: No changes needed to rest of code

Migration Script¶

scripts/migrate_json_to_sqlite.py - Migrated 435 stocks from JSON to SQLite - No errors during migration - Can be re-run to sync changes

Testing¶

# Test data fetcher
uv run python scripts/test_sqlite_integration.py

# Test reader
uv run python scripts/test_sqlite_reader.py

# Run predictions (uses SQLite automatically)
uv run python scripts/run_multi_horizon_predictions.py

Backward Compatibility¶

✅ JSON files still written as backup
✅ JSON fallback in reader if SQLite fails
✅ Existing code works without changes
✅ Dashboard still reads from dashboard_data.json

Next Steps (Optional)¶

Update dashboard generator to read from SQLite
Add --no-json-backup flag to data_fetcher
Eventually remove JSON files after testing period
Add SQLite optimization (VACUUM, indexes)

Benefits¶

Single source of truth: All data in one database
Better queries: Can filter by sector, date, metrics
Atomic updates: No partial writes
Reduced disk usage: No duplicate JSON + DB storage (eventually)
Faster reads: Indexed database queries
Easier backup: Single file to backup

File Sizes¶

SQLite DB: 1.2GB
JSON cache: ~17MB (436 files × ~40KB each)
Dashboard JSON: 2.3MB

Schema¶

See docs/sqlite_integration_plan.md for detailed schema design.