SQLite Integration - Complete ✅¶
Summary¶
All data is now stored in SQLite database as the primary source, with JSON files maintained as backup.
Database Location¶
models/neural_network/training/stock_data.db (1.2GB)
Tables¶
1. current_stock_data (NEW)¶
- Purpose: Current stock data from yfinance
- Records: 435 stocks
- Fields: ticker, prices, financial metrics, JSON blobs for complex data
- Updated by:
scripts/data_fetcher.py
2. Existing Tables¶
assets: Asset registrysnapshots: Historical snapshots for trainingforward_returns: Training labelsnn_predictions: Neural network predictionsvaluation_predictions: Classic model valuations
Updated Scripts¶
Data Fetcher (scripts/data_fetcher.py)¶
- Writes to: SQLite (primary) + JSON (backup)
- New method:
save_to_sqlite() - Behavior: Atomic writes to both storages
Stock Data Reader (src/invest/data/stock_data_reader.py)¶
- New module: Unified interface for reading stock data
- Methods:
get_stock_data(ticker)- Get single stockget_all_tickers()- List all tickersget_stocks_by_sector(sector)- Filter by sectorget_stock_count()- Total count- Format: Returns dict matching JSON cache format for compatibility
Prediction Script (scripts/run_multi_horizon_predictions.py)¶
- Updated:
load_stock_cache()now reads from SQLite first - Fallback: Uses JSON if SQLite fails
- Transparent: No changes needed to rest of code
Migration Script¶
scripts/migrate_json_to_sqlite.py
- Migrated 435 stocks from JSON to SQLite
- No errors during migration
- Can be re-run to sync changes
Testing¶
# Test data fetcher
uv run python scripts/test_sqlite_integration.py
# Test reader
uv run python scripts/test_sqlite_reader.py
# Run predictions (uses SQLite automatically)
uv run python scripts/run_multi_horizon_predictions.py
Backward Compatibility¶
- ✅ JSON files still written as backup
- ✅ JSON fallback in reader if SQLite fails
- ✅ Existing code works without changes
- ✅ Dashboard still reads from
dashboard_data.json
Next Steps (Optional)¶
- Update dashboard generator to read from SQLite
- Add
--no-json-backupflag to data_fetcher - Eventually remove JSON files after testing period
- Add SQLite optimization (VACUUM, indexes)
Benefits¶
- Single source of truth: All data in one database
- Better queries: Can filter by sector, date, metrics
- Atomic updates: No partial writes
- Reduced disk usage: No duplicate JSON + DB storage (eventually)
- Faster reads: Indexed database queries
- Easier backup: Single file to backup
File Sizes¶
- SQLite DB: 1.2GB
- JSON cache: ~17MB (436 files × ~40KB each)
- Dashboard JSON: 2.3MB
Schema¶
See docs/sqlite_integration_plan.md for detailed schema design.