Proposal: Integrating taktflow-bms-ml with foxBMS POSIX vECU
Date: 2026-03-21 Status: PROPOSAL Dependencies: foxbms-posix (BMS in NORMAL), taktflow-bms-ml (5 models trained)
The Opportunity
We have two working systems that don't talk to each other:
| foxbms-posix | taktflow-bms-ml |
|---|---|
| Real BMS firmware running on Linux | 5 trained ML models (ONNX) |
| Outputs 15+ CAN messages with battery state | Inputs: pack V/I/T time series |
| Uses coulomb counting for SOC (drifts) | SOC LSTM: 1.83% RMSE |
| No degradation awareness | SOH LSTM: 0.85% RMSE |
| No predictive capability | RUL Transformer: 16% MAPE |
| Threshold-only fault detection | Thermal CNN: F1=1.000 |
| Static plant model (constant values) | Trained on real BMW i3 driving data |
Connecting them creates something neither can do alone: a BMS with ML-augmented intelligence, testable end-to-end without hardware.
Three Integration Layers
Layer 1: ML-Driven Plant Model (replace static data with realistic behavior)
Problem now: plant_model.py sends constant 3700mV, 0A, 25C. foxBMS reaches NORMAL but nothing interesting happens — SOC stays at 50% forever, no faults, no degradation.
Solution: Use the SOC LSTM model in reverse — instead of predicting SOC from measurements, replay the BMW i3 training data through the plant model to feed foxBMS with real driving profiles.
BMW i3 trip CSV ──→ plant_model.py ──→ SocketCAN ──→ foxBMS vECU
72 real trips encodes to CAN processes real
V, I, T, SOC foxBMS CAN format frames driving data
What this enables: - foxBMS SOC changes over time (charge/discharge cycles) - foxBMS sees realistic temperature variation - foxBMS precharge/contactor logic exercised with real voltage profiles - foxBMS plausibility checks tested against real cell behavior
Implementation:
# plant_model_replay.py — new file, ~80 lines
import csv
import onnxruntime # not needed for replay, but available
class TripReplay:
"""Replay BMW i3 trip through foxBMS CAN interface."""
def __init__(self, trip_csv):
self.data = load_trip(trip_csv) # reuse prepare_soc_dataset.py loader
self.idx = 0
self.num_cells = 18
def step(self):
"""Return one timestep of battery data for CAN encoding."""
row = self.data[self.idx]
pack_v = row[0] # Battery Voltage [V]
pack_i = row[1] # Battery Current [A]
temp = row[2] # Battery Temperature [C]
# Derive cell voltages from pack voltage
cell_v_mv = int(pack_v / self.num_cells * 1000)
self.idx = (self.idx + 1) % len(self.data)
return cell_v_mv, int(pack_i * 1000), int(temp * 10)
Effort: 1-2 days Risk: Low — plant model changes only, no foxBMS code changes Value: Immediately makes every demo and test run realistic
Layer 2: ML Sidecar (run inference alongside foxBMS, compare results)
Problem: foxBMS uses coulomb counting for SOC. It works but drifts over time. We have a 1.83% RMSE LSTM but no way to use it.
Solution: A Python sidecar process that reads foxBMS CAN output, runs ONNX inference, and publishes ML predictions on a separate CAN ID or MQTT topic.
foxBMS vECU ──→ CAN TX ──→ ML Sidecar (Python) ──→ CAN TX (new IDs)
0x233 pack V/I reads SocketCAN 0x700 ML SOC
0x250 cell V builds 200-step window 0x701 ML SOH
0x260 cell T ONNX inference (5 models) 0x702 ML thermal risk
0x235 BMS SOC every 1 second 0x703 ML RUL
What this enables: - Side-by-side SOC comparison: foxBMS coulomb counting vs ML LSTM - Early degradation detection (SOH drops before capacity fades visibly) - Thermal anomaly scoring (0-1 risk level, not just threshold) - RUL estimation for predictive maintenance - All observable via standard CAN tools (candump, CANape)
Implementation:
# ml_sidecar.py — new file, ~150 lines
import onnxruntime as ort
import numpy as np
import socket, struct, collections
# Load all 5 ONNX models
soc_model = ort.InferenceSession("taktflow-bms-ml/models/bms/soc_lstm.onnx")
soh_model = ort.InferenceSession("taktflow-bms-ml/models/bms/soh_lstm.onnx")
thermal_model = ort.InferenceSession("taktflow-bms-ml/models/bms/thermal_cnn.onnx")
# Sliding window buffer (200 timesteps × 5 features)
window = collections.deque(maxlen=200)
# Read foxBMS CAN → extract V, I, T → build window → infer
while True:
frame = read_can_frame(sock)
if frame.id == 0x233: # Pack Values P0: voltage + current
pack_v, pack_i = decode_0x233(frame.data)
if frame.id == 0x260: # Cell Temperatures
temp, temp_max = decode_0x260(frame.data)
window.append([pack_v, pack_i, temp, temp_max, 0.0]) # velocity=0 for SIL
if len(window) == 200:
# SOC inference
x = np.array(window, dtype=np.float32).reshape(1, 200, 5)
x = (x - norm_mean) / norm_std # normalize with training stats
soc_pred = soc_model.run(None, {"bms_window": x})[0][0]
# Publish on CAN
can_send(0x700, encode_soc(soc_pred)) # ML SOC
Input mapping (foxBMS CAN → LSTM features):
| LSTM Feature | foxBMS CAN Source | Signal |
|---|---|---|
| pack_V (V) | 0x233 Pack Values P0 | packVoltage_mV / 1000 |
| pack_I (A) | 0x233 Pack Values P0 | packCurrent_mA / 1000 |
| T_avg (C) | 0x260 Cell Temperatures | average of decoded temps |
| T_max (C) | 0x260 Cell Temperatures | max of decoded temps |
| velocity (km/h) | not available in foxBMS | set to 0 (SIL) or inject from plant |
Key detail: The LSTM was trained with normalization. The sidecar must apply the same soc_norm_mean.npy and soc_norm_std.npy from data/bms-processed/. Without normalization, predictions will be garbage.
Effort: 3-5 days Risk: Medium — needs CAN signal decoding to match foxBMS DBC exactly Value: High — demonstrates ML+firmware co-simulation, directly portfolio-worthy
Layer 3: ML-Enhanced Fault Injection (use models to generate realistic faults)
Problem: PLAN.md Phase 3 (fault injection) currently means manually setting one cell to 4.5V. That's unrealistic — real faults develop gradually.
Solution: Use the ML models to generate realistic fault progression scenarios based on patterns in the training data.
Fault Scenario Engine ──→ plant_model.py ──→ foxBMS ──→ ML Sidecar
↓
"thermal runaway at t=300s" gradually increasing detects anomaly
"capacity fade over 500 cycles" temperature + voltage at t=280s (20s early)
"cell imbalance developing" drop following NREL triggers CAN alert
failure profiles
Scenarios:
| Scenario | Data Source | Plant Model Behavior | foxBMS Expected Response | ML Expected Response |
|---|---|---|---|---|
| Thermal runaway | NREL failure profiles | Cell temp ramp 2C/min → 100C | Opens contactors at 80C threshold | Thermal CNN detects at 60C (20s earlier) |
| Capacity fade | SOH training data | Slowly reduce cell voltage range | No response (below threshold) | SOH LSTM tracks degradation trend |
| Cell imbalance | Imbalance CNN training data | 1 cell drifts 50mV/cycle | Balancing activates | Imbalance CNN predicts which cell |
| Sensor drift | Synthetic | IVT current offset +5A gradually | SOC drifts | SOC LSTM disagrees with BMS SOC |
| Fast charge stress | BMW i3 high-current profiles | High current + temp rise | SOF limits power | Thermal risk score rises |
Effort: 1-2 weeks Risk: High — requires careful scenario design and validation Value: Very high — proves ML catches faults that threshold logic misses
Architecture Summary
Layer 1 Layer 2 Layer 3
(Plant Model) (ML Sidecar) (Fault Injection)
| | |
BMW i3 trip data | foxBMS CAN output | NREL failure |
or fault scenario ─────+────────────────────── | ── profiles ───────+
| | |
v v v
+--------------------+ +------------------+ +----------------+
| plant_model.py | | ml_sidecar.py | | fault_engine.py|
| (enhanced) | | ONNX Runtime | | scenario |
| trip replay | | 5 models loaded | | generator |
| fault injection | | reads foxBMS CAN | | gradual faults |
+--------+-----------+ | publishes ML CAN | +-------+--------+
| +--------+---------+ |
| 0x270,0x521 | 0x700-703 |
v v |
+----------------------------------------+ |
| SocketCAN (vcan1) | <-----------+
+------------------+---------------------+
|
v
+----------------------------------------+
| foxbms-vecu (C binary) |
| BMS state machine + SOC counting |
| 15+ CAN TX messages |
+----------------------------------------+
Recommended Implementation Order
| Phase | What | Effort | Depends On | Deliverable |
|---|---|---|---|---|
| L1.1 | Trip replay plant model | 2 days | — | plant_model_replay.py |
| L1.2 | Dynamic current → SOC changes | 1 day | L1.1 | CAN 0x235 SOC changing over time |
| L2.1 | ML sidecar skeleton (CAN read + ONNX load) | 2 days | — | ml_sidecar.py |
| L2.2 | SOC LSTM inference + CAN publish | 2 days | L2.1 | CAN 0x700 with ML SOC |
| L2.3 | SOC comparison dashboard | 1 day | L1.2 + L2.2 | foxBMS SOC vs ML SOC plot |
| L2.4 | Thermal CNN + SOH LSTM inference | 2 days | L2.1 | CAN 0x701, 0x702 |
| L3.1 | Thermal runaway scenario from NREL data | 3 days | L1.1 + L2.4 | Fault detected 20s early |
| L3.2 | Cell imbalance + capacity fade scenarios | 3 days | L3.1 | Full fault test suite |
| Docker | Compose: foxbms-vecu + plant + sidecar | 1 day | L2.2 | docker-compose.yml |
Total: ~3 weeks for a student to deliver L1 + L2, ~5 weeks for all three layers.
What Makes This Valuable
For the foxBMS project
- Realistic test data instead of constant values → actual validation
- Predictive capability that foxBMS doesn't have natively
- Fault injection with realistic profiles instead of step functions
For the ML project
- Real firmware to validate against (not just offline evaluation)
- CAN-based deployment pipeline (ONNX → sidecar → CAN bus)
- Cross-validation: ML SOC vs foxBMS coulomb counting vs BMW i3 ground truth
For a student thesis
- Claim: "ML-augmented BMS achieves 1.83% SOC RMSE vs 5-10% coulomb counting drift"
- Claim: "Thermal anomaly detected 20 seconds before threshold-based detection"
- Deliverable: Working demo — foxBMS + ML sidecar + fault injection, all on SocketCAN
- Comparison: dSPACE VEOS + MATLAB = $150k+; this setup = $0 (open source + SocketCAN)
For portfolio / interviews
- Embedded firmware + ML + CAN protocol + Docker → full-stack automotive
- Not a toy: real foxBMS (Fraunhofer), real driving data (BMW i3), production models (ONNX)
- Reproducible: clone, build, run, see CAN output in 10 minutes
Technical Risks
| Risk | Impact | Mitigation |
|---|---|---|
| LSTM velocity feature missing in SIL | SOC accuracy degrades | Retrain with 4 features (drop velocity) or set to 0 |
| Normalization stats mismatch | Predictions are random | Ship soc_norm_mean.npy + soc_norm_std.npy with sidecar |
| CAN signal decoding mismatch | Wrong input to model | Use foxBMS DBC file for both plant model and sidecar |
| ONNX Runtime on ARM (future) | Won't run on embedded | CPU ONNX Runtime works on x86 and ARM64; tested |
| 200-step window = 200 seconds at 1Hz | 3+ minutes to first prediction | Use smaller window (50 steps) with reduced accuracy, or warm-start from plant data |
| foxBMS cycle rate (1ms) vs ML inference (~10ms) | Sidecar can't keep up | Run inference at 1Hz, not per CAN frame — 200x reduction |
Files to Create
| File | Location | Purpose |
|---|---|---|
plant_model_replay.py |
foxbms-posix/src/ | Trip replay from BMW i3 CSV |
ml_sidecar.py |
foxbms-posix/src/ | ONNX inference from foxBMS CAN |
fault_engine.py |
foxbms-posix/src/ | Realistic fault scenario generator |
decode_foxbms_can.py |
foxbms-posix/src/ | foxBMS CAN message decoder (shared) |
docker-compose.yml |
foxbms-posix/ | Compose: vECU + plant + sidecar |
requirements.txt |
foxbms-posix/src/ | onnxruntime, python-can, numpy |