Proposal: Integrating taktflow-bms-ml with foxBMS POSIX vECU

Date: 2026-03-21 Status: PROPOSAL Dependencies: foxbms-posix (BMS in NORMAL), taktflow-bms-ml (5 models trained)


The Opportunity

We have two working systems that don't talk to each other:

foxbms-posix taktflow-bms-ml
Real BMS firmware running on Linux 5 trained ML models (ONNX)
Outputs 15+ CAN messages with battery state Inputs: pack V/I/T time series
Uses coulomb counting for SOC (drifts) SOC LSTM: 1.83% RMSE
No degradation awareness SOH LSTM: 0.85% RMSE
No predictive capability RUL Transformer: 16% MAPE
Threshold-only fault detection Thermal CNN: F1=1.000
Static plant model (constant values) Trained on real BMW i3 driving data

Connecting them creates something neither can do alone: a BMS with ML-augmented intelligence, testable end-to-end without hardware.


Three Integration Layers

Layer 1: ML-Driven Plant Model (replace static data with realistic behavior)

Problem now: plant_model.py sends constant 3700mV, 0A, 25C. foxBMS reaches NORMAL but nothing interesting happens — SOC stays at 50% forever, no faults, no degradation.

Solution: Use the SOC LSTM model in reverse — instead of predicting SOC from measurements, replay the BMW i3 training data through the plant model to feed foxBMS with real driving profiles.

BMW i3 trip CSV ──→ plant_model.py ──→ SocketCAN ──→ foxBMS vECU
  72 real trips        encodes to           CAN         processes real
  V, I, T, SOC        foxBMS CAN format    frames      driving data

What this enables: - foxBMS SOC changes over time (charge/discharge cycles) - foxBMS sees realistic temperature variation - foxBMS precharge/contactor logic exercised with real voltage profiles - foxBMS plausibility checks tested against real cell behavior

Implementation:

# plant_model_replay.py — new file, ~80 lines
import csv
import onnxruntime  # not needed for replay, but available

class TripReplay:
    """Replay BMW i3 trip through foxBMS CAN interface."""
    def __init__(self, trip_csv):
        self.data = load_trip(trip_csv)  # reuse prepare_soc_dataset.py loader
        self.idx = 0
        self.num_cells = 18

    def step(self):
        """Return one timestep of battery data for CAN encoding."""
        row = self.data[self.idx]
        pack_v = row[0]  # Battery Voltage [V]
        pack_i = row[1]  # Battery Current [A]
        temp   = row[2]  # Battery Temperature [C]
        # Derive cell voltages from pack voltage
        cell_v_mv = int(pack_v / self.num_cells * 1000)
        self.idx = (self.idx + 1) % len(self.data)
        return cell_v_mv, int(pack_i * 1000), int(temp * 10)

Effort: 1-2 days Risk: Low — plant model changes only, no foxBMS code changes Value: Immediately makes every demo and test run realistic


Layer 2: ML Sidecar (run inference alongside foxBMS, compare results)

Problem: foxBMS uses coulomb counting for SOC. It works but drifts over time. We have a 1.83% RMSE LSTM but no way to use it.

Solution: A Python sidecar process that reads foxBMS CAN output, runs ONNX inference, and publishes ML predictions on a separate CAN ID or MQTT topic.

foxBMS vECU ──→ CAN TX ──→ ML Sidecar (Python) ──→ CAN TX (new IDs)
  0x233 pack V/I           reads SocketCAN             0x700 ML SOC
  0x250 cell V             builds 200-step window       0x701 ML SOH
  0x260 cell T             ONNX inference (5 models)    0x702 ML thermal risk
  0x235 BMS SOC            every 1 second               0x703 ML RUL

What this enables: - Side-by-side SOC comparison: foxBMS coulomb counting vs ML LSTM - Early degradation detection (SOH drops before capacity fades visibly) - Thermal anomaly scoring (0-1 risk level, not just threshold) - RUL estimation for predictive maintenance - All observable via standard CAN tools (candump, CANape)

Implementation:

# ml_sidecar.py — new file, ~150 lines
import onnxruntime as ort
import numpy as np
import socket, struct, collections

# Load all 5 ONNX models
soc_model = ort.InferenceSession("taktflow-bms-ml/models/bms/soc_lstm.onnx")
soh_model = ort.InferenceSession("taktflow-bms-ml/models/bms/soh_lstm.onnx")
thermal_model = ort.InferenceSession("taktflow-bms-ml/models/bms/thermal_cnn.onnx")

# Sliding window buffer (200 timesteps × 5 features)
window = collections.deque(maxlen=200)

# Read foxBMS CAN → extract V, I, T → build window → infer
while True:
    frame = read_can_frame(sock)
    if frame.id == 0x233:  # Pack Values P0: voltage + current
        pack_v, pack_i = decode_0x233(frame.data)
    if frame.id == 0x260:  # Cell Temperatures
        temp, temp_max = decode_0x260(frame.data)

    window.append([pack_v, pack_i, temp, temp_max, 0.0])  # velocity=0 for SIL

    if len(window) == 200:
        # SOC inference
        x = np.array(window, dtype=np.float32).reshape(1, 200, 5)
        x = (x - norm_mean) / norm_std  # normalize with training stats
        soc_pred = soc_model.run(None, {"bms_window": x})[0][0]

        # Publish on CAN
        can_send(0x700, encode_soc(soc_pred))  # ML SOC

Input mapping (foxBMS CAN → LSTM features):

LSTM Feature foxBMS CAN Source Signal
pack_V (V) 0x233 Pack Values P0 packVoltage_mV / 1000
pack_I (A) 0x233 Pack Values P0 packCurrent_mA / 1000
T_avg (C) 0x260 Cell Temperatures average of decoded temps
T_max (C) 0x260 Cell Temperatures max of decoded temps
velocity (km/h) not available in foxBMS set to 0 (SIL) or inject from plant

Key detail: The LSTM was trained with normalization. The sidecar must apply the same soc_norm_mean.npy and soc_norm_std.npy from data/bms-processed/. Without normalization, predictions will be garbage.

Effort: 3-5 days Risk: Medium — needs CAN signal decoding to match foxBMS DBC exactly Value: High — demonstrates ML+firmware co-simulation, directly portfolio-worthy


Layer 3: ML-Enhanced Fault Injection (use models to generate realistic faults)

Problem: PLAN.md Phase 3 (fault injection) currently means manually setting one cell to 4.5V. That's unrealistic — real faults develop gradually.

Solution: Use the ML models to generate realistic fault progression scenarios based on patterns in the training data.

Fault Scenario Engine ──→ plant_model.py ──→ foxBMS ──→ ML Sidecar
                                                            ↓
  "thermal runaway at t=300s"     gradually increasing    detects anomaly
  "capacity fade over 500 cycles" temperature + voltage   at t=280s (20s early)
  "cell imbalance developing"     drop following NREL     triggers CAN alert
                                  failure profiles

Scenarios:

Scenario Data Source Plant Model Behavior foxBMS Expected Response ML Expected Response
Thermal runaway NREL failure profiles Cell temp ramp 2C/min → 100C Opens contactors at 80C threshold Thermal CNN detects at 60C (20s earlier)
Capacity fade SOH training data Slowly reduce cell voltage range No response (below threshold) SOH LSTM tracks degradation trend
Cell imbalance Imbalance CNN training data 1 cell drifts 50mV/cycle Balancing activates Imbalance CNN predicts which cell
Sensor drift Synthetic IVT current offset +5A gradually SOC drifts SOC LSTM disagrees with BMS SOC
Fast charge stress BMW i3 high-current profiles High current + temp rise SOF limits power Thermal risk score rises

Effort: 1-2 weeks Risk: High — requires careful scenario design and validation Value: Very high — proves ML catches faults that threshold logic misses


Architecture Summary

                        Layer 1                Layer 2              Layer 3
                    (Plant Model)           (ML Sidecar)       (Fault Injection)
                         |                       |                    |
  BMW i3 trip data       |    foxBMS CAN output  |   NREL failure     |
  or fault scenario ─────+────────────────────── | ── profiles ───────+
                         |                       |                    |
                         v                       v                    v
              +--------------------+   +------------------+  +----------------+
              | plant_model.py     |   | ml_sidecar.py    |  | fault_engine.py|
              | (enhanced)         |   | ONNX Runtime     |  | scenario       |
              | trip replay        |   | 5 models loaded  |  | generator      |
              | fault injection    |   | reads foxBMS CAN |  | gradual faults |
              +--------+-----------+   | publishes ML CAN |  +-------+--------+
                       |               +--------+---------+          |
                       | 0x270,0x521           | 0x700-703           |
                       v                       v                     |
              +----------------------------------------+             |
              |           SocketCAN (vcan1)             | <-----------+
              +------------------+---------------------+
                                 |
                                 v
              +----------------------------------------+
              |        foxbms-vecu (C binary)           |
              |  BMS state machine + SOC counting       |
              |  15+ CAN TX messages                    |
              +----------------------------------------+

Phase What Effort Depends On Deliverable
L1.1 Trip replay plant model 2 days plant_model_replay.py
L1.2 Dynamic current → SOC changes 1 day L1.1 CAN 0x235 SOC changing over time
L2.1 ML sidecar skeleton (CAN read + ONNX load) 2 days ml_sidecar.py
L2.2 SOC LSTM inference + CAN publish 2 days L2.1 CAN 0x700 with ML SOC
L2.3 SOC comparison dashboard 1 day L1.2 + L2.2 foxBMS SOC vs ML SOC plot
L2.4 Thermal CNN + SOH LSTM inference 2 days L2.1 CAN 0x701, 0x702
L3.1 Thermal runaway scenario from NREL data 3 days L1.1 + L2.4 Fault detected 20s early
L3.2 Cell imbalance + capacity fade scenarios 3 days L3.1 Full fault test suite
Docker Compose: foxbms-vecu + plant + sidecar 1 day L2.2 docker-compose.yml

Total: ~3 weeks for a student to deliver L1 + L2, ~5 weeks for all three layers.


What Makes This Valuable

For the foxBMS project

For the ML project

For a student thesis

For portfolio / interviews


Technical Risks

Risk Impact Mitigation
LSTM velocity feature missing in SIL SOC accuracy degrades Retrain with 4 features (drop velocity) or set to 0
Normalization stats mismatch Predictions are random Ship soc_norm_mean.npy + soc_norm_std.npy with sidecar
CAN signal decoding mismatch Wrong input to model Use foxBMS DBC file for both plant model and sidecar
ONNX Runtime on ARM (future) Won't run on embedded CPU ONNX Runtime works on x86 and ARM64; tested
200-step window = 200 seconds at 1Hz 3+ minutes to first prediction Use smaller window (50 steps) with reduced accuracy, or warm-start from plant data
foxBMS cycle rate (1ms) vs ML inference (~10ms) Sidecar can't keep up Run inference at 1Hz, not per CAN frame — 200x reduction

Files to Create

File Location Purpose
plant_model_replay.py foxbms-posix/src/ Trip replay from BMW i3 CSV
ml_sidecar.py foxbms-posix/src/ ONNX inference from foxBMS CAN
fault_engine.py foxbms-posix/src/ Realistic fault scenario generator
decode_foxbms_can.py foxbms-posix/src/ foxBMS CAN message decoder (shared)
docker-compose.yml foxbms-posix/ Compose: vECU + plant + sidecar
requirements.txt foxbms-posix/src/ onnxruntime, python-can, numpy