foxBMS POSIX vECU — Lessons Learned

2026-03-21 — Phase 3 Fault Injection

1. RSL/MOL test values must be tier-specific

Context: Generated 2,005 test cases via cross-product (signals × methods × tiers). Mistake: STUCK_AT_0 used value 0 for all tiers (MSL, RSL, MOL). 0mV is way below MSL (2500mV), so RSL/MOL tests triggered FATAL instead of WARNING. Fix: Each tier gets a value between its boundaries: OV_RSL=4210mV (between RSL 4200 and MSL 4250). Principle: Cross-product test generation must be followed by tier-aware value validation. Generic fault methods + severity tiers ≠ valid test cases without domain-specific value constraints.

2. subprocess.PIPE blocks vECU when buffer fills

Context: Test runner started foxbms-vecu with stdout=subprocess.PIPE to capture logs. Mistake: foxBMS generates heavy stderr trace output (~50KB/s). PIPE buffer is 64KB. Filled in ~4s → vECU blocked on write → BMS stuck in PRECHARGE forever. Fix: Use subprocess.DEVNULL instead. Logs go to /dev/null. If needed, redirect to file, never PIPE. Principle: Never use PIPE for long-running high-output processes unless actively reading the pipe.

3. foxBMS default cell config is NOT NMC

Context: foxBMS ships with cell thresholds for a low-voltage chemistry (2500mV nominal, 1500-2800mV range). Mistake: Assumed foxBMS defaults match NMC. Plant model sent 3700mV → immediate overvoltage fault. Fix: Patch battery_cell_cfg.h for NMC (3700mV nominal, 2500-4250mV range). Principle: Always verify cell chemistry config matches the simulated battery. Read the actual #define values, don't assume.

4. IVT current sign convention is configurable

Context: Plant model sent negative current for discharge (common IVT convention). Mistake: foxBMS has BS_POSITIVE_DISCHARGE_CURRENT = true — positive = discharge. Our negative current was interpreted as charge → overcurrent charge fault. Fix: Check BS_POSITIVE_DISCHARGE_CURRENT in battery_system_cfg.h and match plant model sign. Principle: Sign conventions are config-dependent. Read the config. Don't assume "negative = discharge" is universal.

5. DIAG grace period must be time-based, not call-based

Context: Added startup grace period using a call counter (8000 calls ≈ 8s). Mistake: DIAG_Handler is called from 1ms + 10ms + 100ms tasks. 8000 calls ≈ 1.5s, not 8s. Fix: Use OS_GetTickCount() for real elapsed time. Principle: Call count ≠ wall time when function is called from multiple periodic tasks.

6. SIL override command format has active byte

Context: Test runner packed override as [cmd, idx, value_BE]. Mistake: sil_process_command expects [cmd, idx, active, value_LE]. Missing active byte, wrong endianness. Fix: Pack as struct.pack("<BBBi", cmd, idx, 1, value). Principle: Always read the receiver code to verify the wire format. Don't guess.

7. Contactor feedback should not echo command

Context: SPS stub set feedback = currentSet (command echoed as feedback). Mistake: No independent feedback path → welding/stuck-open impossible to simulate. Fix: Contactor feedback reads from SIL override table first, then falls back to SPS simulation. Re-enabled DIAG IDs 51-53 for contactor feedback mismatch. Principle: Feedback must be independent of command. In SIL, use override table. In HIL, use real GPIO.

8. 3D arrays in foxBMS database structs

Context: Patch injected pCV->cellVoltage_mV[s][c] = v; Mistake: Array is 3D: [BS_NR_OF_STRINGS][BS_NR_OF_MODULES_PER_STRING][BS_NR_OF_CELL_BLOCKS_PER_MODULE]. Fix: Use pCV->cellVoltage_mV[s][m][c] with module loop. Principle: Always check struct definition before writing array access. foxBMS uses 3D arrays even with 1 module.

2026-03-21 — Test Execution Time

9. vECU restart dominates test time

Context: Each FATAL fault test triggers ERROR → must restart vECU (~8s startup). Mistake: Ran all tests sequentially with restart between each. 200 tests × 8s = 27 min. Principle: Sort tests to minimize restarts. Run all WARNING tests first (no restart needed), then all FATAL tests batched together.