Sim EMS β Implementation Roadmap (Python + FastAPI + PostgreSQL + React)
0) Executive Summary
Sim EMS is a best-in-class Energy Management System that combines:
- Real-time edge control (deterministic cycle, state machines, safety-first control)
- Enterprise analytics (multi-tenancy, hierarchy, reporting, billing, carbon)
- Predictive intelligence (forecasting + optimization via MPC)
This README defines:
- A phased roadmap from MVP to production
- The Python-first architecture for both Edge and Backend
- The algorithms (control, estimation, forecasting, optimization)
- The design patterns and standards used across the system
1) Canonical Tech Stack
1.1 Edge (On-Prem / Device)
Language: Python 3.11+
Runtime: Linux (Raspberry Pi / IPC), systemd
Control Loop: asyncio-based cycle (deterministic)
Recommended libraries:
- Async: asyncio, anyio
- Messaging (optional): paho-mqtt, pyzmq
- Modbus: pymodbus
- CAN: python-can, cantools
- Local storage: sqlite3 (local config/state), optional duckdb
- Validation: pydantic
- Logging: Python logging with JSON formatter
1.2 Backend (Cloud / Data Platform)
API: FastAPI (Python 3.11+)
DB: PostgreSQL 15+ (TimescaleDB extension recommended for time-series)
Caching/Queue (optional): Redis + Celery/RQ
Recommended libraries:
- FastAPI + uvicorn
- Auth: python-jose, passlib, OIDC integration (enterprise)
- DB: SQLAlchemy + alembic (migrations)
- Async DB: asyncpg (or SQLAlchemy async)
- Observability: prometheus-client, OpenTelemetry (optional)
1.3 UI (Web)
Frontend: React 18 + TypeScript
Visualization: Apache ECharts (or Recharts)
State: Redux Toolkit (or React Query + Zustand)
1.4 Integration Fabric (Recommended)
- Telemetry ingestion: MQTT/HTTPS/WebSocket
- Command path: HTTPS + MQTT/WebSocket to edge
- Schema: versioned JSON payloads + strict validation
2) System Architecture (Python-First)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BACKEND (FastAPI) β
β β’ Auth/RBAC β’ Tenants β’ Org Hierarchy β’ Device Registry β
β β’ Ingestion APIs β’ Reports β’ Billing β’ Carbon β’ Optimization Schedules β
β β’ WebSocket for Live Data β’ Admin APIs β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββ
β TLS
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA (PostgreSQL) β
β β’ Metadata tables β’ Timeseries hypertables (TimescaleDB optional) β
β β’ RLS policies (multi-tenant) β’ Audit logs β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ
β EDGE (Python Control Core) β
β β’ IPO cycle + Process Image + Scheduler β
β β’ Controllers (Strategy plugins) β
β β’ Bridges (Modbus/CAN/MQTT/HTTP/OCPP) β
β β’ Local buffer β’ Offline-first β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ
β PHYSICAL DEVICES β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3) Design Patterns (What We Use and Why)
3.1 Edge Control Patterns
- IPO Cycle (InputβProcessβOutput)
-
Ensures predictable read/compute/write sequencing.
-
Process Image (Immutable Snapshot)
- Input writes to
next_value; at cycle start, snapshot swaps intovalue. -
Controllers read only from the frozen snapshot.
-
Priority Scheduler
-
Controllers run in strict order: Manual override β Safety β Grid β Optimization β Aux.
-
State Machine (Hierarchical FSM)
- Top-level states:
INIT,STANDBY,RUN,FAULT,SHUTDOWN. -
Control-mode states:
PEAK_SHAVING,SELF_CONSUMPTION,DROOP,BACKUP,ISLAND. -
Strategy Pattern (Controllers)
-
Each control strategy implements a common interface (
Controller.execute(ctx)). -
Adapter Pattern (Device Drivers)
-
Protocol-specific mapping to standardized device interfaces (ESS, Meter, PV, EVCS).
-
Circuit Breaker + Retry/Backoff (Bridges)
- Prevents cascading failures from unstable devices.
3.2 Backend Patterns
- API Gateway (FastAPI)
-
Consolidates authentication, authorization, and routing.
-
Domain-Driven Boundaries
-
Separate domains: telemetry, billing, carbon, reporting, optimization.
-
Event-Driven Processing (Optional)
-
Ingestion triggers aggregation jobs, anomaly checks, report scheduling.
-
Multi-Tenant Isolation
-
Recommended: PostgreSQL Row-Level Security (RLS) + tenant_id everywhere.
-
CQRS (Optional at scale)
- Separate write model (ingestion) from read model (dashboards/reports).
4) Algorithms (Control, Estimation, Forecasting, Optimization)
This section lists the canonical algorithms Sim EMS uses. Every algorithm must be: - bounded-time on edge (or moved to backend) - observable (metrics) - testable (scenario replay)
4.1 Control Algorithms (Edge)
- Peak Shaving (Reactive + Hysteresis + Ramp Limit)
- Objective: keep grid import below threshold to reduce demand charges.
-
Key elements:
- hysteresis band to avoid oscillation
- ramp-rate limiting to reduce stress
-
Self-Consumption Balancing (Grid-Zero / Export-Limited)
-
Objective: maximize onsite PV usage by charging/discharging BESS.
-
Frequency Droop Control (Primary Frequency Support)
- Objective: respond to frequency deviations.
-
Enhancements:
- deadband
- SOC reserve window
-
Virtual Inertia (Optional)
-
Adds dF/dt component for fast frequency response.
-
Reactive Power / Power Factor Control
-
Objective: maintain PF targets or voltage support where supported.
-
Backup Reserve + Island Mode Sequencing
- Maintains SOC floor for backup.
-
Island sequences: detect grid loss, isolate, stabilize, resynchronize.
-
PID / Ramp Tracking (Optional)
- For smooth setpoint tracking where inverter dynamics require it.
4.2 Battery State Estimation (Edge + Backend)
- SOC by Coulomb Counting
-
Integrate current over time; drift corrected periodically.
-
OCV Correction (Rest-State Recalibration)
-
Only valid when current ~0 and battery at rest.
-
Extended Kalman Filter (EKF) + Equivalent Circuit Model (ECM)
-
Fuses current + voltage + temperature to reduce drift.
-
SOH Estimation (Backend-first)
- Physics-informed ML (e.g., XGBoost) using:
- cycle count (DoD-weighted)
- calendar aging features
- temperature stress
- C-rate stress
4.3 Forecasting (Backend)
- Persistence Baseline
-
Very short horizon forecast.
-
Similarity Model (kNN)
-
Finds similar historical days based on time/season/weather features.
-
LSTM (24h Load/PV Forecast)
-
Learns non-linear time dependencies.
-
Ensemble
-
Weighted combination of persistence + kNN + LSTM.
-
Price Ingestion + Prediction
- Day-ahead market ingestion; predictive model optional.
4.4 Optimization (Backend + Edge Execution)
- Model Predictive Control (MPC)
- Horizon: 24h, step: 15 min.
-
Objective options:
- minimize cost
- reduce demand charges
- reduce carbon
- reduce battery degradation
-
Real-Time Constraint Arbitration (Edge)
- Applies hard constraints and caps schedule tracking.
- Optional linear solver:
- resolves conflicting requests from controllers
5) Data Model (PostgreSQL)
5.1 Core Tables (Conceptual)
tenantsusers,roles,permissionsorg_nodes(enterprise/site/building/floor/space/equipment)devices(type, vendor, protocol, config)channels(device points: unit, scaling, metadata)measurements(time-series; Timescale hypertable recommended)events(alarms, faults, operator actions)tariffs,billing_runs,invoicescarbon_factors,emissionscommands(backend β edge),command_acksaudit_log(immutable)
5.2 Multi-Tenant Strategy
- Add
tenant_idto every row that is tenant-owned. - Use PostgreSQL RLS policies to enforce isolation.
6) Roadmap (Phases, Deliverables, Acceptance Criteria)
Time estimates are indicative. You can compress/expand based on team size.
Phase 0 β Repo & Foundations (Weeks 1β2)
Deliverables: - Repo structure (edge/backend/ui) - Local dev stack (docker compose: postgres + backend + ui) - CI pipeline (lint/test/build)
Acceptance:
- backend starts, connects to Postgres
- ui builds and loads
- health endpoints and basic logging exist
Phase 1 β Edge MVP + Telemetry (Weeks 3β6)
Scope: - Edge IPO cycle + Process Image + channel model - One bridge: Modbus TCP - One device mapping: Meter + ESS minimal - Local buffer + offline-first queue - Backend ingestion API + time-series storage
Acceptance: - Stable cycle timing within configured bounds - Telemetry stored in PostgreSQL and visible in UI - Basic alarms for stale data
Phase 2 β Safety + Core Controllers (Weeks 7β10)
Scope: - Safety controllers: SOC limits, temp derating, voltage protection - Peak shaving controller (hysteresis + ramp) - Manual override path - Fault state machine and recovery procedures
Acceptance: - Safety limits always override optimization - Faults latch and require explicit clear - Setpoints never violate configured hard constraints
Phase 3 β Enterprise Core (Weeks 11β16)
Scope: - Multi-tenancy + RBAC - Org hierarchy - Device registry + provisioning workflow - Reporting basics (daily/weekly energy summaries)
Acceptance: - Tenant isolation verified - Role restrictions enforced - Reports reproducible with audit trails
Phase 4 β Billing + Carbon (Weeks 17β22)
Scope: - Tariff engine (TOU + demand charge baseline) - Invoices/export - Carbon factors + emissions reports
Acceptance: - Billing results match test fixtures - Carbon reports are traceable and consistent
Phase 5 β Forecasting + MPC (Weeks 23β30)
Scope: - Forecasting service (baseline + kNN + LSTM) - MPC schedule generation (CVXPY) - Edge schedule tracker controller
Acceptance: - MPC produces feasible schedules under constraints - Edge follows schedule but never violates safety - Drift monitoring for forecasts
Phase 6 β EVCS + Advanced Grid Services (Weeks 31β38)
Scope: - OCPP integration - Smart charging + surplus charging - Droop + virtual inertia options
Acceptance: - EVCS cluster stays within site power cap - Grid services respond within configured time
Phase 7 β Production Hardening (Weeks 39+)
Scope: - Full observability (metrics, tracing) - Security hardening (cert auth, secrets mgmt) - Load tests + failure injection - Upgrade/rollback strategy
Acceptance: - SLOs defined and measured - Documented operational runbooks - Disaster recovery and backup tested
7) Testing & Validation Strategy
Edge
- Unit tests for controllers
- Scenario replay tests (recorded telemetry β deterministic outputs)
- Hardware-in-the-loop (HIL) where possible
Backend
- Contract tests for ingestion payloads
- Billing fixtures and golden tests
- Multi-tenant security tests (RLS)
UI
- Component tests
- End-to-end flows for provisioning, dashboards, billing
8) βAdd New Featureβ Rules (Engineering Governance)
Every new feature must define: - type (controller/bridge/backend/ml/ui) - owner and acceptance criteria - safety impact analysis - test plan - observability (metrics + logs) - fallback behavior
9) Next Actions
If you want, I can generate: - a matching folder scaffold (edge/backend/ui) aligned to this roadmap - an initial API contract (OpenAPI) for ingestion + commands - a minimal database schema (Alembic migrations)