SimTestLab EMS - System Architecture
Overview
SimTestLab EMS uses a microservices architecture with clear separation of concerns:
- Django: User management, authentication, RBAC, and admin operations
- Next.js: Data visualization, dashboards, and user-facing application
- FastAPI: Real-time WebSocket connections and telemetry streaming
- Edge Controllers: Device control logic and telemetry collection
Key Design Principles: Separation of concerns, admin-first design, real-time performance, independent scalability, and Docker-based development
Technology Stack
Backend Services
| Service | Technology | Port | Purpose |
|---|---|---|---|
| User Management | Django 6.0 + DRF | 8000 | Authentication, users, organizations, permissions |
| WebSocket Service | FastAPI + uvicorn | 8001 | Real-time data streaming, WebSocket connections |
| Edge Controller | TBD (Python) | TBD | Device control, telemetry collection |
Frontend
| Component | Technology | Port | Purpose |
|---|---|---|---|
| Visualization App | Next.js 16 + React 19 | 3000 | Dashboards, charts, user interface |
Infrastructure
| Service | Technology | Port | Purpose |
|---|---|---|---|
| Database | PostgreSQL 16 | 5432 | User data, organizations, sites, metadata |
| Cache/PubSub | Redis 7 | 6379 | Session storage, caching, real-time message broker |
| Time-Series DB | TimescaleDB (optional) | 5433 | Historical telemetry data (future) |
Development Tools
- Docker Compose: Local development orchestration
- TypeScript: Type safety in frontend
- OpenAPI: API documentation and client generation
- SWR: Frontend data fetching and caching
Architecture Diagram
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Docker Compose Network β
β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β Next.js β β Django β β FastAPI β β
β β Frontend β β Backend β β WebSocket β β
β β Port: 3000 β β Port: 8000 β β Port: 8001 β β
β β β β β β β β
β β β’ Dashboards β β β’ User Auth β β β’ WebSockets β β
β β β’ Data Viz βββββΊβ β’ RBAC ββββββ β’ Streaming β β
β β β’ Charts β β β’ Admin Panel β β β’ Real-time β β
β β β’ User Profile β β β’ JWT Tokens β β β’ PubSub β β
β β β β β’ Organizations β β β β
β ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ β
β β β β β
β β REST API β JWT Validation β β
β ββββββββββββββββββββββββ (on WS connect) β β
β β β β
β β β β
β ββββββββββββΌβββββββββββ ββββββββββββββΌβββββββββ β
β β PostgreSQL β β Redis β β
β β Port: 5432 β β Port: 6379 β β
β β β β β β
β β β’ Users β β β’ Permission Cache β β
β β β’ Organizations β β β’ PubSub Channels β β
β β β’ Sites β β β’ WebSocket State β β
β β β’ Roles/Permissions β β β’ Telemetry Stream β β
β βββββββββββββββββββββββ βββββββββββββββββββββββ β
β β² β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββ
β
Publishes telemetry to Redis
β
ββββββββββββββ΄βββββββββββββ
β Edge Controllers β
β (Future Development) β
β β
β β’ Device Control β
β β’ Telemetry Collection β
β β’ Local Processing β
βββββββββββββββββββββββββββ
Service Responsibilities
1. Django Backend (User Management)
Primary Responsibility: User authentication, authorization, and administrative operations
Core Features: - User authentication (JWT-based) - Role-based access control (RBAC) - Multi-tenant organization management - Site hierarchy and access control - Django Admin panel for operations - Audit logging and security policies
Key Endpoints: Authentication, user management, organizations, sites, permissions
Data Models: User, Organization, Site, Role, SiteAccess, AuditLog
Communication with FastAPI:
Django provides authentication services to FastAPI but does NOT handle telemetry streaming:
- JWT Validation Endpoint:
- FastAPI calls Django's
/api/validate-tokenwhen WebSocket connects - Django validates JWT signature, checks expiry, verifies user status
- Returns user permissions and accessible sites list
-
This is the only synchronous communication between the services
-
Permission Change Events:
- When admin modifies user roles/permissions in Django Admin
- Django publishes event to Redis:
PUBLISH user:456:permission_changed - FastAPI listens to these events and invalidates its permission cache
- Asynchronous, event-driven (no direct HTTP call)
What Django Does NOT Do: - β Handle WebSocket connections (that's FastAPI's job) - β Stream telemetry data (that flows through Redis β FastAPI) - β Validate permissions for every message (FastAPI uses cached permissions) - β Store telemetry data (that goes to TimescaleDB in future, or stays in Redis)
Django's Focused Role: User/org/site management, authentication, and admin operations only
2. FastAPI WebSocket Service (Real-time)
Primary Responsibility: Real-time data streaming and WebSocket connections
Core Features: - WebSocket connections for live telemetry - Redis PubSub subscription and broadcasting - JWT validation and permission caching - Low-latency data delivery
Communication with Django:
FastAPI communicates with Django only during initial WebSocket connection, not during data streaming:
- Initial Connection (ONE-TIME Django Call):
- Client connects to FastAPI WebSocket with JWT token
- FastAPI makes HTTP GET request to Django:
/api/validate-token - Django validates JWT signature, expiry, and returns user permissions
- FastAPI caches
{user_id: accessible_sites, role, permissions}in Redis (5-min TTL) -
Connection accepted/rejected based on site access
-
Ongoing Streaming (NO Django):
- Edge Controller β Redis:
PUBLISH telemetry:site_123 {data} - FastAPI receives via Redis PubSub subscription
- FastAPI checks cached permissions from Redis (not Django)
- FastAPI broadcasts to authorized WebSocket clients
-
Django is NOT in this critical path
-
Permission Updates (EVENT-DRIVEN):
- Admin changes user role in Django Admin
- Django β Redis:
PUBLISH user:456:permission_changed - FastAPI receives event, invalidates cache, notifies client
- Client reconnects (goes back to step 1)
Why This Separation? - Performance: No HTTP overhead during streaming (10,000+ msg/sec possible) - Scalability: FastAPI handles WebSockets independently without Django bottleneck - Resilience: Django can restart without dropping WebSocket connections - Industry Standard: Pattern used by OpenEMS, MyEMS, AWS IoT Core
Key Principle: Event-driven communication via Redis, not request-driven. FastAPI validates once, then streams independently based on cached permissions
3. Next.js Frontend (Visualization)
Primary Responsibility: User interface, data visualization, and user experience
Core Features: - Real-time dashboards with Recharts - Multi-site hierarchy navigation - Alert management and historical data analysis - Clean Architecture (Domain/Application/Infrastructure/Presentation layers)
Integration: - REST API calls to Django (authentication, user data, historical queries) - WebSocket connections to FastAPI (real-time telemetry) - JWT token management with automatic refresh
4. Edge Controllers (Future Development)
Primary Responsibility: Device control and telemetry collection at the edge
Core Features: Device communication (Modbus, BACnet, MQTT), local control logic, telemetry aggregation, offline buffering, Redis publishing
Architecture Decisions: Separate from Django for performance, independent deployment (on-site or cloud), autonomous operation during cloud outages, periodic metadata sync with Django
User Roles & Access Control
User Types
1. Application Users
Purpose: End users who consume the EMS application
Roles:
a) Viewer - View dashboards and reports - See sites they have access to - Cannot modify settings - Read-only access
b) Operator - All Viewer permissions - Acknowledge alerts - Export reports - Control devices (future) - Cannot manage users
c) Site Manager - All Operator permissions - Manage site-specific settings - Add/remove operators for their sites - Configure site parameters - Cannot access other sites
d) Organization Admin - All Site Manager permissions - Access all sites in their organization - Manage organization users - View organization-wide analytics - Cannot access other organizations
2. Admin/Developer Users
Purpose: Maintain and manage the application itself
Roles:
a) Platform Admin (Django Superuser) - Full access to Django Admin - Create/delete organizations - Assign any role to any user - View all audit logs - System-wide configuration - Database access (via admin panel)
b) Developer - Access to development tools - API documentation - Test data generation - Staging environment access - Cannot access production user data (unless also Platform Admin)
Permission Matrix
| Action | Viewer | Operator | Site Mgr | Org Admin | Platform Admin |
|---|---|---|---|---|---|
| View dashboards | β | β | β | β | β |
| Export reports | β | β | β | β | β |
| Acknowledge alerts | β | β | β | β | β |
| Control devices | β | β* | β | β | β |
| Manage site settings | β | β | β | β | β |
| Add site users | β | β | β | β | β |
| Manage org users | β | β | β | β | β |
| Access other orgs | β | β | β | β | β |
| Django Admin access | β | β | β | β | β |
*Operators may have limited control permissions based on site configuration
Multi-Tenant Architecture
Organization Hierarchy:
Platform (SimTestLab EMS)
βββ Organization (e.g., "Acme Corp")
βββ Site 1 (e.g., "Manufacturing Plant A")
β βββ Building 1
β β βββ HVAC System
β β βββ Lighting
β βββ Building 2
βββ Site 2 (e.g., "Warehouse B")
βββ Solar Array
Access Control: - Users belong to one or more organizations - Site access is explicitly granted (not inherited) - Organization admins can only manage their org's users - Platform admins can see everything (via Django Admin)
Data Isolation:
- Database-level: WHERE organization_id = user.organization_id
- API-level: Django middleware enforces organization boundaries
- WebSocket-level: FastAPI validates site access before streaming
Real-time Telemetry Flow (Critical Path)
Edge β Redis β FastAPI β Frontend
- Edge Controller: Collects sensor data (Modbus, BACnet, MQTT) β Publishes to Redis channel
telemetry:site_123 - FastAPI Background Worker: Subscribes to Redis channels (
telemetry:*,alerts:*,user:*:events) β Receives message β Checks cached permissions β Broadcasts to authorized WebSocket clients - Next.js Frontend: Opens WebSocket with JWT β FastAPI validates (one-time Django call) β Caches permissions in Redis (5-min TTL) β Streams data to client β Updates charts in real-time
Key Advantage: Django NOT in critical path after initial auth. Edge data flows directly through Redis β FastAPI β Client without Django overhead
Permission Update Flow (Event-Driven)
- Django Admin: Admin updates user role β Django saves to PostgreSQL β Invalidates Redis cache β Publishes event to Redis
user:456:permission_changed - FastAPI: Receives Redis event β Finds active WebSocket connections for user β Sends control message to client
- Next.js: Receives control message β Refetches permissions or reconnects WebSocket
Advantage: Instant updates (sub-second), no polling overhead, graceful user experience
Historical Data Query Flow
- Next.js: User requests historical data β GET
/api/telemetry/historicalwith JWT - Django: Validates JWT β Checks site access permissions β Queries TimescaleDB/PostgreSQL β Returns aggregated data
- Next.js: Caches response (SWR/React Query) β Renders charts
Why REST API (not WebSocket): One-time bulk fetch, HTTP caching, simpler for static historical data
Edge Controller Connection Flow
- Startup: Edge loads config β Authenticates with Django using API key β Caches site_id
- Data Collection: Reads devices (Modbus, BACnet, MQTT) every 5s β Publishes to Redis
telemetry:site_123β Optionally writes to TimescaleDB for historical storage - Error Handling: Buffers locally on Redis failure, continues on Django failure (autonomous operation)
Deployment Models: - Cloud-Hosted: Gateway β VPN β Cloud Edge Controller β Redis (easier setup) - On-Premises: Edge on Raspberry Pi/Industrial PC on-site β Local Redis β Cloud Redis replication (more resilient, offline operation)
Production Deployment (Kubernetes)
Architecture: Load Balancer β Next.js (3 replicas) + Django (2 replicas) β FastAPI WebSocket (2 replicas) β PostgreSQL (primary + replicas) + Redis Cluster
Scaling: Django scales on API load, FastAPI scales on WebSocket connections, Next.js scales on user traffic
Pros of This Architecture
1. Leverages Django's Strengths
- Django Admin: Out-of-the-box powerful admin panel for user/org management
- Permissions Framework: Built-in RBAC with groups and permissions
- Battle-tested Auth: Industry-standard authentication and security
- Audit Trail: Easy to add logging for compliance (who changed what, when)
2. Real-time Performance
- FastAPI WebSockets: Native async support, lower latency than Django Channels
- Redis PubSub: Instant message broadcasting to all connected clients
- No Polling: Eliminates constant HTTP requests, reduces server load
- Sub-second Updates: Telemetry appears on dashboards immediately
3. Clear Separation of Concerns
- Django: User management only (not burdened with telemetry streaming)
- FastAPI: Real-time only (not burdened with user CRUD)
- Next.js: Visualization only (not burdened with business logic)
- Clean Boundaries: Teams can work independently on each service
4. Scalability
- Independent Scaling: Scale WebSocket service separately from API service
- Service Isolation: Django crash doesn't affect WebSocket connections
- Database Optimization: PostgreSQL for users, TimescaleDB for telemetry (future)
- Horizontal Scaling: Add more containers/pods as needed
5. Developer Experience
- Docker Compose:
docker-compose upstarts entire stack - Same Environment: Eliminates "works on my machine" issues
- Hot Reload: All services support live code updates during development
- OpenAPI Docs: Auto-generated API documentation (FastAPI + DRF-spectacular)
6. Modern Tech Stack
- TypeScript: Type safety reduces bugs in complex EMS logic
- React Ecosystem: Rich component libraries (Recharts, Radix UI)
- Clean Architecture: Domain-driven design in frontend
- Industry Standard: Docker/Kubernetes deployment path
7. Enterprise Features
- Multi-tenancy: Organization/site hierarchy with data isolation
- RBAC: Granular permissions for different user roles
- Audit Logging: Track all changes for compliance
- Security: JWT tokens, CORS, SQL injection protection
Technology Decisions
Why Django?
- Admin Panel: Best-in-class admin interface for free
- ORM: Powerful database abstraction
- Security: Built-in protection against common vulnerabilities
- Ecosystem: Mature, well-documented, huge community
- Stability: Production-ready, used by Instagram, Pinterest, NASA
Why FastAPI (instead of Django Channels)?
- Performance: 3-5x faster than Django for async workloads
- Simplicity: Native WebSocket support, no extra layers
- Documentation: Auto-generated OpenAPI/Swagger docs
- Modern: Built on Starlette and Pydantic (type-safe)
- Learning Curve: Easier than Django Channels for WebSocket-only use case
Why Next.js?
- React Ecosystem: Best component libraries and tools
- Performance: Server-side rendering, static generation
- Developer Experience: Hot reload, TypeScript support
- API Routes: Can handle some backend logic if needed
- Production Ready: Used by Netflix, TikTok, Twitch
Why Redis?
- Speed: In-memory data structure store (microsecond latency)
- PubSub: Built-in message broker for real-time updates
- Caching: Reduce database load for frequently accessed data
- Session Store: Fast session storage for Django
- Simple: Easy to set up and maintain
Why PostgreSQL?
- ACID Compliance: Data integrity for critical user/org data
- JSON Support: Can store semi-structured data if needed
- Extensions: PostGIS for location data, TimescaleDB for time-series
- Reliability: Industry standard, battle-tested
- Django Support: First-class ORM support