Chapter 12: Operations & Maintenance
O&M requirements, daily monitoring, preventive maintenance, and systematic troubleshooting
12.1 O&M Requirements
Effective operations and maintenance of an underground parking surveillance system requires a structured program that covers six areas: health monitoring cycles, physical inspection schedules, patch management, spare parts inventory, access control reviews, and defined service level agreements (SLAs). Each area has a defined cycle, responsible party, and documentation requirement. The O&M program must be established and funded before system handover — retrofitting an O&M program after handover is significantly more expensive and less effective than designing it in from the start.
| O&M Item | Requirement | Typical Cycle | Responsible Party | Documentation |
|---|---|---|---|---|
| Health Check | Monitor video loss, bitrate anomalies, NTP offset, RAID status, disk SMART, PoE power, uplink utilization, storage capacity | Daily (automated monitoring with alerting) | O&M team / NOC | Daily health report; alert log; trend charts |
| Physical Inspection | Inspect seals and cable glands for degradation; check for fogging or condensation; inspect for vandalism signs; verify desiccant pack condition; check cabinet RH and temperature | Weekly for entrances; monthly for interior cameras and cabinets | Site technician | Inspection checklist; photo record of defects; work order for repairs |
| Patch Management | Apply firmware updates to cameras and network devices; update VMS software; test in staging environment before production deployment; maintain rollback plan | Monthly for critical security patches; quarterly for feature updates | System administrator | Patch log; pre/post firmware version record; rollback procedure |
| Spare Parts Inventory | Maintain on-site spares: at least 1 LPR camera, 1 dome camera, 1 PoE switch (matching installed model), 2 HDDs (matching installed model), cable glands, sealant, desiccant packs | Audit quarterly; replenish within 30 days of use | O&M team / procurement | Spare parts inventory log; replenishment records |
| Access Review | Review all VMS accounts, roles, and permissions; disable inactive accounts; verify MFA is active on all admin accounts; review audit logs for anomalies | Monthly | Security administrator | Access review report; account change log; audit log summary |
| SLA | Entrance LPR cameras: 4-hour restoration SLA (critical); interior cameras: 24-hour restoration SLA (major); storage: 8-hour restoration SLA (critical); VMS: 4-hour restoration SLA (critical) | Defined at contract; measured monthly | O&M team / service provider | Monthly SLA compliance report; incident log with resolution times |
12.2 Daily Monitoring
Daily monitoring is the foundation of proactive O&M. The monitoring system must be configured to generate alarms at three severity levels — Critical, Major, and Minor — with defined response procedures for each level. The alarm grading and linkage system ensures that the right person is notified at the right time, and that every alarm generates a documented response. An alarm that is generated but not acted upon is worse than no alarm — it creates a false sense of security while the underlying problem continues to develop.
| Alarm Grade | Examples | Response Procedure | Target Response Time | Documentation |
|---|---|---|---|---|
| Critical | Entrance LPR camera offline; core switch down; storage full; VMS service failure; UPS on battery | Notify on-call engineer immediately; create priority ticket; display alert on monitoring wall; escalate if not acknowledged within 15 minutes | Acknowledge: 15 min; Restore: per SLA (4h for entrance LPR) | Incident ticket; root cause analysis; corrective action record |
| Major | Multiple cameras offline in one zone; storage utilization >85%; uplink utilization >80%; PoE budget >90%; RAID degraded (1 disk failed) | Create scheduled maintenance ticket; notify site manager; schedule repair within 24 hours; monitor for escalation to Critical | Acknowledge: 1h; Repair scheduled: 24h | Maintenance ticket; repair record; post-repair verification |
| Minor | Single camera blurry or slightly misaligned; NTP offset 1–5 seconds; single disk SMART warning; cabinet temperature 35–40°C | Add to batch repair list; schedule repair at next routine maintenance visit; monitor for escalation | Acknowledge: 4h; Repair: next maintenance cycle | Batch repair list; maintenance record |
The following monitoring parameters must be configured in the VMS and network management system with the alarm thresholds defined above. Each parameter has a defined polling interval and a defined alarm threshold. Parameters marked as critical must have redundant monitoring paths — if the primary monitoring system fails, a secondary alert mechanism (such as SNMP trap to a secondary receiver) must be active.
| Parameter | Polling Interval | Warning Threshold | Critical Threshold | Monitoring Method |
|---|---|---|---|---|
| Camera online status | 30 seconds | — | Camera offline >2 min | VMS heartbeat |
| Recording bitrate per camera | 1 minute | <50% or >150% of baseline | <10% of baseline (recording gap) | VMS analytics |
| NTP offset | 5 minutes | >1 second | >5 seconds | VMS NTP monitor |
| Storage utilization | 5 minutes | >80% | >95% | VMS storage monitor |
| RAID health | 1 minute | 1 disk failed (degraded) | 2 disks failed (critical) | Storage controller SNMP |
| Uplink utilization | 1 minute | >70% | >90% | SNMP from switch |
| PoE budget utilization | 1 minute | >80% | >95% | SNMP from PoE switch |
| Cabinet temperature | 1 minute | >35°C | >40°C | Temperature sensor / SNMP |
| Cabinet humidity | 5 minutes | >60% RH | >75% RH | RH sensor / SNMP |
| UPS status | 1 minute | Battery <80% health | On battery / battery <50% health | UPS SNMP |
12.3 Preventive Maintenance
Preventive maintenance is the most cost-effective way to extend system life and prevent unplanned outages. The following twelve preventive maintenance activities are organized by frequency and must be documented in the maintenance log. Activities marked as high-priority must not be deferred — deferring them creates a predictable failure mode that will result in an unplanned outage at the worst possible time.
| # | Maintenance Activity | Frequency | Priority | Why It Matters |
|---|---|---|---|---|
| 1 | Clean camera lenses and housings; remove dust and spider webs from IR illuminator arrays | Monthly | High | Dust on lens reduces image sharpness; spider webs trigger false motion alarms |
| 2 | Check all seals and cable glands for degradation; retighten loose glands | Monthly | High | Degraded seals allow moisture ingress → connector corrosion → link failure |
| 3 | Replace desiccant packs in junction boxes and cabinets; check color indicator | Every 6–12 months | High | Saturated desiccant allows condensation → fogged lens → unusable images |
| 4 | Review cabinet RH trend charts; investigate any upward trend before threshold is reached | Monthly | High | Upward RH trend indicates seal failure before it causes visible damage |
| 5 | Tighten all terminal block connections and grounding ring lugs | Annually | Medium | Thermal cycling loosens terminals → high-resistance connection → surge damage |
| 6 | Check PoE power headroom on all switches; verify peak load is below 80% of budget | Monthly | High | PoE overload causes camera reboots and recording gaps |
| 7 | Review uplink utilization trend; verify sustained utilization is below 70% | Monthly | High | Uplink congestion causes packet loss → recording gaps → evidence gaps |
| 8 | Verify RAID health and run disk SMART checks; replace any disk with SMART warnings | Monthly | High | Ignoring SMART warnings leads to disk failure during RAID rebuild → data loss |
| 9 | Test NTP offset on all cameras and VMS; verify offset within ±1 second | Monthly | High | Time drift invalidates evidence timeline; must be caught before a dispute arises |
| 10 | Run quarterly worst-case lighting test: drive-through with headlights on at night | Quarterly | High | Exposure profiles drift over time; quarterly test catches degradation before an incident |
| 11 | Verify backup and export integrity: export a test clip and verify hash matches VMS record | Monthly | Medium | Export failure discovered during an incident is too late; verify proactively |
| 12 | Review VMS and network device audit logs for unauthorized access or configuration changes | Monthly | High | Unauthorized configuration changes can disable recording without triggering an alarm |
12.4 Troubleshooting & Repair
The troubleshooting and repair table documents the ten most common failure scenarios encountered in underground parking surveillance systems during the operational phase. Each scenario has a defined diagnostic sequence: identify the symptom, determine the likely cause, locate the fault, isolate it from the rest of the system, restore service, and conduct a postmortem to prevent recurrence. The postmortem step is the most frequently skipped — and the most valuable for preventing repeat failures.
| Symptom | Likely Cause | Locate | Isolate | Restore | Postmortem |
|---|---|---|---|---|---|
| Plate unreadable at night | Headlight glare; wrong exposure profile; WDR disabled | Compare test clips at different times; check exposure profile in VMS | Disable automatic exposure; switch to manual profile for testing | Adjust shutter speed and WDR; retest with drive-through | Update zone exposure profile template; add to quarterly test checklist |
| Random camera reboot | PoE overload; startup surge exceeds switch budget | Check PoE switch logs for power events; measure peak PoE draw | Move affected cameras to less-loaded switch ports | Upgrade switch PoE budget; redistribute cameras | Review PoE power budget calculation; update design standard |
| Fogged image in morning | Condensation inside housing; saturated desiccant | Check RH logs for overnight trend; inspect junction box seal | Dry housing; remove desiccant pack | Reseal junction box; install new desiccant; add RH sensor threshold alert | Investigate seal failure cause; update desiccant replacement schedule |
| Playback slow or fails | Storage write latency high; disk SMART warning; too many concurrent clients | Check disk SMART metrics; measure write latency; count concurrent VMS sessions | Limit concurrent playback clients; isolate affected storage volume | Replace failing disk; rebuild RAID; add cache or expand storage | Update capacity plan; add write latency monitoring alert |
| Gaps on many cameras simultaneously | Uplink congestion; network storm; switch failure | Check uplink utilization on affected switch; check for broadcast storm | Enable storm control; cap per-camera bitrate temporarily | Upgrade uplink; enable QoS; replace failed switch | Update bandwidth model; verify storm control is enabled on all switches |
| Time mismatch between cameras | NTP server unreachable; firewall blocking UDP 123; NTP not configured | Check NTP offset in VMS; ping NTP server from camera network | Point cameras to backup NTP server temporarily | Fix NTP reachability; resync all cameras; verify offset within ±1 second | Add NTP monitoring alert; add backup NTP server to configuration standard |
| VMS login anomaly / unauthorized access | Credential leak; shared account; no MFA | Review VMS audit log for login events; check IP addresses of login attempts | Disable compromised account immediately; block suspicious IP | Rotate all credentials; enable MFA; review RBAC assignments | Security hardening review; add login anomaly detection rule |
| Export fails hash verification | Disk corruption; RAID rebuild interrupted; UPS failure during write | Check storage health; review UPS event log for power events | Move export to backup storage volume | Rebuild RAID; verify UPS graceful shutdown script; re-export from backup | Verify UPS shutdown sequence; add export hash verification to monthly checklist |
| Entire zone goes black | Fiber cut; switch failure; power outage to distribution cabinet | Check link status on floor switch; check power to distribution cabinet | Fail over to B fiber path; verify B path is operational | Repair fiber; replace switch; restore power; verify recording resumes | Verify A/B path physical separation; update fiber route map |
| Frequent false motion alarms | Analytics rules mis-tuned; sensitivity too high; lighting changes triggering detection | Review event statistics; identify cameras with highest false alarm rate | Disable analytics on affected cameras temporarily | Retune detection rules; adjust sensitivity; add exclusion zones for lighting fixtures | Update analytics tuning standard; add false alarm rate to monthly monitoring metrics |