Site Reliability Engineer
Location: Onsite – Kanata, Ontario
About Our Client
Imagine a startup delivering real-time data insights that empower businesses to make smarter, faster decisions. Backed by one of the world’s top tech groups, we blend cutting-edge technology with deep expertise to help companies stay agile and ahead of the curve. With the strength of a powerhouse behind us, we drive innovation and create transformative solutions for today’s dynamic markets.
Edge Signal provides a full-fledged edge computing platform powering computer-vision applications across Retail, Hospitality and Warehousing. they run entirely on AWS, ingesting and analyzing massive fleets of on-premise devices with Datadog monitoring.
We’re looking for an experienced Site Reliability Engineer to keep their cloud and edge infrastructure running flawlessly—and to help their customers get up and running smoothly.
This position is based at their head office in Kanata, Ottawa, reporting to the Director of Technology.
What You’ll Do
Operations
Ensure highly available, fault-tolerant AWS services (auto-scaling, disaster recovery, capacity planning).
Build and maintain Datadog dashboards, monitors and alerts for cloud resources and edge devices; author runbooks and automation scripts for incident response.
Develop tooling to provision, update and health-check thousands of edge devices; ingest device telemetry into Datadog for unified observability.
Automate routine ops tasks (onboarding steps, incident remediation) using shell, Python, etc.
Onboarding
Lead customer installations by configuring IP cameras, NVRS, and Edge Signal agents on-site.
Guide network, security and firmware setups to ensure seamless data flow from device to cloud.
Support
Compliance
Manage AWS IAM (users, roles, policies, SSO) and enforce security best practices.
Monitor and optimize AWS spend—set budgets, report usage and recommend cost-savings strategies.
Integrate secrets management, vulnerability scanning and other compliance controls.