JOB SUMMARY
What you’ll do
- Design and evolve our multi-cloud infrastructure (Azure, AWS and GCP)
- Maintain platform reliability, building scaling systems, automation and optimisation of our technology stack.
- Champion infrastructure-as-code practices across the organisation, using IaC tools (Pulumi) to manage the lifecycle of underlying Azure resources and deploying applications on top of it.
- Build and maintain an internal developer platform that enables engineers to ship faster and with greater confidence.
- Define and own SLOs, own standards for CI/CD, environment management, containerisation, and observability.
- Continuously seek and implement automation opportunities, leveraging modern SRE tools to automate operational tasks, speeding up delivery of applications and services thereby increasing team efficiency and the SRE process for our squads.
- Monitor, detect and address system issues by creating strategies and designing systems to automatically troubleshoot.
- Influence architectural decisions across the broader engineering team. Write RFCs, participate in design reviews and mentor software engineers at all levels.
- Partner with the Head of Technology on roadmap and strategy.
- Work closely with the Cybersecurity squad to create a comprehensive Disaster Recovery strategy and perform periodic DR drills.
- Lead incident response and post mortems with a blameless, systems-thinking mindset. Eliminate toil through automation and champion operational maturity across the engineering organisation.
- Embed security across the entire software delivery lifecycle and not as an afterthought, driving a zero-trust architecture.
- Manage external vendor relationships and ongoing subscriptions related to the platform.
You have built strong working relationships across engineering squads and operations teams, and are firmly established as the go-to Cloud Engineer across the business.
Some of the success measures include reduced incidents, minimal infrastructure-related downtime, and demonstrable improvements to platform stability and delivery speed through automation, testing and sound cloud resource management.
Some of the key projects you will have delivered include:
- Completed a thorough audit of our cloud infrastructure and presented a prioritised action plan to the Head of Technology, with early wins already in flight.
- Taken ownership of our IaC lifecycle, defined SLOs for critical services, and delivered at least one automation uplift that has measurably reduced toil or improved deployment confidence.
- Co-developed a Disaster Recovery strategy with the cybersecurity squad and actively contributed to architectural decisions through RFCs or design reviews.
You think in systems, not just services, and when something needs doing twice, you’ve already automated it. Security is craft to you, not a compliance checkbox with that mindset running through everything you ship.
You own incidents end-to-end without ego, and you’re not afraid to push back when a decision puts reliability or security at risk. Above all, you are a coach at heart, and thrive when sharing your knowledge with others.
With a proactive approach and mindset, you are familiar with and stay updated on technology and trends across all cloud platforms.
- 5+ years of hands-on cloud engineering experience across at least two major providers (AWS, GCP or Azure)
- Strong understanding of the design and maintenance of CI/CD templating and deployment practices (Github Actions, Azure DevOps)
- Strong experience with and a deep understanding of Azure. Azure Expert Certification desirable.
- Experience working within a fintech or financial services environment ideal.
- Experience working with and strong expertise with container and orchestration tools such as Docker, Container Registries, Kubernetes (EKS, GKE, AKS) and Helm.
- SRE expertise: SLOs, error budgets, chaos engineering, mature on-call practices, blameless post-mortems.
- Observability stack experience such as distributed tracing, log aggregation, metrics and alerting (Datadog, Grafana/Prometheus).
- Strong networking fundamentals VPNs, private endpoints, DNS, CDN, and load balancing
- Mastery of infrastructure-as-code tooling, Pulumi and GitOps at scale.
- Strong security engineering foundations, zero-trust networking, IAM, secrets management, SIEM, vulnerability scanning.
- You have knowledge that spans both development and operations including coding, infrastructure management and engineering.
Are you interested in this position?
Apply by clicking on the “Apply Now” button below!
#DesignFintech #GlobalDesigners
#FintechInnovation #CreativeJobs
#DesignHub
#Tech Meets Design
#DesignerNetwork
#Myausjob