Senior Devops Engineer

Full Time
  • August 29, 2026
  • Employment Info

    JOB   DETAILS

    Requirements
    • 5+ years of hands-on experience in DevOps, SRE, or Cloud Engineering
    • Extensive expertise in AWS cloud platforms and services
    • Practical experience with Kubernetes and containerisation technologies
    • Strong scripting and automation skills with Bash, Python, or Go
    • In-depth knowledge of CI/CD tools including Jenkins, GitHub Actions, GitLab CI/CD, and ArgoCD
    • Solid experience with Infrastructure as Code tools including Terraform and CloudFormation
    • Comprehensive understanding of Linux administration and networking fundamentals
    • Experience implementing security best practices including IAM, SSL/TLS, and compliance frameworks such as SOC2, ISO 27001, and GDPR
    • Proficiency in monitoring and logging tools including the ELK Stack, Prometheus, Grafana, or Datadog
    • Exceptional problem-solving skills and the ability to operate in a fast-moving, ambiguous environment
    • Strong communication and collaboration skills to work effectively across cross-functional teams, including client stakeholders
    Responsibilities
    • Architect, build, and continuously enhance CI/CD pipelines to automate and accelerate software delivery across the team
    • Lead the management and optimisation of cloud infrastructure (AWS), ensuring scalability, security, and reliability while championing best practices
    • Design, implement, and maintain Infrastructure as Code (IaC) with tools such as Terraform and CloudFormation, enabling the team to deploy with confidence and agility
    • Proactively monitor, troubleshoot, and enhance system performance, availability, and security, ensuring operational excellence across client environments
    • Drive the adoption of containerisation and orchestration technologies like Docker and Kubernetes to enable scalable, high-performance solutions
    • Improve system observability by implementing advanced logging, monitoring, and alerting with tools such as Prometheus, Grafana, Datadog, CloudWatch and the ELK stack
    • Lead the implementation of security best practices, including IAM, secrets management, and vulnerability assessments
    • Collaborate closely with developers to continuously optimise build, deployment, and scaling strategies for seamless integration and continuous delivery
    • Automate key operational tasks and apply SRE principles to enhance system reliability, uptime, and overall performance
    • Take ownership of incident response and lead root cause analysis for production issues, ensuring swift resolution and ongoing improvement
    • Practise LLMOps: implement prompt versioning, model evaluation pipelines, and controlled promotion gates before anything reaches production
    • Instrument beyond standard metrics: design observability for token costs, inference latency, retrieval quality, and model drift detection
    • Build agentic resilience: implement rate limiting, circuit breakers, and graceful fallbacks for non-deterministic LLM APIs
    • Own inference cost engineering: design throughput management, caching strategy, and cost-per-query alerting to keep AI systems economically viable at scale
    • Design AI-native CI/CD pipelines with evaluation harnesses and golden dataset regression tests baked in before any model or prompt change reaches production
    Desired Qualifications
    • Familiarity with serverless architectures such as AWS Lambda
    • Experience with database performance tuning and scaling techniques
    • Relevant certifications in AWS, Azure, or GCP DevOps
    • Prior experience supporting AI or ML workloads in production environments
    • Familiarity with LLM observability tooling such as LangSmith, Weave, or similar

     

     

    Are you interested in this position?

    Apply by clicking on the “Apply Now” button below!

    #DesignFintech #GlobalDesigners
    #FintechInnovation #CreativeJobs
    #DesignHub
    #Tech Meets Design
    #DesignerNetwork
    #Myausjob