Senior Site Reliability Engineer
Windsor, Berkshire
2d ago

New Energy Platform is a new business unit within Centrica building the future energy supply platform for our UK customers.

As a key part of the Technology function, we are creating a Site Reliability Engineering (SRE) team. The SRE team will work with our squad based engineering teams, global security and networks team in Centrica DTS and key vendors to drive the reliability agenda.

Improving the experience for our existing customer base and enabling the growth agenda by ensuring our levels of service consistently meet customer expectations.

Role accountabilities :

  • Deep understanding of SRE philosophy, technologies, platforms and tools, SLA management, incident resolution, and automation
  • Focus on system reliability, performance, and supportability by balancing feature development velocity and reliability with well-defined SLOs
  • You'll improve CI / CD pipelines to increase development squad’s velocity and confidence while automating provisioning, quality controls, security auditing and maintenance
  • Establish, manage and optimise our monitoring solutions to achieve observability
  • Support squads with best practices in monitoring and improving alert thresholds
  • Design monitoring systems that prioritize the customer perspective and experience
  • Contribute to architectural and design principles to drive reliability, scalability and reusability for a large-scale distributed platform
  • Work with development squads to implement automation opportunities to drive down toil and reduce technical debt
  • Carrying out end-to-end stability inspections to take a holistic view of system health and proactively mitigate customer impacts
  • Firefighting stability problems with business teams and engage in troubleshooting, service capacity planning and demand forecasting, platform performance analysis and system tuning
  • Conducting post-incident reviews and trend analysis and owning the learning loop back to the development squads
  • Providing reports on system health built around the service level indicators (SLIs)
  • The role requires flexibility to participate in rotating on-call duties and timely post-mortems of production incidents.
  • Competencies, Experience and Qualifications :

    Experience and Qualifications

  • BS degree in Computer Science or related technical field involving coding or systems engineering
  • Significant experience working in an SRE or DevOps team supporting a scaled production platform
  • Certification(s) within Cloud Architecture and / or AWS
  • Experience of implementing, maintaining and optimising a CI / CD pipeline
  • Real-world coding, whether that's with traditional compiled languages or scripting languages or both.
  • Experience of working within Cloud Computing and familiarity with Infrastructure as Code.
  • Technical competencies

  • Working knowledge of contemporary monitoring, analytics tooling and best practice
  • Working knowledge of automation tooling and best practice
  • Excellent investigative and diagnosis abilities with strong problem-solving skills combined with ability to take courageous decisions often with limited time and information in order to restore service
  • Strong technical knowledge across cloud, infrastructure and application domains
  • Some familiarity with Go, Python, NodeJS Typescript / Javascript , Scripting Languages (Bash etc..)
  • Experience using Terraform
  • SecDevOps Integrating secure development practices and controls into the development / deployment process
  • Capability for continual improvement and ultra-fast technology skill take-on
  • Ability to engage, build and sustain stakeholder relationships and influence decisions
  • Excellent oral and written communication skills, including the ability to explain technology solutions in business terms and clearly communicate to both technical and non-technical staff
  • Able to lead virtual teams and hold peers to agreed standards of delivery and performance
  • Strong analytical skills to bring out key information from multiple data sources to drive superior operational performance
  • Calm under pressure and takes the lead during complex situations
  • Desirable if you have experience of the Azure DevOps product.
  • Report this job

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    My Email
    By clicking on "Continue", I give neuvoo consent to process my data and to send me email alerts, as detailed in neuvoo's Privacy Policy . I may withdraw my consent or unsubscribe at any time.
    Application form