The Senior DevOps Engineer, Toolchain is a key member of our core platform team, working closely with the platform Manager, Agile Toolchain and Processes to develop and operate ERT s software development toolchain service supporting multiple teams across the company.
The successful candidate will be responsible to
1) apply engineering solutions to manage specific self-hosted, cloud-based tools within our Agile Toolchain service offering;
ensuring a reliable end-user service experience
2) develop workflow integrations and automation that combines multiple software tools into a cohesive development experience that meets the needs of our software teams, eliminates repetitive tasks and improves system resiliency.
DevOps and SRE practices are foundational to this role, including software engineering, toolchains, pipelines, runners, containers and container orchestration, automation, infrastructure as code, continuous integration and continuous deployment, application performance monitoring and change management.
As a Senior DevOps Engineer, Toolchain, your main responsibilities will be to :
Develop and maintain a platform of integrated systems and tools to support Agile and DevOps practices across a diverse set of workflows used by multiple development teams across the company
Participate in architecture and software development activities
Translate loosely defined requirements into solutions
Run our toolchain infrastructure using Ansible, Terraform and Kubernetes
Plan infrastructure growth
Provide direct and responsive support for availability incidents and other urgent analytic, development or operational needs
Debug production issues across services, at all component levels
Use our toolchain service offering for your day-to-day activities and work to continually improve it
Document all actions and work to define repeatable actions that can be automated
Design and build reusable templates to accomplish specific workflow use cases
Continually improve our monitoring, metrics and automated response capabilities
Use coding languages or scripting methodologies to solve problems with custom workflows
Collaborate with team members to tackle complex technological infrastructure, security, and development problems
Perform incremental testing actions on code, processes and deployments to identify ways to streamline execution and minimize errors
The duties and responsibilities listed in this job description represent the major responsibilities of the position. Other duties and responsibilities may be assigned, as required.
ERT reserves the right to amend or change this job description to meet the needs of ERT. This job description and any attachments do not constitute or represent a contract.
A minimum of 2 years of hands-on staging and production experience in each of the following areas :
Administering and deploying development CI / CD tools such as GitLab or Jenkins
Infrastructure and configuration management solutions such as Puppet Terraform or Helm
Container orchestration services, especially Kubernetes
Systems administration scripting methods such as Python, Bash, or PowerShell.
Supporting Windows and Linux operating system environments
In-depth operational expertise in the following areas :
Cloud architecture best practices around operational excellence, security, reliability, performance efficiency, and cost optimization
Best practices and IT operations for dynamic, always-up, always-available services
Other Qualifications :
Demonstrated systems perspective when analyzing problems, thinking about the overall operation, failure modes and how to address these problems proactively
A strong sense for the importance of documentation, and the importance of not having to learn things twice
Ability to work in an agile product team environment and balance a diverse set of stakeholder requests
Excellent oral and written communication skills with an ability to break down complex technical systems to help business partners understand the value
Strong technical collaboration and communication skills as well as the ability to drive cultural change and adoption of best practices through community participation
Ability to collaborate with other teams across the company, defining technology roadmaps, sharing experiences and lessons learned for continual improvement
Excellent problem-solving and troubleshooting skills
Process-oriented with great documentation skills
A good understanding on how different tools collaborate inside a tool chain
Basic understanding of development, release and deployment processes used by R&D teams