What's your new role about?
Are you passionate about technology and love the idea of leading technical incident management? If you thrive in a fast paced, tech-first environment and are ready to take a hands on technical approach to a 24x7, 356 platform then this is a great opportunity to support the platform availability and technical monitoring of DAZN’s global cloud based sports streaming platform!Join DAZN’s growing Technical Incident Management team to drive the evolution of the DAZN Critical Incidents Technical Response Group, handling all technical incident types.
You will take ownership for Incident Management as the key decision-maker and with the authority to direct the problem resolution path for the fastest restoration to any service.
Through your management and restoration of impacted services from any critical incidents, you’ll apply the right technical resources and act as technical lead for major incident calls.
You’ll be involved in dynamic, varied work from determining the client impact, agreeing on resolution actions, managing the Technical communication channel and collaborating with other Incident Managers.
You will be passionate about delivering a Major Incident Management process of top quality and integrity to act as the interface to the other Technology and Development stakeholders.
Plus, you’ll have a unique opportunity to interact with suppliers!You’ll be joining a growing team who are constantly looking for ways to evolve our technology with cutting edge solutions like AWS, ECS and Lambda and using varied languages from Node.
JS to Java and PHP. We love innovation and out of the box thinking, so if you are looking for a chance to really push technical boundaries and work with cutting edge technology then DAZN is the place to be!
HERE’S A BREAKDOWN OF WHAT YOU’LL DO (NOT ALL OF IT, JUST THE MOST IMPORTANT STUFF)
Line Management Responsibility of a team of Technical Incident managers and Senior engineers
Technically leads all aspects of critical incidents (S1-S3) focused on fastest service restoration / recovery bridge, teams communication channels, sync-points for sub-tech teams leading investigations (including 3rd party vendors and DAZN engineering teams
Be responsible for the quality and integrity of Major Incident Management process and is the interface with OPS Incident Managers, Support teams, and DAZN Development / Engineering teams.
Support and lead technical incidents requiring deep technical and problem resolution skills of the team, this may include across regions working with other TIMs / Engineering teams / Vendors / Suppliers to support 24x7 coverage.
Partner with other Support, Dev and Engineering teams to resolve difficult or unique system issues that team members are not equipped to handle
Provide recommendations on troubleshooting and other technology improvements to quickly resolve incidents, ensuring infrastructure and application stability
Assume leadership responsibility during an S1 to direct the TIM team as they work towards service restoration and Lead S1 S3 tech incident bridge calls, determine SMEs needed, identify problem and release / de-escalate after diagnosis
Build strong internal and external relationships with technical teams, customers and third parties
Ensure the TIM team meets resolution specifications as designed in the SLA while also enabling reduction of mean time to resolution
Have an attitude of flexibility and willingness to support a 24x7 global operation via off-hours support or on-call availability
DO YOU HAVE THESE ESSENTIALS?
Demonstrated leadership and team management abilities of managing a senior technical team
Strategic and tactical thinking, quantitative and analytical skills, while under pressure
Working knowledge of ITIL incident, problem, and change management components
The ability to co-ordinate technical, incident and supplier side teams to ensure that all incidents are accurately prioritised and effectively managed
Experience in systems across Cloud-based environments and dealing with applications built in Microservices architecture.
Extensive experience of managing major incidents especially those that have a significant cross service impact, including how to influence technical teams not under your direct control
The ability to identify early indications of major incidents not progressing well and the skill to engage the right teams to get them on track
Excellent written and oral communication skills; with a special focus on customer / client level interaction
Practical experience with incident / outage and crisis management
Experience with monitoring tools like Check Mk, Nagios, Pager Duty, Datadog, New Relic or similar
Working knowledge of physical IT infrastructures such as Enterprise Server Platforms and related IT architectures and equipment
Exposure to working with Log Aggregation tools like Elk, Logz.io or similar
NOT ESSENTIAL BUT GREAT IF YOU ALSO HAVE
Broadcast industry experience and understanding of broadcast systems / technologies
Knowledge of ticketing system Service Now or JIRA
Configuration Management Puppet and or Ansible or Terraform