Trainline is an innovative, tech business with a mission to make travel as simple, seamless and affordable as possible. We’re proud to be Europe’s leading independent train and coach platform and rank among the highest-
rated travel and ticketing apps globally. Today, we offer our customers travel to thousands of destinations in and across 36 countries in Europe and beyond .
That’s more than £2.3 billion in ticket sales annually, and over 60 million visits to our apps and websites each month.The Platform Operations team are responsible for the overall Availability, Performance and Reliability of the entire trainline platform.
at peak times over 185 people per minute are booking Trains! We are a growing company that loves new technology. We run a diverse platform that is 100% hosted on AWS utilising the best of what it has to offer, coupled with our own tooling this allows us to embrace Continuous Delivery, DevOps and Cloud environments to their full potential.
You will often find members of our leadership team as well as our development community speaking at meetups and conferences.
The Platform Operations team are the special forces team, at the forefront of what is going on with the platform, they provide the first responder cover for critical incident response, change coordination, system administration and monitoring and alerting on a 24x7x365 basis.
We are a relaxed culture but very serious about what we do and how we do it, we want people that can thrive in a high-pressured environment where personal leadership and initiative is valued and rewarded.
What you'll be working on...
You will be heavily involved in Major incidents relating to Production, External Test and Staging environments. This is right from initial event, participating in the rapid response to service restoration and identifying follow up preventative measures.
You will be part of the team that has ownership of all monitoring tools, ensuring that they keep up with the rate of change that we have, ensuring that everything from BAU alerting using this tooling to report on and improve upon SLA OLA’s by holding teams accountable for their service quality.
You will take ownership for and provide priority support to Retailing and Fulfilment systems, ranging critical incidents to proactively working on preventative measures by learning and questioning the status of the platform, taking ownership to ensure that issues are not forgotten after they are resolved
You will work with DevOps Engineers in product aligned teams to ensure applications are understood and that Continuous Delivery activities are carried out in a safe and timely manner there is trust, but we still need eyes on the prize
You will use your own experience and learning to provide a fresh approach to troubleshooting and processes, we want you to think outside the box coming up with innovative and unique solutions, pushing the bar higher each time
You will participate in an On-Call schedule to ensure that our systems are supported at all times, you will have the freedom to suggest and push for engineering solutions to failures, taking pride in every call out that is solved by automation rather than human intervention
You will have a professional approach to these interactions that builds confidence you the abilities of you and your team
What you'll bring..
Proven experience of being part of a team that managed operational environments on the hook for availability, reliability and performance
Experience in being part of a support team in a high pressure, fast moving environment alongside Incident, Change and Service Desk management
Ability to see and act upon potential issues, whether they are technical changes, processes or procedures
A solid background in technology operations, with demonstrable ability in a range of technologies
Very high energy and enthusiasm, with a passion for delivering awesome service
Excellent interpersonal, relationship building and influencing skills
Highly customer focused Analytical approach to decision making and problem resolution with experience of juggling multiple tasks and priorities
Technical Ability Experience
Enterprise Technology : Experience with highly available, high transactional websites and applications within micro services architecture, clustered systems, N+1 architecture, automated deployments, disaster recovery and business continuity
Operating Systems : Linux, Microsoft Windows Server (Including Active Directory, DNS, DHCP, IIS)
AWS : EC2, S3, Lambda, VPC, CloudWatch, Terraform
Automation and Scripting : Team City, Puppet, Consul, Powershell, Selenium, GitHub
Monitoring : HP BSM, SCOM, New Relic Insights / APM, InfluxDB, Elastic Search, Kibana, Sensu, Grafana
Desirable Ability / Experience
Production experience with frontend web services including Apache, IIS and NGINX
Experience with e-commerce and website operations (WebOps)
Experience with monitoring & web analytics tools
Understanding of Networking, TCP IP, Firewalls, NAT Instances, NGINX load balancer and traffic management
Firm grasp on security and its importance within a cloud environment (PCI-DSS / SecOps)
Understanding of database technologies such as Oracle, MS SQL, DynamoDB
Understanding of DevOps and Agile methodologies
We value open expression at Trainline, we believe it’s the diversity of experience, backgrounds and perspectives of our employees that makes us who we are.
We encourage everybody to play a part in changing the way people travel across the world.