Do you want to be a Site Reliability Engineer that builds and manages scalable, self-healing, globally distributed systems?
Our Site Reliability Engineers make sure users are always connected to great local businesses by keeping Yelp fast and available as we continue to scale.
No matter how many times we get searched, scraped, scanned, spammed, pinged, paged, or queried, we gotta keep our cool and keep the site and the apps running smoothly.
We work for both the Yelp end users and the Yelp developers, implementing critical parts of the core architecture and supporting developers as they do the same.
We get to take on exciting challenges that you can only find at the kind of scale that serves over 100 million users per month.
Spinning up infrastructure should always be a git commit and a code review away : automation and self-service are at the core of what we do.
We're looking for people with a passion for all things related to distributed systems, serving queries fast, uptime, scaling, and solving hard problems with the right tools.
We have fun working on these challenges and are looking for others who do, too!
Where You Come In :
Work closely with developers in supporting new features and services
Analyze solutions and implement best practices for our database cluster and its components
Build cluster management tooling for Cassandra Kubernetes Operator
Develop and maintain easy, intuitive API (REST / GraphQL) interfaces to our databases that keep developers moving fast
Work on observability of relevant database metrics and troubleshoot site issues using industry-leading tools like Splunk and prometheus
Support and administer Cassandra clusters, as well as the stacks they run on by automation
Design new systems, tests, and procedures
Participate in our daytime on-call rotation, acting as a point of call for automated systems and highlighting availability issues when they can't be automatically resolved
What it Takes to Succeed :
An experienced software engineer with a strong interest in distributed systems and database technologies (like Cassandra or any other NoSQL databases)
Fluency in Python, Java, Golang, or a similar language familiarity with more than one is a plus
Knowledge of best practices related to security, performance, high availability and disaster recovery
Proficiency in Kubernetes
Mastery of Linux
Expertise in Configuration Management (i.e., Puppet / Ansible / Chef / etc)
Experience with public cloud platforms and related tooling (i.e., Terraform, AWS CloudFormation, etc)
What You'll Get :
Full responsibility for projects from day one, an awesome team, and a dynamic work environment
Competitive salary with equity in the company, a pension scheme, and an optional employee stock purchase program
25 days paid holiday initially, rising to 29 with service and a 1 day floating holiday every year
Private health insurance, including dental and vision
Flexible working hours and meeting-free Wednesdays
Regular 3-day Hackathons and weekly learning groups, always with interesting topics
£60 per month toward any exercise of your choice
Yelp values diversity. We’re proud to be an equal opportunity employer and consider qualified applicants without regard to Age, Disability, Gender Reassignment, Marriage or Civil Partnership, Pregnancy and Maternity, Race, Religion or Belief, Sex.