We create the data products & technology that make advertising work better for people.
Choreograph, an affiliate of GroupM, is a global data products and technology company, purpose-built for an era that demands a new approach to data management, usage, and brand growth.
Data is the fuel that powers growth. The companies who best leverage data are creating unbeatable advantages over their competitors while simultaniously connectig with customers more effectively.
Our goal is to help future-focused businesses use their data in ways that meet savvy customers expectations while building trust and understanding.
We are over 700 strong in 17 markets around the world and offer a modular product suite, empowering marketers to drive sustainable, data-enabled growth.
Position Overview :
We have an exciting opportunity for an experienced big data engineer with Python and Hadoop expertise, to join us, focussed on building technology products for some of the world's leading media agencies.
We are looking to extend an existing big data processing capability, already very successfully servicing one of our largest product stacks, and build the new innovations required by our evolving analytics product suite.
You will work in a small squad, as part of a larger team, architecting, extending, maintaining and tuning our big data capabilities.
You will work hands-on with our data pipeline application, in collaboration with local and offshore development teams.
You will be accountable for the technical design (with support from the lead architect) and implementation of the application, end to end.
You will be responsible for the code quality, CICD and software development lifecycle (SDLC) of the application.
You will proactively monitor system performance and implement tuning.
You will work with both product and engineering teams to ensure that our systems are fit for purpose.
You will accurately estimate and implement feature work to a high standard, meeting both functional and non-functional requirements, on time.
You will manage technical debt; making the right calls between balancing pragmatic delivery and compromising implementation patterns.
Contribute to technical project meetings, technical reviews and delivery activities.
You have a proven track record of :
4+ years experience operating in the big data stack (Hive, Big Query, Hadoop, HDFS; Presto preferred) data-intensive applications running on distributed infrastructure
5+ years experience building Python applications
Designing and managing successful delivery of cloud-native data pipeline applications at scale
Experience in the SDLC for a major project
Discerning the nuances of the business context that should impact on system design and applying the right solution to the problem
Designing a shared-service used by multiple consumers
Designing interfaces with particular awareness of whether functionality should be inside or outside a service's boundary
Skills : You are :
You are :
Proficient in SQL (MySQL and PostgreSQL databases)
Capable of reading and writing basic Python applications
Experienced in data pipeline development
Experienced across GCP (DataProc, Big Query), primarily, AWS (EMR, Athena), secondarily
Focussed on delivery
Experienced in the ELT paradigm and have worked with a range of tools
You know :
Modern big data architecture
Microservice architecture patterns
API implementation patterns and interfaces
Big data file formats (AVRO, ORC, PARQUET) and their different features
Ideally, you :
Have experience within the advertising industry and associated 3rd party datasets