Jr. Data Engineer
Brentford, United Kingdom
3d ago

At GlaxoSmithKline we have created a world-leading data and computational environment to enable large scale scientific experiments that exploit GSK’s unique access to data.

Our focus is on bringing data, analytics & science together into solutions for our scientists to develop medicines for patients.

This data and computational environment supports GSK R&D across a broad range of pharmaceutical areas including genetics, functional genomics, clinical, biopharma and others.

The Data & Compute Delivery (DCD) Data Engineering team is a crucial component of the environment and are responsible for delivery of data pipelines populating and maintaining data for scientific use in HPCs, Cloud and the R&D Information Platform (RDIP).

We are looking for a passionate and enthusiastic individual who will contribute to the strategy for data movement in a variety of scientific areas by working closely with people who are involved in the generation, handling and consumption of such data that includes Data & Computational Science (DCS), R&D Tech, different vendors and the larger R&D organization.

The data engineer needs to be able to apply technologies in a DataOps environment to solve big data problems and to develop innovative big data solutions based on defined business requirements.

The successful candidate must be able to learn and work independently, lead or assist with pipeline development efforts and collaborate effectively with co-workers.

This role will provide YOU the opportunity to lead key activities to progress YOUR career, these responsibilities include some of the following :

Participate in data teams to supporting the implementation of pipelines to support R&D strategy and conceptual data flows

Partner with principal data engineers and metadata leads to translate conceptual data models into physical database / tables optimized for data analytics in RDIP using established environments and tools

Assist the design, build, test and maintenance of data acquisition and processing pipelines including but not limited to the creation / maintenance of appropriate artifacts

Ensure the preservation of data integrity from source to target state including but not limited to the acquisition of appropriate metadata and the incorporation of appropriate QC checks into the pipelines

Support the use and growth of the Data Engineering DataOps environment including development and maintenance of related DataOps / DevOps infrastructure

Provide Tier 3 support for production pipelines

Support DCS and broader R&D in self-service / exploratory efforts

Work with R&D and Tech to support DataOps enhancements, and onboard these tools or enhancements

Ensure the quality consistency and availability of guidance documentation of end users of the tools to support high quality outputs

Support GxP readiness as it related to the data pipelines and address associated gaps

Why you?

Basic Qualifications :

We are looking for professionals with these required skills to achieve our goals :

Computer Science, Bioinformatics, or related degree; 1+ years experience in big data technologies, data movement, data wrangling or data / dev ops systems and tools

Experience data movement and data pipelines

Experience with Big Data technologies (ideally Cloudera stack including HDFS, Hive, Impala and Spark), Cloud-based offerings (Microsoft Azure, GCP, AWS, etc), and corresponding tools.

Preferred Qualifications :

If you have the following characteristics, it would be a plus :

Proven ability to contribute to development projects.

Strong interpersonal skills and effective communication of complex concepts to stake holders with wide range of expertise.

Familiarity with open source software, bioinformatics tools and languages such as SQL, R, Perl, Python, Java, and ETL tools.

Experience with data movement and management in the Pharmaceutical industry or related scientific fields.

Background and experience in LIMS systems, Next Generation Sequencing (NGS) workflows, Cloud computing and HPC systems.

Understanding of diverse omic data types including RNA-Seq, DNA-Seq, Chip-Seq, WES, WGS, ATAC-seq, microbiome, proteomic, metabolomic data etc.

from different sources.

Familiarity with data mining, machine learning and artificial intelligence techniques

Report this job

Thank you for reporting this job!

Your feedback will help us improve the quality of our services.

My Email
By clicking on "Continue", I give neuvoo consent to process my data and to send me email alerts, as detailed in neuvoo's Privacy Policy . I may withdraw my consent or unsubscribe at any time.
Application form