Skip to content

Computational Protein Scientist (gn) - ML & Protein Design @ Biotech Venture, Cambridge (UK)

  • On-site
    • Cambridge, United Kingdom
  • Portfolio Company

Job description

About DropCode

DropCode is building the data engine for protein function. Starting with enzymes, we use our patented droplet microfluidics platform to capture exponentially more data on protein function than conventional methods, linking genotype to phenotype at per-droplet resolution, making every droplet a micro test tube. This data fuels machine learning models that learn in ever greater detail how sequence determines function. Our wedge is enzyme engineering for biocatalysis and industrial biotechnology, but our ambition is to make DropCode the definitive platform for protein function prediction.

We are Cambridge PhDs with deep expertise across microfluidics, biochemistry, machine learning, optics, and engineering. We believe the language of biology is machine learning, and that the fastest path to transformative models is not just better AI, it is better inputs.

The Role

We are looking for an exceptional computational scientist to lead our machine learning and protein modelling efforts. You will own the sequence–function modelling stack end to end: from processing large-scale functional datasets generated in our microfluidic runs, to training and deploying generative and predictive models that drive the next round of experiments. You will work in a tight loop with the biology and engineering teams, turning quantitative phenotypic data into closed-loop active learning systems that continuously improve our models.

This is a foundational role. You will be building the ML infrastructure from the ground up, and your architectural choices will shape DropCode for years.

What You'll Do

  • Design and train sequence–function models on deep mutational scanning datasets and high-throughput screening outputs from our microfluidics platform

  • Develop and iterate generative models (transformers, diffusion models, or equivalent) for enzyme sequence design and optimisation

  • Build closed-loop active learning pipelines that couple ML predictions with experimental design, shortening the design–build–test–learn cycle

  • Model protein fitness landscapes, including epistatic interactions, to navigate high-dimensional sequence space intelligently

  • Partner with the biology team to define the data collection strategy and ensure experimental outputs are ML-ready

  • Establish best practices for model evaluation, benchmarking, and uncertainty quantification in the context of functional prediction

  • Own and grow the computational stack as the team scales

What We're Looking For

  • Demonstrated contribution to a meaningful breakthrough in protein design or sequence–function modelling

  • Proven hands-on experience with protein language models or generative models applied to biological sequences

  • Deep familiarity with deep mutational scanning, large-scale functional datasets, or comparable high-throughput data modalities

  • Strong understanding of fitness landscape theory and epistasis in the context of sequence optimisation

  • Experience building active learning or Bayesian optimisation systems that integrate ML with experimental feedback

  • Excitement at the prospect of working with large volumes of proprietary, quantitative functional data unavailable anywhere else

  • Comfortable operating in the ambiguity of early-stage R&D and motivated by the challenge of building foundational infrastructure

  • PhD in machine learning, computational biology, biophysics, or a related field (or equivalent depth of experience)

Who You Are

You are frustrated by the slow, artisanal nature of current biological engineering and believe the field needs a step-change in data scale and quality. You think quantitatively, treat every experiment as a data point for a model, and have strong opinions about what it takes to build the best protein design systems in the world. You thrive in collaborative, fast-moving environments where the pace is set by scientific urgency, not process.

On-site
  • Cambridge, England, United Kingdom

or