PyData Tel Aviv 2022

Live Coding: Breaking the Privacy-Utility Trade-Off with Deming Regression
12-13, 12:00–12:30 (Asia/Jerusalem), Track 1

Imagine you’re conducting a salary survey with the goal of training a model to predict the salary. Cool, right? Not if you don’t handle user privacy… How can we make sure the collected data can’t be used to identify the users, while still being able to properly train our model?

In this session, we’ll eat the cake and leave it whole: We’ll use a less known model called Deming regression to handle our anonymized data, and it’ll have a quality similar to a model trained on the private data! And all will be live coded, starting with an empty Jupyter notebook. Join the fun ;)


Imagine you’re conducting a salary survey with the goal of training a model to predict the salary. Cool, right? Not if you don’t handle user privacy… How can we make sure the collected data can’t be used to identify the users, while still being able to properly train our model?

In this session, we’ll eat the cake and leave it whole: We’ll use a less known model called Deming regression to handle our anonymized data, and it’ll have a quality similar to a model trained on the private data! And all will be live coded, starting with an empty Jupyter notebook. Join the fun ;)

Over 13 years of experience as a software engineer and algorithm developer in various domains, including NLP, recommender systems, vision, and cybersecurity.

I am passionate about good quality code, interesting ideas and sophisticated algorithms. I love encountering elegant equations while trying to solve real-life problems.

Seasoned Data Scientist with an emphasis on NLP, classical ML, visualization and experimentation.

Driven by great passion for the field, I am inspired by unintuitive insights and inferences made by smart algorithms. In my talks, I try to convey my typical spirit and enthusiasm while delivering crisp takeaways.