PyData Tel Aviv 2022

Data Maps : Creating a better model is easy if you ‘just’ use the right data
12-13, 15:45–16:15 (Asia/Jerusalem), Track 1

When gathering data to train a ML model, the common belief is ‘the more the merrier’. In reality though, individual data samples may have varying effects on the learning process. How can we automatically measure the contribution of samples towards learning, and what can we do with it?


In this session we will get familiar with the concept of Data Maps. We will use this concept to analyze an important and largely ignored source of information - The model’s behavior on individual instances during training. I will also share how we can use Data Maps to develop a smart sampling mechanism, resulting in models with higher performances and better generalization, and how you too can use this to enhance your models.

Racheli is a Data Scientist at Riskified with 5 years experience in the Fraud Detection domain. She has led multiple projects regarding automating and optimizing ML solutions, and is currently working on data quality assessment processes.
Racheli is a visualizations enthusiast and loves the way a good chart can tell a story and make complex concepts clear to understand. Outside of her professional life, Racheli enjoys traveling, camping and Yoga.