PyData Tel Aviv 2022

Life, Death, and Shopping
12-13, 15:00–15:30 (Asia/Jerusalem), Track 1

A step-by-step introduction to purchase prediction. Also applicable to survival analysis and churn prediction. Including implementation in PySpark.


When dealing with survival analysis, the model's success is predicting death correctly. But it can also predict an engine failure, abandonment, or even purchases.
In purchase prediction, survival analysis, or churn prediction, the data is usually labeled or artificially labeled by a set of rules- such as inactivity for 30 days equivalent to churn. But the data structure is different from classical machine learning, and the data handling and modeling are different accordingly.
In this lecture, we will cover the data structures and aggregations for such analysis focusing on time aggregations using pyspark and what NLP got to do with any of it.

Dina Bavli is a Data Scientist with experience in NLP, Graph theory, NetworkX, churn prediction, and a growing interest in ASR (automated speech recognition).
Her Master's thesis deals with classifying and characterizing persuasion. She is a former teaching assistant for ML and an international public speaker.
She is a data science content writer for workshops, meetups, and online courses, and an official author of the Towards Data Science and Better Programming publications.
Dina is passionate about data, sharing knowledge, and contributing to society. Whenever she can't find a sufficient tutorial, she creates one.