Tal Erez Hauer
Tal is a Machine Learning Scientist at PayPal, working on horizontal data science infrastructures and solutions. Her current work focuses on clustering solutions for big-data applications, serving multiple business applications in the risk and credit domains. Tal's past work includes advanced sequence processing solutions, graph-based applications and applied research in the cyber domain. She holds a BSc in Industrial Engineering from Ben Gurion University.
![The speaker's profile picture](/media/avatars/%D7%98%D7%9C_%D7%AA%D7%9E%D7%95%D7%A0%D7%94_%D7%A7%D7%9C%D7%95%D7%96%D7%90%D7%A4_oAS8tO9.jpeg)
Sessions
I was about to give up my DBSCAN clustering solution when I found out how long it takes to train it with 400 million records. The density-based clustering algorithm was exactly what we needed at PayPal to solve a few unsupervised anomaly-detection problems, but when runtime hits O(n^2) it just seemed impossible.
The talk will introduce how we re-implemented DBSCAN for big data by parallelizing it using a graph algorithm, and walk through our solution which enables clustering of 400M records in a few hours.