PyData Tel Aviv 2022

Keeping sensitive data safe using recommendation systems
12-13, 11:15–11:45 (Asia/Jerusalem), Track 2

What if I told you every-day recommendation systems can be utilized to detect unwanted behaviors? Now, what if I told you they can also be harnessed to prevent internal security violations in organizations? Well, It’s happening. Kind of neat, right?


In our every-day lives, whenever we hear about recommendation systems, it’s usually in the context of consumerism; whether via e-commerce, social-media or other advertising medias – recommendation systems enable us to consume different types of personalized content. In my session, I’ll advise you to explore that exact context through a different lens - by sharing about how recommendation systems’ methods can be utilized to detect unwanted behaviors, such as security anomalies within organizations.

In many cases, organizations store their data in a relational database. Naturally, teams and groups from the same areas of responsibility in the organization, access the same groups of tables, since they’re usually working on the same types of projects.

As some of this data may be sensitive, there’s a clear need to ensure that only authorized users will have access to it; In order to prevent potential security incidents and any possible misusage of the data. Therefore, it’s crucial for organizations to be able to detect any fundamentally different behaviors in its user’s routine data-access patterns.

My proposed solution is based on using recommendation systems methods for detecting anomalies in the access patterns of users to tables. The recommended products are the data tables, and the users are 'consuming' the tables. However, in the following case, we take a counter-intuitive approach, and look for products (=tables) that the user dislikes (= shouldn’t access).

Since we’re counting the amount of accesses each user performs to each table, the access patterns are an implicit feedback. In order to learn the usage patterns, we’re using Alternating Least Square (ALS) algorithm for implicit collaborative-filtering. This way, the model can calculate the likelihood for each access of user to a specific data-table and detect if it’s an anomaly or not.

Now, you probably ask yourself ‘How such concept can be implemented as a long-lasting solution to protect data abuse within MY organizations?’ I guess you have to join my talk to find out for yourself.

Notes: The idea is using recommendation systems for anomaly detection. The AUC we were able to achieve was higher than 90%.

Liron is a Data Scientist at PayPal for more than 5 years. She works for the cyber security threat oversight team in the information security organization. Her focus is finding machine learning solutions for information security problems in general and specifically for insider frauds threats.

Liron holds a Msc in software and information systems engineering. In her thesis she researched the field of user verification on mobile devices using sequences of touch gestures. The full paper of this thesis was published at PAKDD 2018 conference, and also she published an extended abstract of this research in UMAP 2017 conference.