Which algorithm is best to detect anomalies within a data set of 5k+ user-login events?

Question

Which algorithm is best to detect anomalies within a data set of 5k+ user-login events?

asked Oct 5, 2021 in Machine Learning by ML_newbie (120 points)

I am trying to build an unsupervised ML model to detect anomalies within 5000+ users' login data. I selected 5 features contained within each of the user-login events (e.g. IP, hour of day, day of week, device_id, OS). I am looking for the best algorithm to use. I am considering using density function to determine probabilities of the feature values and whether an event is an outlier. The problem is that feature values are only relevant to the specific user. For example, you cannot compare login IP across users, login IP is only applicable to the user.
Ultimately, I want to detect events that are changes in a user login behavior, like different IP, day, hour, device_id, or OS, where the more features that have changed increase the probability of an outlier.
At this point, I am not sure how to build a model with data that contains multiple users, because I don't know how to separate the user data so the model is trained per user and finding anomalies within the individual user's features.

I also don't have any labeled data to use for testing, should I fabricate some?

Any advice greatly appreciated.

Thank you!

Which algorithm is best to detect anomalies within a data set of 5k+ user-login events?

Please log in or register to add a comment.

Please log in or register to answer this question.

Related questions

Categories