First time here? Checkout the FAQ!
x
0 votes
133 views
asked in Machine Learning by (120 points)  
I am trying to build an unsupervised ML model to detect anomalies within 5000+ users' login data.  I selected 5 features contained within each of the user-login events (e.g. IP, hour of day, day of week, device_id, OS).  I am looking for the best algorithm to use.  I am considering using density function to determine probabilities of the feature values and whether an event is an outlier.  The problem is that feature values are only relevant to the specific user.  For example, you cannot compare login IP across users, login IP is only applicable to the user.
Ultimately, I want to detect events that are changes in a user login behavior, like different IP, day, hour, device_id, or OS, where the more features that have changed increase the probability of an outlier.
At this point, I am not sure how to build a model with data that contains multiple users, because I don't know how to separate the user data so the model is trained per user and finding anomalies within the individual user's features.

I also don't have any labeled data to use for testing, should I fabricate some?

Any advice greatly appreciated.

Thank you!
  

Please log in or register to answer this question.

...