close
The Wayback Machine - https://web.archive.org/web/20200901085529/https://github.com/yzhao062/pyod/issues/120
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding precision@n and roc(@n?) #120

Open
Henlam opened this issue Jul 2, 2019 · 12 comments
Open

Question regarding precision@n and roc(@n?) #120

Henlam opened this issue Jul 2, 2019 · 12 comments

Comments

@Henlam
Copy link

@Henlam Henlam commented Jul 2, 2019

Hello,

first and foremost, thank you for building this wrapper it is of great use for me and many others.

I have question regarding the evaluation:
Most outlier detection evaluation settings work by setting the ranking number n equal the number of outliers (aka contamination) and so did I in my experiments.

My thought concerning the ROC and AUC score was:

  1. Don't we have to to rank the outlier scores from highest to lowest and evaluate ROC only on the n numbers. Thus, needing a ROC@n curve?
  2. Why do people use ROC and AUC for outlier detection problems which by nature are heavily skewed and unbalanced. Hitting a lot of true negatives is easy and guaranteed, if the algorithms knows that there only n numbers of outliers.

In my case the precision@n of my chosen algorithms are valued in the range of 0.2-0.4 because it is a difficult dataset. However, the AUC score is quite high at the same.

I would appreciate any thoughts on this since I am fairly new to the topic and might not grasp the intuition of the ROC curve for this task.

Best regards

Hlam

@yzhao062
Copy link
Owner

@yzhao062 yzhao062 commented Jul 4, 2019

You are correct. roc @ n samples is also a popular choice. Will put this on my todo list. I do not know any particular reason why people usually report the full ROC but reporting roc @ n is not uncommon :)

One thought is roc @ n is a point evaluation but roc considers the full picture.

@Henlam
Copy link
Author

@Henlam Henlam commented Jul 5, 2019

Thank you very much for your answer.

If you say normal roc considers the full picture, does a high roc in this case just mean that on average the ground-truth true outliers are ranked ahead of most inlier points, but not necessarily in the top n?
i.e. ROC=0 if the true outliers are at the bottom of the outlier score ranking?

@evanmiller29
Copy link

@evanmiller29 evanmiller29 commented Jul 23, 2019

@yzhao062 - Need some help updating the documentation? I'm happy to open a pull request for this

@yzhao062
Copy link
Owner

@yzhao062 yzhao062 commented Jul 26, 2019

@evanmiller29 sorry for the delay. pr is always welcome :)

@yzhao062
Copy link
Owner

@yzhao062 yzhao062 commented Jul 26, 2019

@Henlam I think the most relevant paper for this topic is:

Hope this helps

@evanmiller29
Copy link

@evanmiller29 evanmiller29 commented Aug 5, 2019

Thanks for passing through the papers. I'm having a reading at the moment. I'm not a 100% outlier detection (more normal ML) person but I'm keen to be involved in the project. You OK with that?

@firmai
Copy link

@firmai firmai commented Dec 5, 2019

This metric always give the same ROC regardless of level of KNN contamination.


from pyod.utils.data import evaluate_print
# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, isfraud, y_train_scores)

This metric constantly changes depending on the level of KNN contamination. Is this normal.


from sklearn import metrics

# evaluate and print the results
print("\nOn Training Data:")
print("Roc Auc score",round(metrics.roc_auc_score(isfraud, y_train_pred),2))
@yzhao062
Copy link
Owner

@yzhao062 yzhao062 commented Dec 5, 2019

This metric always give the same ROC regardless of level of KNN contamination.


from pyod.utils.data import evaluate_print
# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, isfraud, y_train_scores)

This metric constantly changes depending on the level of KNN contamination. Is this normal.


from sklearn import metrics

# evaluate and print the results
print("\nOn Training Data:")
print("Roc Auc score",round(metrics.roc_auc_score(isfraud, y_train_pred),2))

See this one: #144
ROC is evaluating ranking...not labels.

if y_train_pred is the predicated scores, then it is normal. If y_train_pred is the predicted labels, then it is wried.

@firmai
Copy link

@firmai firmai commented Dec 5, 2019

Indeed it is quite weird, it is the predicted labels.

y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)

My reproducible code:

import pandas as pd
df = pd.read_csv('https://github.com/firmai/random-assets/blob/master/fraud.csv?raw=true').iloc[:,1:]

df = df.drop(columns=["nameOrig","nameDest"])
# one hot encoding
df = pd.get_dummies(df,prefix=['type'])
isfraud = df.pop("isFraud")
isflaggedfraud = df.pop("isFlaggedFraud")

from pyod.models.knn import KNN   # kNN detector

# train kNN detector
clf_name = 'KNN'
clf = KNN(contamination=0.0756)
clf.fit(df)

# get the prediction label and outlier scores of the training data
y_train_pred = clf.labels_  # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_  # raw outlier scores

from pyod.utils.data import evaluate_print
from sklearn import metrics

# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, isfraud, y_train_scores)

# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, isfraud, y_train_pred)

# evaluate and print the results
print("\nOn Training Data:")
print("Roc Auc score",round(metrics.roc_auc_score(isfraud, y_train_pred),2))


@yzhao062
Copy link
Owner

@yzhao062 yzhao062 commented Dec 5, 2019

I run the code. the reason is that you only get 122 outliers among 100000. So you need to change contamination to be small enough to see a difference (<122). Otherwise, it is misclassed anyway.

However, you should not use ROC to evaluate label but score.

@raoarisa
Copy link

@raoarisa raoarisa commented Feb 10, 2020

why precision is used to decide the model... outlier should not be detected as inlier as it would be costliest error... so, false negative rate should be less.. so type 2 error will be taken care to decide the model.

@yzhao062
Copy link
Owner

@yzhao062 yzhao062 commented Feb 10, 2020

why precision is used to decide the model... outlier should not be detected as inlier as it would be costliest error... so, false negative rate should be less.. so type 2 error will be taken care to decide the model.

I think it is indeed precision @ rank n or precision @ rank k, which is still slightly different than precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.