Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to replicate performance #4

Open
deklanw opened this issue Dec 19, 2020 · 3 comments
Open

Unable to replicate performance #4

deklanw opened this issue Dec 19, 2020 · 3 comments

Comments

@deklanw
Copy link

deklanw commented Dec 19, 2020

I've attempted a reimplementation in PyTorch for the recsys framework RecBole here, RUCAIBox/RecBole#594 so that it's convenient to compare with other algorithms, etc.

I replicated your experiment almost exactly afaict: MovieLens100k, 70-20-10 split, early stopping with Recall@20. The only difference I see is that I didn't remove users with few interactions as you say in the paper

For this dataset, we maintain users with at least 5 interactions.

I used HyperOpt to do a search on the hyperparameter ranges specified in the paper (with an added option for dropout probability between 0.1 and 0.5) limited to 50 trials.

DGCF results:

best params:  {'dropout_prob': 0.24266119278104079, 'embedding_size': 128, 'learning_rate': 0.0016153742760160951, 'n_layers': 2, 'reg_weight': 2.031773354290135e-05}

'test_result': {'recall@20': 0.3248, 'mrr@20': 0.5986, 'ndcg@20': 0.3795, 'hit@20': 0.9618, 'precision@20': 0.2608}

I did the same for LightGCN

LightGCN results:

best params:  {'embedding_size': 128, 'learning_rate': 0.002856632032475591, 'n_layers': 2, 'reg_weight': 1.43923729841778e-05}

'test_result': {'recall@20': 0.3336, 'mrr@20': 0.6135, 'ndcg@20': 0.3868, 'hit@20': 0.9724, 'precision@20': 0.2629}

These figures are quite different from your paper, the ndcg especially, but in particular LightGCN is winning in every metric.

Is there anything not written in the paper that I might be missing in my implementation?

And, btw, are you applying node dropout to LightGCN (even though it wasn't a part of the algorithm originally, afaik)?

Thanks for any help!

@JimLiu96
Copy link
Owner

Thanks for sharing your results. I didn't run the experiment with HyperOpt. Does your result stay stable under these params? The reg weight for the best performance from my experience should be around 1e-2.

@deklanw
Copy link
Author

deklanw commented Dec 22, 2020

I realized that I wasn't disabling the dropout during evaluation. Fixed it and ran Hyperopt for 100 iterations this time, to be extra sure. Results are about the same:

DGCF

best params:  {'dropout_prob': 0.023892894735004354, 'embedding_size': 64, 'learning_rate': 0.006279775923826556, 'n_layers': 3, 'reg_weight': 6.334030498942448e-05}

'test_result': {'recall@20': 0.3254, 'mrr@20': 0.5948, 'ndcg@20': 0.3778, 'hit@20': 0.965, 'precision@20': 0.259}}

LightGCN

best params:  {'embedding_size': 128, 'learning_rate': 0.004966963171170461, 'n_layers': 4, 'reg_weight': 0.0013284118691326246}

'test_result': {'recall@20': 0.3339, 'mrr@20': 0.6074, 'ndcg@20': 0.3866, 'hit@20': 0.9714, 'precision@20': 0.2655}}

I didn't run the experiment with HyperOpt. Does your result stay stable under these params? The reg weight for the best performance from my experience should be around 1e-2.

I'm using Hyperopt (instead of a naive grid search) to speed up the evaluation. Reg weight of around 1e-2 was tested during Hyperopt's search.

It's possible there is some other mistake in my implementation, I'm just not sure what it could be.

@JimLiu96
Copy link
Owner

Actually, I use early-stop to control the training. Also, from my experience, the results are unstable for the ml100k dataset. Don't know how hyperopt solves this problem. The reported results are the best ones for all different models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants