Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variational RHN + WT (depth=10) with 517 units per layer is enough vs original 830 #17

Open
wenwei202 opened this issue Dec 27, 2017 · 3 comments

Comments

@wenwei202
Copy link

wenwei202 commented Dec 27, 2017

The homogeneity of RHNs ease us to learn sparse structures within RHNs. In our recent work of ISS (https://arxiv.org/pdf/1709.05027.pdf), we find that the we can reduce "#Units/Layer" of "Variational RHN + WT" in your Table 1 from 830 to 517 without losing perplexity. This reduces the model size from 23.5M to 11.1M, which is much smaller than the model found by "Neural Architecture Search". For your interests, the results are covered in Table 2 in our work.

Let us know if this is interesting to you.

@jzilly
Copy link
Owner

jzilly commented Dec 28, 2017

Dear wenwei202,

Does this also imply that it would be possible not to reduce model size and to improve performance instead?

Thank you for making us aware of this. I will have a look at the paper.

@wenwei202
Copy link
Author

wenwei202 commented Dec 28, 2017

@julian121266 That is a good point. Current finding is it can slightly improve the performance to 67.5/65.0 using a smaller size of 726 as shown in table 2. Let me check if we can improve more starting from a larger model and compressing them. BTW, did you try model size beyond 830 for your RHNs with depth 10? If it didn't improve performance, was it because it's more difficult to optimize?

@jzilly
Copy link
Owner

jzilly commented Jan 4, 2018

@wenwei202 We had similar findings. Optimization was fine. The model simply did not generalize much better. In fact depth 8 ended up working slightly better than depth 10. Most likely the relationship is submodular with diminishing returns for increased depth.
A new iteration on the RHN idea was actually published half a year later: https://arxiv.org/abs/1705.08639

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants