Variational RHN + WT (depth=10) with 517 units per layer is enough vs original 830 #17

wenwei202 · 2017-12-27T04:12:47Z

The homogeneity of RHNs ease us to learn sparse structures within RHNs. In our recent work of ISS (https://arxiv.org/pdf/1709.05027.pdf), we find that the we can reduce "#Units/Layer" of "Variational RHN + WT" in your Table 1 from 830 to 517 without losing perplexity. This reduces the model size from 23.5M to 11.1M, which is much smaller than the model found by "Neural Architecture Search". For your interests, the results are covered in Table 2 in our work.

Let us know if this is interesting to you.

jzilly · 2017-12-28T10:17:24Z

Dear wenwei202,

Does this also imply that it would be possible not to reduce model size and to improve performance instead?

Thank you for making us aware of this. I will have a look at the paper.

wenwei202 · 2017-12-28T16:39:50Z

@julian121266 That is a good point. Current finding is it can slightly improve the performance to 67.5/65.0 using a smaller size of 726 as shown in table 2. Let me check if we can improve more starting from a larger model and compressing them. BTW, did you try model size beyond 830 for your RHNs with depth 10? If it didn't improve performance, was it because it's more difficult to optimize?

jzilly · 2018-01-04T13:46:55Z

@wenwei202 We had similar findings. Optimization was fine. The model simply did not generalize much better. In fact depth 8 ended up working slightly better than depth 10. Most likely the relationship is submodular with diminishing returns for increased depth.
A new iteration on the RHN idea was actually published half a year later: https://arxiv.org/abs/1705.08639

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variational RHN + WT (depth=10) with 517 units per layer is enough vs original 830 #17

Variational RHN + WT (depth=10) with 517 units per layer is enough vs original 830 #17

wenwei202 commented Dec 27, 2017 •

edited

Loading

jzilly commented Dec 28, 2017

wenwei202 commented Dec 28, 2017 •

edited

Loading

jzilly commented Jan 4, 2018

Variational RHN + WT (depth=10) with 517 units per layer is enough vs original 830 #17

Variational RHN + WT (depth=10) with 517 units per layer is enough vs original 830 #17

Comments

wenwei202 commented Dec 27, 2017 • edited Loading

jzilly commented Dec 28, 2017

wenwei202 commented Dec 28, 2017 • edited Loading

jzilly commented Jan 4, 2018

wenwei202 commented Dec 27, 2017 •

edited

Loading

wenwei202 commented Dec 28, 2017 •

edited

Loading