Skip to content

Latest commit

 

History

History
37 lines (34 loc) · 3.39 KB

README_features.md

File metadata and controls

37 lines (34 loc) · 3.39 KB
  • Support games with more than 2 players
  • Speed/memory optimized - Reaching about 3000 rollouts/sec per CPU core, meaning about 5 sec/game during self-play (using 800 rollouts per move), with an i5 from 2019 without GPU. All in all, that is a 25x to 100x speed improvement compared to initial repo, see details here.
    • MCTS and logic optimized thanks to Numba, NN inference is now >70% time spent during self-plays based on profiler analysis
    • Neural Network inference speed and especially latency improved, thanks to ONNX
    • Batched MCTS for speed, no use of virtual loss
    • Memory optimized with no performance impact, using zlib compression
  • Algorithm improvements based on Accelerating Self-Play Learning in Go
    • Playout Cap Randomization
  • Improve MCTS strength
  • Improve NN strength
    • Use blocks from MobileNetv3 for optimal accuracy with high speed
    • Improve training speed using OneCycleLR and AdamW
    • Upgrade to KL-divergence loss instead of crossentropy
    • HyperParameters Optimization with Population-Based Training

What I tried but didn't worked:

  • MCTS: advanced cpuct formula (using init and base), surprise weight, and handle different training with Z and Q values (not averaging) like this article
  • NN: SGD optimizer, ReduceLROnPlateau scheduler
  • NN architecture: Dropout, BatchNorm2D (BatchNorm works), GARB article, regular architectures like EfficientNet, ResNet, ResNet v2, Squeeze-Excitation, Inception, ResNext, ...
  • Performance improvements: new memory allocator (TBB, TC, JE, ...)

Others changes: parameters can be set in cmdline (added new parameters like time limit) and improved prints (logging, tqdm, colored bards depending on current Arena results). Output an ELO-like ranking

Still todo:

  • Run full random move in 1% of game to increase diversity
  • Multiprocessing to use several cores during self play
  • KLD-thresholding (LeelaChessZero/lc0#721)