SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
Updated
Sep 20, 2024 - Python
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Neural Network Compression Framework for enhanced OpenVINO™ inference
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
Tutorial notebooks for hls4ml
Quantization of Models : Post-Training Quantization(PTQ) and Quantize Aware Training(QAT)
ECQx: Explainability-Driven Quantization for Low-Bit and Sparse DNNs
A Tutorial Notebook to Quantization in Machine Learning
Implementation of MedQ: Lossless ultra-low-bit neural network quantization for medical image segmentation
Quantization notebooks (adapted from and for Mobile Apps w/ Machine Learning, By Dara Varam and Lujain Khalil)
EfficientNetV2 (Efficientnetv2-b2) and quantization int8 and fp32 (QAT and PTQ) on CK+ dataset . fine-tuning, augmentation, solving imbalanced dataset, etc.
A lightweight Convolutional Autoencoder for recognizing Bangla font styles along with quantization for deploying resource-constrained IoT devices.
Training neural nets with quantized weights on arbitrarily specified bit-depth
0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture
Quantization simulation of neural networks with PyTorch
Quantization Aware Training
Quantization Aware Training
A model compression and acceleration toolbox based on pytorch.
Add a description, image, and links to the quantization-aware-training topic page so that developers can more easily learn about it.
To associate your repository with the quantization-aware-training topic, visit your repo's landing page and select "manage topics."