OpenAI and MXFP4: a breakthrough in AI efficiency
OpenAI has introduced an innovation that could radically change how artificial intelligence models are managed: the MXFP4 data format. This advancement promises to drastically reduce computing and memory costs, making AI faster and more accessible for companies and cloud providers.
What is MXFP4?
MXFP4 is a 4-bit data type developed by the Open Compute Project, designed to optimize weight management in LLM models. Unlike traditional formats like BF16, MXFP4 uses a micro-scaling technique that allows for a much wider range of values while maintaining acceptable precision.
- Only 4 bits per value
- Block quantization with a common scaling factor
- Up to 75% savings in memory and bandwidth
Why does MXFP4 matter?
Reducing data precision enables less VRAM and bandwidth usage, lowering infrastructure costs. According to OpenAI, about 90% of the weights in the new gpt-oss models have been quantized with MXFP4, allowing 120 billion parameter models to run on GPUs with just 80GB of memory.
Additionally, token generation is up to 4 times faster compared to BF16 models, thanks to increased computational capacity of modern GPUs.
Market impact and infrastructure
OpenAI has chosen to distribute its models only in MXFP4 format, pushing the industry toward this new technology. Cloud providers and companies can now run advanced models with cheaper and more efficient hardware.
"OpenAI’s decision to focus on MXFP4 marks a turning point for the entire sector, making AI more accessible and sustainable."
AI Expert
Limitations and future prospects
Despite its advantages, MXFP4 is not without compromises: precision may be lower than FP8, and other manufacturers like Nvidia are developing variants such as NVFP4 to further improve quality.
OpenAI’s direction could accelerate the adoption of these formats, making AI increasingly efficient and widespread.