Features fourth-generation Tensor Cores and a Transformer Engine with FP8 precision that provides up to 4X faster training over the prior generation for GPT-3 (175B) models, dramatically accelerating large language model development and deep learning workflows.
Massive Parallel Processing Power
The H100 features an impressive 640 Tensor Cores and 128 Ray Tracing Cores, which facilitates high-speed data processing, supplemented by 14,592 CUDA cores to achieve an incredible 26 teraFLOPS on full precision procedures, making it ideal for the most demanding computational tasks.
Advanced High-Speed Interconnect
Fourth-generation NVLink offers 900 gigabytes per second (GB/s) of GPU-to-GPU interconnect, enabling seamless multi-GPU scaling and distributed computing for enterprise-level AI workloads and high-performance computing applications.
Use Cases
Large Model Inference
Run massive models with predictable latency. Optimize for throughput, batch size, and performance per watt.
Generative AI applications for text, image, and audio.
Scaling ML infrastructure as your customer base grows.