28 August 2023

Going Live Today: Tesla's $300 Million AI Cluster

With its new supercomputer, Tesla is significantly enhancing its computing capabilities to train its full self-driving (FSD) technology faster than ever. This could not only make Tesla more competitive than other automakers but will make the company the owner of one of the world's fastest supercomputers.

Tesla's $300 Million AI Cluster Is Going Live Today


Tesla is set to launch its highly-anticipated supercomputer on Monday, according to @SawyerMerritt. The machine will be used for various artificial intelligence (AI) applications, but the cluster is so powerful that it could also be used for demanding high-performance computing (HPC) workloads. In fact, the Nvidia H100-based supercomputer will be one of the most powerful machines in the world.
Tesla's new cluster will employ 10,000 Nvidia H100 compute GPUs, which will offer a peak performance of 340 FP64 PFLOPS for technical computing and 39.58 INT8 ExaFLOPS for AI applications. In fact, Tesla’s 340 FP64 PFLOPS is higher than 304 FP64 PFLOPS offered by Leonardo, the world’s fourth highest-performing supercomputer.

"Due to real-world video training, we may have the largest training datasets in the world, hot tier cache capacity beyond 200PB — orders of magnitudes more than LLMs," explained Tim Zaman, AI Infra & AI Platform Engineering Manager at Tesla.

While the new H100-based cluster is set to dramatically improve Tesla's training speed, Nvidia is struggling to meet demand for these GPUs. 
  • As a result, Tesla is investing over $1 billion to develop its own supercomputer, Dojo, which is built on custom-designed, highly optimized system-on-chips. 
Dojo will not only accelerate FSD training but will also manage data processing for Tesla's entire vehicle fleet. 
  • Tesla is simultaneously bringing its Nvidia H100 GPU cluster online along with Dojo, a move that will give the company unparalleled computing power in the automotive industry.
  • Elon Musk recently revealed that Tesla plans to spend over $2 billion on AI training in 2023 and another $2 billion in 2024 specifically on computing for FSD training. 
  • This underscores Tesla's commitment to overcoming computational bottlenecks and should provide substantial advantages over its rivals. 
Tesla's $300 Million AI Cluster Is Going Live Today | Tom's Hardware
_________________________________________________________________________________
2 days ago — FSD 12 Training: "Tesla's $300 million GPU cluster goes live in 48 hours It's made up of 10,000 NVIDIA H100s (below). Each one costs around ...
2 days ago — Tesla's $300 million GPU cluster goes live in 48 hours It's made up of 10,000 NVIDIA H100s (below). Each one costs around $30k.
8 hours ago — Tesla AI 10k H100 clustergo live monday. Due to real-world video training, we may have the largest training datasets in the world, ...
Elon Musk Just RELEASED Its AI Powerhouse: Tesla DOJO Supercomputer -  YouTube
Uploaded: Jul 25, 2023
36.4K Views678 Likes
... , Project Dojo will get more than $1 billion in funding from Elon Musk's Tesla by the end of 2024. To create software for self-driving cars, the Dojo supercomputer will have the capacity to ...


No comments:

Money Funds Start Shuffling Assets Ahead of SEC Rule Changes

U.S. fund managers are preparing for regulatory shift in October JPMorgan expects reforms to have less impact than 2016 changes Federal Regi...