Wallstreetcn
2024.04.19 01:36
portai
I'm PortAI, I can summarize articles.

Llama 3 returns as a king, can it rival GPT-4 and catch up with closed-source models with open-source models?

The release of the open-source large model Llama 3 has sparked discussions in the AI community. Meta has launched two models of different scales and plans to release more versions in the future. The model is expected to exceed 400 billion parameters and compete with Claude 3. Meta's CEO announced that an AI assistant based on the Llama 3 model now covers multiple applications and has introduced an image generator. The emergence of Llama 3 directly competes with OpenAI's GPT-4. Meta insists on the open-source route, scoring a point for open-source models. The official version of Llama 3 is expected to be released in July this year. Insiders revealed that researchers have not yet fine-tuned it, nor have they decided whether it will be a multimodal model. Llama 3 is hailed as one of the best-performing open-source models

On April 18th, the AI ​​community welcomed another major news as Meta unveiled the "most powerful open-source large model ever" Llama 3.

Meta has open-sourced two different scale models, Llama 3 8B and 70B, for external developers to use for free. In the coming months, Meta will release a series of new models with capabilities such as multimodal, multilingual conversations, and longer context windows. The larger version of Llama 3 is expected to have over 400 billion parameters, aiming to compete with Claude 3.

At the same time, Meta CEO Mark Zuckerberg announced that based on the latest Llama 3 model, the Meta AI assistant now covers all applications such as Instagram, WhatsApp, Facebook, and has launched a website separately. There is also an image generator that can create images based on natural language prompts. The emergence of Llama 3 directly competes with OpenAI's GPT-4. Unlike the "not so open" OpenAI, while the debate in the AI ​​community continues over the open-source or closed-source route, Meta firmly advances towards the Holy Grail of AGI along the open-source path, turning the tide for open-source models.

Insiders revealed that researchers have not yet started fine-tuning Llama 3 and have not decided whether Llama 3 will be a multimodal model. There are reports that the official version of Llama 3 will be officially launched in July this year.

Meta's Chief AI Scientist and Turing Award winner Yann LeCun is cheering for the release of Llama 3 while also teasing that more versions will be released in the coming months, stating that Llama 3 8B and Llama 3 70B are currently the best-performing open-source models in the same size category. Llama 3 8B outperforms Llama 2 70B on some test sets.

Even Elon Musk made an appearance in the comments section, expressing his approval and anticipation for Llama 3 with a simple "Not bad." NVIDIA's senior scientist Jim Fan believes that the release of Llama 3 has gone beyond technological advancements, symbolizing the ability of open-source models to compete with top-notch proprietary models.

Based on benchmark tests shared by Jim Fan, the strength of Llama 3 400B is almost on par with Claude's "Super Cup" and the new version of GPT-4 Turbo, making it a "watershed moment". It is believed that it will unleash tremendous research potential and drive the development of the entire ecosystem, potentially allowing the open-source community to utilize models at the level of GPT-4

The day of the announcement happened to be the birthday of Stanford University professor and top AI expert Andrew Ng. Andrew Ng expressed that the release of Llama 3 is the best gift he has ever received in his life, thanking Meta!

One of the founding members of OpenAI and former AI director at Tesla, Andrej Karpathy, also praised Llama 3. As one of the pioneers in the field of large language models, Karpathy believes that Llama 3's performance is approaching that of GPT-4:

Llama 3 is a very powerful model released by Meta. By sticking to basic principles, spending a lot of high-quality time on reliable systems and data work, and exploring the limits of long-term training models. I am also very excited about the 400B model, which may be the first open-source model at the level of GPT-4. I think many people will request longer context lengths.

I hope to have models with smaller parameters than 8B, ideally ranging from 0.1B to 1B, for educational work, (unit) testing, embedded applications, etc.

Cameron R. Wolfe, AI director at Rebuy and a Ph.D. in the field of deep learning, believes that Llama 3 proves that the key to training excellent large language models lies in data quality. He detailed the efforts made by Llama 3 in terms of data, including:

  1. Pre-training data of 15 trillion tokens: 7 times more than Llama 2 and even more than DBRX's 12 trillion;

  2. More code data: Including more code data during pre-training to enhance the model's reasoning ability; After the release of Llama 3, Mark Zuckerberg told the media, "Our goal is not to compete with open-source models, but to surpass everyone and build the most advanced artificial intelligence." In the future, the Meta team will release a technical report on Llama 3, disclosing more details about the model.

The debate about open source and closed source is far from over, and the upcoming GPT-4.5/5 is poised to arrive this summer. The battle of large models in the AI field is still unfolding