Seedance 2.0 Deep Evaluation: Stable in Complex Scenarios, Even ASMR?

The launch of Seedance 2.0 has sparked heated discussions, with Musk and American directors praising it. Users have shown strong interest in its global usage and registering Chinese accounts. This version has significant improvements in understanding visual language and multimodal input, supporting mixed inputs of images, videos, audio, and text, enhancing visual consistency and stability. Users have shared various creative videos showcasing the powerful features of Seedance 2.0

It's so lively! It's a phenomenal show time～

Just as ByteDance launched Seedance 2.0, it immediately went viral across the internet!!!

On one side, Musk praises it angrily, while on the other, American directors exclaim that Hollywood is doomed.

Many foreigners are even anxiously urging for more: When will it be available for global use? How to register a Chinese account? Waiting online! Quite urgent!

So hot, so amazing, we have to give it a try ourselves.

Look at this super popular "Cat vs. Godzilla" video I made, with the little cat jumping in and delivering a heavy punch～

Now let's take a look at this AI version of "F1 Speeding," with the tachometer soaring and brakes screeching, it really has that Hollywood vibe:

Next, let's try a Chinese kungfu battle against Iron Man, with both of them exchanging close combat moves, the sound effects are thrilling, it's so stylish!

Creative netizens are also getting in on the action, check out this one who did a continuous shot, sliding from the street into the subway station and into the train, super dreamy:

And this netizen, who simply uploaded a comic screenshot, had Seedance 2.0 generate an entire plot video for him, it's just too amazing!!

Honestly, Seedance 2.0 really understands camera language better and is more controllable, with reference capabilities at the next level, making it very suitable for our daily multi-shot and refined control usage～

As usual, no more chit-chat, let's test it out and see if the model can deliver, and let everyone evaluate!!!

Camera understanding has improved

Complex scenarios can also be stabilized

I don't know if you have encountered the same frustrating moments as I have:

That is, when we are making AI videos in our daily lives, once it involves multi-shot prompts, the main character's features can easily become "floaty."

For example, the characters in the generated video look different from front to back, the scenes and camera styles are inconsistent, and in the end, we basically face the scenario of repeatedly drawing cards...

Ultimately, it's still due to bugs in the model's consistency and stability.

One of the most obvious upgrades in Seedance 2.0 lies in the multi-modal input forms, where we can input images, videos, audio, and text in four mixed content formats, and the model's performance in visual consistency has become more stable and controllable!

Let's start with an appetizer. Recently, the Shao's martial arts AI videos have been very popular online, so I directly input a photo of martial artists fighting:

In the design of specific prompts, the male and female characters in the scene need to form a complete conflict chain through dialogue, emotional changes, and actions leading to a standoff, while also requiring the AI to stabilize the characters' appearances during multi-shot transitions and maintain consistency in overall style and emotional tone!

From the effects below, even though the characters' expressions and camera angles change during large movements and multiple shot transitions, the facial features of the male and female leads remain stable without significant deformation issues, no glitches at all!

Now let's play something interesting. This time, we let the Mona Lisa perform a grand play of secretly drinking Coke at the Louvre.

In the prompt design, the Mona Lisa needs to consistently complete actions like taking out and drinking Coke while remaining fixed in the original painting, and also synchronously display speaking expressions, which raises the stability requirements for the model significantly:

You wouldn't believe it, but the Mona Lisa's sneaky little eyes and actions while secretly drinking Coke are quite on point, and the facial consistency while speaking is also stable, with the action of holding the frame not feeling out of place, passing the test!

Let's play a frame-to-frame transition game. The Year of the Horse has arrived, which is quite fitting. I uploaded two images of horses running with completely different styles, allowing this horse to undergo a big transformation in the same scene:

There's something interesting... transitioning from ink wash style to oil painting style, and then to pixel art, the overall connection is quite natural, and the sound effects and transformation actions are well-timed!!!

Overall, I personally feel that the model performs quite strongly in terms of visual consistency and controllability, making it very suitable for daily multi-character and multi-angle video production scenes.

When using it, I recommend uploading more images from different angles and with different reference elements, as this will stabilize the overall output quality of the model.

One-Take Shooting is Also Possible

In this update, Seedance 2.0 has another major feature highlight—"One-Take Shooting."

However, this one-take shooting is a bit different from our usual understanding; it mainly emphasizes that we can provide the model with different reference images, and the model can connect these images into one video.

This time, I decided to play with a futuristic sci-fi feel, directly feeding the model three reference images of Earth cities a hundred years later, all with a strong cyberpunk vibe and different perspectives: . In the prompt settings, the model needs to smoothly transition from image one to image two and then to image three from the same perspective, while also completing sharp turns, dives, and ascents, which requires higher logic and coherence in the camera transitions:

. As a result, I got a drone perspective of a futuristic city traversal blockbuster. I must say, the incredible shaking sensation of the camera shots is quite impressive???

First, let's talk about the advantages. The three reference shots involved in the input were indeed fully restored in the video, and it truly is a one-take shot, no doubt about that.

However, a minor flaw—though not really a flaw—is that the transition between image one and image two is a bit too abrupt. Let me slow it down for you to see:

Actually, my ideal situation is to add a narrow space as a transition between Figure 1 and Figure 2, so that when switching to Figure 3, the overall coherence will be better, and there won't be a sense of disconnection.

(What do you all think? Is there a better solution? Feel free to leave a comment～)

Support for complete plot output

In terms of consistency and stability, Seedance 2.0 has another ability to autonomously—imagine the plot.

It doesn't just generate images based on the first frame but can combine multiple reference materials and prompts to directly run a complete segment of "plot output."

This time, I simply let the AI create a complete plot-oriented anime video in a six-panel comic format:

Interpret【@Image 1】in a comic style from left to right and top to bottom, keeping the character dialogues consistent with the text in the images, adding special sound effects for scene transitions and key plot interpretations, with an overall humorous style.

emm… The images themselves are fine; all six frames have been fully restored.

The main issue lies with the text; many fonts and the original comic's text do not match, and the timing of the text appearing is also out of sync with the images.

I guess it might be because the text itself is not part of the main elements of the images. Compared to characters, actions, and scenes, the text in the comic clearly has a lower priority for the model.

So during multi-shot and rhythm transitions, it can easily be treated as a variable element...

Indeed, one cannot have it all; AI is the same. (doge

Video Length/Sound Effects Can Also Be DIY

In addition to basic image shot capabilities, Seedance 2.0 has leveled up in video extension and sound effect editing.

First, let's talk about video extension.

You all need to note that this extension is not simply stretching the video time; instead, we can provide the first frame image in the prompts and clearly "mark" the required video duration.

This time, I fed the AI a 3D-style image of a running donut, asking it to extend and generate a 10-second video while completing a whole set of continuous actions like rolling, jumping, and sliding:

Alright, the video length is completely fine; when I say 10 seconds, it generates exactly 10 seconds, not a second more or less, and the sound effects are super dynamic, moving to the beat～

But why is this donut running backwards??? (I don't quite understand)

Finally, let's talk about another capability of Seedance 2.0 in multimodality—sound effects.

The official emphasis is not just on voice dubbing, but on the accuracy of the timbre, which is super impressive, and the fit with the characters is also higher.

Let's first try a scenario that tests sound effect capabilities—a food broadcast (salivating, let's see if the model can accurately reproduce the sounds of chewing different foods):

From the effects generated below, the AI has perfectly reproduced the crunching sound of fried chicken, the crisp sound of cucumbers, the stretching sound of pizza, and the bubbling sound of cola all 1:1, not bad!

Let's try an ASMR scenario next, this time we'll have the AI perform the trigger sounds of different objects in the same video, let's see what the effect will be～

Except for the first crystal collision sound being slightly off, the rest are almost all 1:1 reproductions. The textures of metal, glass, and silk are very realistic, and the layers are well-defined:

I guess the reason the crystal sound is a bit off is mainly because the model generated the standard sound effects directly based on the prompts, without considering the scene, so the resulting sound resembles metal collision rather than crystal friction...

Although there are various evaluations of Seedance 2.0 online, my actual experience using it this time is:

If we only talk about the understanding ability of the lens (including but not limited to the consistency, controllability, and coherence of the images), Seedance 2.0's performance indeed exceeded my expectations.

Even with very simple prompts, it can produce quite ideal results, making it suitable for our daily AIGC video image generation creation.

There are also some small bugs, such as the script output for multi-panel comics, where the model may not be able to reproduce each scene image 1:1, and additionally, there may occasionally be issues with sound effects being off In daily use for some business scenarios and AIGC daily creation, it is already sufficient and quite good, which is quite surprising.

Recently, some netizens have exclaimed that Hollywood is going to be finished, indicating their satisfaction with the model's performance. (doge)

Currently, Seedance 2.0 has been launched on the Doubao App and Jimengli. Interested friends can directly try it out.

(ps: From personal testing, I suggest everyone experience it on Doubao these days, as it takes several hours to queue for generating a video on Jimengli, no other options…)

Source: Quantum Bit

Risk Warning and Disclaimer

The market has risks, and investment should be cautious. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial conditions, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investing based on this is at one's own risk