Home  >  Article  >  Technology peripherals  >  Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

PHPz
PHPzforward
2023-05-25 14:56:29616browse

This video segmentation model from Southern University of Science and Technology can track anything in the video.

Not only can it "watch", but it can also "cut". It is also easy for it to remove individuals from the video.

In terms of operation, the only thing you need to do is a few clicks of the mouse.

Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

The special effects artist seemed to have found a savior after seeing the news, saying bluntly that this product will change the rules of the game in the CGI industry.

Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

This model is called TAM (Track Anything Model). Is it similar to the name of Meta’s image segmentation model SAM?

Indeed, TAM extends SAM to the video field and lights up the skill tree of dynamic object tracking.

Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

#Video segmentation models are actually not a new technology, but traditional segmentation models do not alleviate human work.

The training data used by these models all require manual annotation, and even need to be initialized with the mask parameters of specific objects before use.

The emergence of SAM provides a prerequisite for solving this problem - at least the initialization data no longer needs to be obtained manually.

Of course, TAM does not use SAM frame by frame and then superimpose it. It also needs to build the corresponding spatiotemporal relationship.

The team integrated SAM with a memory module called XMem.

You only need to use SAM to generate initial parameters in the first frame, and XMem can guide the subsequent tracking process.

There can be many tracking targets, such as the following picture of Along the River During the Qingming Festival:

Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

Even if the scene changes, it will not affect the performance of TAM:

Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

We experienced it and found that TAM uses an interactive user interface, which is very simple and friendly to operate.

Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

In terms of hard power, TAM’s tracking effect is indeed good:

Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

However, the accuracy of the elimination function in some details needs to be improved.

Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

From SAM to TAM

As mentioned above, TAM is based on SAM and combines memory capabilities to establish spatio-temporal association. realized.

Specifically, the first step is to initialize the model with the help of SAM's static image segmentation capabilities.

With just one click, SAM can generate the initialization mask parameters of the target object, replacing the complex initialization process in the traditional segmentation model.

With the initial parameters, the team can hand it over to XMem for semi-manual intervention training, greatly reducing human workload.

Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

In this process, some manual prediction results will be used to compare with the output of XMem.

In the actual process, as time goes by, it becomes more and more difficult for XMem to obtain accurate segmentation results.

When the difference between the results and expectations is too large, the re-segmentation step will be entered. This step is still completed by SAM.

After SAM re-optimization, most of the output results are relatively accurate, but some still require manual adjustment.

The training process of TAM is roughly like this, and the object elimination skills mentioned at the beginning are formed by combining TAM with E2FGVI.

E2FGVI itself is also a video element elimination tool. With the support of TAM's precise segmentation, its work is more targeted.

To test TAM, the team evaluated it using the DAVIS-16 and DAVIS-17 data sets.

Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

#The intuitive feeling is still very good, and it is indeed true from the data.

Although TAM does not require manual setting of mask parameters, its two indicators of J (regional similarity) and F (boundary accuracy) are very close to the manual model.

Even the performance on the DAVIS-2017 data set is slightly better than that of STM.

Among other initialization methods, the performance of SiamMask cannot be compared with TAM;

Although another method called MiVOS performs better than TAM, it has evolved for 8 rounds after all...

Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!

Team Profile

TAM is from the Visual Intelligence and Perception (VIP) Laboratory of Southern University of Science and Technology.

The research directions of this laboratory include text-image-sound multi-model learning, multi-model perception, reinforcement learning and visual defect detection.

Currently, the team has published more than 30 papers and obtained 5 patents.

The leader of the team is Associate Professor Zheng Feng of Southern University of Science and Technology. He graduated with a doctorate from the University of Sheffield in the UK. He has worked for the Institute of Advanced Studies of the Chinese Academy of Sciences, Tencent Youtu and other institutions. He entered Southern University of Science and Technology in 2018 and was promoted to Associate Professor.

Paper address:
https://arxiv.org/abs/2304.11968
GitHub page:
https://github.com/gaomingqi/Track-Anything
Reference link:
https://twitter.com/bilawalsidhu/status/1650710123399233536 ?s=20

The above is the detailed content of Southern Science and Technology’s Black Technology: Eliminate video characters with one click, the special effects artist’s savior is here!. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete