Home >Technology peripherals >AI >Transformer revolutionizes 3D modeling, MeshGPT generation effect alarms professional modelers, netizens: revolutionary idea
In the field of computer graphics, triangle meshes are the main way to represent 3D geometric objects, and are also the most commonly used 3D resource expression methods in games, movies, and virtual reality interfaces. The industry usually uses triangular meshes to simulate the surfaces of complex objects, such as buildings, vehicles, animals, etc. At the same time, common geometric transformations, geometry detection, rendering and shading operations also need to be performed based on triangle meshes
Compared with other 3D shape representations such as point clouds or voxels, triangles Meshes provide a more coherent surface representation: more controllable, easier to manipulate, more compact, and can be directly applied in modern rendering pipelines, achieving higher visual quality with fewer primitives
Previously, researchers have tried to use representation methods such as voxels, point clouds and neural fields to generate 3D models. These representation methods also need to be converted into meshes through post-processing. for use in downstream applications, such as isosurface processing using the Marching Cubes algorithm
Unfortunately, this approach results in an overly dense mesh and an overly detailed mesh, often Bumpy errors caused by over-smoothing and isosurfacing will appear, as shown in the following image:
Compare 3D meshes modeled by 3D modeling professionals are more compact in representation while maintaining sharp detail with fewer triangles.
Many researchers have long hoped to solve the task of automatically generating triangle meshes to further simplify the process of creating 3D assets.
In a recent paper, researchers proposed a new solution: MeshGPT, which directly generates the mesh representation as a set of triangles.
The paper link can be found at: https://nihalsid.github.io/mesh-gpt/static/MeshGPT.pdf
Inspired by the Transformer language generation model, they adopted a direct sequence generation method to synthesize triangle meshes into triangle sequences
Following the paradigm of text generation, researchers first learned a vocabulary of triangles, where triangles were encoded as latent quantized embeddings. To encourage the learned triangle embeddings to preserve local geometric and topological features, we employ a graph convolutional encoder. These triangle embeddings are then decoded by a ResNet decoder, which processes the sequence of tokens representing the triangles to generate the vertex coordinates of the triangles. Finally, the researchers trained a GPT-based architecture based on the learned vocabulary to automatically generate a sequence of triangles representing the mesh, and achieved the advantages of clear edges and high fidelity.
Experiments across multiple categories on the ShapeNet dataset show that MeshGPT significantly improves the quality of generated 3D meshes compared to existing techniques, Shape coverage improved by an average of 9%, and FID scores improved by 30 points.
On social media platforms, MeshGPT has also sparked heated discussions:
Someone once said : "This is the truly revolutionary idea."
One netizen pointed out that the highlight of this method is that it overcomes other The biggest obstacle to the 3D modeling approach is the ability to edit.
Some people boldly predict that maybe all the problems that have not been solved since the 1990s can be inspired by Transformer:
There are also users engaged in 3D/movie production-related industries who expressed concerns about their careers:
However, some people pointed out that based on the generation examples provided in the paper, this method has not yet reached the stage of large-scale application. A professional modeler can create these meshes in less than 5 minutes
This commenter stated,The next step might be to have the LLM control the generation of the 3D seeds and add the image model to the autoregressive part of the architecture. After reaching this step, the production of 3D assets for games and other scenes can be automated on a large scale.
Next, let’s take a look at the research details of the MeshGPT paper.
Inspired by the progress of large language models, the researchers developed a sequence-based method that uses triangular meshes as Triangular sequences are autoregressively generated. This method produces clean, coherent and compact meshes with sharp edges and high fidelity.
The researchers first learned geometric vocabulary embeddings from large 3D object meshes to be able to encode and decode triangles. Then, based on the learned embedding vocabulary, the Transformer for grid generation is trained in an autoregressive manner for index prediction.
#To learn the triangle vocabulary, the researchers used a graph convolutional encoder that operates on the triangles of the grid and their neighborhoods to extract Rich geometric features capture the intricate details of 3D shapes. These features are quantized as Embedding in the codebook through residual quantization, effectively reducing the sequence length of the grid representation. After sorting, these embedded information are decoded by a one-dimensional ResNet guided by the reconstruction loss. This stage lays the foundation for subsequent training of Transformer.
Next, the researchers used these quantized geometric embeddings to train a pure decoder transformer similar to GPT. They do this by extracting a sequence of geometric embeddings in mesh triangles and training the transformer to predict the codebook index of the next embedding in the sequence
After training, the transformer can autoregressively sample to predict the embedding sequences, and then decodes these embeddings to generate novel and diverse mesh structures showing efficient, irregular triangles similar to human-drawn meshes.
MeshGPT uses a graph convolution encoder to process mesh surfaces and uses geometric neighborhood information to capture and represent 3D Strong features of complex shape details are then quantized into codebook embeddings using a residual quantization method. This approach ensures better reconstruction quality compared to simple vector quantization. Guided by the reconstruction loss, MeshGPT sorts and decodes the quantized embeddings via ResNet.
This study uses the Transformer model to generate grid sequences as token indexes from the pre-trained codebook vocabulary library. During training, the image encoder extracts features from mesh surfaces and quantizes them into a set of surface embeddings. These embeddings are tiled, marked with start and end tokens, and then fed into the above GPT type Transformer model. The decoder is optimized with a cross-entropy loss to predict the subsequent codebook index of each embedding
This study combines MeshGPT with common Comparative experiments were conducted on mesh generation methods, including:
Additionally, the study compared MeshGPT with the neural field-based SOTA method GET3D.
As shown in Figure 6, Figure 7 and Table 1, MeshGPT outperforms the baseline method in all 4 categories. MeshGPT can generate sharp, compact meshes with finer geometric details.
Specifically, compared with Polygen, MeshGPT can generate shapes with more complex details, and Polygen is more likely to accumulate errors during the inference process; AtlasNet often suffers from folding artifacts ), resulting in lower diversity and lower shape quality; BSPNet using planar BSP trees tends to produce blocky shapes with unusual triangulation patterns; GET3D produces good high-level shape structures, but has too many triangles and imperfect planar surfaces .
##As shown in the table As shown in 2, the study also allowed users to evaluate the quality of meshes generated by MeshGPT, with MeshGPT significantly outperforming AtlasNet, Polygen, and BSPNet in terms of shape and triangulation quality. Most users preferred the shape quality (68%) and triangulation quality (73%) generated by MeshGPT compared to GET3D.
The rewritten content is: novel shape. As shown in Figure 8, MeshGPT is able to generate novel shapes beyond the training dataset, ensuring that the model does more than just retrieve existing shapes
Shape completion. As shown in Figure 9 below, MeshGPT can also infer multiple possible completions based on a given local shape and generate multiple shape hypotheses.
The above is the detailed content of Transformer revolutionizes 3D modeling, MeshGPT generation effect alarms professional modelers, netizens: revolutionary idea. For more information, please follow other related articles on the PHP Chinese website!