


Transformer revolutionizes 3D modeling, MeshGPT generation effect alarms professional modelers, netizens: revolutionary idea
In the field of computer graphics, triangle meshes are the main way to represent 3D geometric objects, and are also the most commonly used 3D resource expression methods in games, movies, and virtual reality interfaces. The industry usually uses triangular meshes to simulate the surfaces of complex objects, such as buildings, vehicles, animals, etc. At the same time, common geometric transformations, geometry detection, rendering and shading operations also need to be performed based on triangle meshes
Compared with other 3D shape representations such as point clouds or voxels, triangles Meshes provide a more coherent surface representation: more controllable, easier to manipulate, more compact, and can be directly applied in modern rendering pipelines, achieving higher visual quality with fewer primitives
Previously, researchers have tried to use representation methods such as voxels, point clouds and neural fields to generate 3D models. These representation methods also need to be converted into meshes through post-processing. for use in downstream applications, such as isosurface processing using the Marching Cubes algorithm
Unfortunately, this approach results in an overly dense mesh and an overly detailed mesh, often Bumpy errors caused by over-smoothing and isosurfacing will appear, as shown in the following image:
Compare 3D meshes modeled by 3D modeling professionals are more compact in representation while maintaining sharp detail with fewer triangles.
Many researchers have long hoped to solve the task of automatically generating triangle meshes to further simplify the process of creating 3D assets.
In a recent paper, researchers proposed a new solution: MeshGPT, which directly generates the mesh representation as a set of triangles.
The paper link can be found at: https://nihalsid.github.io/mesh-gpt/static/MeshGPT.pdf
Inspired by the Transformer language generation model, they adopted a direct sequence generation method to synthesize triangle meshes into triangle sequences
Following the paradigm of text generation, researchers first learned a vocabulary of triangles, where triangles were encoded as latent quantized embeddings. To encourage the learned triangle embeddings to preserve local geometric and topological features, we employ a graph convolutional encoder. These triangle embeddings are then decoded by a ResNet decoder, which processes the sequence of tokens representing the triangles to generate the vertex coordinates of the triangles. Finally, the researchers trained a GPT-based architecture based on the learned vocabulary to automatically generate a sequence of triangles representing the mesh, and achieved the advantages of clear edges and high fidelity.
Experiments across multiple categories on the ShapeNet dataset show that MeshGPT significantly improves the quality of generated 3D meshes compared to existing techniques, Shape coverage improved by an average of 9%, and FID scores improved by 30 points.
On social media platforms, MeshGPT has also sparked heated discussions:
Someone once said : "This is the truly revolutionary idea."
One netizen pointed out that the highlight of this method is that it overcomes other The biggest obstacle to the 3D modeling approach is the ability to edit.
Some people boldly predict that maybe all the problems that have not been solved since the 1990s can be inspired by Transformer:
There are also users engaged in 3D/movie production-related industries who expressed concerns about their careers:
However, some people pointed out that based on the generation examples provided in the paper, this method has not yet reached the stage of large-scale application. A professional modeler can create these meshes in less than 5 minutes
This commenter stated,The next step might be to have the LLM control the generation of the 3D seeds and add the image model to the autoregressive part of the architecture. After reaching this step, the production of 3D assets for games and other scenes can be automated on a large scale.
Next, let’s take a look at the research details of the MeshGPT paper.
Overview of Method
Inspired by the progress of large language models, the researchers developed a sequence-based method that uses triangular meshes as Triangular sequences are autoregressively generated. This method produces clean, coherent and compact meshes with sharp edges and high fidelity.
The researchers first learned geometric vocabulary embeddings from large 3D object meshes to be able to encode and decode triangles. Then, based on the learned embedding vocabulary, the Transformer for grid generation is trained in an autoregressive manner for index prediction.
#To learn the triangle vocabulary, the researchers used a graph convolutional encoder that operates on the triangles of the grid and their neighborhoods to extract Rich geometric features capture the intricate details of 3D shapes. These features are quantized as Embedding in the codebook through residual quantization, effectively reducing the sequence length of the grid representation. After sorting, these embedded information are decoded by a one-dimensional ResNet guided by the reconstruction loss. This stage lays the foundation for subsequent training of Transformer.
Next, the researchers used these quantized geometric embeddings to train a pure decoder transformer similar to GPT. They do this by extracting a sequence of geometric embeddings in mesh triangles and training the transformer to predict the codebook index of the next embedding in the sequence
After training, the transformer can autoregressively sample to predict the embedding sequences, and then decodes these embeddings to generate novel and diverse mesh structures showing efficient, irregular triangles similar to human-drawn meshes.
MeshGPT uses a graph convolution encoder to process mesh surfaces and uses geometric neighborhood information to capture and represent 3D Strong features of complex shape details are then quantized into codebook embeddings using a residual quantization method. This approach ensures better reconstruction quality compared to simple vector quantization. Guided by the reconstruction loss, MeshGPT sorts and decodes the quantized embeddings via ResNet.
This study uses the Transformer model to generate grid sequences as token indexes from the pre-trained codebook vocabulary library. During training, the image encoder extracts features from mesh surfaces and quantizes them into a set of surface embeddings. These embeddings are tiled, marked with start and end tokens, and then fed into the above GPT type Transformer model. The decoder is optimized with a cross-entropy loss to predict the subsequent codebook index of each embedding
Experimental results
This study combines MeshGPT with common Comparative experiments were conducted on mesh generation methods, including:
- Polygen, which generates polygonal meshes by first generating vertices and then generating faces conditioned on the vertices;
- BSPNet represents the mesh through convex decomposition;
- AtlasNet represents the 3D mesh as the deformation of multiple 2D planes.
Additionally, the study compared MeshGPT with the neural field-based SOTA method GET3D.
As shown in Figure 6, Figure 7 and Table 1, MeshGPT outperforms the baseline method in all 4 categories. MeshGPT can generate sharp, compact meshes with finer geometric details.
Specifically, compared with Polygen, MeshGPT can generate shapes with more complex details, and Polygen is more likely to accumulate errors during the inference process; AtlasNet often suffers from folding artifacts ), resulting in lower diversity and lower shape quality; BSPNet using planar BSP trees tends to produce blocky shapes with unusual triangulation patterns; GET3D produces good high-level shape structures, but has too many triangles and imperfect planar surfaces .
##As shown in the table As shown in 2, the study also allowed users to evaluate the quality of meshes generated by MeshGPT, with MeshGPT significantly outperforming AtlasNet, Polygen, and BSPNet in terms of shape and triangulation quality. Most users preferred the shape quality (68%) and triangulation quality (73%) generated by MeshGPT compared to GET3D.
The rewritten content is: novel shape. As shown in Figure 8, MeshGPT is able to generate novel shapes beyond the training dataset, ensuring that the model does more than just retrieve existing shapes
Shape completion. As shown in Figure 9 below, MeshGPT can also infer multiple possible completions based on a given local shape and generate multiple shape hypotheses.
The above is the detailed content of Transformer revolutionizes 3D modeling, MeshGPT generation effect alarms professional modelers, netizens: revolutionary idea. For more information, please follow other related articles on the PHP Chinese website!

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

Dreamweaver Mac version
Visual web development tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software