search
HomeTechnology peripheralsAIUse vision to prompt! Shen Xiangyang showed off the new model of IDEA Research Institute, which requires no training or fine-tuning and can be used out of the box.

What kind of experience will it bring when using visual prompts?

Just draw a random outline in the picture and the same category will be marked immediately!

Use vision to prompt! Shen Xiangyang showed off the new model of IDEA Research Institute, which requires no training or fine-tuning and can be used out of the box.

Even the grain-counting step is difficult for GPT-4V to handle. You only need to manually pull the box to find all the rice grains.

Use vision to prompt! Shen Xiangyang showed off the new model of IDEA Research Institute, which requires no training or fine-tuning and can be used out of the box.

There is a new target detection paradigm!

At the just-concluded IDEA Annual Conference, Shen Xiangyang, founding chairman of the IDEA Research Institute and foreign academician of the National Academy of Engineering, presented the latest research results -

Based on the Visual Prompt model The content of T-Rex needs to be rewritten

Use vision to prompt! Shen Xiangyang showed off the new model of IDEA Research Institute, which requires no training or fine-tuning and can be used out of the box.

The entire interactive process is ready to use out of the box and can be completed in just a few steps.

Previously, Meta’s open source SAM segmented all models, which directly ushered in the GPT-3 moment in the CV field. However, it was still based on the text prompt paradigm, which would be more difficult to deal with some complex and rare scenarios.

Now you can easily solve the problem by exchanging pictures for pictures.

In addition, the entire conference is also full of useful information, such as Think-on-Graph knowledge-driven large model, developer platform MoonBit, AI scientific research artifact ReadPaper update 2.0, SPU confidential computing co-processor , controllable portrait video generation platform HiveNet, etc.

Finally, Shen Xiangyang also shared the project on which he spent the most time in the past few years: Low-altitude Economy.

I believe that when the low-altitude economy is relatively mature, there will be 100,000 drones in the sky of Shenzhen every day, and millions of drones taking off every day

Use vision to make prompts

In addition to the basic single-round prompt function, T-Rex also supports three advanced modes

  • Multi-round positive mode

This is similar to multiple rounds of dialogue, which can produce more accurate results and avoid missed detections

  • Positive and negative example mode

It is suitable for scenarios where visual cues are ambiguous and cause false detections.

Cross-graph mode allows you to redesign and layout charts to easily visualize data and information

By using one reference chart to detect other images

Use vision to prompt! Shen Xiangyang showed off the new model of IDEA Research Institute, which requires no training or fine-tuning and can be used out of the box.

According to reports, T-Rex is not restricted by predefined categories and can use visual examples to specify detection targets, thereby solving the problem that certain objects are difficult to fully express in words and improving prompt efficiency. Especially in the case of complex components in some industrial scenarios, the effect is particularly obvious

Use vision to prompt! Shen Xiangyang showed off the new model of IDEA Research Institute, which requires no training or fine-tuning and can be used out of the box.

In addition, by interacting with users, it can also be quickly evaluated at any time Test results and perform error correction, etc.

The composition of T-Rex mainly includes three components: image encoder, prompt encoder and frame decoder

Use vision to prompt! Shen Xiangyang showed off the new model of IDEA Research Institute, which requires no training or fine-tuning and can be used out of the box.

This work comes from IDEA Research Institute Computer Vision and Robotics Research Center.

The team’s previously open source target detection model DINO is the first DETR model to rank first in the COCO target detection list; it has become a hit on Github (it has received 11K stars so far) Grounding DINO, a zero-sample detector, and Grounded SAM, which can detect and segment everything. For more technical details, please click on the link at the end of the article.

The whole conference is full of useful information

In addition, several research results were also shared at the IDEA conference.

For exampleThink-on-Graph knowledge-driven large model, simply speaking, it combines the large model with the knowledge graph.

Large models are good at intention understanding and autonomous learning, while knowledge graphs are better at logical chain reasoning because of their structured knowledge storage methods.

Think-on-Graph drives the large model agent to "think" on the knowledge graph, and gradually searches and infers the optimal answer (search and reason step by step on the associated entities of the knowledge graph). In every step of reasoning, the large model is personally involved and learns from each other's strengths and weaknesses with the knowledge graph.

Use vision to prompt! Shen Xiangyang showed off the new model of IDEA Research Institute, which requires no training or fine-tuning and can be used out of the box.

MoonBit is a developer platform powered by Wasm and designed for cloud computing and edge computing.

The system not only provides universal programming language design, but also integrates modules such as compilers, build systems, integrated development environments (IDEs), and deployment tools to improve development experience and efficiency

Use vision to prompt! Shen Xiangyang showed off the new model of IDEA Research Institute, which requires no training or fine-tuning and can be used out of the box.

The previously released scientific research artifact ReadPaper has also been updated to 2.0. New functions such as reading copilot and polishing copilot were demonstrated at the press conference.

Use vision to prompt! Shen Xiangyang showed off the new model of IDEA Research Institute, which requires no training or fine-tuning and can be used out of the box.

At the end of the press conference, Shen Xiangyang released the "White Paper on Low-altitude Economic Development (2.0) - Fully Digital Solution". In his Smart Integrated Lower Airspace System, SILAS), a new concept of Temporal Spatial Process was proposed.

T-Rex link:
https://trex-counting.github.io/

The above is the detailed content of Use vision to prompt! Shen Xiangyang showed off the new model of IDEA Research Institute, which requires no training or fine-tuning and can be used out of the box.. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Gemma Scope: Google's Microscope for Peering into AI's Thought ProcessGemma Scope: Google's Microscope for Peering into AI's Thought ProcessApr 17, 2025 am 11:55 AM

Exploring the Inner Workings of Language Models with Gemma Scope Understanding the complexities of AI language models is a significant challenge. Google's release of Gemma Scope, a comprehensive toolkit, offers researchers a powerful way to delve in

Who Is a Business Intelligence Analyst and How To Become One?Who Is a Business Intelligence Analyst and How To Become One?Apr 17, 2025 am 11:44 AM

Unlocking Business Success: A Guide to Becoming a Business Intelligence Analyst Imagine transforming raw data into actionable insights that drive organizational growth. This is the power of a Business Intelligence (BI) Analyst – a crucial role in gu

How to Add a Column in SQL? - Analytics VidhyaHow to Add a Column in SQL? - Analytics VidhyaApr 17, 2025 am 11:43 AM

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

Business Analyst vs. Data AnalystBusiness Analyst vs. Data AnalystApr 17, 2025 am 11:38 AM

Introduction Imagine a bustling office where two professionals collaborate on a critical project. The business analyst focuses on the company's objectives, identifying areas for improvement, and ensuring strategic alignment with market trends. Simu

What are COUNT and COUNTA in Excel? - Analytics VidhyaWhat are COUNT and COUNTA in Excel? - Analytics VidhyaApr 17, 2025 am 11:34 AM

Excel data counting and analysis: detailed explanation of COUNT and COUNTA functions Accurate data counting and analysis are critical in Excel, especially when working with large data sets. Excel provides a variety of functions to achieve this, with the COUNT and COUNTA functions being key tools for counting the number of cells under different conditions. Although both functions are used to count cells, their design targets are targeted at different data types. Let's dig into the specific details of COUNT and COUNTA functions, highlight their unique features and differences, and learn how to apply them in data analysis. Overview of key points Understand COUNT and COU

Chrome is Here With AI: Experiencing Something New Everyday!!Chrome is Here With AI: Experiencing Something New Everyday!!Apr 17, 2025 am 11:29 AM

Google Chrome's AI Revolution: A Personalized and Efficient Browsing Experience Artificial Intelligence (AI) is rapidly transforming our daily lives, and Google Chrome is leading the charge in the web browsing arena. This article explores the exciti

AI's Human Side: Wellbeing And The Quadruple Bottom LineAI's Human Side: Wellbeing And The Quadruple Bottom LineApr 17, 2025 am 11:28 AM

Reimagining Impact: The Quadruple Bottom Line For too long, the conversation has been dominated by a narrow view of AI’s impact, primarily focused on the bottom line of profit. However, a more holistic approach recognizes the interconnectedness of bu

5 Game-Changing Quantum Computing Use Cases You Should Know About5 Game-Changing Quantum Computing Use Cases You Should Know AboutApr 17, 2025 am 11:24 AM

Things are moving steadily towards that point. The investment pouring into quantum service providers and startups shows that industry understands its significance. And a growing number of real-world use cases are emerging to demonstrate its value out

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Will R.E.P.O. Have Crossplay?
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)