Home >Technology peripherals >AI >Google StyleDrop surpasses MidJourney in terms of controllability, and former GitHub CTO uses AI to subvert programming
The AI Venture Capital Weekly Report released by Alpha Commune focuses on the new trends in artificial intelligence represented by large language models and generative AI. Alpha Commune hopes to discover and invest in extraordinary entrepreneurs (AlphaFounders), believing that extraordinary entrepreneurs are a huge driving force in technology, business and society, and they guide the direction of the venture capital ecosystem.
This week, we observed the following new trends and trends in the field of AI:
1. AI visual generation and multi-modality are progressing rapidly: Google StyleDrop has become a new "SOTA model" in terms of style consistency and controllability, and PandaGPT launched by Cambridge and Tencent unifies 6 modalities. .
2. AI programming capabilities have become the focus of breakthroughs: Google launched the new DIDACT programming framework, Baidu Comate programming assistant debuted, and the former CTO of Github started a business to build a trillion-parameter large model in the field of programming.
3. Various new alignment methods want to subvert RLHF: Direct preference optimization (DPO) simplifies the preference learning pipeline, and Stanford and Google DeepMind have developed simpler and more effective value alignment methods.
4. New artificial intelligence research makes the sorting algorithm 70% faster: Google DeepMind’s AlphaDev has increased the speed by 70% in running the C sorting algorithm trillions of times.
5. Many startups are trying to solve the problem of AI computing power: Two Harvard dropouts built a dedicated chip for large language model inference, which increased the performance by 140 times per dollar and provided cloud computing for generative AI. Capable CoreWeave has raised a total of over US$400 million in one month.
Google's latest StyleDrop can be called Midjourney's rival. It can deconstruct and reproduce any complex art style through a reference picture, including abstract works, different styles of LOGOs, etc., and "pre-SOTA models" In comparison, StyleDrop excels in style consistency and text alignment. It provides a more controllable painting process and enables fine work that was previously unimaginable.
StyleDrop is built on Muse, a state-of-the-art text-to-image synthesis model based on the mask-generated image Transformer. It contains two synthesis modules for basic image generation and super-resolution, each module consists of a text It consists of an encoder T, a transformer G, a sampler S, an image encoder E and a decoder D.
StyleDrop’s training process includes two key aspects. The first is effective fine-tuning of parameters. By fine-tuning the parameters of the generated visual Transformer, it can generate images of similar style on a given reference image. This is followed by iterative training with feedback. Through the iterative training process, the generated images are gradually optimized to improve style consistency and text alignment.
Google DeepMind Hassabis’s two sentences detonated the computer field: “AlphaDev discovered a new and faster sorting algorithm, and we have open sourced it into the main C library for developers to use. This is just an AI improvement The beginning of progress in code efficiency.”
Based on the AlphaZero model, AlphaDev transforms the sorting problem into a single-player "assembly game". By searching a large number of possible instruction combinations, it discovers sorting algorithms that are faster than existing algorithms. In C sorting algorithms that run trillions of times Improved speed by 70%. Relevant research papers have been published in the authoritative scientific journal Nature, and this result has now been incorporated into the LLVM standard C library Abseil and open sourced.
Daniel J. Mankowitz, one of the main authors of AlphaDev, said: This technology has an important impact on programming and digital society, will save billions of people time and energy, and is expected to optimize the entire computing ecosystem.
A technology called Neuralangelo from Nvidia and Johns Hopkins University can automatically generate detailed 3D models from ordinary videos. It uses an SDF-based neural rendering reconstruction and multi-resolution hash encoding architecture to generate 3D structures without depth data. Currently, related papers have been selected for CVPR 2023.
In the paper, Neuralangelo was tested using DTU and Tanks and Temples data sets, and the results showed that it performed accurately in 3D detail generation and image restoration. Compared with "pre-SOTA models" such as NeuS and NeuralWarp, Neuralangelo shows excellent results on both the DTU dataset and the Tanks and Temples dataset.
In order to allow large language models to understand and interact with video content, researchers from DAMO Academy proposed Video-LLaMA, a large-scale model with audio-visual capabilities. This model can perceive and understand video and audio signals, understand user instructions, and complete complex tasks such as audio and video description and question and answer.
However, this model still has limitations such as limited perceptual capabilities, difficulty in processing long videos, and inherent hallucinations in language models. DAMO Academy said it is building high-quality audio-video-text data sets to improve perception capabilities.
Recently, researchers from Cambridge, NAIST and Tencent AI Lab launched a cross-modal language model called PandaGPT. PandaGPT combines the modal alignment capability of ImageBind and the generation capability of Vicuna to achieve command understanding and following capabilities in six modalities. The model demonstrates the ability to understand different modalities, including image/video-based Q&A, creative writing, and visual-auditory reasoning. The model can process image, video, text, audio, heat map, depth map and IMU data and naturally combine their semantics.
By fine-tuning the LLaMA model, researchers at the National University of Singapore developed Goat, a model dedicated to arithmetic with a parameter size of 7 billion, which is significantly better than GPT-4 in terms of arithmetic capabilities. Goat performs excellently on the BIG-bench arithmetic subtask, with accuracy exceeding Bloom, OPT, GPT-NeoX, etc. Among them, the accuracy achieved by the zero-sample Goat-7B even exceeds the PaLM-540 after few-sample learning.
Goat achieves near-perfect accuracy on large number addition and subtraction operations by fine-tuning on a synthetic arithmetic data set, surpassing other pre-trained language models. For more challenging multiplication and division tasks, the researchers proposed a task classification and decomposition method to improve arithmetic performance by decomposing it into learnable subtasks. This research provides useful exploration and inspiration for the progress of language models on arithmetic tasks.
7. iFlytek Spark Cognitive Large Model V1.5 is released, with multiple rounds of dialogue and mathematical abilities further upgraded
On June 9, iFlytek Spark Cognitive Large Model V1.5 was released. This version has made breakthroughs in open-ended question and answer, with multiple rounds of dialogue and mathematical abilities upgraded, as well as text generation, language understanding, and logical reasoning abilities. In addition, iFlytek will also bring the "Spark Cognitive Model" to the mobile terminal and release its Spark APP.
According to the plan, iFlytek will carry out three rounds of iterative upgrades this year, with the goal of benchmarking ChatGPT on October 24. In addition to June 9th, the next upgrade node is August 15th, which is mainly to upgrade code capabilities and multi-modal interaction.
Google recently announced a framework called DIDACT, which uses AI technology to enhance software engineering and assist developers in writing and modifying code in real time.
The model of the DIDACT framework is multi-modal in nature and can predict the next editing operation based on the developer's historical operations. This capability allows the model to better understand the developer's intent and provide accurate recommendations. The model can also complete more complex tasks, such as starting from a blank file and continuously predicting subsequent edit operations until a complete code file is generated.
DIDACT tools include annotation parsing, build fixing, and hint prediction, each integrated at different stages of the development workflow. Records of these tools' interactions with developers are used as training data to help models predict developer actions during software engineering tasks.
9. Baidu launches Comate, a code writing assistant based on large models, which improves Wen Xinyiyan’s high-performance model reasoning ability by 50 times
Recently, Baidu Smart Cloud launched the Comate coding intelligent recommendation tool and officially opened invitation testing. Comate is similar to code writing assistants such as GitHub Copilot, but uses more Chinese comments and development documents as training data. During the coding process, Comate can reason about possible next input choices based on what the developer is currently writing. According to Baidu, Comate capabilities have been first integrated into all business lines of Baidu and have achieved good results: 50% of the code in the core R&D department can be generated through Comate.
In addition, Baidu stated that Wen Xinyiyan’s reasoning performance has been improved by 10 times. At the same time, based on the complete tool chain provided by Wenxin Qianfan's large model platform, in high-frequency and core scenarios of enterprise applications, Wenxinyiyan's high-performance mode "Wenxinyiyan-Turbo" has improved the performance of inference services 50 times.
A study led by Jeff Clune, a former senior member of the OpenAI research team, found that the performance and safety of artificial intelligence agents can be improved by letting them imitate human thinking and actions. The research uses a dataset of thoughts spoken by humans as they act, allowing an agent to learn the ability to think and combine it with modeled behavior. This approach is called "thought cloning", where upper-level components generate ideas and lower-level components perform actions.
The researchers used millions of hours of thought data collected from YouTube videos and text recordings for training. Experimental results show that the "thought cloning" method outperforms the traditional behavior cloning method and performs better in out-of-distribution tasks. This research is of great significance to the development of artificial intelligence, improving the intelligence level and safety of agents and making them easier to understand and control.
The paper "ByteTransformer: A High-Performance Transformer Boosted for Variable-Length" published by ByteDance, NVIDIA and the University of California, Riverside won the best paper in IPDPS 2023.
ByteTransformer is a GPU-based Transformer inference library developed by ByteDance. ByteTransformer is an efficient Transformer implementation that achieves high performance on BERT transformer through a series of optimization methods. For variable-length text input, compared to other Transformer implementations, ByteTransformer can achieve an average acceleration of more than 50% in experiments. It is suitable for accelerating natural language processing tasks and improving the efficiency of model training and inference.
RLHF (Reinforcement Learning with Human Feedback) is currently a popular method for aligning large models with humans. It gives the model impressive dialogue and coding capabilities, but the RLHF pipeline is much more complex than supervised learning and involves more training. language model and sample from the language model policy in the training loop, resulting in a large computational cost.
Recently, Stanford University and other institutions have proposed a research called Direct Preference Optimization (DPO). The research shows that the RL-based objective used by existing methods can be accurately optimized with a simple binary cross-entropy objective, thereby simplifying preference learning. pipeline. That is, it is entirely possible to directly optimize language models to adhere to human preferences without the need for explicit reward models or reinforcement learning.
An important step in the development of language models is to make their behavior consistent with human social values, also known as value alignment. The current mainstream method is RLHF.
However, there are several problems with this approach. First, the rewards generated by the agent model are easily hacked, leading to responses that do not meet expectations. Secondly, the agent model and the generative model need to continuously interact, making the training process time-consuming and inefficient. Third, the reward model itself does not exactly correspond to the model of human thinking.
A recent study from Dartmouth, Stanford, Google DeepMind and other institutions shows that using social games to build high-quality data combined with simple and efficient alignment algorithms may be the key to achieving value alignment. The researchers proposed a method for alignment training on multi-agent game data. They developed a virtual social model called Sandbox, in which social individuals make a good impression by responding to social norms. By learning from sandbox historical data, they proposed a stable alignment algorithm. Experimentally verified, alignment-trained models are able to generate socially normative responses more quickly. The stable alignment algorithm is comparable to RLHF in terms of performance and training stability, and proposes a simpler and more effective value alignment method.
1. Poolside, founded by the former GitHub CTO, received US$26 million in seed round financing
Recently, Poolside received US$26 million in seed round financing led by Redpoint Ventures. Poolside’s goal is to unleash human potential by pursuing software-created AGI (artificial general intelligence), and is based on a basic concept: in the human direction The path to AGI transition should be achieved by building specific capabilities rather than general methods.
Jason Warner, the founder of Poolside, previously served as the managing director of Redpoint Ventures and earlier served as the CTO of GitHub. His team was responsible for developing GitHub Copilot. He co-founded Poolside with serial entrepreneur Eiso Kant, directly targeting OpenAI.
Poolside is building a powerful next-generation basic model and infrastructure. It may be a trillion-parameter model focused on software and code. Using the capabilities of this model, artists, doctors, scientists, and educators can super With low barriers to entry for building software and products 1,000 times faster than today, creating software will become feasible and ubiquitous for everyone.
2. UpdateAI, an AI-powered customer success platform, received $2.3 million in early investment from IdealabX, Zoom Ventures, and a16z
UpdateAI is a customer success platform provider that recently received US$2.3 million in financing led by IdealabX.
UpdateAI simplifies the tedious work of customer calls, allowing customer success managers to focus on delivering scalable customer insights. The platform integrates with Zoom Meetings and leverages ChatGPT to generate smart meeting summaries that provide a concise meeting overview and automate post-call tasks like sending follow-up emails to customers.
UpdateAI’s co-founder and CEO Josh Schachter is a serial founder with a mixed background. Before founding UpdateAI, he not only had two entrepreneurial experiences and multiple professional experiences as a product manager in large companies, but also served as a director at Boston Consulting Group. Have a deep understanding of business needs.
UpdateAI received US$2.3 million in financing. This round of financing was led by IdealabX, with participation from Zoom Ventures and a16z. UpdateAI has previously received $1.7 million in funding, and this round brings its total funding to $4 million.
3. CoreWeave, which focuses on providing cloud computing capabilities for generative AI, received another US$200 million in strategic financing within a month
CoreWeave is a startup company focusing on AI cloud computing. Its investor, Magnetar Capital, has led its strategic financing of US$200 million after previously leading a US$221 million Series B financing. CoreWeave’s current capital is US$2 billion. unicorn.
CoreWeave offers more than a dozen SKUs of NVIDIA GPU cloud services, including H100, A100, A40 and RTX A6000, for a variety of use cases including artificial intelligence and machine learning, visual effects and rendering, batch processing and pixel streaming.
CoreWeave was founded by Intrator, Brian Venturo, and Brannin McBee, who initially focused on cryptocurrency applications and have since pivoted to general computing as well as generative AI technologies such as text generation AI models.
In CoreWeave’s previous $221 million Series B financing, in addition to the lead investor Magnetar Capital, there were also investors such as NVIDIA, former GitHub CEO Nat Friedman, and former Apple executive Daniel Gross.
4. Workflow automation engine 8Flow.ai received US$6.6 million in seed round financing
Recently, 8Flow.ai received US$6.6 million in seed round financing led by Caffeinated Capital. Institutions such as BoxGroup and Liquid2 and individual investors such as former GitHub CEO Nat Friedman and Howie Liu also participated.
The company launched a self-learning workflow automation engine for enterprises that integrates with tools such as Zendesk, ServiceNow and Salesforce Service Cloud to assist agents in completing daily tasks. In the future, the company plans to use all this data to train machine learning models to generate AI workflows tailored to each user's needs.
8Flow.ai’s product currently exists in the form of a Chrome browser extension that automatically copies and pastes relevant data from one program to another. The tool automatically learns common steps for each agent and renders them into actions that can be triggered with a single click.
8Flow.ai founder Boaz Hecht was the co-founder and CEO of SkyGiraffe, and later served as vice president of the ServiceNow platform, responsible for mobile and artificial intelligence chat robot products.
5. Hyro, a conversational artificial intelligence platform in the medical field, received US$20 million in Series B financing led by Macquarie Capital
Recently, Hyro, a conversational artificial intelligence platform in the medical field, received US$20 million in Series B financing led by Macquarie Capital.
Hyro was co-founded by two Cornell University alumni, Israel Krush and Rom Cohen. Israel Krush is a serial entrepreneur with rich industry experience.
Hyro leverages unique natural language processing and knowledge graph technologies to build a plug-and-play internal chat interface for medical systems to cover 85% of daily tasks in general medical departments. Hyro can perform client maintenance work without training data and update internal information in real time. The platform’s built-in AI assistant can match the original workflow of the medical department to help them centralize communications, improve services and reduce operating costs.
It is reported that Hyro’s ARR has increased by more than 100% year-on-year, and major customers include Mercy Health, Baptist Health, Intermountain Healthcare, etc.
6. Predibase, a commercial low-code machine learning platform, completed US$12.2 million in Series A financing
Predibase is a commercial low-code machine learning platform for developers. It helps users without machine learning skills quickly and easily build, iterate, and deploy complex AI applications. Recently, Predibase received $12.2 million in Series A financing led by Felicis.
Using Predibase’s platform, even users without machine learning skills can quickly and easily build, iterate, and deploy complex AI applications. Users only need to define the required content through the platform's own AI model, and the rest of the operations are automatically completed by the platform. Novice users can choose the recommended model architecture, and expert users can fine-tune all model parameters according to their own needs, greatly shortening the original deployment time of AI applications.
Predibase’s founder and CEO Piero Molino has a background at the intersection of industry and academia. He has had professional experience at IBM and Uber, and has also worked as a research scientist at Stanford University.
7. Beehive AI, an AI analysis platform for unstructured customer data, received US$5.1 million in seed round financing
Beehive AI is the world's first AI platform specifically designed to analyze unstructured customer data. It recently received US$5.1 million in seed round financing led by Valley Capital Partners.
Beehive AI is an end-to-end, customizable enterprise AI platform for consumer research with unprecedented accuracy, relevance and scale. By analyzing unstructured open data, combined with quantitative data, Beehive AI helps companies extract new insights, helping them better understand and serve their customers.
The platform allows customers to upload their existing data collected on any platform or launch an AI-designed survey asking open-ended questions to obtain rich and nuanced feedback from customers. It then performs custom analysis on the data and allows customers to explore insights using intuitive programmable dashboards.
8. Etched.ai, a chip design and developer dedicated to large language model inference, received US$5.36 million in seed round financing
Etched.ai is a designer and developer of dedicated chips for large language model reasoning. Recently, it received US$5.36 million in seed round financing led by Primary Venture Partners, with participation from former Ebay CEO Devin Wenig and others. The company's current valuation is approximately US$34 million.
Etched.ai, founded by Harvard dropouts Gavin Uberti and Chris Zhu, has designed a more professional, lower-power chip for running generative AI models. They hope to introduce its chip to the market in the third quarter of 2024 , and plans to sell to major cloud service providers.
The founders of Etched.ai say simulations show their chip offers a 140x improvement in performance per dollar compared to traditional GPUs.
9. Using artificial intelligence to improve the cost-effectiveness of cloud computing, Antimetal received US$4.3 million in seed round financing
Recently, Antimetal, which is committed to developing AI technology to improve the cost-effectiveness of cloud computing, completed a US$4.3 million seed round of financing led by Framework Ventures.
Antimetal uses proprietary machine learning models to optimize cloud computing deployment, entering into the most mainstream AWS cloud computing service, and will expand to other cloud computing platforms such as Google and Microsoft in the future.
The company develops online algorithms, uses artificial intelligence to study market dynamics, and then integrates, schedules, and resells these cloud computing resources. It takes an average of 90 days for businesses to sell these idle AWS resources, but with Antimetal, the deal is completed about three times faster.
The company’s founder and CEO, Matthew Parkhurst, worked in SaaS companies for a long time before starting his own business and has more than 7 years of industry experience.
10. AI medical imaging startup Hypervision Surgical received £6.5 million in seed round financing
Hypervision Surgical recently received £6.5 million in seed round financing from HERAN Partners, Redalpine and ZEISS Ventures.
Hypervision Surgical is a spin-out from King’s College London, founded by a team of clinicians, medical imaging and artificial intelligence experts. Its goal is to equip clinicians with advanced computer-assisted tissue analysis to improve surgical precision and patient safety, reducing patient morbidity and healthcare costs in the surgical specialty.
Currently, the company is developing medical imaging for surgical procedures by combining AI hyperspectral imaging and edge computing. With this technology, surgeons can rely on precise measurements and tissue property information to differentiate between healthy and unhealthy tissue during complex oncology surgeries.
Martin Frost, a core member of the company team, was the founder and former CEO of CMR Surgical, a surgical robot company. Company CEO Michael Ebner graduated from King's College London and was elected to the Royal Academy of Engineering.
This article was compiled by Alpha Commune from multiple information sources and was written with the assistance of ChatGPT.
About Alpha Commune
The above is the detailed content of Google StyleDrop surpasses MidJourney in terms of controllability, and former GitHub CTO uses AI to subvert programming. For more information, please follow other related articles on the PHP Chinese website!