AI在线 AI在线

NVIDIA and HKU Collaborate to Launch New Visual Attention Mechanism, Boosting High-Resolution Generation Speed by Over 84 Times!

Recently, The University of Hong Kong and NVIDIA jointly developed a new visual attention mechanism called Generalized Spatial Propagation Network (GSPN), which has made significant breakthroughs in high-resolution image generation.Although traditional self-attention mechanisms have achieved good results in natural language processing and computer vision fields, they face dual challenges of huge computational overhead and loss of spatial structure when handling high-resolution images. The computational complexity of traditional self-attention mechanisms is O(N²), making it very time-consuming to process long contexts.

Recently, The University of Hong Kong and NVIDIA jointly developed a new visual attention mechanism called Generalized Spatial Propagation Network (GSPN), which has made significant breakthroughs in high-resolution image generation.

Although traditional self-attention mechanisms have achieved good results in natural language processing and computer vision fields, they face dual challenges of huge computational overhead and loss of spatial structure when handling high-resolution images. The computational complexity of traditional self-attention mechanisms is O(N²), making it very time-consuming to process long contexts. Additionally, converting two-dimensional images into one-dimensional sequences causes the loss of spatial relationships.

image.png

To address these issues, GSPN adopts an innovative two-dimensional linear propagation method combined with the "stability-context conditioning" theory, reducing the computational complexity to √N level while preserving the spatial coherence of the image. This new mechanism significantly improves computational efficiency and sets performance records on multiple visual tasks.

The core technology of GSPN includes two major parts: two-dimensional linear propagation and stability-context conditioning. By scanning row by row or column by column, GSPN can efficiently process two-dimensional images. Compared with traditional attention mechanisms, GSPN not only reduces the number of parameters but also maintains the integrity of information propagation. Moreover, researchers proposed the stability-context conditioning to ensure the system's stability and reliability during long-distance propagation.

In experiments, GSPN demonstrated outstanding performance. In image classification tasks, GSPN achieved an 82.2% Top-1 accuracy at 5.3GFLOPs, surpassing many existing models. In image generation tasks, GSPN increased the generation speed by 1.5 times when processing 256×256 tasks. Especially in text-to-image generation tasks, GSPN could quickly generate images at a resolution of 16K×8K, accelerating inference time by more than 84 times, showcasing its great potential in practical applications.

In summary, GSPN, through its unique design concepts and structures, has significantly improved computational efficiency while maintaining spatial coherence, opening up new possibilities for future multimodal models and real-time visual applications.

Project homepage: https://whj363636.github.io/GSPN/

Code: https://github.com/NVlabs/GSPN

Key points:

🌟 GSPN boosts the generation speed of high-resolution images by over 84 times through an innovative two-dimensional linear propagation mechanism.

💡 This mechanism solves the problems of computational complexity and loss of spatial structure in traditional self-attention mechanisms when handling high-resolution images.

🚀 GSPN sets performance records in multiple visual tasks, providing new directions for future applications.

相关资讯

New AI Breakthrough! The First Explainable Detection Framework for Images and Videos Officially Released

With the rapid development of artificial intelligence-generated content (AIGC) technology, the vivid images and videos on social media are becoming increasingly difficult to distinguish between truth and falsehood. To address this challenge, researchers have jointly launched "IVY-FAKE," the first explainable detection framework specifically designed for images and videos. This framework aims to enable AI not only to identify the authenticity of content but also to clearly explain its reasoning behind the judgment.In the era of AIGC, traditional detection tools often operate in a "black box" manner.
6/16/2025 11:01:54 AM
AI在线

刚刚,OpenAI推出全新ChatGPT Images,奥特曼亮出腹肌搞宣传

如果你刚刚打开 X 并且正好关注了 OpenAI 和山姆・奥特曼,那么你可能会看到这样的照片:是的,确实有点辣眼睛。 就连 OpenAI 官方号也忍不住吐槽(其实是刷热度):sam.而在评论区,更是一片吐槽和调侃:但不管怎么说,热度是有了。 实际上,山姆・奥特曼之所以发这样一张辣眼睛的图片,正是为 OpenAI 刚刚推出的全新 ChatGPT Images 造势。
12/17/2025 10:26:00 AM
机器之心

Getty 携手英伟达升级 AI 文生图服务:6 秒生成 4 张照片、提示词最多 250 个单词

Getty Images 和英伟达公司昨日(7 月 29 日)发布声明,联合推出安全的商业文生图 AI 模型,能够在 6 秒时间内生成 4 张照片,比以前的模型性能提高了一倍,速度处于行业领先水平。图源:英伟达Getty Images 表示全新文生图 AI 模型部分基于英伟达 Edify 模型架构,该架构隶属于英伟达 Picasso,主要为视觉设计搭建和部署生成式 AI 模型。英伟达 Edify 模型架构不仅能够带来更快的生成速度、更高的质量、更符合用户输入的提示词,而且该改进了 4K 采样和微调模型的能力。相比较
7/30/2024 2:28:33 PM
故渊