AI在线 AI在线

NVIDIA and HKU Collaborate to Launch New Visual Attention Mechanism, Boosting High-Resolution Generation Speed by Over 84 Times!

Recently, The University of Hong Kong and NVIDIA jointly developed a new visual attention mechanism called Generalized Spatial Propagation Network (GSPN), which has made significant breakthroughs in high-resolution image generation.Although traditional self-attention mechanisms have achieved good results in natural language processing and computer vision fields, they face dual challenges of huge computational overhead and loss of spatial structure when handling high-resolution images. The computational complexity of traditional self-attention mechanisms is O(N²), making it very time-consuming to process long contexts.

Recently, The University of Hong Kong and NVIDIA jointly developed a new visual attention mechanism called Generalized Spatial Propagation Network (GSPN), which has made significant breakthroughs in high-resolution image generation.

Although traditional self-attention mechanisms have achieved good results in natural language processing and computer vision fields, they face dual challenges of huge computational overhead and loss of spatial structure when handling high-resolution images. The computational complexity of traditional self-attention mechanisms is O(N²), making it very time-consuming to process long contexts. Additionally, converting two-dimensional images into one-dimensional sequences causes the loss of spatial relationships.

image.png

To address these issues, GSPN adopts an innovative two-dimensional linear propagation method combined with the "stability-context conditioning" theory, reducing the computational complexity to √N level while preserving the spatial coherence of the image. This new mechanism significantly improves computational efficiency and sets performance records on multiple visual tasks.

The core technology of GSPN includes two major parts: two-dimensional linear propagation and stability-context conditioning. By scanning row by row or column by column, GSPN can efficiently process two-dimensional images. Compared with traditional attention mechanisms, GSPN not only reduces the number of parameters but also maintains the integrity of information propagation. Moreover, researchers proposed the stability-context conditioning to ensure the system's stability and reliability during long-distance propagation.

In experiments, GSPN demonstrated outstanding performance. In image classification tasks, GSPN achieved an 82.2% Top-1 accuracy at 5.3GFLOPs, surpassing many existing models. In image generation tasks, GSPN increased the generation speed by 1.5 times when processing 256×256 tasks. Especially in text-to-image generation tasks, GSPN could quickly generate images at a resolution of 16K×8K, accelerating inference time by more than 84 times, showcasing its great potential in practical applications.

In summary, GSPN, through its unique design concepts and structures, has significantly improved computational efficiency while maintaining spatial coherence, opening up new possibilities for future multimodal models and real-time visual applications.

Project homepage: https://whj363636.github.io/GSPN/

Code: https://github.com/NVlabs/GSPN

Key points:

🌟 GSPN boosts the generation speed of high-resolution images by over 84 times through an innovative two-dimensional linear propagation mechanism.

💡 This mechanism solves the problems of computational complexity and loss of spatial structure in traditional self-attention mechanisms when handling high-resolution images.

🚀 GSPN sets performance records in multiple visual tasks, providing new directions for future applications.

相关资讯

New AI Breakthrough! The First Explainable Detection Framework for Images and Videos Officially Released

With the rapid development of artificial intelligence-generated content (AIGC) technology, the vivid images and videos on social media are becoming increasingly difficult to distinguish between truth and falsehood. To address this challenge, researchers have jointly launched "IVY-FAKE," the first explainable detection framework specifically designed for images and videos. This framework aims to enable AI not only to identify the authenticity of content but also to clearly explain its reasoning behind the judgment.In the era of AIGC, traditional detection tools often operate in a "black box" manner.
6/16/2025 11:01:54 AM
AI在线

Getty 携手英伟达升级 AI 文生图服务:6 秒生成 4 张照片、提示词最多 250 个单词

Getty Images 和英伟达公司昨日(7 月 29 日)发布声明,联合推出安全的商业文生图 AI 模型,能够在 6 秒时间内生成 4 张照片,比以前的模型性能提高了一倍,速度处于行业领先水平。图源:英伟达Getty Images 表示全新文生图 AI 模型部分基于英伟达 Edify 模型架构,该架构隶属于英伟达 Picasso,主要为视觉设计搭建和部署生成式 AI 模型。英伟达 Edify 模型架构不仅能够带来更快的生成速度、更高的质量、更符合用户输入的提示词,而且该改进了 4K 采样和微调模型的能力。相比较
7/30/2024 2:28:33 PM
故渊

调查:超72% 的企业选择 AI 工具时最看重易用性

根据最近的一项 CIO 报告,企业在人工智能(AI)领域的投资高达2.5亿美元,尽管在证明投资回报率(ROI)方面面临挑战。 商业领袖们正努力提高生产力,但新技术的集成往往需要重构现有应用、更新流程并激励员工学习,以适应现代商业环境。 QuickBlox 首席执行官 Nate MacLeitch 对136位高管进行了调查,以揭示 AI 采用的现实情况,探讨领导者的首要任务、主要担忧以及他们在2025年寻找可信工具的信息来源。
3/18/2025 10:02:00 AM
AI在线
  • 1