Tapping Into the Power: Graphics Performance Analyzers

Creative enthusiasts working with entertainment media such as HD video need highly responsive tools that sustain the creative flow. After all, waiting for an effect to render or suffering through herky-jerky video playback are sure ways to squelch inspiration.

So tapping into the full performance potential of today’s desktops, laptops, tablet PCs and mobile computing architectures — and eliminating latency-inducing bottlenecks — is essential for media application developers. These are typically daunting time- and resource-intensive tasks.

Fortunately, a number of powerful developer tools can help streamline the process of analyzing and optimizing media and other graphics-intensive applications. For example, Graphics Performance Analyzers (GPA) allow developers to increase the parallelization of their code, readily identify and eliminate hotspots and bottlenecks and accelerate media encoding, decoding, preprocessing and transcoding operations across a variety of platforms, including legacy and the current second-generation processor family.

Making an Impact With Performance

Optimization is a critical part of the product development workflow, especially for media application developers. For example, ArcSoft — a leading developer of video editing, conversion and application sharing — devotes 50 percent of its development cycle to the optimization process. Why is optimization so important? It all boils down to performance.

“Today’s users don’t want to wait for effects to render or videos to load,” says Yanlong Sun, ArcSoft’s deputy general manager of video and home entertainment. “Tapping into the performance of processor architecture through fine-tuning and optimization means that users don’t need to wait.”

Optimization is also a top priority for Corel, one of the world’s top software companies. “Platform optimization is fundamental to our development,” explains Jan Piros, senior strategic product manager at Corel. “A significant amount of our effort goes into this because the gains made can be felt throughout many of our features. It’s an effort whose impact is multiplied throughout the software and is of great benefit to the user.”

With each new generation of processor, more cores are added to a single piece of silicon. To make use of all that processing power, software developers tune and optimize their code for multicore, multithreaded operations. This allows the software to utilize all available cores and threads on a system, helping boost performance in the process.

Getting the Numbers

Zeroing in on the exact cause of any particular latency — when hundreds of modules and millions of lines of code are involved — is like trying to find the proverbial needle in a haystack. Discovering bottlenecks and analyzing CPU and graphics workloads at the system, task and intra-frame levels can help save developers a significant amount of time during optimization and development of their application.

GPA provides developers with a suite of analysis tools for visualizing and optimizing applications efficiently from the system level all the way down to individual elements, such as draw calls within a single video frame. In addition, GPA lets developers experiment and actually see performance opportunities from optimizations without making source code changes with a standalone GPA Frame Analyzer tool.

Case Study: ArcSoft

ArcSoft — a leading developer of multimedia imaging technologies and applications for desktop and embedded platforms — creates software for smartphones, feature phones, tablets, PCs, smart TVs and cameras. They know that optimization is a crucial portion of their development cycle.

GPA was instrumental in allowing ArcSoft to parallelize the core engine used in both ShowBiz and MediaConverter. “Parallel tasking gives our users the ability to simultaneously output finished content to, say, YouTube and a handheld device format,” says Sun. “GPA gave us a frame-by-frame GPU analysis to help us improve our decode and encode pipelines. Multicore, multithreaded processor technology significantly reduces the conversion time. The user can now convert four or more files concurrently while leaving the processor free for other tasks.”

Case Study: Corel

Corel, one of the world’s top software companies with more than 100 million active users in more than 75 countries, develops innovative products that are easy to learn and use. Corel VideoStudio Pro X4, its flagship video-editing software, offers video makers of all skill levels a comprehensive set of video-editing tools, along with plug-ins for rock-steady video stabilization and broadcast-quality titles, animations and graphics.

In developing VideoStudio Pro X4, Corel engineers used GPA to achieve optimal load balancing between CPU and GPU media-processing pipelines. “The decode/encode functions allowed us to achieve very fast transcoding speed, as well as fast read-back between video and system memory,” says Chung-Tao Chu, director of development at Corel.

GPA helped Corel engineers identify bottlenecks and hot spots by analyzing modules related to a single feature or feature set instead of having to look at the entire VideoStudio Pro code base. Once identified, bottlenecks were eliminated, resulting in code optimized for performance and multicore scalability. “It lets us deliver a video editor with a smooth and responsive creative experience that really wasn’t possible with previous-generation chips,” says Piros.

Corel’s new MotionStudio 3D is an easy-to-use 3D and motion-graphics application that makes titles and graphics for video. “MotionStudio is very graphics-intensive,” says Chung-Tao. “Looking ahead to future releases, we can absolutely see where GPA will help optimize our very complex and computing-intensive graphics.”