<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>Comments on: Nvidia&#8217;s discloses its DP performance limitations</title>
	<atom:link href="http://www.vrworld.com/2009/02/11/nvidias-discloses-its-dp-performance-limitations/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.vrworld.com/2009/02/11/nvidias-discloses-its-dp-performance-limitations/</link>
	<description></description>
	<lastBuildDate>Thu, 09 Apr 2015 17:57:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.1.1</generator>
	<item>
		<title>By: Gipsel</title>
		<link>http://www.vrworld.com/2009/02/11/nvidias-discloses-its-dp-performance-limitations/#comment-10339</link>
		<dc:creator><![CDATA[Gipsel]]></dc:creator>
		<pubDate>Thu, 12 Feb 2009 17:16:04 +0000</pubDate>
		<guid isPermaLink="false">http://theovalich.wordpress.com/?p=1058#comment-10339</guid>
		<description><![CDATA[You are completely right that nvidias GT200 double performance is lacking compared to ATI current offerings. An already quite old HD3850 has about the same double precision performance than a GTX280. A HD4870 is almost three times faster!

But depending on the problem CPUs aren&#039;t as bad as you indicate. It is almost the same as with GPUs, some algorithms fit the architecture and some not. Generally speaking the probability that a given problem tanks on a GPU (obtained performance &lt;10% of peak) is a lot higher than on a CPU.

Just as an example, I have implemented an embarrassingly parallel (perfectly suited to GPUs) algorithm using double precision for CPUs as well as for ATI GPUs. It runs with about 40% (Core2, K10) to 60% (Core i7, HT helps a lot here) of theoretical peak performance on CPUs and 62% of theoretical peak on ATI GPUs (150GFlops sustained on a HD4870, peak would be 240GFlops).
But because it is not a 50:50 ratio of multiplication and additions (or MADs for ATI) it is actually quite close to the maximal obtainable performance for this algorithm. There are virtually no stalls on ATI GPUs, it really executes one instruction per unit and clock cycle more than 95% of the time. The Core i7 is actually not that far behind. The hyperthreading like execution of the parallel tasks on the GPU really helps to avoid stalls and boosts the utilization (same is true for the i7) in such cases.

And if you wonder what algorithm I am talking about, it&#039;s Milkyway@home.]]></description>
		<content:encoded><![CDATA[<p>You are completely right that nvidias GT200 double performance is lacking compared to ATI current offerings. An already quite old HD3850 has about the same double precision performance than a GTX280. A HD4870 is almost three times faster!</p>
<p>But depending on the problem CPUs aren&#8217;t as bad as you indicate. It is almost the same as with GPUs, some algorithms fit the architecture and some not. Generally speaking the probability that a given problem tanks on a GPU (obtained performance &lt;10% of peak) is a lot higher than on a CPU.</p>
<p>Just as an example, I have implemented an embarrassingly parallel (perfectly suited to GPUs) algorithm using double precision for CPUs as well as for ATI GPUs. It runs with about 40% (Core2, K10) to 60% (Core i7, HT helps a lot here) of theoretical peak performance on CPUs and 62% of theoretical peak on ATI GPUs (150GFlops sustained on a HD4870, peak would be 240GFlops).<br />
But because it is not a 50:50 ratio of multiplication and additions (or MADs for ATI) it is actually quite close to the maximal obtainable performance for this algorithm. There are virtually no stalls on ATI GPUs, it really executes one instruction per unit and clock cycle more than 95% of the time. The Core i7 is actually not that far behind. The hyperthreading like execution of the parallel tasks on the GPU really helps to avoid stalls and boosts the utilization (same is true for the i7) in such cases.</p>
<p>And if you wonder what algorithm I am talking about, it&#8217;s Milkyway@home.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Content Delivery Network via Amazon Web Services: CloudFront: cdn.vrworld.com

 Served from: www.vrworld.com @ 2015-04-10 14:48:28 by W3 Total Cache -->