PS4's GDDR5 RAM Latency Not A Problem But DDR3 May Have Better CAS Latency - Kursk Dev

gameseveryday · 2015-06-13T14:56:50Z

But it's really hard to tell as bandwidth also plays a role, says Jujubee's Michal Stepien.

Expand all

It's like each developer says something different. But I guess they know more than use since they actually developer and I am just a gamer

2hen Ps4 was announced I was told by some on this site that GDDR5 ram will help Ps4 more in the future.

Reply Agree 9 Disagree 4

SPAM Inappropriate

thejigisup3240d ago

It's helping more now and later def

Reply Agree 11 Disagree 4

SPAM Inappropriate

showtimefolks3240d ago

That's what I was told, by few of the smart tech guys on this site. Marc cerny got feedback from Developers

Either way both ps4 and Xbox one are awesome. So is wiiu too

I am not trying to hate on any console

Agree 12 Disagree 1

SPAM Inappropriate

jebabcock3239d ago (Edited 3239d ago )

Sounds like the developer was basically saying CAS latency for them has no noticeable impact due to the architecture of the system. He then gives a nod to the Bandwidth as a factor in performance between the two types of memory.

Digging a little deeper, we have to understand that most developers don't even have a way to effectively guage this. Its not quite so easy to measure specifically the impact of CAS latency vs other latencies or other aspects of particular types of memory. It is easy to guage the overall performance. I think this is what he was trying to imply by bringing up the bandwidth. There are alot of pieces that come together including the system architecture and APIs that impact this performance. The result that most devs see is a combination of all those factors. Often its shots in the dark and pure speculation on what are root causes for problems in software. It is also dependent on code decisions by the developers themselves. There is almost always many ways to accomplish any task and certain ways may require completely different toolsets.

Most developers like working with a single Peg. If a developer has a square peg, he wants all the systems to have a square hole. When it doesn't its the system's fault. Those making circular pegs, well.. you get the picture. I definitely promise you that despite having similar architectures, both the X1 and PS4 require completely different pegs.

my two cents anyway...

Agree 2 Disagree 0

SPAM Inappropriate

hay3239d ago (Edited 3239d ago )

CAS latency is how much cycles or nanoseconds memory waits for the data after requesting it.

Lets say modern DDR3 takes 11 cycles to retreive data, and has 1.06billion cycles per second per memory dice while PS4 GDDR5 clocks 2.7billion cycles, with estimated 15 cycles to retreive data, we can see that GDDR5 still has the edge in this context even when we exclude the fact that GDDR5 has additional processor for task organization.

It means that DDR3 HAS better CAS latency, but with the faster clock of GDDR5 it's true that it's doing more in less time. Which is a mixture of win and a loss.

Agree 0 Disagree 0

SPAM Inappropriate

iloveallgames3239d ago

@hay

Except the x1 DDR3 RAM is 11 cycles at 2.133 billion cycles per second.

The ps4 GDDR5 RAM is "200+" cycles at 2.7 billion cycles per second, atleast that's what Naughty Dog's Jason Gregory presented at Semana Informatica.

http://www.gamepur.com/news...

Agree 0 Disagree 2

SPAM Inappropriate

+ Show (1) more replyLast reply 3239d ago

3239d ago

Replies(2)

rainslacker3239d ago

Seems most say the same thing on this subject. It's the contextualization that changes between sites.

Reply Agree 1 Disagree 1

SPAM Inappropriate

Eonjay3240d ago (Edited 3240d ago )

This is really easy to resolve. There are lots of test showing cards that use DDR3 vs GDDR5 memory. DDR3 has never won. The latency is real but the reason why DDR3 never overcomes GDDR5 is because the latter has an overwhelming bandwidth advantage. Also, the latency itself is so small, it is practically irreverent.

Reply Agree 23 Disagree 14

SPAM Inappropriate

jhoward5853240d ago (Edited 3240d ago )

You are correct. I've seen quite of few of DDR3 vs GDDR5 benchmark test and GDDR5 always wins.

But I wonder would the result be the same once you add DX12 to the equation.

Also, I'd really like to see a new benchmark on the ps4 and x1 once Sony get their new API up and ruining on their machine.

And, If my memory serve me correctly I recall GDDR5 makes the CPU constantly have to wait for the memory to load data(bask n forth)

Reply Agree 8 Disagree 15

SPAM Inappropriate

donthate3240d ago

"You are correct. I've seen quite of few of DDR3 vs GDDR5 benchmark test and GDDR5 always wins."

I don't think I have seen a benchmark between DDR3 and GDDR5 in relation to CPU use?

That is where DDR3's advantage show, whereas all the comparison done is on the GPU which GDDR5 has an advantage.

Even then it also depends on how the developer is using their resources, but in the end these minor differences results in minor and mostly unnoticeable differences.

"And, If my memory serve me correctly I recall GDDR5 makes the CPU constantly have to wait for the memory to load data(bask n forth)"

That is the definition of (CAS) latency that there is a wait from the time requested to the time received and yes, GDDR5 do have a higher latency.

Agree 6 Disagree 11

SPAM Inappropriate

XBLSkull3239d ago

Problem is you haven't seen a benchmark test with GDDR5 vs DDR3+esRAM, so you can't really call a PC benchmark an equivalency to x1 vs ps4.

Agree 7 Disagree 12

SPAM Inappropriate

AndrewLB3240d ago

The thing is, the PS4 doesn't have the processing power to come even remotely close to utilizing that amount of bandwidth, especially since it's only dealing with 1080p resolution and usually post-process anti-aliasing. Graphics cards like the GTX 680 have similar memory bandwidth but have twice the processing power to actually use that bandwidth.

And FYI, part of the reason why Assassins Creed Unity was CPU bound on PS4 was because of it's higher memory latency. You didn't see the Xbone have problems with huge crowds of people.

Regardless, The PS4 is substantially more powerful than Xbone in every other way, so that will more than make up for that one deficiency.

@jhoward585
Those benchmarks are typically memory bandwidth specific, so yes... GDDR5 would perform better. A few years back a few video card manufacturers made two versions of a card utilizing the identical GPU but one had GDDR5 and the other GDDR3 and the only advantage the GDDR5 versions had while running at identical clock speeds were at either higher than 1080p resolution or when hardware anti-aliasing was being processed.

And what API are you talking about? If it's Vulcan you speak of, then don't expect much. PS4's API is lower level than Vulcan or Mantle.

Reply Agree 8 Disagree 23

SPAM Inappropriate

jhoward5853240d ago (Edited 3240d ago )

And what API are you talking about? If it's Vulcan you speak of, then don't expect much. PS4's API is lower level than Vulcan or Mantle.

That's is the point....Both Vulcan or Mantle are multithreaded dependent on how things get process through-out the hardware components.

The PS4's API lower level can do some multithreading, but its no where near what Vulcan or Mantle and DX12 can do.

All three (Vulcan or Mantle and DX12) are heavily depended on CPU to send draw calls to the GPU for processing graphics on screen.

That being said, this has lead me to believe, if the PS4 is having this CPU hip-cup now the way it is, then what will happen when Sony uses something similar to DX12 on the PS4?

Remember DX12 is an API that uses a lot of CPU resource to run properly/efficiency.

Agree 1 Disagree 12

SPAM Inappropriate

Tempest3173239d ago

Just to point out...GDDR3 and DDR3 are two different things...the last gpu that was made with ddr was a very long time ago

Agree 0 Disagree 0

SPAM Inappropriate

rainslacker3239d ago (Edited 3239d ago )

Latency isn't that big of a deal when the developer has access to the memory controller. Outside of system controlled memory, higher bandwidth would always be preferable.

@jhoward

Yes the results would be the same. Hardware latency and bandwidth aren't affected by an API, they are hard wired into the hardware itself. The data management may be handled differently on different API's, but for console programming that kind of thing is typically managed by the game engine, or hard coded pointers to access the data. Bandwidth won't be affected either way, only the way that bandwidth is handled once it gets where it's going.

Reply Agree 1 Disagree 1

SPAM Inappropriate

jhoward5853239d ago (Edited 3239d ago )

The data management may be handled differently on different API's, but for console programming that kind of thing is typically managed by the game engine, or hard coded pointers to access the data. Bandwidth won't be affected either way, only the way that bandwidth is handled once it gets where it's going.

I pretty much know that the game engine plays a big part on How things are managed throughout the hardware.

However, I think the cause of this CAS Latency is due to the GDDR5's prefetch buffer.

The GDDR5 ram its really just a basic ram that has been sped up to achieve high bandwidth.

I could be wrong about that...

Technically, from past experience I've had with many hardware, I just know that when a hardware uses(or has) a cheat mechanism normally that hardware would have performance issue, here and there. some are big, and some are small.

And yes, I do consider the GDDR5 ram to be a cheat mechanism of some sort.

Agree 0 Disagree 1

SPAM Inappropriate

rainslacker3239d ago

Maybe I should expand. I actually have said the same thin about latency not being that much of an issue when the dev has access to the memory controller on several occasions, and I think it may be too broad of a statement to use in the way you're talking about.

To understand what I'm talking about, it's first important to know what latency is in regards to memory(for those that don't know, not directed at you).

Unlike network latency, which is the total time it takes for data to travel from one point to another, memory latency is the measurement of the delay between when the memory controller tells a memory module that it needs data, and the point at which that data is available on it's way out of the chip. Latency, for all intents and purposes most of that time is used by the memory controller to look up the memory address of data.

Since in the typical PC setup the OS does not allow for the user(programs) to determine where to put data within memory, the memory controller will handle that data for the user. This is why low latency is important for general purpose computing, or server applications where near instant speed to data is needed for the millions of requests that memory has per second for data.

However, in a console set up, parts of memory are allowed to have the user program the memory controller, so data can be stored where the dev determines it should fit. In some cases, this management may be given to the memory controller, and often is, but in other cases, for things that require quicker access, that data can be assigned to the memory directly from the program. In some cases, this can be done by an advanced game engine, and in other cases it can be done through the low level API's. The areas that get assigned are usually protected to prevent the memmory controller from writing to these areas when that particular data is not required to be resident.

Since the developer is assigning the data within memory, the memory controller can be bypassed, and the user can go directly to the memory module with no lookup needed, thus reducing the overhead of the lookup. The time it takes for that data to go from the module to the output is still the same, but in modern memory architectures that is pretty negligible.

So basically, devs can remove the largest part of latency, which is the lookup function of the memory controller, and simply access memory directly to get the data it needs.

Agree 0 Disagree 1

SPAM Inappropriate

rainslacker3239d ago

To be fair, I don't actually know what the data speed is from the point at which a memory module gets a request for data and then gets it to the pins itself, but since these system busses are pretty fast, I'd imagine that it's going to be roughly the same between them, and nowhere near as different as the latency of the standard chips themselves.

Anyhow, while I know it's the subject of discussion, latency and bandwidth are not the only things to consider when looking at how memory performs. They are the two most measurable factors for benchmarking, which is why they are the most talked about topics.

For instance, one of the early leaks of when Xbox was still Durango talked about virtual addressing of the system memory. This would allow sets of data to occupy non-contiguous pages of memory, which could then be addressed without penalty as if it were from a single page. This reduces latency regardless of control of the memory controller, since the memory controller effectively addresses several things at once, even though the lookup latency still exists, the output is actually increased due to multiple things being paged at once.

The ability to read and write at the same time compared to if it can't be done at the same time is another big factor in memory speed. I believe both systems allow for asynchronous read/write functions. With GDDR5, the actual write functions would be faster due to the higher bandwidth, but the read functions may slow down due to the memory controller. On the other hand, with a higher bandwidth, it's possible read and write could still be faster since the data can be moved faster.

GDDR5 is basic DDR3 SDRAM at the core memory module, but it retrieves the data differently. It allows for multiple(2 I think) memory pages to be opened at once, which is not common on DDR3, although I assume it is on the X1 due to the rumor I listed above. The pre-fetch itself is just to reduce the latency inherent in using a memory controller, and is appropriate for parallel data such as graphics because the next set of data will be known. GDDR5's biggest difference is that it can be paired with any number of 32-bit I/O registers to achieve higher bandwidth performance, whereas DDR3 is typically paired with a single or dual 64 bit controller operated by the CPU bus.

Given the unique nature of the X1's setup, and the fact that I think not all the details about how it's set up have been released, I can't say exactly whether DDR3 can match the performance of GDDR5 in things like bandwidth. From what I understand, the setup does mitigate some of the shortcomings of the standard PC DDR3 setup so it can handle graphics operations better, but the bandwidth itself isn't going to increase because of it.

I wouldn't consider GDDR5 to be a cheat mechanism in this case. Given that the memory controller is programmable by the dev, the higher bandwidth is preferred for performance sake, and at the end of the day, performance is what the devs want most. They don't care if they have to cheat to get it, because in programming such things, there is no real cheating. If it can be done on the hardware without hacking things that may cause failure, it's a perfectly acceptable practice to perform such operations.

Other than that, data management itself is not a product of the memory or the controller. It's handled at a system/API level within the processors themselves. I think you are trying to refer to I/O in that first comment, but I was not referring to that in my original comment.

Agree 1 Disagree 1

SPAM Inappropriate

rainslacker3239d ago

about pointers. Pointers are a way to address memory within a program, however, pointers will still use a memory controller. Basically it tells the controller that it needs the data at a certain address as opposed to telling it that it needs a particular set of data that is registered to a particular variable or whatver. This removes the need for memory table lookup, and it is faster. In a game, it's much the same way, except that the program can "bypass" the controller and get that data directly from memory.

Also, I'm not positive, but I'd imagine the pre-fetch buffer comes after the point where latency would be introduced. It's basically just a cache for the next set of data that the user is likely to request, which can be pretty easy to determine in a graphics scenario which is parallel. This however could also be programmed by the user to allow the pre-fetch to get something before it's needed to pretty much eliminate latency for that set of data, since the pre-fetch would already be poised to send the data along. Since I don't deal with pre-fetches I'll have to look that up to verify, but it makes the most sense from a logic standpoint.

Agree 1 Disagree 1

SPAM Inappropriate

jhoward5853239d ago (Edited 3239d ago )

@rainslacker

I wouldn't consider GDDR5 to be a cheat mechanism in this case.

Well, I'm not trying to downplay GDDR5 because its a very Ram.

However, I still think GDDR5 ram does uses some cheat mechanism of some sort because the true nature of GDDR5(in its original design)wasn't build with high bandwidth in mind.

They've simply added a feature mechanism that wasn't fully intended for DDR3.

The RAM to look out for is HBM because it was built from the ground up to achieves higher bandwidth with less power than DDR4 and GDDR5.

Bottomline: the GDDR5 ram works wonders when it used in the right way.

Again, GDDR5 is still a good ram.

https://en.wikipedia.org/wi...

Agree 0 Disagree 0

SPAM Inappropriate

rainslacker3239d ago

I/O transfer is I/O transfer. Only real difference is the size of the data paths, and that kind of thing isn't really set by the memory hardware itself, but the controller for that memory.

The amount of data that can be transferred could be enhanced or limited based on design, but if the memory can handle the kind of throughput it does then it doesn't really matter.

DDR3 was built to be scalable to new technologies without the need to implement new core designs, which is why single channel RAM can be stuck into dual channel RAM sockets, because the memory controller decides how it will run. In fact, most of the dual channel stuff doesn't even really denote that the memory is any better, it's more for packaging reasons as each stick will be it's own device, but the controller will see it as one, address it as one, but there are actually two memory modules on the stick itself. One of the core features of DDR3 design was less reliance on individual memory devices(sticks of RAM) to handle the control of it's data, giving more control to the SouthBridge controller to handle such things. This eliminated a lot of the problems with prior RAM modules which would see hugely varying results from system to system, as cheaping out on the memory controller for each chip was pretty common.

Anyhow, to put it simply(which I rarely do), the RAM will handle data the way the memory controller tells it to handle it. The RAM itself is just a storage mechanism which interfaces with the I/O mechanisms of the RAM itself. It's limited by physics on what it can achieve, but that doesn't mean that it can't achieve more than the original designed intent was.

All the other stuff, like pre-fetch or internal buffers, are just extensions of the original design, much like the 486 processor eventually got a L2 cache, or the 386 had floating point co-processors offered as an option. They aren't the core of the device, just extras to make it more efficicient or faster somehow from the viewpoint of the CPU/GPU. The GPU and CPU in a typical PC has no idea how that data is stored on a RAM chip, so whether it comes from an extension, cache, or whatever, it's all the same to the processor.

Going back to the original topic though, this is changed due to the ability for the programmer to control memory directly, so those extensions can make a huge difference if they are utilized for specific purposes which may reside outside the typical nature of how they are actually designed. DX12 isn't going to allow that level of access though. It would cause nothing but problems for a regular PC, and consoles already have low level API's to handle such things. On the X1, that API could be part of the Xbox version of DX12, but I don't know the specifics there.

I do agree with you HBM is beastly, and will reduce the 2nd biggest bottleneck facing today's computers. The first being hard drive access, which even SSD doesn't solve as of yet. 6 Raid-0 SSD drives though means never waiting for stuff to happen though....particularly with a caching controller card. :)

Agree 0 Disagree 0

SPAM Inappropriate

jhoward5853239d ago (Edited 3239d ago )

DDR3 was built to be scalable to new technologies without the need to implement new core designs.

Well, that pretty much go for just about every hardware out on the market.scalable, or not.
Nothing is ever impossible. With enough time, any manufacture can cheat their way in to a system hardware if they so feels in order to save money.

One thing I do know about hardware is, it is define(in most cases) by the level(or amount) of electrical current it can sends back n forth through out the transistors on a die. No matter what added technology were implemented to DDR3's deign, its still won't change the amount(level) electrical current of electrons that are sent through path ways to past through every single logic gate. Once the physical hardware has already been built there no way it can change unless the manufacture make big changes to the hardware original deign.

Not to mention, most hardware design on a die must exceed error cheeking by 99% or it will not work with other existing hardware. This is something manufacture do to keep everything consistence processing wise.

In the case of GDDR5, everything single tweaking that were done to its original design were stretched as far as it can go based on its on its electrical current it can send through out its entire circuity.

Once the physical hardware has already been designed it won't ever give one the level of performance on the same scale of a hardware that was designed from the ground up.

Agree 0 Disagree 0

SPAM Inappropriate

+ Show (4) more repliesLast reply 3239d ago

Sir_Simba3240d ago

This GOD-Damn site!!

Reply Agree 17 Disagree 3

SPAM Inappropriate

IGiveHugs2NakedWomen3240d ago

Is double talk news worthy now?

Reply Agree 3 Disagree 4

SPAM Inappropriate

Yo Mama3240d ago (Edited 3240d ago )

Gaming bolt is becoming just as much of a laughing stock site as Dualshockers.

Reply Agree 8 Disagree 15

SPAM Inappropriate

Rimeskeem3240d ago

Haven't seen you in a while

Reply Agree 4 Disagree 1

SPAM Inappropriate

thejigisup3240d ago

That one bubble is easy to miss sometimes

Agree 5 Disagree 1

SPAM Inappropriate

DanielGearSolid3240d ago

Dualshockers has actually improved. Actually greatly improved IMO

Reply Agree 6 Disagree 3

SPAM Inappropriate

raggy-rocket3239d ago

Pointing out dualshockers is biased is like me pointing out the Xbox magazine is biased. It's called dualshockers, I think it's allowed to lean in a certain direction.

Reply Agree 1 Disagree 0

SPAM Inappropriate

kraenk123240d ago (Edited 3240d ago )

How is this f'in site still on here?!

@yo mama

What is your problem with Dualshockers?!

Reply Agree 7 Disagree 5

SPAM Inappropriate

OB1Biker3240d ago

Man its ridiculous.
Its E3 time and Gaminbot are still at poking left and right for clicks and fanboy wars with the same crap. I think more and more people are aware of it fortunately

Reply Agree 4 Disagree 0

SPAM Inappropriate

rainslacker3239d ago

Don't think they got much left since N4G mods are cracking down on duplicate posts, rumor articles with no sources, and giving credit to original articles instead of the ones that rip off from other sites. About the only thing left is these click bait question articles.

Now N4G just needs to crack down on misrepresenting quotes for clicks.

GB doesn't really have the clout to get any decent news like the other big sites, some of which are just as loathed around here. I dislike Kotaku and Polygon, but they have the presence to actually get good news during big events, which is about the only time when they provide decent news.

Agree 4 Disagree 0

SPAM Inappropriate

Tonykid3240d ago

The thing is People(Developers) are given so much Ram that they don't know properly use all of it. The PS3 had only 256 mb for RAM yet it was possible to make games like the last of us. Developers need to learn how to correctly utilize How much/What kind of RAM they have.

Reply Agree 1 Disagree 4

SPAM Inappropriate

Unarmed_Civilian3240d ago

Ps3 has 512MB divided to system and video ram.
Apparently at the start of ps3 life 110MB was reserved for OS, now it uses only 40 mb for OS.

Reply Agree 2 Disagree 1

SPAM Inappropriate

Tonykid3239d ago

yes 512 together but only 256 for developing

Agree 2 Disagree 4

SPAM Inappropriate

Dark_king3239d ago

No Unarmed_Civilian is right it had split memory 256 MB GDDR3 RAM clocked at 650 MHz And and 256mb of XDR at 3.2 ghz.

Agree 1 Disagree 0

SPAM Inappropriate

3240d ago

Replies(1)

Lennoxb633240d ago

The architecture of the X1 was built for low latency because it was originally going to be cloud powered. So giving it really beefy hardware would be a mistake.

Reply Agree 1 Disagree 15

SPAM Inappropriate

Unarmed_Civilian3240d ago (Edited 3240d ago )

It has ddr3 and esram is becose the price of gddr5 memory was high, we are talking about 2012 when the both consoles hardware needed to be close to completion.

Ps4 could have had also DDr3 and EDram(like the 360 has)but Sony took the risk and went with Gddr5 instead but as Mark cerny sed he wanted to make the ps4 easy to develop.

Reply Agree 3 Disagree 2

SPAM Inappropriate

Spotie3240d ago

Cloud compute comes with its own inherent additional latency, meaning it would negate that slight advantage.

Reply Agree 9 Disagree 2

SPAM Inappropriate