Killing pixels now and then...

Athonline

Trainee
CRank: 5Score: 14870

Let's talk: Technical.

Everytime I read an article about next-gen hardware, the computer scientist inside me vomits. I hate it how all these “video games journalists”, usually under-educated and with little-to-none research, speak and compare hardware and software without having a single clue what they are saying.
This blog post will be technical, maybe not as much as a Systems Architecture 101 module at University, but will try to keep it technical. If you want one of those “It has 8Gbs, thus can store many things at once” articles, please stop reading. If you don’t like it when someone “bash” your favourite console, please stop reading. If you want to learn a few things about how hardware really works and what really matters in it, continue…

Architecture, what is it?
First a quick introduction to X86: Established by Intel in 1978 as a 16-bit instruction set architecture and “re-invented” (again by Intel) in 85, with 32-bit support, X86 is the dominant system architecture. The architecture went under another major change in 2003, with AMD releasing “X86-64”, or as AMD calls it: AMD64 architecture. AMD64 is also known in Windows world as: X64 (other OSes uses different names).
The idea behind AMD64 is simple: 32-bit CPUs started to limit the virtual address space –will call it, incorrectly, for non-technical people convenience: RAM memory. However by creating a new architecture, like Intel was planning to do would limit legacy code support. AMD created a X86 processor with extra registries (8 new registries) and extended the existing general-purpose, arithmetic, logical operation, memory-to-registry and registry-to-memory registries from 32-Bit to 64-Bit. This ensured 16/32-bit code support, but also the ability to run code in 64-bit using “long mode”. Intel eventually implemented AMD’s solution and X86-64 became the “golden standard”.
Over the years, we saw changes such as the memory controller to be integrated into the CPU and most recently even the GPU. AMD innovated with their “APUs” (Accelerated Processing Units), by combining CPU and GPU into a single die, with functions such as GP-GPU computing. This Heterogeneous System Architecture (HSA) AMD APUs’ got, allows the GPU to access low level hardware in the same way as the CPU does. The most important HSA feature for the next generation is the Unified Address Space, which allows the CPU and GPU to access the same pool. I will get back to this later.
As both PS4 and XBOne uses an AMD’s APU based on Jaguar microarchitecture, they implement HSA architecture, while retaining an X86-64 (or X86 in short) architecture for the CPU –or in more details: an X86 instruction set for CPU.
Due to the HSA architecture and having both the CPU and GPU on the same die, bottlenecks of the more “traditional” X86 architecture –called von Neumann architecture, are decreased, if not eliminated. I do not personally believe we can directly compare a PC with any of the two consoles in terms of “raw power” in papers at least. Having an instant communication with the CPU and GPU, allowing the GPU to access all the “goodies” the CPU can, is going to make game engine developers feel like kids in a candy store. In the PC world, nVidia tries to push GP-GPU computing with their CUDA framework, while Intel –who doesn’t want to CPUs to become obsolete- tries to enforce the CPU, limiting the GPU to graphics/ physics processing. AMD APUs, simply offer best of both worlds. Sure, there are disadvantages, especially in the modularity of the system, but still for that is worth: Amazing job AMD!

Memory, the marketing hype and the real advantages
“It has 8GBs!!!! GDDR5 RAM!!!” The line that dominated PS4 articles after the reveal of the console and numerous sites, desperate for some traffic started talking how great it or how it will compare with XBOne’s DDR3 RAM, then articles how PS4 uses 3.5GBs RAM for OS or XBOne’s eSRAM will provide similar spawned like Zerg hatchlings all over the net. On top of that, you get PC gamers saying how their PCs got 8GBs for some years now.
Let’s start with the capacity first. As I discussed above, X86-64 supports more than 4GBs RAM and indeed PCs have 8GBs as their “standard” for some years now –my current laptop has 16GBs! However I am running Virtual Machines, remote connectors and other memory-savvy tasks. Is there really a benefit for gaming from 8GBs or more? Yes and no.
PCs got two types of memory: your well-known RAM and vRAM. RAM is the temporary memory pool for the CPU –often called: System memory- and vRAM is for the GPU. vRAM is much simpler than RAM, as it can only hold: shader programs, vertex buffers, index buffers and textures (the largest files out of the four). The only thing RAM can’t do is upload data directly into the GPU, aka serve as a memory pool for the GPU. This is why you load graphics-related assets to the RAM and then COPY them to the vRAM. Now during this copying, you get one of the biggest performance bottlenecks in PC gaming. Not only copying takes time, but also temporarily –assuming the developer knows how to proper garbage cleaning of unused resources- the same data are in both the RAM and vRAM. A bigger RAM pool, means you can put data the GPU currently needs, but also the ones it may need or data accessed frequently by the GPU. This is done as GPU memory tends to be much smaller ~1GB. Currently 1-2GBs of vRAM are great! As the GPU pulls and process data fast enough, reducing the need to store data for a long period of time (some ms). A rule of thump is: if you play in extreme resolutions (like having 3-4 monitors at 1080p each), you need bigger textures, thus more vRAM. Moreover more vRAM helps devs to be more “loose”.

A quick, muuuch simplified and a tad inaccurate, “data flow” how a texture/shade is processed:
1) Game asks the shading framework (e.g. OpenGL).
2) The framework requests it from the drivers of the framework. The drivers sit in OS level.
3) OS asks the CPU, CPU responds instantly and asks HDD.
4) HDD gets the texture, sends it to the CPU, the CPU allocates it to RAM.
5) GPU alerts CPU that wants the texture and CPU copies it from RAM to vRAM.
6) GPU retrieves the data from vRAM and processes them.
7) Processed data are outputted to the user, alerting the game that data were outputted.
8) Game asks CPU to delete RAM data.

Now –hopefully- can see why games such The Last of Us had amazing graphics, while PS3 only had 256MBs of RAM… and why Bethesda are terrible devs, when it comes to porting a game from one console to another… ;-)

In a PC, 8GBs are “needed” for Full HD, as in a PC OS, frameworks, etc etc run on the background. In a console, things such as the shading framework are embedded to the OS. Even if PS4 OS uses 3,5GBs, the remaining are more than plenty for games! As long as the devs don’t rush (much) with their code.
How about the speed of RAM? I can recall a few years ago, I had to make a choice: DDR2 or DDR3 for my desktop. I went with the first one. Not only it was cheaper, but back then the performance was identically –if not, at some instances better on the DDR2. Why? DDR2 back then had a lower CAS latency. CAS is, in simple words, the delay between the moment the memory controller asks RAM to get some data and the data are available to be retrieved. Nowadays CAS in DDR3 is as low –if not lower- than DDR2 CAS. Still in real-world: 1-2 FPS gain is not a deal breaker, is it?
What about GDDR5? Compare to DDR3? GDDR5 is actually DDR3, just with an 8-bit buffer! The G in GDDR5 stands for Graphics, as this type of memory is used exclusively for graphics cards.
Wait, exclusively for graphic cards?! Then how is PS4 using it? PS4 as I said above, uses an APU with HSA. This allows to bypass the biggest bottleneck in computer graphics: copying the textures from RAM to vRAM. PS4, similar to XBOne will use ONE memory pool for both system and video. The idea behind HAS and how it handles memory: Why copy the data, which maybe MBs –if not GBs- big, when you can just copy a pointer that shows where the data is? That simple approach, which unfortunately can’t be implemented in PCs, due to Intel acting like a child and enforcing an architecture they shamelessly copied from AMD. This simple key point, in the way how XBOne and PS4 works, give them a huge performance boost.
Hey! All that sounds cool, but you didn’t answer: GDDR5, is it worth it? Yes it does. The 8-bit buffer, alongside with the HSA approach to memory, in the hands of proper developers will do marvels!
What about XBOne? XBOne performance in this area is harder to guess. The eDRAM approach could “theoretically” give similar or greater performance than the PS4. In practise however, it all depends on XBOne’s OS - the way it will handle data in eDRAM and the actual devs, if they are using or not DirectX 11.2. Personally I believe XBOne is a little bit behind from PS4 in this area, at least for multi-plats.

Programming language, matters more than you know
The X86 architecture was heavily marketed by both Sony and Microsoft. However media went even further, making the whole process of porting to be as simple as “pressing a Save As” button. Is it? No. Windows, Linux and Mac computers now all uses X86 processors, but do you see the same performance and instant porting of apps from one platform to another? No. Why?
Each of those OSes handles hardware resources, slightly differently, but also allows access to programmers to different OS-level resources, different compilers and sometimes you get to use different Integrated Developer Environments (IDEs) –a software used by developers to write, compile, test code.
For example, Microsoft since 95, using Visual Studio IDE tried to lure developers into Windows application development. With Visual Studio, MS released their own compilers (Visual C/ C++), with direct access to Windows’ core resources, locking developers into Windows platform -a really smart-move in my opinion. Long story short, the easiness of MS’ tools locked down applications into Windows. In 2000 Microsoft tried to introduce C#, a programming language similar to C++ with a Java skin on-top of it. Unfortunately C# didn’t manage to attract much attention until 2004-05, thanks to XNA Framework built on-top of it. C# alongside with the effective XNA framework made games development for 360 attractive, multiple indies started to use it to develop and port games to 360. In fact studios went further, sharing some of the Visual C# libraries between their PC and 360 versions of their games. Unfortunately MS discontinued XNA, in favour of another to-be-announced framework… Moreover their primary “graphics rendering” framework is DirectX 11 (with exclusive 11.2 support).
Sony on the other hand, as Cell was co-developed with IBM, allowed you to use their own (from what I heard: awful) SDK or IBM’s Cell SDK. The second one, as far as I am aware was using OpenCl. OpenCL is an open framework developed originally by Apple, for Parallel Programming. This lack of familiarity with the SDKs and frameworks, alongside with other factors, made the first two years a nightmare for devs and gave PS owners lots of crappy ports. PS4 however promises to change all that; Sony shipped their SDKs early and best of all: Uses OpenGL 4.0 AND DirectX 11 (but not 11.2) within their “Playstation Shader Lanuage”. While Sony created again a new shader languae for their graphics rendering, it is built on-top of the two most used Shader languages. OpenGL is essentially the bread and butter of any Computer Science student with a 3D-graphics or games development programming module and DirectX is just one of those few Microsoft industrial standards that are actually standards. This versallity in PS4 programming will defienetly attract indies. Especially since indies like using OpenSource, multi-plat, standards like OpenGL 4.0.
What about performance? Here XBOne is the winner. DirectX 11.2 allows devs who actually implement it, to achieve amazing results, using much less resources. For more info have a look at this:
http://www.youtube.com/watc...
XBOne victory is a small one; in multiplat games, where devs will be reusing code and it is less likely invest into implementing 11.2, will find both consoles equal. Moreover between DirectX 11.0/11.1 and OpenGL4.0, OpenGL performs a tad better.

Final thoughts
The purpose of this article was to throw some light how the new consoles actually work and why we shouldn’t compare them to PCs. Unfortunately we have no real idea of their performances, especially when devs such as Naughty Dogs create optimised games for them. At the moment their sheer power, even in the hands of inexperience devs will produce almost equal to a high-end PC results.
As for which console is the best, again we can’t judge from pure hardware comparisons. Their performance differences will mainly be judged by how much time devs will dedicated to each console. PS4 out of the box, has a small advantage in multiplat; but even that advantage won’t be visible until much later.
I will close this blog entry recommending people to get whatever their friends will do or has franchises they prefer. I personally just got a new high-end laptop, as I personally prefer PC gaming for multiplats and most likely will get an XBOne for Halo (most of my friends play Halo) this year and a PS4 whenever The Order 1986 comes out.
If you have any technical questions or want me to clarify something please just ask!  Apologies for the poor usage of the English language, as it is only my second language.

Gimmemorebubblez3897d ago

I just want to check for my blog, no currently released PC has hUMA?

Athonline3897d ago

Hello, the new APUs from AMD got a limited implementation of HUMA. It isn't exactly the same as the console one, but nether the less, great performance boost. Check out tomshardware. They got a nice article about it.

MusicComposer3897d ago

Hmmm you didn't touch on the difference in GPU core count at all, which I'd recommend writing about. According to the technical documents which turned out to be factual with what has been announced, the PS4 has 1,152 cores and the X1 only has 768 cores. That's 50% more raw power to say it another way and easily one of the biggest factors for game performance.

That provides a huge advantage for the PS4 regardless of any of these other technical aspects you mentioned. Definitely more than the memory difference or coding language the games are written in.

Athonline3897d ago

@MusicComposer

Counting cores, etc in Computer Science to determine performance is the biggest mistake you can do and one of the first things that they clarify in any Data Structures and Algorithms University module. Instead we use notations, such as the Big-Oh, etc.

Examples that prove the above:
-When Quad cores CPUs came out, the actually performed worse than Dual Core CPUs.
-SLI/Crossfire scalability isn't ideal (i.e. a x2 multiply) and it is really drivers-games depended.

Bellow I wrote code how you can load data into 8 Shaders cores in parallel (at the same time):

<VEC8_diffuseShader>:&# 160;
VEC8_sample vec_r0,  ;vec_v4, t0, vec_s0&# 160;
VEC8_mul  vec_r3,&# 160;vec_v0, cb0[0] 
VEC8_madd vec_r3, v ec_v1, cb0[1], vec_r3  
VEC8_madd vec_r3, v ec_v2, cb0[2], vec_r3  
VEC8_clmp vec_r3, v ec_r3, l(0.0), l(1.0)  
VEC8_mul  vec_o0,&# 160;vec_r0, vec_r3 
VEC8_mul  vec_o1,&# 160;vec_r1, vec_r3 
VEC8_mul  vec_o2,&# 160;vec_r2, vec_r3 
VEC8_mov  vec_o3,&# 160;l(1.0)

From the above low-level code you can see that you have to manually load each texture for processing into a shader.

In a similar way in Java, we can use Threads. Threading is Java's way of providing a "pseudo" approach to Parallel programming. We use:
new Thread(this).start();
To start a Thread and then use code such as:
SwingUtilities.InvokeLater(){ ... }
To synchronise GUI Threads.

As you can see when we (devs) have to use parallel programming, especially in drawing things at a screen there are multiple steps and things we have to "force" our code to do, a bit more manual.

As a result of the above: I strongly believe devs, will just port games from one console to another, without really optimising a game to low-level, to "actively" use that extra power.

If you want, bring me a 4 years old PC (Pentium 4) and a modern Core i7 machine. I can show you, using a simple password generation program that the first machine can actually outperform the later, by taking into account code optimisation and using C/C++ over for example Matlab. Do not underestimate programming languages/techniques.

As I said in my conclusion: In the short-run, I doubt there will be any difference in terms of performance within the first years. In an ideal world, devs won't be lazy and will do porting by optimising games into each machine. However my experience in the IT industry and talking with devs (both in the gaming and other industries): Devs are the laziest people. We use DRY (Donot Repeat Yourself) as our manifesto, trying to copy-paste code from one platform to another or reference code from one library to another, just to avoid re-writing it.

I will update the article tomorrow, with a much better written explanation of what I said above and try to explain how Shading Cores work in the GPUs.

wishingW3L3897d ago (Edited 3897d ago )

a bit off-topic for an answer to your question but huma is irrelevant for PC gaming anyway because APUs come with really crappy GPUs and CPUs. They are nothing like the PS4's or XB1's which have big GPUs and a dual CPU cluster for a grand total of 8 cores instead of only 4 on the PC or a GPU that can hardly compete with an intel HD. Not to mention that they share crap DDR3 memory instead of GDDR5 or DDR3 with esram to boost graphic performance.

A couple of APUs with huma are mentioned here: http://www.tomshardware.com...

Gimmemorebubblez3897d ago

A couple of PC AMD Kaveri???? APU's are coming out next year for commercial PC's.

LostDjinn3897d ago

I thought the Xbone ran two RAM pools. One being DDR3 and the other being eSRAM.

You also claim DX11.2 provides tiled resources as a win for MS when PRT (the hardware implementation thereof from AMD) was created using an OpenGL extension. I'm a little confused. How is it a win for MS when the OpenGL API is the most efficient (by design) way to use the hardware? TR is little more than MS attempting to "borrow" what already exists from an open source API in order to compete.

Athonline3897d ago (Edited 3897d ago )

The eSRAM isn't exactly a memory pool. Think of it like an mSATA drive and an HDD. The eSRAM sits between the memory pool and the APU, allowing quickly access to frequently written. For the user it is like the eSRAM is never there. That is why I said it will depend how effective XBOne's OS will be at adding resources to the eSRAM or whanever devs will actually get the low level access to do that allocation themselves.

The AMD hardware is optimised for both DX and OpenGL. It is one of the reasons Sony included both DX and OpenGL within their shading language.

In OpenGL there are two extensions that provide TR:
-Megatexture, used in Rage.
-AMD's SparseTexture. Which can be found here: http://www.opengl.org/regis...

Indeed the idea behind TR was copied from the above two extensions. However in DX11.2, this feature is provided natively within the framework -as far as I know, TR is part of the OpenGl 5.0 features list for native support.

Devs working on a multiplat is more likely to develop in DX, as they will be able to reuse code between PS4 and XBOne. In that case they get the option to use 11.2 TR, while on PS4 they will have to convert their code to OpenGL...

I personally looking at both frameworks, I found the Window version of DX11.2 a bit better in its' TR implementation than OpenGL 4.2, with extra libraries to use TR and more likely to attract people to use it.

Again as I said above: We can only speak theoretical at the moment for the performance of both machines. In terms of programming, it will all lie down whenever devs will use that X or Y tool or not.

Just bear in mind that both consoles offer a modified version of DX and/or OpenGL, with their own APIs and features' sets integrated to them.

Off the record: Talking with devs, who actually work within UK's gaming industry; I was told that PS4 offers more low level access, but XBOne DX library is much friendlier and robust to work with... unfortunately they couldn't comment more or show me pieces of code.

PS: Nice to have a proper discussion with someone, instead of fanboys war... :)

LostDjinn3897d ago (Edited 3897d ago )

"Nice to have a proper discussion with someone :)" - Back at you.

Have you had any hands-on time with the Xbone? (Coding time not playing time) The problem as I see it is that DDR3 is read or write. Not both at the same time. Where as MS have used dual porting on the eSRAM to allow simultaneous read/write.

Now if they (MS) use the eSRAM as a simple cache for both CPU and GPU functions to cut down on fetch times (and try to leverage it that way against the DDR3 lack of bandwidth) they're going to run into problems. If they use it (the eSRAM) as a hardware back-buffer (G-buffer) like they did in the 360, they're going to run into problems.

I'll use TR as a point as we're both familiar with it.

TR uses a fairly small amount of physical RAM. On the video you linked they comment about using only 16 meg. Sounds great. The trick is that it uses bandwidth to compensate (you only stream what you need for the frame in question).

Now if you've used the eSRAM as a cache all texture data would be considered new and would need to be loaded into the cache or you'd need to forgo the cache and hence negate its bandwidth.

If you used it (the eSRAM) as your Back-buffer you'd be reliant on the bandwidth of the DDR3 (or worse the HDD) when it came to streaming as you can't write faster than you can read. It would however allow for much cheaper AA solutions and post-processing provided it didn't exceed the 32 meg physical pool (for everything they need to do).

It's going to reach the point where workarounds or down-sampling are going to be needed no matter which way they've leveraged the eSRAM.

I just wanted your take on how it's shaping up now.

Edit: Sorry if I seem to be peppering you. It's just that most of the time here specs equate to little more that arbitrary numbers fanboys throw at each other. It's good to know that once in a while that's not the case.

Athonline3897d ago

@LostDjinn

Thanks :)

I see what you mean. In my blog, I tried to separate hardware from software in order not to confuse people.

For the problems you mentioned, as I said it will depend on how much access dev will have or how good the OS will be.

In my opinion the best thing is to load static environments; Static environments won't change as much as let's say the main character's model. This can reduce the bandwidth issues. Loading static things, can be done both as both a buffer and cache.

For the main, non-static models it won't really matter. As the GPU will draw them at each frame, you can use blur or other techniques to eliminate any graphical imperfections/ artifacts.

Ofc at the end you end up having physical limitations and to get the best out of it, you will need devs who know what they are doing...

LostDjinn3897d ago

NP.

Yeah, I sort of though we'd end up in the same place and sure enough we did. It's those damn laws of physics that keep messin' with things. Someone needs to discover magic so we can get around them.

Anyway, cheers for the chat. See ya 'round (I hope).

Nicaragua3897d ago

This is a great blog.

Thanks for taking the time to write it and the very detailed explanation's in the comments above.

fsfsxii3897d ago

I'm pretty sure no one of these fanboys know squat about the technical stuff both systems have, well, you're not a dveloper.
The media just throws them a bone and they fight over it. Thats the fanboy mentality over on N4G.

kewlkat0073897d ago (Edited 3897d ago )

Wow first blog I ever read throughout on here..

I'm no programmer but I feel like optimizing code to take advantage of any hardware is what separates the different tiers of Devs on any platform.

You already know a talented dev studio is is going to have a polished game that makes use of the given hardware given the right financial backing, support and talent. That is why we have devs like Naugthy Dog..Crytek..etc

That is why I never tend to compare first-party game titles "technically" across different systems because "not all devs created equal within their respected platform choice". One dev can push the platform while the other might not be on the same level. Again, financial backing, support and talent goes a long way.

After reading your blog, I have a better understanding on what these consoles are pushing for and how the "tools" being used, will be equally be as "important" as the hardware and the devs that use them.

Show all comments (20)
20°

Switch update out now (version 18.0.1), patch notes

Nintendo Switch version 18.0.1 firmware update has been released and the full patch notes are now available.

Read Full Story >>
nintendoeverything.com
40°

Pac-Pix - A Fantastic Forgotten Spinoff

Pac-Pix launched in 2005 as a Nintendo DS exclusive, and deserves to make a comeback on modern touchscreen devices.

40°

Helldivers 2 flamethrower should be able to kill bug holes

It's about time we talk about the flamethrower and how it should be made better in Helldivers 2, starting with the ability to destroy bug holes.

Read Full Story >>
gamesandwich.com
ChasterMies18h ago

This is dumb. Fire doesn’t collapse tunnels.