hardware news

FX 8150, aka Bulldozer, the new 8-core processor from AMD


1_Introduction

Five years after the advent of its first Phenom Stars using the architecture, AMD is preparing to upgrade its offer for x86 processors. Of course, we did have along the way a revision of the Stars architecture with the arrival of Phenom II, but nothing fundamentally new and especially nothing that can not challenge the dominance of Intel ultra-niche on the performance in particular.

In this season, and not without some delay, AMD finally launches its new architecture by the name of code Bulldozer. AMD and to revive the brand for the occasion FX, as the first processor architecture using Bulldozer FX are baptized, a name that will remind many memories for fans of Athlon 64. But if at the time the Athlon FX processors were simple heart, the new FX offers us in the late 2011 AMD models are 8-core, no less!

The return of the FX brand is it synonymous with performance feedback? How to position the new FX processors face the current supply of Intel? These questions will be answered in the following pages.

2_Bulldozer architecture: space for modules

2.1_Architecture made ​​of modules revisited


With Phenom and Phenom II, AMD K10 architecture we proposed, codenamed Stars. Relatively effective in certain areas, but it struggled to stand comparison with the offer from Intel. The gap condition became even more acute since the appearance of Sandy Bridge processors, the famous second-generation Core. The arrival of the first real sign Bulldozer new micro-architecture from AMD Phenom or even be ahead. Some would argue, in fact, that architecture is a derivative of K10 K8 Athlon 64 ... We will not go into these debates preferring to take you on an overview of the features of this architecture.

Among the objectives of AMD in the development of Bulldozer there are energy efficiency, modularity of the architecture or the fact to support the operating frequencies to the effectiveness of the architecture, a rather risky gamble we shall see.
The big news of the Bulldozer architecture is above all the CMT (Cluster Multithreading). This is an approach quite interesting and rather different from what we usually see, especially at Intel where the SMT (Simultaneous Multithreading) reigns supreme. With Bulldozer, AMD challenges, in a sense, the notion of heart run x86. One of the basic concepts of Bulldozer is in fact what AMD calls a module. Bulldozer is a module consists of two execution cores x86. Two true hearts? Not quite ... Our two hearts will indeed share a number of resources as the party responsible for data fetch (ie they are loaded), but also the instruction decoding unit or the one in charge of the calculations on floating. The second level cache is shared between both cores of the same module which here is clearly the element the least innovative.

Concept module Bulldozer

But why share units between our hearts in a single module rather than duplicating? The reason is simple: save transistors course and therefore the die area, but also optimize the consumption of the processor. Typically, it is recognized that in the x86 unit in charge of decoding is quite voracious energy. The shared between two cores to optimize parameters such energy. More pragmatically, other units are rarely sought or continuously at full capacity. This is particularly true of the unit dedicated to calculations on floating-point numbers. In fact, the pool between two hearts is actually quite relevant since it is then truly sought. Each module Bulldozer therefore has two cores and two threads can run simultaneously.

Note flexibility induced by this operation: if one thread is running within the module, it has access to all shared resources. To this is added a certain fluidity modularity for AMD. The smelter will feature the launch of variants 6 and 8 core processors of its FX: in the first case the processor will consist of three modules Bulldozer in the second it will carry a total of four modules. Furthermore this design is supposed to a rise in frequency facilitated in accordance with the objectives stated above. We remember that the material AMD had some difficulties in recent years.




The hearts of 8 FX 8150 seen by Windows


2.2_Beyond the modules!

The modules are not the only new Bulldozer architecture, far from it. Thus, the front-end is also changing. This block, which ensures the constant supply of instruction execution units, is now shared between the cores of the same module. It has therefore been revised accordingly to ensure a sustained throughput. Here AMD's engineers have focused most of their attention on the management of connections. Recall that a connection is nothing more than a jump in the code. Clearly we move from one end to another code if a condition is and the idea is to provide connections to speed prefetching or data necessary for the successful continuation of the code.

Thus, the prediction pipelines and power are now decoupled while AMD introduced a number of mechanisms well known ... Intel. There are, collage, management of direct and indirect branches with a buffer at two levels, the loop detection or the presence of a trace cache to store, as on Nehalem micro-instructions already decoded. On the decoding unit dedicated to this function can handle up to four Bulldozer instructions per cycle, against the previous three Phenom.

Following on Clubic.com: FX 8150, aka Bulldozer, the new 8-core processor from AMD: Architecture Bulldozer: up to modules
Computer and high tech




Focus on front-end

As for the cache, the Bulldozer module has 64 KB of memory for first-level instructions running on two routes: it is shared between the cores. Each heart also embeds its own first-level cache with 16 KB for data. By L1D cache is 4-way associative. Question performance, this cache is supposed to have been optimized with, among others, the development of techniques to predict the location of the path of the requested data in the cache.

By the way the arrival of a fusion solution. Like Intel (definitely!) Bulldozer is capable of decoding multiple instructions in a single statement: AMD is called fusion branch.
3_ An architecture that has the heart, head for the DDR3 1866
3.1_ Focus on the hearts!

Du Côté des Cœurs d'Exécution de 86 Every module de Bulldozer, le pipeline de l'ONU Retrouvé à 4 Étages (Contre 3 précédemment). Il Est découpé AVEC UN Côté d'ALU ous Deux Unités d'Exécution Arithmétiques et Logiques et 2 AGU ous Unités de Génération d'adresses. Par rapport un Phenom n'avez le pipeline de 3 Étages etait Composé de Trois ALU, Un Cœur d'Exécution Bulldozer NE may donc executer les instructions entières Qué Deux cycle d'horloge nominale Contre 3 versez Un Cœur Phenom. C'EST UN désavantage Théorique Qui ne Tien Pas du Compte seconde Cœur d'Exécution et Qui ne Considère Pas l'aspect Consommation énergétique. Rappelons qu'un module de Bulldozer AVEC SES Deux Cœurs d'Exécution HNE cense être de Nettement Moins vorace Qué Deux Cœurs Phenom.

En Ce Qui Concerne Le Registre, celui-ci Profite D'Une improvement with significative l'Arrivée d'un PRF. Là Encore l'architecture de Sandy Bridge Intel BNO d'une recemment familiarise with this notion. Le PRF HNE UNE Sorte d'index Qui RELIE les Registres nominale Utilise le moteur d'Exécution out-of-commander un Chaleurs entrées de Dans la mémoire tampon. L'Avantage Premier HNE ici de Stocker des entrées de taille supérieure.

Nos Deux Cœurs d'Exécution soi partagent UNE unité centrale chargée des Opérations en 128 bits. Ladite unit HNE composee de deux de canalisations de 128 bits peuvent être de Qui Combinés fr UNE Seule unit 256 bits, y reviendrons BNO. De Type FMAC les pipelines en question peuvent éffectuer fr UNE Seule passe des Opérations de multiplication et d'addition sur des Nombres à virgule flottante et sans CE Aucun arrondi Durant le Calcul versez Précision UNE Maximale. Une Fois unifiées en 256 bits, CES Deux Unités peuvent traiter les instructions Alors AVX Au Rythme D'Une instruction de cycle d'horloge nominale. Le passage à l'ONU PRF Evoque, plus haut HNE of course justifié par L'arrivée de la charge de prix en AVX.

FPU shared core Bulldozer

Since we mentioned the instructions, it's good to add that the management of AVX, AMD adds support SSE instructions SSE 4.1 and 4.2. As if this were not enough, it also adds its own poetically named XOP instructions and not to mention the FMA4 CVT16. These are instructions for completing the AVX and SSE5 from the project a time announced by AMD but never materialized. The FMA4 (Fuse multiply add) is an instruction that stores the result of an operation in a register after performing additional in a single cycle multiplication and addition. Intel uses for its part FMA3 one where the result of said operation is placed in a register previously used. In this regard, the successor to Bulldozer will FMA3, joining the Intel camp. These instructions could make significant gains in high performance applications (applications of computing, scientific, etc.). Only now it's a safe bet that the compilers do not benefit, especially if AMD has already announced that abandoned in the future for FMA4 FMA3 ...

Instructions supported by Bulldozer

Note finally the management of AES for everything related to hardware acceleration related to encryption and decryption operations using the standard of the same name.

What about memory?
We mentioned a little earlier arrangement of the first-level caches. Of course, the Bulldozer architecture not only of this single level cache. So AMD has a second level cache type associations. Arranged on 16 channels, it is shared between the cores and size rises to 2 MB per module then.

A third level of cache is also planned with a maximum of 8 MB shared between all the modules making up the processor. Always voluntary, the L3 cache has 64 tracks. The third level of cache is not inclusive and includes all the data ejected from the L2 cache. In total, therefore, an FX processor architecture Bulldozer can have DE16 MB cache! It is not nothing. As for the latency of caches we found 3 cycles for the L1, L2 for 18, which is very good and 65 for the L3. As feared the latter value is quite high. This is more so a Core of a second-generation latency L3 cache is 57 cycles

Beyond the caches, the Bulldozer architecture integrates the memory controller of course. It was right here in a DDR3 memory controller type dual channel, each channel being interfaced with 8 bits of 64 bits for error correction. The controller supports relatively high frequencies. Thus, with two bars can be installed in the DDR3-1866 on its operating system in practice to 933 MHz. Attention, with four bars it necessarily falls to 800 MHz maximum frequency or the DDR3-1600. AMD also points to support a supply voltage of 1.25 volts for DDR3.

DDR3-1866 and FX 8150

Mini DisplayPort Cables / HDMI declared illegal

HDMI Org, the organization guarantees the standard of the same name, wants the cables to convert Mini DisplayPort to HDMI port being withdrawn from sale because they are not licensed and therefore illegal, a new evil that threatens to some firms.

"The HDMI specification defines an HDMI cable only to have HDMI connectors at both ends. Everything else is not a licensed use of the specification and is therefore not allowed, "says HDMI Org in a statement that explains that, since the cables Mini DisplayPort / HDMI male / male have never been tested and licensed, they are illegal and should be removed from the market.

For their part, the cables offer a DisplayPort socket on one side and an HDMI socket on the other are allowed to the extent that it is possible to connect an HDMI cable terminated it.

This new risk of a blow to companies like Apple or Toshiba that use this type of cable on their products. HDMI Org "recognizes that there may be a need in the market for this type of product. However, for the moment, there is no way to produce its cables in a lawful manner. "Let us hope that the brands that use this cable will try to launch a test phase to legalize it.

More on Clubic.com: Mini DisplayPort Cables / HDMI declared illegal
IT and high tech

Consumption, Overclocking and Conclusion

Consumption
We naturally seek to verify the power consumption of different processors that folder. For this we use a power meter that is the consumption of the entire machine, ie directly in the decision. We proceed in two steps: a break, one in charge with Prime 95.






At rest the consumption of our processor requires no special comment. Two processors out of the lot, however: the Core i7 2600K with a consumption system within 120 Watts and the Phenom II X6 1100T with the same consumption. Under load, the story is different: the Core i7 system 990x peaked at over 330 watts or a few extra watts against the Core i7 980X. Nevertheless, in charge of the Phenom II X6 1100T is less greedy ... as the Core i7 2600K!

Overclocking
Unable to close the test let alone overclocking. That side, we were pleasantly surprised. Not only did we reach 4 GHz in a jiffy, but we managed to reach 4.2 and 4.4 GHz without any problems and stably. We have suddenly found our ambitions for reach 5 GHz. Goal succeeded because we've run the Core i7 990x to 5 GHz with 167 MHz system bus and a coefficient to 30x. If the system starts smoothly we had to manually adjust some tension so that Windows 7 will not load completely blue screen (or black). No way, however, achieve a stable 5.4 GHz ... despite several attempts!





Various tests including an overclocking at 5 GHz

Conclusion
Delayed many times, then finally launched in haste, the Core i7 990x is the absolute fastest Intel processors, no doubt about that. It is also a good way for Intel to erase an image somewhat tarnished with the case of the bug Sandy Bridge.

While the new platform Intel Socket LGA-1155 has the ignition delay - the fault of a bug found in chipsets Series 6 after marketing - Intel seems to prolong the life of his high-end platform in LGA-1366. It was believed the Socket LGA 1366 and sentenced there in truth this stimulus Intel platform by giving it life ... Beyond the launch of Core i7 990x, the arrival in several motherboard manufacturers new references based on the chipset X58 is a sign that does not disappoint! Even Intel is ready to play by giving her offspring with the Smackover DX58SO2.
Returning briefly to the Core i7 990x: if it is the fastest processor at the time, Intel does not force his talent as this new standard is a mere speed-bump. Without a new revision of the die, it merely an increase of 133 MHz nominal frequency is not very long but it can occupy the land by the release of new chips AMD Bulldozer. As for the first hexa-core processors based on Intel's Sandy Bridge architecture, their output would have been postponed to the latest news ... this may be explaining this. We find still in our tests for the game, the second generation Core processors (ie the Sandy Bridge) are better than the new Core i7 990x.

A final word on the price of the Core i7 990x, indecent necessarily a price: it will cost a little less than 1000 euros to acquire the beast. But this newcomer is still a merit: it is mechanically cut the price of Core i7 970 and 960 and in significant proportions! It may ultimately be her only interest?