Hardware

The 80's supercomputer that's sitting in your lap

By now we're all used to the idea that each generation of CPU is faster and does more than the last, but just how fast? The computer you're probably using to read this blog on is faster than an 80's supercomputer.

By now we're all used to the idea that each generation of CPU is faster and does more than the last, but just how fast? The computer you're probably using to read this blog is faster than an 80's supercomputer.

-----------------------------------------------------------------------------------------------------

Just this year, IBM achieved a major breakthrough in computing speed with the Roadrunner when it broke the Peta-FLOP barrier. Currently a near unbelievable speed, it's only a start. By the end of the next decade, scientists hope to have a supercomputer that's capable of exceeding the Exaflop barrier, or 1 quintillion floating operations per second.

The question, of course, is what does that compare to on the desktop? Based on where we are now, what can we expect to have on a typical desktop or laptop in twenty years? Perhaps an indication can be made by just how far we've come in the last twenty years.

Supercomputers 80's style

Twenty years ago as the 80s drew to a close, when you used the word "supercomputer" the company that came to mind wasn't IBM, it was Cray. For most of the decade, the Cray X-MP series dominated the computer field. Twenty years ago Cray introduced its newest and fastest computer. It was the successor to the X-MP line: the Cray Y-MP.

The Cray Y-MP could host up to eight 32-bit processors that were capable of 333 MegaFlops each. Combined, the Cray Y-MP could sustain a speed of over 2 GFlops. The CPUs ran at a blazing 167Mhz, upward of over 200Mhz, and could process both 24-bit and 32-bit instructions. The operating system of choice was a Cray-developed version of Unix.

Introduced just the next year, the fastest processor for a desktop you could buy was a 486DX/2 66. This was a processor that had a system bus of 33Mhz and an internal clock rate of 66Mhz. It was a 32-bit processor as well and could also run Unix.

As slow as a 486 may seem today, the speed difference between the 486 and the  Cray was vast. Check this chart from Netlib.org:

SYSTEM OS/COMPILER CPU/FPU FPU(MHz) SCALAR_MFLOPS REF
---------------- ------------------------- ----------- -------- ------------- ---
Cray Y-MP C90 NOTE 001, C90/16256 ----------- 240 67.16 9
Cray Y-MP C90 cc -O3 (4ns clock) ----------- 250 60.39 45
AMI 80486DX2/66 NOTE 024, MS DOS 5.0 80486DX2 66.7 2.96 19
MAC Quadra 800 Think C 5.0.4

68040

33.3 2.96 40
486DX2/66 EISA NOTE 025, IBM OS/2 2.0 80486DX2 66.7 2.87 36
486DX2/66 NOTE 026, SCO UNIX 3.2.2 80486DX2 66.7 0 32

Note that I tossed an 80's era Mac Quadra 800 running a Motorola 68040 in there for good measure. NetLibs uses Scalar MFlops as a measurement, which is slightly different than the Linmark-based speed ratings used by Top500.org, which tracks the fastest computers on the planet. It was, however, the only benchmark featuring older machines I could find. Even so, you can see the chasm between what was available on a top-rated desktop as opposed to a supercomputer. It's a factor of 30:1.

Fast forward twenty years

Today, that poor Cray Y-MP that would have set you back over $20 million would barely be able to run Windows Vista. Even a low-end laptop today has more power than that machine. For example, check out the Mobile CPU charts on Tom's Hardware Guide. The AMD Turion TL-59 CPU on my HP Pavillion runs at about 11GFlops. The Core 2 Duo found in a MacBook Pro will pull around 16GFlops.

The gap between supercomputers and desktops has widened beyond the 30:1 ratio of twenty years ago. Where you can pull about 25GFlops in a QuadCore desktop and up to 88Gflops on a Xeon QuadCore server, the Roadrunner, as mentioned above, will pull  1 Petaflop or 1,000,000 GFlops.

However, it's still conceivable that in twenty years we'll have Petaflop machines sitting in laptops or whatever device will ultimately be called the common PC. As it is, Intel's working on an 80-core CPU that will break the Teraflop barrier for PCs. This may ship by 2011 or 2012.

Just think how well Vista will run on that!

17 comments
jojomonkeyboy
jojomonkeyboy

When comparing performance of your average desktop to the crays of the early 1990s or even 80s, you must take into consideration not the "peak mflop" ratings but the sustained mflop ratings. Peak mflops mean little if the machine never reaches such performance levels. The Army actually analyzed the cost/benefit of having a cluster of p4 2.8 ghz or going with a cray solution. They discovered that although the p4 2.8 ghz had a high peak mflop rating of 5.6gflops, in practice it only reached %3.4 of its peak performance due to bandwidth limitations! Please see the results of the army's study right here https://cug.org/5-publications/proceedings_attendee_lists/2003CD/S03_Proceedings/Pages/Authors/Muzio_slides.pdf Needless to say their conclusion was that the cray solution was more cost effective and easier to program and maintain. Nasa's high performance computer lab also did a comparison between their old cray xmp 12 (one processor 2 megawords of memory) and a dual pentium II 366 running windows NT. They had to redesign the space shuttle's solid rocket boosters back in the late 80s after the challenger disaster and the cray xmp was used to model air flow and stresses on the new design. Some years later the code was ported to a windows NT workstation and the simulation rerun for comparison. The result is that a single processor cray xmp was able to compute the simulation in 6.1 hours versus 17.9 hours on the dual pentium II. The cray xmp could have up to four processors with an aggregate bandwidth of over 10gb a sec. to main memory, this kind of SUSTAINED bandwidth between cpu (not gpu) and main memory was not matched on the desktop until about 4 years ago. The pentium IIs had either a 66mhz or 100mhz bus speed so we are talking a maximum bandwidth of only 800mb (528mb with 66mhz bus) and with around 330mb/sec transfer rates sustained (remember pc's use dram and the crays mostly used very expensive sram memory). The importance of bandwidth and real world number crunching performance can be seen in the STREAM benchmark. Please go to http://www.streambench.org/ to see exactly what I mean. In 1990 the C90 cray was the baddest super computer on the planet, and at $30 million fully configured it was also by far the costliest. Here's a photo of it: http://www.cisl.ucar.edu/zine/96/fall/images/c90.gif. The cray c90 could have up to 16 processors, with 16gb of memory, and could achieve a maximum performance of around 16glfops. "Well gee, my cheapo phenom x6 can do well over 16 gflops because that's what it says on my sisoft sandra score so I have a cray c90 sitting under my desk blah blah..." you are completely wrong if you think this. The sisoft sandra benchmark tests everything in cache which is easy for the cpu to access. Real world problems, the kind that crays are built to solve, can't fit into a little 4mb cache and thus we come to sustained bandwith problems. The c90 can fetch 5 mega words per clock cycle (for each processor) from main memory and has a real world bandwidth of 105gb/sec; compare this to a relatively modern, quad processor (4 processors and 16 cores) core i7 2600 that gets a measly 12gb a second sustained bandwidth. "But the core i7 2600 is clocked much higher than the c90 which only operate at 244mhz per processor". Ahhh but if the data is not available for the processor to operate on then it just sits there, wasting all cycles, waiting for the memory controller to deliver data to it. Without getting into too much detail (if you want a lot of detail read my analysis of the cray 1a versus pentium II below) the real world mflops of the C90, working on data sets too large for a typical pcs small cache, works out to roughly 8.6 gflops while the Intel Core i7 2600 will achieve only about 1gflops sustained on problems out of cache. So far there are no desktops, and won't be for quite a few years, that come EVEN close to the real world sustained bandwidth (and thus sustained performance) of a C90. Now for problems that do fit into the tiny cache and can be mostly pre-fetched, of course the desktop will be superior to the old crays. Here is a rough comparison I made between a cray 1a and a pentium II 400, read on only if you want to be bored to death: The Cray 1A had a clock cycle time of 12.5 ns, or an operational frequency of 80 mhz. It had three vector functional units and three floating point units that were shared between vector and scalar operands in addition to four scalar units. For floating point operations it could perform 2 adds and a multiply operation per clock cycle. It had a maximum memory configuration of 1 million megawords or 8 megabytes at 50ns access time interleaved into 16 banks. This interleaving had the effect of allowing a maximum bandwidth of 320 million megawords into the instruction buffers or 2560 mb/sec. Bandwidth to the 8 vector registers of the Cray 1A could occur at a maximum rate of 640 mb/sec. The Cray !A possessed up to eight disk controllers each with one to four disks, and each disk having a capacity of 2.424X10^9 bits for a maximum total hard disk capacity of 9.7 gigabytes. There were also 12 input/output channels for peripheral devices and the master control unit. It cost over 7 million in 1976 dollars and weighed in at 10,500 lbs with a power requirement of 115 kilo watts. So how does this beast compare with myr old clunker of a PC with 384 mb of SD100 ram and a P2 400 mhz cpu? Well lets take a simple triad operation, with V representing a vector register and S representing a scalar register. S*V0[i] + V1[i] = V2[i] Without getting into too much detail this equation requires 24 bytes of data to perform once. There are two floating point operations going on here, the multiplication of the scalar value with the vector, then the addition of the second vector.Thus, assuming a problem too large to just loop in the cray 1A registers, and a bandwidth of 640 mb/s, the maximum performance of a Cray1A would equal (640/24) * 2 = 53 mflops on large problems containing data which could not be reused. This figure correlates well with the reported performance of the Cray 1A on real world problems http://www.ecmwf.int/services/computing/overview/supercomputer_history.html. True bandwidth on a Cray 1A would also have to take into account bank conflicts plus access latency so about 533 mb/sec sustained is a more realistic figure. On smaller problems with reusable data the Cray 1A could achieve up to 240 mflops by utilizing two addition function units and one multiplication function unit simultaneously through a process called chaining. So you see the Cray 1A could be severely bandwidth limited when dealing with larger heterogeneous data sets. My pentium II 400 has 512 kb of L2 cache, 384 mebabytes of SD100 ram, and a 160gb 7200 rpm hard drive. Theoretically it can achieve a maximum of 400 mflops when operating on data contained in its L1 cache, although benchmarks like BLAS place its maximum performance at 240 mflops for double precision operations which is what we are interested in here. Interestingly this is about the same as what a Cray !A can do on small vectorizable code. However once we get out to problem sizes of 128kb or 256kb or even 512kb my pentium 2 would beat the Cray 1A even in its greatest strength, double precision floating point operations, due to the bandwidth advantage of the L2 cache over the Cray's memory. At 1600 mb/s bandwidth my computer can do up to 133 mflops for problems under 512 kb in size but greater than the L1 Cache. Once we get beyond 512 kilobytes the situation shifts as data would then need to be transferred from the SD100 ram.The theoretical bandwidth of SD100 ram is 800 mb/sec, still greater than the Cray 1A but here we run into some issues. The Cray 1A had memory comprised of much more expensive SRAM, while my memory is el crapo DRAM which require refresh cycles. So with these taken into account my DRAM actually has a theoretical maximum bandwidth of about 533mb/s and a real world maximum sustained bandwidth of a little over 300mb/s. This means that for problems out of cache, my pentium 2 gets slowed to a measly 315/12 = 26 mflops. In this special situation where the problem is vectorizable, the Cray 1A is still faster than my pentium 2, not bad for a computer that is 30 years old. Once we get problems greater than 8 megabytes, the advantage shifts completely back to my pentium II as the Cray !A must then stream data from its hard disks (which were slower than ultra ATA/100) and my computer can go right on fetching data from ram. The Cray 1A could not realize its full potential as it was hampered by bandwidth and memory size issues, yet in certain situations could outperform a desktop computer from 1998. Solid state disks,more memory ports, and larger memories were utilized in the subsequent cray xmp to address these problems. A desktop like the core duo E6700 can do over 12 gigaflops, BUT only on problems that are small and fit into its cache. Once the data gets out of cache today's modern computers get their butts kicked by the old school Crays from the 80s. Just visit http://www.streambench.org/ to see what I mean.

Pjones
Pjones

Vista will still run like a dog.

digitrog
digitrog

... and if Micro$oft is still around then, will find a way to Slow down the processor with bloat-ware ... ;p

namtupdj
namtupdj

For merely main CPU(s) intensive applications, one's laptop MAY be more powerful. That is, it could be as long as one doesn't consider the results achieved via those supercomputer's parallel programming for multiple CPU's (more than two), along with sophisticated operating system and scientific application "tweaks" (again, allowing more / better use of multiple CPUs than laptop OSes do!). Effective parallel processing programs for your laptop's multicore chips are still hard to come by. BUT, when one considers DATA as well as CPU intensive applications, the laptop is woefully outclassed by even 80's machines - including "plain ole" IBM (and Burroughs) mainframes. Nor are main CPU processing power numbers really applicable for such comparisons. When one adds up all the processing power in the IO peripherals (disk controllers, etc.), those supercomputers and again even the vanilla mainframes vastly outclassed today's laptops. Generally, such comparisons are hardly "apples to apples" - whether that be in total system compute and data handling power or in actual real-world performance, which again must be assessed based on specific applications. This is sort of like comparing the "power" of a Corvette to that of a Mac Truck only on the basis of speed and not considering carrying load and capacity, transmission gearing, etc. Yet this facile and largely wrong / misleading comparison continues to be made for its "gee whiz" factor.

John Sheesley - TechRepublic Pro
John Sheesley - TechRepublic Pro

In 1988, the fastest computer on the planet could just barely break the Gigaflop barrier. The Cray Y-MP would go 2.3 Gigaflops if you bought the 8 CPU version and ran it flat out - all for $20 million dollars. Today, a $700 laptop is 5 times as fast as I point out in Classics Rock: http://blogs.techrepublic.com.com/classic-tech/?p=189 IBM now makes a Petaflop supercomputer - how soon do you think something that fast will be common on the desktop? And what could it possibly do for the average user that we can't do today?

John Sheesley - TechRepublic Pro
John Sheesley - TechRepublic Pro

Although I will admit you can't directly compare the power of a CPU between a desktop and a mainframe and wind up with the same performance comparisons because PCs traditionally are more I/O bound than mainframes, I would suspect that modern laptops and especially desktops have buses with a wider bandwidth than an 80's mainframe. Coupled with a transactional OS such as Unix or even BSD rather than Windows, I have no doubt that a laptop of today could out perform, say, an IBM 4381 from 1985. That of course is purely conjecture. I'd have to do some research to confirm it.

w2ktechman
w2ktechman

"And what could it possibly do for the average user that we can't do today? " Run Windows v.9 Edited -- that is Home Ed.

Neon Samurai
Neon Samurai

With the current crop of graphics cards and physics processing slowly gaining tracktion, we'll be looking at virtual reality. Think Crysis graphics updated to truly realistic plus full imersion in four demensions and five senses.

LarryD4
LarryD4

Yes the PC of today has a lot more computing power then the Cray of the 80s. But if you look at the type of number crunching that it did, you might be surprised how ill equiped the PC's of today would be to do it. I'm not trying to argue the point that the Cray of the 80s could still be faster. But their is a reason why the mainframe industry hasn't gone away. They are built for specific "number crunching" processing. Not the broad GUI, user friendly processing required for a home computer. But its still cool to think that my trusty laptop, if I took it back to 1981, could be put in one of those giant rooms that had the Cray and it could replace the whole thing.

ByteBin-20472379147970077837000261110898
ByteBin-20472379147970077837000261110898

By the time things get that fast, Desktop computers as you know it will be obsolete, probably. Everyone's going mobile. Especially if WiMAX takes off, you won't need your desktop because you'll have a computer in your car, or wherever you want to go. You won't be in one place long enough. The "desktop" is also undergoing changes from hulking towers to smaller, more streamlined look that is easier to lug around to say, a LAN party. Many folks are trying to save space (I'm one of them). I put my tower desktop system in the closet as I've found that over the past year I been using my Intel Core2 Duo-based laptop running Vista. Laptops will get smaller in size too. So it'll be people buying super-fast laptops that can do more than today's desktops. What I'm excited about is the possibility of more development of Artificial Intelligence on the smaller, more portable "home" machines. It's an area I'm fascinated with and with more computing power, Neural Networks may be easier to actually run and experiment with. Another thing is it'll make 3D, games, music composing and video editing much easier. Faster Speed = more opportunity for creativity for the rest of us. This should open up some very interesting things in the future.

robert.dammers
robert.dammers

It is easy to forget that disk drives have been out-performing Moore's Law since the 1980s. My desktop's USB drives vastly outperform the 3380 drives I had on block multiplexed channels on my 4341 when I do a disk-to-disk copy, even though they go through the same hub (the 3380s had a throughput of some 3Mbytes per second on the channel, and a capacity of some 2.5Gb each in total, I think). Even the storage on my pocket sized Linux NAS box, accessed via 100Mb Ethernet is faster than that, and I have 1Tb of storage on a device costing ?160 in total! In fact, I play a game with myself wondering when, during my career our total computing capacity, worldwide at work, exceeded what I now have above my desk, with my old 2.8GHz AMD box with 1Gb of RAM, and 2Tb of storage, with another 2Tb for backup (I work for Royal Dutch/Shell). I think it was some time during the early 1980s, when I was a very new and shiny systems programmer. By contrast, in the 1990s, we used graphical workstations for visualising oil reservoirs underground with the same processor found in my sons' very old N64. I think the real contrast is not between UNIX and Windows, but between lean systems that have only the usability and manageability features they actually need, rather than dumping everything in. If you ran a mainframe emulator (Google "SIMH") on Windows, with a mainframe operating systems and old-fashioned applications (hacked in assembler, or using efficient shared libraries) running on it, you might find that it was still pretty slippy.

LarryD4
LarryD4

Well sure! If your talkin Unix or BSD I whole heartily agree, but then again when you talk about a "Laptop" compared to a Mainframe, most peopele would look at their PC laptop or the Mac Book as a comparison. It would be a neat challenge to actually try to get it done! Had a job workng mainframes back in the late 80's and the company had some Xerox Sigma 9's they bought from the government. When a major abend occured that halted the system it would play the National Anthem. Next time I want something more soothing like Barry White! :)

Neon Samurai
Neon Samurai

At the end of the BBs days, I was experimenting with running the board off a RAM drive. Boot dos, create the ramdrive from within config.sys, extract the ZIP archive onto the ramdrive to restore the system, run your restored programs from there. It was freakishly fast once you got everything uncompressed too the ramdrive. Windows replaced Dos and it's ramdrive.sys (forget the exact name) and I forgot about it until much later when I saw an ad for a ramdrive board; six or eight ram slots waiting to offer a DDR fast "drive". The thing even had it's own power adapter so you wouldn't loose the ramdrive between reboots.

digitrog
digitrog

a friend of mine bought a couple of the SD chip to IDE adapters - loaded one with an install of Windows OS [instead of using a mechanical HDD], and even on an older PC can have the OS boot up in just a couple of seconds, not the usual half minute or more ... So going "Solid State" can also make a huge difference in speed ... I can remember one of the computing magazines, when the Pentium II was quite new, did a radical experiment by loading what was the equivalent of just 32Mb of [Pentium quality] Cache Ram onto an old 386 as normal Ram, they had that 386 computer actually out performing the Pentium II computer - certainly makes one think of the wasted potential that was lost in those older machines ...

mforman
mforman

Ah! My first computer game was the Radio Shack version of that very game. I bought one of the original TRS-80 Model 1 (4k RAM, non-extended BASIC) and that was my first game in "machine language". Oooooooo... Instead of Klingons, it was "Jovians" to avoid licensing issues I guess lol.

LarryD4
LarryD4

Back in the early days the mainframes had tons of Easter eggs. One of the most common one was the classic Star Trek game that used a two dimensional array that you had to navigate, via text of course. And no matter what mainframe shop I worked in I was always able to find the Snoopy ANSI calendar.

mforman
mforman

I remember attending an IBM network training seminar in Chicago in the mid-80s and one of the instructors was an old OS system programmer from the very early mainframe days, when most of the OS's were custom written along with the applications. He was trying to troubleshoot a problem from some programmer's very early work, but kept getting an OS error message, "Shut 'er down, Clancy, she's pumpin' mud!"

Editor's Picks