Linux Online Advertisement
[ Register ]

[ Applications ]
[ Documentation ]
[ Distributions ]
[ Download Info ]
[ General Info ]
[ Book Store ]

Advertisement

[ Courses ]
[ News ]
[ People ]
[ Hardware ]
[ Vendors ]
[ Projects ]
[ Events ]
[ User Groups ]
[ User Area ]

Programming Perl (3rd Edition)

[ About Us ]
[ Home Page ]
[ Advertise ]

Linux Online Interviews

Interview with Alex Vrenios

Michael J. Jordan, Linux Online Staff

May 14, 2002

Alex Vrenios

This week we interview Alex Vrenios, (seen here with our Linux Online mug on his desk) expert in configuration and deployment of Linux clusters and author of the book Linux Cluster Architecture being released next month (June 2002).

Mr. Vrenios is founder and chief scientist at the Distributed Systems Research Lab, a technical consulting group specializing in performance measurement and analysis of distributed computing architectures. He holds both a BS and an MS in Computer Science, and is the author of several articles along with his upcoming book. He is a member of the ACM and a Senior member of the IEEE.

He started his career working with communications and data transmission software on IBM mainframes. He made a switch to a networked C/Unix environment, became fascinated with inter-process communication and distributed systems, and never looked back. His other interests include astronomy, photography, amateur radio, hiking, and touring on a bright red Honda Pacific Coast motorcycle.

We thank him for taking the time out of his busy schedule to answer our questions about this topic that's getting more and more press everyday within the Linux community.


Linux Online:    I just wanted to preface this question with a personal comment and that is, my knowledge of cluster technology is somewhat limited and that's why I wanted to do this interview with you. It's something that we're talking about more and more in the Linux community, so let's start off with the 64,000 dollar question: What's a cluster?

Alex Vrenios:   I certainly agree that clustering is a very hot topic. There is a lot of unexploited parallelism in most applications. The scientific community has huge amounts of data, with many chunks that can be processed independently of the others. Data servers, including Web servers, are transaction processors that can process huge numbers of transactions independently of others. The computer manufacturer's answer is the multiprocessor, a computer with several CPUs or processor chips inside it, all sharing the same physical memory. This is a very expensive solution, however, that not everyone can afford.

The cluster computer is essentially a bunch of ordinary computers tied together through networking. They run custom software that makes them all work together as a team, toward a common goal. Computer scientists call this a distributed system, one that presents a single-system image to the user. That means the user sees nothing to suggest that there is more than one computer system responding to his or her queries.

Linux Online:    What's special about that?

Alex Vrenios:   Most interesting (to me) about cluster systems is that the personality of these systems - their behavior - depends on the architecture of its software instead of the hardware. A data service may run transaction server software, a scientific research center may run a distributed numerical algorithm, and both of these may include load-balancing features to get the most out of the cluster as a whole, and fail-over or even fault tolerance processing to make sure the system degrades in response to failures instead of crashing. The same bunch of computers on the same network can become a transaction server, a supercomputer, or whatever your fertile mind can conjure up. Software is the determining factor in system behavior, and the hardware is its infrastructure. This is a departure from the specialized computer architectures of the past several decades, made possible by the cluster computer concept, the availability of powerful, low-cost personal computers, and a sophisticated, low-cost (even free) operating system like Linux.

Linux Online:    How did you get interested in and start working with cluster technology?

Alex Vrenios:   I've been a part-time computer science graduate student most of my adult life. I am still awe struck by the brilliance of some of our contemporary thinkers. I heard about the eminent failure of the supercomputer market about the same time that I heard about the Beowulf project. Leading edge technology being bested by a bunch of off-the-shelf PCs got my attention. I took a course in distributed operating systems and followed it up with a seminar course about the more general applications. Here was this wide-open field of interest that was so sparsely populated that just about everyone was doing original work. I could not resist the opportunities.

Linux Online:    You've just finished up your book on the subject Linux Cluster Architecture. Was putting your experience down on paper more difficult than configuring a cluster?

Alex Vrenios:   Configuring my first cluster was hard because I lacked the wide range of skills necessary to do so. Once I had a handle on all that had to be done, I wrote a tutorial, a set of presentation slides. I carefully followed my own instructions and built a four-node cluster, just to work the bugs out of my methodology, and it worked! It was just after that experience that I realized there might be enough material for a book.

Once the material was organized into a presentation, it was a short step to writing a book proposal. The book itself was a huge effort for me, lasting almost nine months. I am still not completely in touch with all the emotions of having finished it. It will probably hit me full force when I hold the book in my hands.

Linux Online:    Why is Linux so well suited to doing this?

Alex Vrenios:   UNIX-like operating systems are easier to network than the mainframe systems I've worked with, or even some the popular desktop PC operating systems. Linux seems to have just enough sophistication to let me harness the power of a local network without getting bogged down in too many unrelated configuration issues. I am a big fan of Linux, and have been for several years now. It keeps getting better with every release.

Linux Online:    The name Beowulf is used a lot in association with cluster technology. Was this the first cluster project?

Alex Vrenios:   No. The first cluster I know about was designed by Kai Li, back in the 1980s, as part of his doctoral work at Yale. Local area networks were becoming popular, and a lot of research was being done on optimizing virtual memory systems. Dr. Li had the idea of servicing page faults by reading virtual memory pages into the physical memory of the PC that requested it. If you designed a program to run on a shared memory multiprocessor with virtual memory support, it should run on this network of PCs. Li's system became known as a Distributed Shared Memory system, or DSM. There is a collection of papers about this subject in the form of a book, titled Distributed Shared Memory, by J. Protic, published by the IEEE Computer Society Press, Los Alamitos, CA, 1998.

Research on variations of this new concept exploded, lasting maybe ten years before computer scientists settled on the cluster computer architecture as its heir-apparent. There is a lot of work published in this area and your readers can get a good overview in a book by Andrew Tanenbaum, one of my personal favorites, titled Distributed Operating Systems, published by PTR Prentice Hall, Englewood Cliffs, NJ, 1995.

NASA's Beowulf project got started about a decade later and was based on spin-offs from Li's seminal work. The Pile-of PCs (POP), Cluster of Workstations (COW), and the Network of Workstations (NOW) projects preceded Beowulf according to a 1997 paper in IEEE Aerospace, "Beowulf: Harnessing the Power of Parallelism" authored by Ridge, Becker, Merkey at NASA, and Sterling at JPL. Interested readers might try to locate a copy of the January 1998 issue of the Linux Journal, and look at http://www.beowulf.org for further information.

You probably remember the special edition of Red Hat 5.0 called Extreme Linux. This was essentially a CD-ROM with the NASA code, introduced back in 1998. I expect it's a collector's item today. http://www.redhat.com still had some information on this product the last time I looked. There is an Extreme Linux developers group, too, but I don't recall who is organizing it.

Linux Online:    We hear a lot about Hollywood starting to use Linux for their big animation projects like Shrek, Lord of the Rings and other popular movies. You have to have a lot of processing power to do that stuff. Are they using clusters?

Alex Vrenios:   I honestly don't know what Hollywood is using for their animation processing. They don't lack for funding these days, so I would venture to say that they don't use the dated PCs that I do. There are probably using high-powered clusters for just the reasons you cite.

Linux Online:    All of those amazing results you get from a Google search, that's clustering at work too, isn't it?

Alex Vrenios:   Absolutely. My understanding of the PC farms used by search engines, and by database backed Web sites in general, is that they exploit the parallelism of the astronomical number of searches they receive as well as that of the database storage architecture itself. A network of processors, each with its own physical memory spaces, is perfect for such an application.

Linux Online:    Are there other people doing important things with clusters?

Alex Vrenios:   I know about NCAR, the National center for Atmospheric Research, Los Alamos National Laboratory, Caltech, JPL, Oregon State, Princeton, some genome work and I heard recently that the CIA is using them, too. But if I tell you about that, well...

Linux Online:    Yeh, I see what you mean. On to less classified topics... In terms of power, from what I understand, that's measured in Gflops. What is a Gflop and what kind of numbers would give me bragging rights if I were the owner of a cluster?

Alex Vrenios:   Gflops, or gigaflops, is a measure of billions of floating-point operations per second sustained (as opposed to in a burst) by a floating-point processor.

As background information, the "power" of a computer is measured in several ways, many of which amount to no more than advertising hype. SPEC, an acronym for the Systems Performance Evaluation Corporation proposed the SPECmark, a repeatable benchmark to measure computer power. This was later broken into the SPECint and SPECfp by running just the integer or just the floating-point benchmark programs on the system under test, respectively. Today SPEC has a number of benchmarks available for your use, and a considerable results database on their Web site, so you can compare before you buy.

http://www.specbench.org
has further details.

Let me point out that the Beowulf challenge is a price-performance ratio; not just a cheap system, nor just a fast system, but a system that achieves a high performance rating at an extraordinarily low cost: a ten-to-one ratio is not uncommon. I spent about $2000 on my first cluster (eight nodes) and less than $500 on a more modern 4-node system. I don't run supercomputing software, but I'll venture to say that my 4-node cluster would perform as well as a high-end PC costing $5K. That's where the bragging rights come into play, in my opinion.

As to the biggest Linux cluster, I think there is a company making a system called KiloCluster (or something like that), with the potential for a Beowulf cluster with thousands of Linux nodes.

Linux Online:    And Linux brings this cost of a Gflop down considerably, isn't that right?

Alex Vrenios:   According to a talk on Beowulf-class clusters, given two years ago by the Beowulf pioneer Dr. Thomas Sterling, one system costing under $50K achieved more than a Gflops sustained operation in 1996, and a 120-processor system achieved a 10 Gflops rating the next year, winning the Gordon Bell Award for price-performance two years in a row.

Linux Online:    Due to all of these things we've discussed here, do you think the Linux cluster will make the mainframe obsolete over time?

Alex Vrenios:   No; at least I hope not. Don't forget that I grew up in this business with mainframes. I was a student programmer on campus and a part-time operator on weekends at a local company, where I went on to work full-time after graduation. Mainframes are near and dear to my heart.

A computer system that fills an entire room has a mystique about it. There is a barely perceptible roar in the background from the air conditioning system, not unlike that on the Starship Enterprise. Once each month, on the Saturday morning after preventive maintenance, we had to cold-start the system. When you hit the Power-On button, all the glass doors on the tape drives came down, the big disk drives spun up to speed, and the hundred or so neon console panel lights flashed their initialization dance. This was not some science fiction movie; this was a state-of-the-art, multi-million-dollar computer, and it came to life because I told it to. One cannot have such experiences and believe in the end of the mainframe.

Linux Online:    Thanks for taking the time to answer our questions about this important field in Linux.

Alex Vrenios:   My pleasure, Mike. Thanks for having me here!



Just a reminder that if you're interested in cluster technology and you want more in-depth information, check out Linux Cluster Architecture being released soon (June 2002).




Comments: feedback (at) linux.org
Advertising: banners (at) linux.org
Copyright Linux Online Inc.
Compilation ©1994-2008 Linux Online, Inc.
All rights reserved.