EUNIS97, Grenoble (France) 9-11 September 1997

Ref: 032401

Center - A distributed computing center of the future?

1,2 Ludek Matyska, 2Eva Hladká

Introduction

The role of computer centers at universities had undergone a very dramatic reshaping in the past decade. It is no more a single ``computer aware'' center of the university, it is becoming much more a coordinating place, responsible for a kind of computer related infrastructure. However, new roles are also emerging, and in this paper we discuss a potential which may be gained by merging services of individual computer centers together.

The extremely fast proliferation of personal computers lead to a belief that computers are becoming a tool not too different from other ordinary tools used in our everyday life. The information society of tomorrow began to look like a kind of paradise where everybody uses his or her computer to connect to sources of information, to ease any work to be done. The computer centers started to become an obsolete notion and many universities considered to reduce or even to close them. In the Central and Eastern Europe of the nineties, the situation was even more dramatic due to the very fast changes there.

But, as with any other complex and sophisticated tool, it is not ease to use it without a lot of training and experience. Situation started to change with the emerging of local area networks and their interconnection with the Internet. While at the beginning it was easy to join few computers into a LAN, the interconnection of LANs called for new expertise and, as such, for some kind of centralized control over its deployment. What is more important, new services were looked for and the vital role of computer centers reemerged.

Contemporary Role of Computer Centers

As contemporary computer centers are no more the sole owners of computing related technology at the universities, they have to focus their attention to services which are most efficiently done from a center. While individual users have usually their own personal computers on their desks-- computers whose raw computing power and memory and disk capacity is larger than that of large computers of the past-- these computers must be somehow connected to the network. The infrastructure building and maintenance is thus one of the indispensable new roles.

Another important role is related to reliability and robustness. While individual users can backup their data, just a tiny fraction is actually used to do it on a regular basis. It is much more easier, convenient and cheaper to provide such a service from some central place. It is also much more reliable, as there are usually more than just one device allocated (or allocatable) for this task. Another point is the disk capacity. A failure of individual disk in a personal computer usually means that the computer will be out of service for some noticeable time. On the other hand, computer centers usually build their (large) disk capacities using some kinds of RAID's, where a single disk failure may not be even noticeable by the end users. In general, all the services provided by computer centers are (or may be) backed up in some way, and the redundancy needed is substantially cheaply achieved at this level.

Last but not least, there come the information services used and/or provided by the university. The university management is becoming more distributed, with the responsibility for decision delegated to lower parts of the managerial hierarchy. However, the responsibility for data correctness calls for some centralized supervisors. The information technology allows, when properly used, to take the best from both worlds-- the data are kept centrally, at the computer center, while the access is provided in a distributed way. Similar situation also holds for information provided by the university (e.g., through the web). While the information may be collected, and even prepared, i.e., edited, formatted and the like, in a distributed fashion over many parts of the university, it may then be stored in an individual server, managed by the computer center.

As we have seen, there are still at least three roles where the computer centers have their irreplaceable responsibilities:

  1. The infrastructure.
  2. The reliability.
  3. Information services.

Computer centers are not, however, independent entities in the networked world of today. The increased mobility of researchers and students, coupled with the increased number of people using services of more than just one particular computer center, needs to be supported by a kind of convergence of individual computing centers. It may not be surprising that it is again the ``power'' users, i.e., users of high performance computers, looking always for ways how to increase the computing power they have at their disposal, who are the first one to ask for similar (if not identical) computing environments. However, these users will be very fast followed by others, and it is vital for the computer centers to be well prepared before the main wave will hit them.

The Center

The Center 3-year project was launched in the last year as a part of the TEN-34 CZ activities of the Czech Republic. Its main goal is to connect the largest computing centers of the Czech universities, namely the West Bohemia University in Pilzen, Czech Technical University and Charles University in Prague, and Technical University and Masaryk University in Brno into one virtual computing center. The primary target of this pilot project, lead by the Masaryk University and supervised by the first author of this article, is a group of academic users of high performance computers at the respective sites, but it is in no way limited to them. The primary goal is to create a large virtual computer with a uniform user interface. This virtual computer is spanning a large geographical area (the distance between Pilzen and Brno is more than 250km). The interface is understood in the broadest sense, i.e., encompassing all the provided services. The tex2html_wrap_inline122 Center is also built as an open center, where more computers may be connected in and where new partners may also become involved. This push a very strong limits on what may be done and how.

A truly heterogeneous virtual computer is built, whose nodes are computers of individual centers. There are three POWER Challenges from SGI, large AlphaServer from Digital and a 19 processor IBM SP2 to be connected in one whole. From the user's point of view the result of the project will be seen as just one large tex2html_wrap_inline122 Computer. Users will be allowed to log to any node while having immediate access to all the tex2html_wrap_inline122 Center resources. This means that user of some program (service) may not be even aware (or take care of) which particular node runs her program, more or less in the same way as users of parallel computer don't care which particular node they are using.

Administration

As may perhaps be predicted, the political and administrative problems are the harder ones. We already identified some places where common agreement is necessary:

Technical side

The whole tex2html_wrap_inline122 Center project is not possible without a reliable and high performance network between its individual nodes. The sites are currently connected to the TEN-34 CZ backbone, an ATM academic network running at 34Mbps. All the involved computers have direct access to this ATM backbone which means that a virtual channels may be created among them. Both IP over ATM and LAN Emulation mode of the underlying ATM network will be used to create a kind of dedicated routes through which the tex2html_wrap_inline122 Computer nodes communicate. An ATM metropolitan area network running at 155Mbps is currently available at Prague and at Brno, opening thus a possibility to connect a subset of nodes at higher speed than allowed by the backbone alone.

A distributed file system is provided on top of the network connection. After considering all possibilities, the AFS distributed file system from Transarc [2] was chosen as a primary filesystem of the tex2html_wrap_inline122 Center. The main reasons were:

An AFS multilicense covering all universities involved in the project was purchased. Each university (computer center) established its own AFS cell. There are, however, some peculiarities and problems connected with the use of AFS, which have consequences to the tex2html_wrap_inline122 Center implementation. Overall, we found AFS to be a valuable tool for the read only filesystems (parts of the operating systems and the application software) but of just a limited use for read/write filesystems (like the user directories). AFS is definitely not a choice when a high local I/O throughput is required (e.g. ab-initio calculations). The AFS is therefore used in tex2html_wrap_inline122 Center to store the read only directories with application programs and shared parts of user home directories. Users have an option to either have all their home directories stored in the AFS or to have (small) local filesystems at each node and use AFS as a shared data repository. AFS is also complemented by the use of the local native filesystems which are made available through (a limited) use of NFS.

The use of AFS naturally lead to the adoption of Kerberos for the user authentication [3]. We are currently using Kerberos 4 implementation (from KTH, Sweden)-- the Kerberos 5 is again available in USA only. To allow for an easy and smooth path for future expansion, each computing center is running its own Kerberos realm and we use the interrealm authentication to move the tickets around. We had to modify a lot of standard programs (like login, telnet, telnetd, ...) to make the interrealm crossing as smooth as possible and especially to eliminate any need for users to know precisely where the realm borders lie. While quite successful, we discovered that Kerberos 4 interaction with the AFS own authentication mechanism is not ideal and that sometimes users have to reissue their passwords to have access to all their resources.

Load Sharing Facility (LSF [1]) from Platform Computing, Inc. was chosen as a job queuing and load balancing tool for the whole tex2html_wrap_inline122 Center. A LoadLeveler is used on IBM SP2 and a gateway is to be developed to connect both these systems. Again, each computing center runs its own LSF cluster with an intercluster communication established to allow for a proper load balancing between individual computing nodes. The use of AFS and Kerberos lead to a problem whose best solution we are still searching: how to ensure that proper authority will be given to user's jobs when they finally left the job queue and/or when they are running for a very long time (days or even weeks).

The same set of application programs is not available at each node of the tex2html_wrap_inline122 Center. The transparent access allows to use them without knowing where they are may actually run. The queuing system is aware of the location of all major programs and reroutes individual request to nodes where they may be (best) served. There is, however, no such support for interactive programs.

Conclusion

While the tex2html_wrap_inline122 Center project is just in its first phase (the project started on September 1996), we already identified several major advantages of the tex2html_wrap_inline122 Center over the individual centers:
  1. It simplifies the access to centralized services of different nodes. It also allows to share ``personalized'' environments between sites, including access to personal files.
  2. It increases the utilization of individual computers and software licenses available-- it is no more necessary to buy everything to every site.
  3. It provides much higher reliability at much lower cost-- users at individual university may continue to work even in case of ``their'' node failure.
The tex2html_wrap_inline122 Computer, which is scheduled to be put into full experimental operation at the end of 1997, will be used both as a large distributed computer and as a testbed for the unified user interface of computer centers of major Czech universities.

References

1
URL: http://www.platform.com
2
URL: http://www.transarc.com
3
URL: http://web.mit.edu/kerberos/www/

About this document ...


1Faculty of Informatics,
2Institute of Computer Science
Masaryk University, Botanická 68a, 602 00 Brno, Czech Republic
E-mail:
ludek@ics.muni.cz, eva@fi.muni.cz

Copyright EUNIS 1997 Y.E.