CSC495/591J Distributed Systems


Though distributed systems have been around more than 20 years, it is only now that the discipline gets its deserved attention and popularity. The widespread use of Internet and World-Wide Web technologies made distributed systems a necessity of our everyday life. For example, whenever you access a data file in your personal directory from a workstation or search information in the Web, it is more than likely that you are running a distributed system.

The primary objective of this course is to examine the state of the art and practice in distributed computing and to provide students hands-on experience in developing distributed protocols. This course deals with the fundamental principles in building distributed applications including, but not limited to,

o Interprocess Communication (message passing,shared memory)
-- Message Passing (TCP/IP socket programming)
-- Shared Memory (threads)
-- Remote Procedure Calls (RPC)
-- Group Communications (ISIS)

o Synchronization
-- Clock synchronization
-- Mutual Exclusion
-- Resource Allocation
-- Distributed Agreement
-- Deadlock Handling

o Client/Server Model
-- File Systems
-- NFS (stateless)
-- AFS (stateful)

o Distributed Objects (CORBA)

o Security and Authentication
-- DES
-- Kerberos

o Fault Tolerance, Recovery and Replication


Late Homework Policy


OK, folks -- change of plan. I decided to go a little lenient on late homework. I can grant you one HW being late at most ONE WEEK for free of charge. You have to pick which one you would like to turn in late and not to be penalized for it (let's call this HW a *free* HW). You can have _at most one_ free HW during the course period. The normal deduction rule applies to any other late work: 5% of the late homework score will be deducted for each day being late. The same deduction rule applies to the free homework being late more than one week in which case 5% times the homework score times (the number of days being late after the due day -7) will be deducted. I will not take any work being late more than two weeks (i.e., it will get no credit). Because of this policy, I will not post HW solutions before the two weeks after the due date.


Course Syllabus

hw1 and solution


TCP/IP Network Programming

example tcp/udp programs

hw2

Lecture 4 Slides (8/27/97)

hw3 (due 9/5/97)

Multithreads

pthread example program (9/3/97)

hw4 (due 9/12/97) solution

Multithreaded Programming Guide (188 pages) ps (1.3MB) pdf (0.4MB)

More about pthread (basic calls, FAQs)

hw5 (due 9/24/97)

Extra Credit HW for those who like challenges -- or who might want to make up for any lost credits in HW and midterm (due 10/17/97)

Remote Procedure Call

remote procedure call example programs

hw6 (due 10/15/97)

file example programs

IP Multicasting Information

IP multicast API information . This describes the multicast API (application programming interface) using sockets. It is from Steve Deering's original multicast release README (1989) but is still valid.

IP Multicasting Routing (by Semeria and Maufer).

IP Multicast Example Program (By Steve Deering)

Group Communication

Please refer to your class notes.

CORBA

Please refer to your class notes.

Distributed File Systems

Lecture Notes I (in ps file )
Lecture Notes II (in ps file )

Synchronization

Clock Synchronization (in ps file )
Mutual Exclusion (in ps file )
Resource Allocation (in ps file ), also read the other slides handed out during the class

Atomic Transactions

Serializability and Concurrency (in ps file )
More concurrency control (in ps file )
To get more information about atomic transactions, see Benstein et al. (reserved in the library).

Fault Tolerance

How failure occurs (in ps file ) (11/19/97)
Atomic Commitment (in ps file stolen from Maurice Herlihy) To get more information about atomic commiment, see Benstein et al. (reserved in the library).
Replication (in ps file )
Agreement (in ps file ) (11/24/97)


CSC495/591J References (required readings)

V. Jacobson, ``Congestion Avoidance and Control,'' copy Computer Communication Review, vol. 18, no. 4, pp. 314 -- 329, Aug 1988.

R. Stevens, ``TCP Slow Start, Congestion Avoidance, Fast Retransmit, ans Fast Recovery Algorithm,'' Internet-draft (draft-stevens-tcpca-spec-01.txt from ftp.isi.edu)

J. Saltzer, D. Reed, and D. Clark, ``End-to-End Arguments in System Design,'' ACM Transactions on Computer Systems, vol. 2, no. 4, Nov. 1984, pp. 277--288.

T. Anderson, B. Bershad, E. Lazowska, and H. Levy, ``Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism,'' ACM Transactions on Computer Systems, vol. 10, no. 1, Feb. 1992, pp. 53--79.

Ken Birman, ``The process group approach to reliable distributed computing,'' Communications of the ACM, vol 36, no 12 (December 1993), pp. 37-53.

Leslie Lamport, ``Time, Colcks, and The Ordering of Events in a Distributed System,'' Communications of the ACM, vol 21, no 7, July 1978.

D. Cheriton and D. Skeen, ``Understanding the Limitations of Causally and Totally Ordered Communication,'' Proceedings of the 1993 ACM Symposium on Operating Systems Principles, Asheville, NC, Dec 1993.

Ken Birman, ``A Response to Cheriton and Skeen's Criticism of Causal and Totally Ordered Communication,'' Operating Systems Review , Oct 1993, pp 11 -- 21.

Barbara Liskov, Liuba Shrira, and John Wroclawski, ``Efficient At-Most-Once Messages Based on Synchronized Clocks,'' ACM Transactions on Computer Systems vol. 9, no. 2, May 1991, pp 125--142.

CSC495/591J Suggested Reading (where you can find more information)

Levy and Silberschatz, ``Distributed File Systems: Concepts and Examples,'' ACM Computing Survey vol. 22, pp. 321 -- 374, Dec. 1990.

Satyanarayanan, ``Distributed File Systems,'' Chapter 14 of Distributed Systems, edited by Sape Mullender.

Bernstein, Hadzilacos and Goodman, ``Concurrency Control and Recovery in Database Systems,'' Addision Wesley, ISBN 0-201-10715-5. A good reading for atomic transactions and concurrency control