Algorithm

We have seen that before a computer can perform a task, it must be given an algorithm telling it precisely what to do; consequently, the study of algorithms is the cornerstone of computer science. In this chapter we introduce many of the fundamental concepts of this study, including the issues of algorithm discovery and representation as well as the major control concepts of iteration and recursion. In so doing we also present a few well-known algorithms for searching and sorting. We begin by reviewing the concept of an algorithm.

The Concept of an Algorithm

In the introductory chapter we informally defined an algorithm as a set of steps that define how a task is performed. In this section we look more closely at this fundamental concept.

An Informal Review

We have encountered a multitude of algorithms in our study. We have found algorithms for converting numeric representations from one form to another, detecting and correcting errors in data, compressing and decompressing data files, controlling multiprogramming in a multitasking environment, and many more. Moreover, we have seen that the machine cycle that is followed by a CPU is nothing more than the simple algorithm. As long as the halt instruction has not been executed
continue to execute the following steps:
a. Fetch an instruction.
b. Decode the instruction.
c. Execute the instruction.
As demonstrated by the algorithm describing a magic trick, algorithms are not restricted to technical activities. Indeed, they underlie even such mundane activities as shelling peas:
Obtain a basket of unshelled peas and an empty bowl.
As long as there are unshelled peas in the basket continue to execute the following steps:
a. Take a pea from the basket.
b. Break open the pea pod.
c. Dump the peas from the pod into the bowl.
d. Discard the pod.
In fact, many researchers believe that every activity of the human mind, including imagination, creativity, and decision making, is actually the result of algorithm execution—a conjecture we will revisit in our study of artificial intelligence.
But before we proceed further, let us consider the formal definition of an algorithm.

The Formal Definition of an Algorithm

Informal, loosely defined concepts are acceptable and common in everyday life, but a science must be based on well-defined terminology. Consider, then, the formal definition of an algorithm.
Note that the definition requires that the set of steps in an algorithm be ordered. This means that the steps in an algorithm must have a well-established structure in terms of the order of their execution. This does not mean, however, that the steps must be executed in a sequence consisting of a first step, followed by a second, and so on. Some algorithms, known as parallel algorithms, contain more than one sequence of steps, each designed to be executed by different processors in a multiprocessor machine. In such cases the overall algorithm does not possess a single thread of steps that conforms to the first-step, second-step scenario. Instead, the algorithm’s structure is that of multiple threads that
branch and reconnect as different processors perform different parts of the overall task. (We will revisit this concept. Other examples include algorithms executed by circuits such as the flip-flop, in which each gate performs a single step of the overall algorithm. Here the steps are ordered by cause and effect, as the action of each gate propagates throughout the circuit. Next, consider the requirement that an algorithm must consist of executable steps. To appreciate this condition, consider the instruction
Make a list of all the positive integers which would be impossible to perform because there are infinitely many positive integers. Thus any set of instructions involving this instruction would not be
an algorithm. Computer scientists use the term effective to capture the concept of being executable. That is, to say that a step is effective means that it is doable. Another requirement imposed by the definition is that the steps in an algorithm be unambiguous. This means that during execution of an
algorithm, the information in the state of the process must be sufficient to determine uniquely and completely the actions required by each step. In other words, the execution of each step in an algorithm does not require creative skills.
Rather, it requires only the ability to follow directions. An algorithm define a terminating
process, which means that the execution of an algorithm must lead to an end. The origin of this requirement is in theoretical computer science, where the goal is to answer such questions as “What are the ultimate limitations of algorithms and machines?” Here computer science seeks to distinguish between problems whose answers can be obtained algorithmically and problems whose
answers lie beyond the capabilities of algorithmic systems. In this context, a line is drawn between processes that culminate with an answer and those that merely proceed forever without producing a result.
There are, however, meaningful applications for nonterminating processes, including monitoring the vital signs of a hospital patient and maintaining an aircraft’s altitude in flight. Some would argue that these applications involve merely the repetition of algorithms, each of which reaches an end and then automatically repeats. Others would counter that such arguments are simply attempts to cling to an overly restrictive formal definition. In any case, the result is that the term algorithm is often used in applied, or informal settings in reference to sets of steps that do not necessarily define terminating processes. An example is the long-division “algorithm” that does not define a terminating process for dividing 1 by 3. Technically, such instances represent misuses of the term.

World Wide Web

In this section we focus on an Internet application by which multimedia information is disseminated over the Internet. It is based on the concept of hypertext, a term that originally referred to text documents that contained links, called hyperlinks, to other documents. Today, hypertext has been expanded to encompass images, audio, and video, and because of this expanded scope it is sometimes
referred to as hypermedia.
When using a GUI, the reader of a hypertext document can follow the hyperlinks associated with it by pointing and clicking with the mouse. For example, suppose the sentence “The orchestra’s performance of ‘Bolero’ by Maurice Ravel was outstanding” appeared in a hypertext document and the name Maurice Ravel was linked to another document—perhaps giving information about the
composer. A reader could choose to view that associated material by pointing to the name Maurice Ravel with the mouse and clicking the mouse button. Moreover, if the proper hyperlinks are installed, the reader might listen to an audio recording of the concert by clicking on the name Bolero.
In this manner, a reader of hypertext documents can explore related documents or follow a train of thought from document to document. As portions of various documents are linked to other documents, an intertwined web of related information is formed. When implemented on a computer network, the documents within such a web can reside on different machines, forming a networkwide
web. The web that has evolved on the Internet spans the entire globe and is known as the World Wide Web (also referred to as WWW, W3, or the Web). A hypertext document on the World Wide Web is often called a Web page.
A collection of closely related Web pages is called a Web site. The World Wide Web had its origins in the work of Tim Berners-Lee who realized the potential of combining the linked-document concept with internet technology and produced the first software for implementing the WWW in December of 1990.

Web Implementation

Software packages that allow users to access hypertext on the Internet fall into one of two categories: packages that play the role of clients, and packages that play the role of servers. A client package resides on the user’s computer and is charged with the tasks of obtaining materials requested by the user and presenting these materials to the user in an organized manner. It is the client that
provides the user interface that allows a user to browse within the Web. Hence the client is often referred to as a browser, or sometimes as a Web browser.
The server package (often called a Web server) resides on a computer containing hypertext documents to be accessed. Its task is to provide access to the documents under its control as requested by clients. In summary, a user gains access to hypertext documents by means of a browser residing on the user’s computer. This browser, playing the role of a client, obtains the documents by soliciting the services of the Web servers scattered throughout the Internet. Hypertext documents are normally transferred between browsers and Web servers using a protocol known as the Hypertext Transfer Protocol (HTTP).
In order to locate and retrieve documents on the World Wide Web, each document is given a unique address called a Uniform Resource Locator (URL). Each URL contains the information needed by a browser to contact the proper server and request the desired document. Thus to view a Web page, a person first provides his or her browser with the URL of the desired document and then
instructs the browser to retrieve and display the document.
A typical URL is presented in Figure 4.8. It consists of four segments: the protocol to use to communicate with the server controlling access to the document, the mnemonic address of the machine containing the server, the directory path needed for the server to find the directory containing the document, and the name of the document itself. In short, the URL in Figure 4.8
tells a browser to contact the Web server on the computer known as ssenterprise.aw.com using the protocol HTTP and to retrieve the document named Julius_Caesar.html found within the subdirectory Shakespeare within the directory called authors.
Sometimes a URL might not explicitly contain all the segments shown in Figure 4.8. For example, if the server does not need to follow a directory path to reach the document, no directory path will appear in the URL. Moreover,
sometimes a URL will consist of only a protocol and the mnemonic address of a
computer. In these cases, the Web server at that computer will return a predetermined document, typically called a home page, that usually describes the information available at that Web site. Such shortened URLs provide a simple means of contacting organizations. For example, the URL http://www. google.com will lead to the home page of Google, which contains hyperlinks to
the services, products, and documents relating to the company.
To further simplify locating Web sites, many browsers assume that the HTTP protocol should be used if no protocol is identified. These browsers correctly retrieve the Google home page when given the “URL” consisting merely of www.google.com.
HTML
A traditional hypertext document is similar to a text file because its text is encoded character by character using a system such as ASCII or Unicode. The distinction is that a hypertext document also contains special symbols, called tags, that describe how the document should appear on a display screen, what multimedia resources (such as images) should accompany the document, and which
items within the document are linked to other documents. This system of tags is known as Hypertext Markup Language (HTML). Thus, it is in terms of HTML that an author of a Web page describes the
information that a browser needs in order to present the page on the user’s screen and to find any related documents referenced by the current page. The process is analogous to adding typesetting directions to a plain typed text (perhaps using a red pen) so that a typesetter will know how the material should appear in its final form. In the case of hypertext, the red markings are replaced
by HTML tags, and a browser ultimately plays the role of the typesetter, reading the HTML tags to learn how the text is to be presented on the computer screen.
The HTML encoded version (called the source version) of an extremely simple Web page. Note that the tags are delineated by the symbols and . The HTML source document consists of two sections—a head (surrounded by the head and /head tags) and a body (surrounded by the body and /body tags). The distinction between the head and body of a Web page is similar to that of the head and body of an interoffice memo. In both cases, the head contains preliminary information about the document (date, subject,
etc. in the case of a memo). The body contains the meat of the document, which in the case of a Web page is the material to be presented on the computer screen when the page is displayed.
The head of the Web page displayed in contains only the title of the document (surrounded by “title” tags). This title is only for documentation purposes; it is not part of the page that is to be displayed on the computer screen. The material that is displayed on the screen is contained in the body of the document.
The first entry in the body of the document  is a level-one heading (surrounded by the h1 and /h1 tags) containing the text “My Web Page.” Being a level-one heading means that the browser should display this text prominently on the screen. The next entry in the body is a paragraph of text
(surrounded by the p and /p tags) containing the text “Click here for another page.” the page as it would be presented on a computer screen by a browser.

The Internet

Browser: A program used on a computer connected to the Internet that provides access to the World Wide Web. The browser translates the documents stored on the World Wide Web into a format you can read on your screen.
Download: To transfer a file from a remote computer to your computer through a modem and a telephone line, cable, or wirelessly.
E-mail (Electronic Mail): Electronic mail messages you send over the Internet from one computer to another.
FAQ (Frequently Asked Questions): On a Web site, a list of questions commonly asked by users and the answers to those questions to assist people using the site.
Internet: A worldwide system of linked computers that allows users to send and receive e-mail and documents from one computer to another.
URL (Universal Resource Locator): The address of a Web page on the World Wide Web.

Web (World Wide Web or WWW): A subset of the Internet that allows people to view documents called Web pages using a browser. You can click items called links on a Web page to open another Web page anywhere on the Web identified by the link.
The most notable example of an internet is the Internet (note the uppercase I ), which originated from research projects going back to the early 1960s. The goal was to develop the ability to link a variety of computer networks so that they could function as a connected system that would not be disrupted by local disasters. Much of this work was sponsored by the U.S. government through the Defense Advanced Research Projects Agency (DARPA—pronounced “DAR–pa”). Over the years, the development of the Internet shifted from a government-sponsored project to an academic research project, and today it is largely a commercial undertaking that links a worldwide combination of LANs, MANs, and WANs
involving millions of computers.
Internet Architecture
As we have already mentioned, the Internet is a collection of connected networks. In general, these networks are constructed and maintained by organizations called Internet Service Providers (ISPs). It is also customary to use the term ISP in reference to the networks themselves. Thus, we will speak of connecting to an ISP, when what we really mean is connecting to the network provided by an ISP.
The system of networks operated by the ISPs can be classified in a hierarchy according to the role they play in the overall Internet structure. At the top of this hierarchy are relatively few tier-1 ISPs that consist of very high-speed, high-capacity, international WANs. These networks are thought of as the backbone of the Internet. They are typically operated by large companies that are in the communications business. An example would be a company that
originated as a traditional telephone company and has expanded its scope into providing other communication services.
Connecting to the tier-1 ISPs are the tier-2 ISPs that tend to be more regional in scope and less potent in their capabilities. (The distinction between the tier-1 and tier-2 ISPs is often a matter of opinion.) Again, these networks tend to be operated by companies in the communications business. Tier-1 and tier-2 ISPs are essentially networks of routers that collectivly provide the Internet’s communication infrastructure. As such, they can be thought of as the core of the Internet. Access to this core is usually provided by an intermediary called an access ISP. An access ISP is essentially an independent internet,
sometimes called an intranet, operated by a single authority that is in the business of supplying Internet access to individual users. Examples include companies such as AOL, Microsoft, and local cable and telephone companies that charge for their service as well as organizations such as universities or corporations that take it upon themselves to provide Internet access to individuals within their organizations.
The devices that individual users connect to the access ISPs are known as end systems or hosts. These end systems are not necessarily computers in the traditional sense. They range over a multitude of devices including telephones, video cameras, automobiles, and home appliances. After all, the Internet is essentially a communications system, and thus any device that would benefit from communicating with other devices is a potential end system.

Networking And the Internet

The need to share information and resources among different computers has led to linked computer systems, called networks, in which computers are connected so that data can be transferred from machine to machine. In these networks, computer users can exchange messages and share resources—such as printing capabilities, software packages, and data storage facilities—that are scattered throughout the system. The underlying software required to support such applications has grown from simple utility packages into an expanding system of network software that provides a sophisticated network-wide infrastructure. In a sense, network software is evolving into a network-wide operating system. In this chapter we will explore this expanding field of computer science.

Network Fundamentals

We begin our study of networks by introducing a variety of basic networking concepts.
Network Classifications
A computer network is often classified as being either a local area network (LAN), a metropolitan area network (MAN), or a wide area network (WAN). A LAN normally consists of a collection of computers in a single building or building complex. For example, the computers on a university campus or those in a manufacturing plant might be connected by a LAN. A MAN is a network of intermediate size, such as one spanning a local community. A WAN links machines over a greater distance—perhaps in neighboring cities or on opposite sides of the world.
Another means of classifying networks is based on whether the network’s internal operation is based on designs that are in the public domain or on innovations owned and controlled by a particular entity such as an individual or a corporation. A network of the former type is called an open network; a network of the latter type is called a closed, or sometimes a proprietary, network. Open network designs are freely circulated and often grow in popularity to the point that they ultimately prevail over proprietary approaches whose applications are restricted by license fees and contract conditions.
The Internet is an open system. In particular, communication throughout the Internet is governed by an open collection of standards known as the TCP/IP protocol suite, which is the subject of Section 4.4. Anyone is free to use these standards without paying fees or signing license agreements. In contrast, a company such as Novell Inc. might develop proprietary systems for which it chooses to maintain ownership rights, allowing the company to draw income from selling or leasing these products. Still another way of classifying networks is based on the topology of the network, which refers to the pattern in which the machines are connected. Two of the more popular topologies are the bus, in which the machines are all connected to a common communication line called a bus, and the star, in which one machine serves as a central focal point to which all the others are connected. The bus topology was popularized in the 1990s when it was implemented under a set of standards known as Ethernet, and Ethernet networks remain one of the most popular networking systems in use today.
The star topology has roots as far back as the 1970s. It evolved from the paradigm of a large central computer serving many users. As the simple terminals employed by these users grew into small computers themselves, a star network emerged. Today, the star configuration is popular in wireless networks where communication is conducted by means of radio broadcast and the central machine, called the access point (AP), serves as a focal point around which all communication is coordinated.
The difference between a bus network and a star network is not always, obvious by the physical arrangement of equipment. The distinction is whether the machines in the network envision themselves as communicating directly with each other over a common bus or indirectly through an intermediary central machine. For instance, a bus network might not appear as a long bus
from which computers are connected over short links. Instead, it may have a very short bus with long links to the individual machines, meaning that the network would look more like a star. Indeed,
sometimes a bus network is created by running links from each computer to a central location where they are connected to a device called a hub. This hub is little more than a very short bus. All it does is relay any signal it receives (with perhaps some amplification) back out to all the machines connected to it. The result is a network that looks like a star network although it operates like a busnetwork.
Protocols
For a network to function reliably, it is important to establish rules by which activities are conducted. Such rules are called protocols. By developing and adopting protocol standards, vendors are able to build products for network applications that are compatible with products from other vendors. Thus, the development of protocol standards is an indispensable process in the development of networking
technologies. As an introduction to the protocol concept, let us consider the problem of coordinating the transmission of messages among computers in a network. Without rules governing this communication, all the computers might insist on transmitting messages at the same time or fail to assist other machines when that assistance is required.
In a bus network based on the Ethernet standards, the right to transmit messages is controlled by the protocol known as Carrier Sense, Multiple Access with Collision Detection (CSMA/CD). This protocol dictates that each message be broadcast to all the machines on the bus (Figure 4.2). Each machine monitors all the messages but keeps only those addressed to itself. To transmit a message, a machine waits until the bus is silent, and at this time it begins transmitting while continuing to monitor the bus. If another machine also begins transmitting, both machines detect the clash and pause for a brief, independently random period of time before trying to transmit again. The result is a system similar to that used by a small group of people in a conversation. If two people start to talk at once, they both stop. The difference is that people might go through a series such as, “I’m sorry, what were you going to say?”, “No, no. You go first,” whereas under the CSMA/CD protocol each machine merely tries again later. 
Note that CSMA/CD is not compatible with wireless star networks in which all machines communicate through a central AP. This is because a machine may be unable to detect that its transmissions are colliding with those of another. For example, the machine may not hear the other because its own signal drowns out that of the other machine. Another cause might be that the signals from the different machines are blocked from each other by objects or distance even though
they can all communicate with the central AP (a condition known as the hidden terminal problem. The result is that wireless networks adopt the policy of trying to avoid collisions rather than trying to detect them. Such policies are classified as Carrier Sense, Multiple Access with Collision Avoidance
(CSMA/CA), many of which are standardized by IEEE within the protocols defined in IEEE 802.11 and commonly referred to as WiFi. We emphasize that collision avoidance protocols are designed to avoid collisions and may not eliminate them completely. When collisions do occur, messages must be retransmitted.
The most common approach to collision avoidance is based on giving advantage to machines that have already been waiting for an opportunity to transmit. The protocol used is similar to Ethernet’s CSMA/CD. The basic difference is that when a machine first needs to transmit a message and finds the communication channel silent, it does not start transmitting immediately. Instead, it waits for a
short period of time and then starts transmitting only if the channel has remained silent throughout that period. If a busy channel is experienced during this process, the machine waits for a randomly determined period before trying again. Once this period is exhausted, the machine is allowed to claim a silent channel without hesitation. This means that collisions between “newcomers” and those that have already been waiting are avoided because a “newcomer” is not allowed to claim a silent channel until any machine that has been waiting is given the opportunity to start.

Operating Systems

An operating system is the software that controls the overall operation of a computer. It provides the means by which a user can store and retrieve files, provides the interface by which a user can request the execution of programs, and provides the environment necessary to execute the programs requested. Perhaps the best known example of an operating system is Windows, which is provided in numerous versions by Microsoft and widely used in the PC arena. Another well-established example is UNIX, which is a popular choice for larger computer systems as well as PCs. In fact, UNIX is the core of two other popular operating systems: Mac OS, which is the operating system provided by Apple for its range of Mac machines, and Solaris, which was developed by Sun Microsystems (now owned by Oracle). Still another example of an operating system found on both large and small machines is Linux, which was originally developed noncommercially by computer enthusiasts and is now available through many commercial sources, including IBM. For casual computer users, the differences between operating systems are largely cosmetic. For computing professionals, different operating systems can represent major changes in the tools they work with or the philosophy they follow in disseminating and maintaining their work. Nevertheless, at their core all mainstream operating systems address the same kinds of problems that computing experts have faced for more than half a century.

The History of Operating Systems

Today’s operating systems are large, complex software packages that have grown from humble beginnings. The computers of the 1940s and 1950s were not very flexible or efficient. Machines occupied entire rooms. Program execution required significant preparation of equipment in terms of mounting magnetic tapes, placing punched cards in card readers, setting switches, and so on. The execution of each program, called a job, was handled as an isolated activity—the machine was prepared for executing the program, the program was executed, and then all the tapes, punched cards, etc. had to be retrieved before the next program preparation could begin. When several users needed to share a machine, sign-up sheets were provided so that users could reserve the machine for blocks of time. During the time period allocated to a user, the machine was totally under that user’s control. The session usually began with program setup, followed by short periods of program execution. It was often completed in a hurried effort to do just one more thing (“It will only take a minute”) while the next user was impatiently starting to set up.
In such an environment, operating systems began as systems for simplifying program setup and for streamlining the transition between jobs. One early development was the separation of users and equipment, which eliminated the physical transition of people in and out of the computer room. For this purpose a computer operator was hired to operate the machine. Anyone wanting a program run was required to submit it, along with any required data and special directions about the program’s requirements, to the operator and return later for the results. The operator, in turn, loaded these materials into the machine’s mass storage where a program called the operating system could read and execute them one at a time. This was the beginning of batch processing—the execution of jobs by collecting them in a single batch, then executing them without further interaction with the user.
In batch processing systems, the jobs residing in mass storage wait for execution in a job queue. A queue is a storage organization in which objects (in this case, jobs) are ordered in first-in, first-out (abbreviated FIFO and pronounced “FI-foe”) fashion. That is, the objects are removed from the queue in the order in which they arrived. In reality, most job queues do not rigorously follow the FIFO structure, since most operating systems provide for consideration of job priorities. As a result, a job waiting in the job queue can be bumped by a higher-priority job.
In early batch processing systems, each job was accompanied by a set of instructions explaining the steps required to prepare the machine for that particular job. These instructions were encoded, using a system known as a job control language (JCL), and stored with the job in the job queue. When the job was selected for execution, the operating system printed these instructions at a printer where they could be read and followed by the computer operator. This communication between the operating system and the computer operator is still seen today, as witnessed by PC operating systems that report such errors as “disk drive not accessible” and “printer not responding.”
A major drawback to using a computer operator as an intermediary between a computer and its users is that the users have no interaction with their jobs once they are submitted to the operator. This approach is acceptable for some applications, such as payroll processing, in which the data and all processing decisions are established in advance. However, it is not acceptable when the user must interact with a program during its execution. Examples include reservation systems in which reservations and cancellations must be reported as they occur; word processing systems in which documents are developed in a dynamic write and rewrite manner; and computer games in which interaction with the machine is the central feature of the game. To accommodate these needs, new operating systems were developed that allowed a program being executed to carry on a dialogue with the user through remote terminals—a feature known as interactive processing. (A terminal consisted of little more than an electronic typewriter by which the user could type input and read the computer’s response that was printed on paper. Today terminals have evolved into more sophisticated devices called workstations and even into complete PCs that can function as stand-alone computers when desired.) Paramount to successful interactive processing is that the actions of the computer be sufficiently fast to coordinate with the needs of the user rather than forcing the user to conform to the machine’s timetable. (The task of processing payroll can be scheduled to conform to the amount of time required by the computer, but using a word processor would be frustrating if the machine did not respond promptly as characters are typed.) In a sense, the computer is forced to execute tasks under a deadline, a process that became known as real-time processing in which the actions performed are said to occur in real-time. That is, to say that a computer performs a task in real time means that the computer performs the task in accordance with deadlines in its (external real-world) environment.
If interactive systems had been required to serve only one user at a time, real-time processing would have been no problem. But computers in the 1960s and 1970s were expensive, so each machine had to serve more than one user. In turn, it was common for several users, working at remote terminals, to seek interactive service from a machine at the same time, and real-time considerations presented obstacles. If the operating system insisted on executing only one job at a time, only one user would receive satisfactory real-time service.
The solution to this problem was to design operating systems that provided service to multiple users at the same time: a feature called time-sharing. One means of implementing time-sharing is to apply the technique called multiprogramming in which time is divided into intervals and then the execution
of each job is restricted to only one interval at a time. At the end of each interval, the current job is temporarily set aside and another is allowed to execute during the next interval. By rapidly shuffling the jobs back and forth in this manner, the illusion of several jobs executing simultaneously is created. Depending on the types of jobs being executed, early time-sharing systems were able to provide acceptable real-time processing to as many as 30 users simultaneously. Today, multiprogramming techniques are used in single-user as well as multiuser systems, although in the former the result is usually called multitasking. That is, time-sharing refers to multiple users sharing access to a common computer, whereas multitasking refers to one user executing numerous tasks simultaneously.

Share this article with your friends.