CIS Logo SVC Logo

   Computing & Information Systems
   Department

 

Schoology Facebook        Search CIS Site      Tutorials

Software Design Using C++



Files (Streams)



Introduction


Files are used to store data in a relatively permanent form, on floppy disk, hard disk, tape or other form of secondary storage. Files can hold huge amounts of data if need be. Ordinary variables (even records and arrays) are kept in main memory which is temporary and rather limited in size. The following is a comparison of the two types of storage:
  • Main memory
    • Made up of RAM chips.
    • Used to hold a program when it is running, including the values of its variables (whether integer, char, an array, etc.)
    • Can only hold relatively small amounts of data.
    • Is temporary (as soon as the program is done or the power goes out all of these values are gone).
    • Gives fast access to the data (all electronic).
  • Secondary memory
    • Usually a disk drive (or magnetic tape).
    • Used to hold files (where a file can contain data, a program, text, etc.)
    • Can hold rather large amounts of data.
    • Is fairly permanent. (A file remains even if the power goes out. It will last until you erase it, as long as the disk isn't damaged, at least.)
    • Access to the data is considerably slower (due to moving parts).

Types of File Access:

  • Sequential access. With this type of file access one must read the data in order, much like with a tape, whether the data is really stored on tape or not.
  • Random access (or direct access). This type of file access lets you jump to any location in the file, then to any other, etc., all in a reasonable amount of time.

Types of Files


We can also talk about the type of file. The two basic types are text and binary. A text file consists of readable characters separated into lines by newline characters. (On most PCs, the newline character is actually represented by the two-character sequence of carriage return (ASCII 13), line feed (ASCII 10). (Both numbers are given here in decimal form, not octal or hexadecimal.) On UNIX systems, the newline character is typically represented by a single character, line feed.) A binary file stores data to disk in the same form in which it is represented in main memory. Thus numbers are not converted to readable characters as with a text file. If you ever try to edit a binary file containing numbers you will see that the numbers appear as nonsense characters. Not having to translate numbers into a readable form makes binary files somewhat more efficient. Binary files also do not normally use anything to separate the data into lines. Such a file is just a stream of data with nothing in particular to separate components.

Text Files:


This type of file is very common. A text file is essentially a stream of characters, with some special character (or characters) to mark the end of each line. Note that your source code files are themselves text files. Also, you can create text files using most any editor. Word processors store documents as binary files, but can often also do a Save As to save a document as a text file, should you wish to do so. Input from the keyboard works much the same as input from a text file, and output to the screen works much the same as output to a text file. For example, you can use cout << endl; to advance to the next line on the screen, though underneath what it really does is to output an end of line marker.

A Program to Read a Text File

Look at readtext.cpp. No matter how you have come up with a text file, this program should be able to read it. However, it does assume that the lines contain less than 120 characters each. Note that nowhere in the code is there an indication that it is reading a text file. That is because text files are the default type of file. Also, note that we need to include the fstream header in a program that manipulates files.

The first important thing that is done is to open the stream InFile as shown below. FileName is a string containing the external name of the file, that is, the name it is known by on disk. The ios::in indicates that we are opening the file for input. If the open should fail, we write out an error message. Note that it is common usage to write error messages to the cerr stream and not to cout. That is because some operating systems (such as Linux) allow error messages to be redirected to a file (under certain conditions). Thus, if cout were redirected to a file, error messages written to cerr would still show up on the screen.


fstream InFile;

InFile.open(FileName, ios::in);
if (InFile.fail())
   {
   cerr << "Could not open file " << FileName << endl;
   exit(1);
   }

The main function then calls the DisplayFile function to do the work of displaying the file's contents on screen. Note that the file stream must be passed as a reference parameter. See below for a simplified version of the DisplayFile function.


/* Given:   InFile   A file stream already opened for input.
   Assumes: That InFile is associated with a text file.
   Task:    To display on screen the contents of the file.
   Return:  InFile   The modified file stream.
*/
void DisplayFile(fstream & InFile)
   {
   StringType Line;

   InFile.getline(Line, MaxString);

   while (! InFile.fail())
      {
      cout << Line << endl;
      InFile.getline(Line, MaxString);
      }
   }

You can see that the getline function can be used to read a string from the InFile stream in much the same as it can be used to read a string from cin. This function was used before in the Array of Characters section of the Introductory Topics portion of these web pages, as well as in the Character Arrays section of the Intermediate Topics portion of these web pages. Note that the fail function can be used to tell us when reading from InFile fails, probably because we have reached the end of the file.

The last important thing that is done is to close the InFile stream. Files should always be closed when you are finished using them. Since keeping a file open consumes some system resources, get into the habit of closing a file as soon as you are done with it. The close indicates that the program is finished accessing the file at least for the moment. (The program could open it again later if needed.)


InFile.close();

A Program to Create a Text File

The maketext.cpp example shows one way to do this. This program again handles the data one line at a time. The file stream is called OutFile, and the program begins by asking the user for the name of the file and then opening OutFile as an output file, using the user-supplied filename. The ios::out is what indicates that we want an output file. If the open fails, we write an error message and exit from the program as shown below.


fstream OutFile;

OutFile.open(FileName, ios::out);
if (OutFile.fail())
   {
   cerr << "Could not create file " << FileName << endl;
   exit(1);
   }

The WriteToFile function is used to write data to the OutFile stream. Note that just as we use << in writing to cout, here we use it to write to OutFile. The complete code for the function is shown below.


void WriteToFile(fstream & OutFile)
   {
   StringType Line;

   cout << "Enter a line of text (or press CTRL z to quit):" << endl;
   cin.getline(Line, MaxString);

   while (! cin.fail())
      {
      OutFile << Line << endl;
      cout << "Enter a line of text (or press CTRL z to quit):" << endl;
      cin.getline(Line, MaxString);
      }
   }

The last important thing that the program does is to close the file. It is even more important to close an output file than an input file. Forgetting to close an output file may result in loss of data. This is because many operating systems "buffer" I/O (input/output) operations. Thus data that has been written to a file may actually not be in the file yet, but rather only be in a buffer in main memory. If the program ends without the file being closed, the data in the buffer may not get flushed out and written to the file. If only a small amount of data has been written to a file and the close operation is left out, the file may end up having size zero because none of it really got sent to the file yet.

Since we now have two programs to manipulate text files, one to create a text file, and one to read a text file, it would be helpful to summarize overall what these programs do. The flow of data in the two related programs can be shown in a diagram called a data flow diagram. The following is a data flow diagram for our two programs:

[data flow diagram]

A source or destination for data is shown in a rectangle. In the above diagram these are the users of the programs. A file is shown as a rectangle with one side open. Programs are shown as ovals. The flow of data is shown using arrows to indicate the direction of data flow. These arrows can even be labelled to indicate what data is flowing in the given direction. In the above case, we see two-way data flow between the user the maketext program, for example. This is because the program prompts the user for items such as the name of the file and the user responds by supplying the data. There is a one-way flow of data from the maketext program to the text file; nothing is read into the maketext program from the file. Although this diagram was rather simple, data flow diagrams can be very helpful in more complex situations.

Creating a Formatted Text File with Numbers in It

We can also write numbers to a text file. When you do so, the numbers are translated to readable text. For example, the number 236 is translated to the character 2, followed by the character 3, followed by the character 6. This translation takes a little extra time, but it results in a file that is easily readable. See the makeform.cpp program as an example.

Opening the file proceeds exactly as before. The WriteToFile function gets data from the user and writes it to the file. The essential lines for the entry of one set of data and the writing of it to OutFile are shown below. After reading the price (a number) we must read in the newline character so that the next input operation (for a string) doesn't see that newline and think that the empty string has been entered. Writing the Price to the file works just the same as writing ProductName to the file, though the former is a number and the latter a string.


cout << "Enter a product name (or press CTRL z to quit):" << endl;
cin.getline(ProductName, MaxString);
cout << "Enter the price of this product:" << endl;
cin >> Price;
cin.get(); // read in the newline and discard it
OutFile << ProductName << ' ' << Price << endl;

Reading a Formatted Text File with Numbers in It

To read the data in the file created by the previous program, you read the data in much the same way that it was written. The details can be found in the readform.cpp program. The essential steps in reading one set of product data are shown below. Note that after reading the price from the file, one needs to read the newline character so that the next time a product name is read, the newline isn't taken as representing an empty string in the stream.


InFile >> ProductName;
InFile >> Price;
InFile.get(); // read the newline and discard it

Note that the first line above assumes that the product name does not contain any spaces. (If we wanted to allow spaces it would make for a much more complex program.) Also, in the code (not shown here) to output ProductName, it is assumed that the product name is at most 40 characters in length.


Binary Files:



A Program to Create a Binary File of Integers

Recall that a binary file is faster for storing and retrieving non-text data. Numbers are not stored in readable form. Let's look at the example program makeint.cpp to see how to create such a file. This particular one will contain only integers, though it is possible to write out a mixture of integers, floats, text, etc.

One of the first things to notice is the inclusion of the fstream header file. This will be used in all of the file-related example programs.

In the main function you see that OutFile is set up as an fstream variable. The open command and associated error checking look much like that for text files. However, the open uses ios::binary to specify a binary file. Note the use of the vertical bar (pipe symbol) to do a bitwise or of the two constants ios::out and ios::binary.


fstream OutFile;

OutFile.open("int.dat", ios::out | ios::binary);
if (OutFile.fail())
   {
   cerr << "Could not create file int.dat" << endl;
   exit(1);
   }

The essential steps in writing to the binary file stream are to determine the size of the item to be written out and then to actually write it out. The WriteToFile function contains these key items, as shown below:


IntSize = sizeof(Num);
OutFile.write(reinterpret_cast <char *> (&Num), IntSize);

The sizeof function is a very useful one. It figures out for you the size (in bytes) of a variable (or type name). This saves you a lot of work in trying to look up such details. Also, the size of integers, floats, etc. can vary between different types of computer systems.

The write function is the function to use to write data to a binary file. So, this section is definitely different from what we used to write data to a text file. The write function takes two parameters. The first is the address of where the data is that we want to write out. The ampersand is used here to get the address of the Num variable. Note that if you have a whole array to write out, an array name is an address, a pointer, so no ampersand would be used. The second parameter is the number of bytes of data to be written out.

One ugly technicality of the write function is that it expects its first parameter to have type "pointer to a character", which is written as char *. Since we really have a pointer to an int we must cast the variable to the other type. The code above uses the reinterpret_cast to do this. Just use this cast every time that you use write. If you have an older compiler you may need to use a C-style cast like this:


OutFile.write((char *) &Num, IntSize);

A Program to Read a Binary File of Integers

Look at the readint.cpp program to see how to read in the data that we wrote out to a binary file by using the above program. It is very similar in structure. Note that the open statement again uses ios::binary to indicate that we are using a binary file.


InFile.open("int.dat", ios::in | ios::binary);

The sizeof function is of course used to figure out the number of bytes that we want to read in. In this case it is the size of an int. The read function is then used to read from the binary file. The first parameter is the address of where to put the data that is read. The second is the number of bytes to be read. Thus the parameters look exactly like the parameters for the write function. The first parameter even needs the same cast.


IntSize = sizeof(Num);
InFile.read(reinterpret_cast <char *> (&Num), IntSize);

A Program to Create a Binary File of Records

Let's use employee.h and employee.cpp that we used in the section on records to set up and deal with employee records. Recall that employee.h sets up a record type called EmployeeType with fields called FirstName, LastName, ID, and WageRate. The functions ReadEmployee, PrintEmployee, and EmpCompare are also provided via these two files.

Then look at the makeemp.cpp program. It is very similar to our makeint.cpp program that we examined above. Note that the open command specifies that we are creating a binary file.

The section of code that repeatedly writes to the file is shown below. It uses the ReadEmployee function to get a record of employee data from the user. The write function is then used to output the record to the file. Note that write has the same idiosyncrasies as we saw before: The first parameter must be a pointer, so in this case we give it the address of the employee record. The first parameter must be cast to type char *. Also, the second parameter has to be the number of bytes to write out, computed as usual using the sizeof function.


RecordSize = sizeof(Employee);
Result = ReadEmployee(Employee);

while (Result == OKFlag)
   {
   OutFile.write(reinterpret_cast <char *> (&Employee), RecordSize);
   Result = ReadEmployee(Employee);
   }

A Program to Read a Binary File of Records

The counterpart to the previous program is reademp.cpp, a program to read the emp.dat binary file of records that the previous program produced and to display this employee data on the screen. The emp.dat file would have to be in the current directory for our program to be able to find it and open it.

This program begins by opening the emp.dat file as a binary file. It then uses essentially the following code to read a record at a time from the file and to display its data on the screen. Minor details are left out here to make things clearer. Note how similar this is to the readint.cpp program.


RecordSize = sizeof(Employee);
InFile.read(reinterpret_cast <char *> (&Employee), RecordSize);

while (! InFile.fail())
   {
   PrintEmployee(Employee);
   InFile.read(reinterpret_cast <char *> (&Employee), RecordSize);
   }

A Program to Sort a Binary File of Records

Writing a good sort program for files is beyond the scope of what we can do here. For one thing, files can be very long. The only type of sorting that we are familiar with so far is the sorting or arrays, and arrays are of a fixed (and not too huge) size. To learn how to sort long files, look up the topic of external sorting. For our purposes here, we will assume that the file to be sorted is fairly short. Thus we can read the records from the file into an array, sort the array, and then write the records from the sorted array out to the file.

See the sortemp.cpp program. It sorts the emp.dat file that the previous two programs dealt with. This project also uses emparray.h and emparray.cpp to provide us with EmpArrayType as a type name for an array of 50 employee records as well as the SelectionSort function that we will use to sort the array of records. When compiling this project you also need employee.h and employee.cpp.

The main function begins by opening the emp.dat file for input as a binary file. The program has a new LoadArray function to load the array with records read from this file. Since our project also includes the old LoadArray function that reads records from the keyboard, you might wonder if there would be a conflict. In this case there is not. Even though the two functions have the same name, the parameter lists are different. That is enough so that the compiler can see these functions as distinct. (If you ever want to have two or more functions by the same name, just be certain that the number of parameters is different and/or that the types vary for at least one of the parameters.)

The code for the new LoadArray function is shown below. Of course, it uses the read function to read each record from the file. The loop is controlled by the fail function, which you will recall returns true if end of file is reached or if an error occurred which prevented the read operation from succeeding. Of course we also check to make sure that we don't run off of the end of the array. When the loop ends, we return a code to indicate how the loop ended. Compare this new LoadArray function with the old one which is in emparray.cpp. You will see a lot of similarities. Of course, the old one read data from the keyboard, not from a file.


EmpCount = 0;
RecordSize = sizeof(Employee);
InFile.read(reinterpret_cast <char *> (&Employee), RecordSize);

while ((! InFile.fail()) && (EmpCount < EmpMax))
   {
   EmpArray[EmpCount] = Employee;
   EmpCount++;
   InFile.read(reinterpret_cast <char *> (&Employee), RecordSize);
   }

if (InFile.fail())
   return OKFlag;
else   // array ran out of room
   return TooMuchDataFlag;

When control returns to the main function, the data file is closed. If the return code from LoadArray shows that all was fine, we proceed to sort the array via the SelectionSort function. This SelectionSort function is similar to that used earlier to sort an array of integers (see select.cpp) but has been modified to handle an array of records. The new function is shown below. It is located in the emparray.cpp file, while the EmpCompare function that it uses is found in the employee.cpp file.


/* Given:  EmpArray   The array of employee records to be sorted.
           Count      The number of items in EmpArray.
   Task:   To sort EmpArray into ascending order using selection sort,
           basing the order on the EmpCompare function.
   Return: EmpArray   The sorted array.
*/
void SelectionSort(EmpArrayType EmpArray, int Count)
   {
   int i, k, MinIndex;
   EmployeeType Min;

   for (i = 0; i < Count - 1; i++)
      {
      // Find the minimum from index i to Count - 1.
      // Assume its the first item until we know better.
      Min = EmpArray[i];
      MinIndex = i;
      for (k = i + 1; k < Count; k++)
         if (EmpCompare(EmpArray[k], Min) < 0)  // Found a better min.
            {
            Min = EmpArray[k];
            MinIndex = k;
            }

      if (MinIndex != i)   // swap EmpArray[i] and the minimum
         {
         EmpArray[MinIndex] = EmpArray[i];
         EmpArray[i] = Min;
         }
      }
   }

In the above function note that Min has been changed so that it is now a record. The if test to compare two items now uses the EmpCompare function since it can handle the comparing of two records. Those are the main changes.

Next, we again open the emp.dat file, but this time it is opened for output, not input. (Notice that we use a new fstream variable NewFile. Some compilers may let you reuse the old fstream variable EmpFile as long as you have already closed it.) We write the records from the array to the file in the usual way and then close the file. The line used to write out each record is shown below. The first parameter is the address of the record in EmpArray at index k.


OutFile.write(reinterpret_cast <char *> (&EmpArray[k]), RecordSize);

Comparison of Binary and Text Files

It is instructive to write programs to solve a file-related problem, first by using a binary file and then by using a text file. Let's imagine that we have a series of parts data, where the data on each part consists of an ID number, the number in stock, the price, and a description (a string). When using a binary file we write whole records of parts data to the file at once. When using a text file, we write out separately each of the four pieces of data about a given part. Remember that the text file will be readable by an editor, but the numbers in the binary file will not be readable in this way.

We can set up a type called PartType for a record of parts data as shown in parts.h. The programs to create the data files will differ in how they open the file and in how they write to the file. One will specify that it opens the file as a binary file, but the other will not (and will thus be a text file by default). For the binary file we will use write to write to the file, whereas for the text file we will use the usual output operator and will output each of the four pieces of parts data separately.

Similarly, we can compare programs to read these two files. Again, they differ in how they open the file and in how they read from the file. In particular, with the binary file we will use the read function to read a whole record, but with the text file we will read each of the four pieces of parts data from the file separately, using the usual stream operator (or the getline function when reading the string).

Random File Access



A Program to Modify a Binary File of Records



A Software Engineering Example

Suppose that we want another program in our suite of programs dealing with the emp.dat binary file of employee records. This time we want a program that will let us modify the data contained in the records. This is useful for fixing typos and the like. Let's use a software engineering approach to this problem, both to illustrate the software development life cycle and to assist us with this problem.

Analysis

Let's first get a good sense of the inputs, the processing, and the outputs. Suppose that in order to modify a record, the user is expected to input the last name and first name of the employee. Then the user is prompted whether or not to change the ID or wage rate for that employee record. The user can change as many records as desired, indicating that it's time to stop by pressing CTRL z instead of entering a name. (Remember that we would use CTRL d instead under Linux.) Of course, all changed records are written out to the same spot they were at in the emp.dat file. (Getting to any particular spot is what random access is all about.)

Design

Let's think about the data first. Obviously, we have the file of records to deal with. Since we want to allow the user to search for individual employee records (and then possibly make changes), we might want to read the data into an array of records. We know how to search an array of records. This leads to a few complications, however. One is that an array has a limit on how many records it can hold. The second is that since we probably want to use binary search (which is faster than sequential) we will need to have the records in order. We could either have this program sort the array, or assume that the user has already sorted the file, which might not be a safe assumption.

Because of these complications, let's look for an alternative. Since random access let's us jump to any location we want in a file, a file with random access behaves a lot like an array. Thus we should be able to imitate the binary search algorithm on the file itself and not use an array at all. This would simplify things. Since file access is slow this will mean that searching will be a bit slower, but if the file is not huge this should not be a problem as binary search only does a few probes to find an item in short and medium-sized arrays or files.

We might at this point draw a data flow diagram showing the flow of data among the various programs associated with the emp.dat file. The diagram below shows that the flow of data between the user and each program is two-way (since each program at least asks the user to press a key to continue). Some of the programs only have one-way data flow between the program and the file, but others have two-way data flow. Our new program, here named modemp, obviously has to both read data from the file and write updated data to it.

[data flow diagram]

Next, let's design the functions in a top-down fashion. At first glance we might try for a main function with three functions under it: one to do the binary search, one to print an individual record on the screen so that the user can see what we have, and one to allow the user to make changes to the data. However, since the binary search will need to compare records to see if the names match, we will need the EmpCompare function from the employee.cpp file that we have used before. Also, this file contains a PrintEmployee function that does exactly what we want in printing an employee record, so we will use it too. Reusing code is a great idea, as it can save lots of work! Since we want to allow repeated lookups, the loop that allows this might best be placed in another function, perhaps called ProcessFile. This leads to the following structure chart:

[structure chart]

Of our new functions, the first one we come to below main is ProcessFile. Let's assume that it gets (via a parameter) the file stream, already properly opened. Since we will want to read from the file to get the correct employee record and then maybe to write an updated record to the file, we want the file open for both input and output, which is possible in C++. The main task of this function is, of course, to allow repeated lookups and modifications of an employee record. This leads to the following documentation for this function:


/* Given:   EmpFile   A binary file stream already opened for input and output.
   Task:    To allow the user to repeatedly look up an employee in
            EmpFile by name.  If the lookup succeeds, the info on the
            employee is displayed on the screen and the user is given
            a chance to modify the ID and WageRate for the employee.
   Return:  EmpFile   The modified file stream.
*/
void ProcessFile(fstream & EmpFile)

The SearchFile function is to imitate our binary search in the data file. It will need the file as a parameter and will need to be given the name of the employee to look up. Let's assume that we pass in an employee record containing the first and last name of the employee to look up. We don't care what is in the other fields of the record sent into the function. The function should, however, return this record with all of the fields filled in if there is a match. In the function name we can return true or false to indicate if a match was found. We also need to somehow return the location where a match was found. This is typically done as the number of bytes into the file where the matching record begins. This number can then be used elsewhere in the program if we want to write updated employee data to this location. The resulting documentation for this function then looks something like the following. Note that "seeking" refers to moving to a given position in the file.


/* Given:   EmpFile     A file stream already opened for input and output.
            Employee    An employee record containing the last name and
                        first name to search for.
   Assumes: That EmpFile is in ascending order.
   Task:    To do a binary search in EmpFile for Employee.
   Return:  EmpFile     The file stream (which can be modified by reading
                        and seeking in the file in that the file postion
                        pointer may be moved).
            Employee    If found, this parameter will contain the complete
                        record for the person looked up.
            Location    The location of the Employee record in the file
                        (as the number of bytes into EmpFile).
            SearchFile  In the function name, true is returned if Employee
                        was located, false otherwise.
*/
bool SearchFile(fstream & EmpFile, EmployeeType & Employee, long & Location)

Finally, let's design the Modify function. It needs to be given (via parameters) the file stream, the employee record that we may wish to change, and its location in the file (as the number of bytes into the file where it is found). The function's task is to ask the user whether or not to change the data in the ID and WageRate fields, to get this data, and then to write it out to the correct location for this record in the file. We thus get documentation for this function as follows:


/* Given:   Employee  An employee record.
            EmpFile   A file stream, open for input and output.
            Location  The offset in EmpFile at which Employee can be found.
   Task:    To allow the user to change the ID or WageRate in the Employee
            record, if desired, with the modified data being written to EmpFile.
   Return:  Employee  The (possibly) modified employee record.
            EmpFile   The modified file stream for the file of records.
*/
void Modify(EmployeeType & Employee, fstream & EmpFile, long Location)

You might ask why we haven't allowed the user to change the FirstName or LastName fields. Changing either of these might mess up the ascending order of the data in the array. Since our binary search depends on this ordering, the modify program might fail to work after changing a name! An alternative approach would be to skip the sorting and binary search. We could just process the file sequentially, reading each record and changing it and writing it back to the file as needed.

At this point we might write out in pseudocode the algorithm for one or more of the functions. Let's do this for the ProcessFile function. Since the SearchFile function essentially follows the well-known binary search algorithm, we probably don't need to write out pseudocode for it. Also, the Modify function sounds simple enough to write without doing any pseudocode first. Here is the pseudocode for the ProcessFile function:


void ProcessFile(fstream & EmpFile)
   {
   set up needed local variables

   ask the user for the employee's last name
   read this into the LastName field of Employee

   while (no input failure)
      {
      ask the user for the employee's first name
      read this into the FirstName field of Employee

      if (SearchFile function finds Employee in EmpFile)
         {
         print the data in the Employee record
         call Modify on Employee and EmpFile using Location given by SearchFile
         }
      else
         print a "not found" message

      ask the user for the employee's last name
      read this into the LastName field of Employee
      }
   }

Prototyping

Next we might construct a quick prototype that we could let users try out. Our first prototype might not even access the file at all. The main function would simply call ProcessFile which would contain the loop that prompts the user to repeatedly enter the first and last names for the employee to look for. We could use a stub for the SearchFile function. The stub doesn't do any searching at all, but simply sends back some hard-coded employee data. This data is surely incorrect, but the user will be able to see what things look like on the screen. PrintEmployee can then be used to print this fake data, and Modify can be used to ask the user a series of questions about whether to change the ID or WageRate fields. No code to write changes to the file would be present, however.

We might then create a second prototype which adds more functionality. It could really open the file and search for desired records, but maybe not yet allow the user to actually change any data. In a large project, one might use a sequence of prototypes that gradually approach the desired finished product.

Coding

We now code the complete program. This can be found in the modemp.cpp file. (Note that we use employee.h and employee.cpp too.) Since we have never used random access before, pay particular attention to how this is done. First, however, we begin with the main function, where we open our file for both input and output, since we need to read a record and maybe write out an updated record, read another record and maybe write out an updated one, etc. Since we go back and forth between reading and writing it does not make sense to keep opening and closing the file. (For one thing, opening a file may be somewhat time-consuming.) Instead we just open the file for both input and output and leave it open for the duration of the program.


EmpFile.open("emp.dat", ios::in | ios::out | ios::binary);
if (EmpFile.fail())
   {
   cerr << "Could not open file emp.dat" << endl;
   exit(1);
   }

We would probably next look at the ProcessFile function. Since we wrote pseudocode for it earlier, it is now rather easy to write it out in C++. Note the use of a long variable, which is an extra long integer. A long is what one normally uses to keep track of one's position in a file, since the number of bytes in a file can be a rather large number. Nothing further will be said here about the coding of this function.

Next, let's look at the Modify function. We write it to fit the documentation that we already wrote above. Recall that it receives Location as the number of bytes to go into the file to get to the relevant employee record, a copy of which is being passed via the Employee parameter.


void Modify(EmployeeType & Employee, fstream & EmpFile, long Location)
   {
   char Choice;
   bool Modified = false;

   cout << endl << "Do you wish to modify the ID number (y/n)? ";
   cin >> Choice;
   if ((Choice == 'y') || (Choice == 'Y'))
      {
      Modified = true;
      cout << "Enter the corrected ID number: ";
      cin >> Employee.ID;
      }

   // similarly ask about modifying the WageRate field (details not shown)

   if (Modified)
      {
      EmpFile.seekp(Location, ios::beg);
      EmpFile.write(reinterpret_cast <char *> (&Employee), sizeof(Employee));
      }
   }

How to prompt for the updated data should be obvious. When the boolean Modified flag is true, then we want to write out the updated data to the file. If this flag is false then there is no sense in wasting the time to write unchanged data to the file. The basic idea is that we have to move the file position pointer, the one used to keep track of where to write to the file, to the correct location. Then we write out the record using our familiar write function that we always use with our binary files. Some form of "seek" command is used to move the file position pointer. If we want to use the pointer that keeps track of where to write, we must use the seekp version. The letter 'p' is a reminder that we plan to "put" some data into the file. (There is also a seekg version used when we want to move the file position pointer that keeps track of where we are going to read from a file. The 'g' is a reminder that we want to "get" data from the file.) The first parameter to either version of seek is the number of bytes to move into the file. The second parameter is a constant used to indicate where we are starting from. Typically one uses ios::beg which means that we will move a certain number of bytes into the file from the beginning.

Finally, let's look at the coding of the SearchFile function. This is the one that imitates a binary search within the file to try to find a record containing the first and last names in the Employee record. Of course, we already have an EmpCompare function to tell us if two records have names that match. In such a case EmpCompare returns 0. It returns a -1 if the employee record given as its first parameter is alphabetically less than the employee record given as its second parameter. See employee.cpp if you wish to look at the details.


bool SearchFile(fstream & EmpFile, EmployeeType & Employee, long & Location)
   {
   EmployeeType EmployeeTemp;
   bool Found;
   int CmpResult;
   long Mid, Low, High, RecordSize;

   Found = false;
   Low = 0L;
   // Go to the end of the file:
   EmpFile.seekg(0L, ios::end);
   RecordSize = sizeof(EmployeeTemp);
   // Find the number of records and subtract 1 to get high index:
   High = EmpFile.tellg() / RecordSize - 1L;

   while ((! Found) && (Low <= High))
      {
      Mid = (Low + High) / 2;
      Location = Mid * RecordSize;
      EmpFile.seekg(Location, ios::beg);
      EmpFile.read(reinterpret_cast <char *> (&EmployeeTemp), RecordSize);
      CmpResult = EmpCompare(Employee, EmployeeTemp);

      if (CmpResult == 0)
         {
         Employee = EmployeeTemp;
         Found = true;
         }
      else if (CmpResult < 0)
         High = Mid - 1L;
      else
         Low = Mid + 1L;
      }

   return Found;
   }

In the code above, note that any variable used to keep track of a position within the file is a long. Once again this is because the number of bytes in a file can be huge, so that an ordinary int might overflow. Variables Low, High, and Mid are used to hold record numbers, where the file's records are numbered 0, 1, 2, etc. Even these numbers could get to be rather large, so we use type long for them. Note that variable Low starts at value 0L, where the L is used to indicate a constant of type long.

Finding the proper location to use for variable High is somewhat tricky. We use seekg with the ios::end constant, indicating the end of the file, to seek 0 bytes after the end of the file. In other words this moves the file position pointer (for reading) to the end of the file. We then use the tellg function to report how many bytes into the file we now are. If we divide this number by the number of bytes in an employee record, the result is the number of records in the file. Finally, we subtract 1 (since numbering begins at 0) to get the initial value for the High record number.

The loop used in the binary search should look familiar. The computation for finding Mid is the same as always. However, once Mid is known we need to read in record number Mid so that we can compare it to the record we are looking for. To get to the correct location we use seekg with a first parameter computed as Location = Mid * RecordSize. This gives us the number of bytes by which to move into the file to get to the desired record. We then use our usual read function to read the record and then call upon EmpCompare to compare the record just read with the Employee record. The rest of the code is much like that found in ordinary binary search.

After coding, one would then proceed with testing and debugging. Special cases that should be tested include finding and modifying the first and last records in the file. We won't go into this step further here. There is also the maintenance step once the software is put into actual use. Finally, there is the documentation, a lot of which has been accumulated during the above discussion. We have a description of the software requirements, a data flow diagram, a structure chart, documentation for each function, etc.

Related Items

Back to the main page for Software Design Using C++

Author: Br. David Carlson with contributions by Br. Isidore Minerd
Last updated: March 16, 2023
Disclaimer