Software Design Using C++
Files (Streams)
Introduction
Files are used to store data in a relatively permanent form,
on floppy disk, hard disk, tape or other form of secondary storage.
Files can hold huge amounts of data if need be. Ordinary variables (even records
and arrays) are kept in main memory which is temporary and rather
limited in size. The following is a comparison of the two types of storage:
- Main memory
- Made up of RAM chips.
- Used to hold a program when it is running, including the
values of its variables (whether integer, char, an array, etc.)
- Can only hold relatively small amounts of data.
- Is temporary (as soon as the program is done or the power
goes out all of these values are gone).
- Gives fast access to the data (all electronic).
- Secondary memory
- Usually a disk drive (or magnetic tape).
- Used to hold files (where a file can contain data, a program, text, etc.)
- Can hold rather large amounts of data.
- Is fairly permanent. (A file remains even if the power goes
out. It will last until you erase it, as long as the disk
isn't damaged, at least.)
- Access to the data is considerably slower (due to moving parts).
|
Types of File Access:
- Sequential access. With this type of file access one
must read the data in order, much like with a tape, whether the data
is really stored on tape or not.
- Random access (or direct access).
This type of file access lets you jump to any location in the file,
then to any other, etc., all in a reasonable amount of time.
|
Types of Files
We can also talk about the type of file. The two basic types are text
and binary. A text file consists of readable characters separated into
lines by newline characters. (On most PCs, the newline character is
actually represented by the two-character sequence of carriage return
(ASCII 13), line feed (ASCII 10). (Both numbers are given here in decimal form,
not octal or hexadecimal.) On UNIX systems, the newline character
is typically represented by a single character, line feed.) A binary file stores
data to disk in the same form in which it is represented in main memory.
Thus numbers are not converted to readable characters as with a text file.
If you ever try to edit a binary file containing numbers you will see that
the numbers appear as nonsense characters. Not having to translate numbers
into a readable form makes binary files somewhat more efficient. Binary
files also do not normally use anything to separate the data into lines.
Such a file is just a stream of data with nothing in particular to separate components.
Text Files:
This type of file is very common. A text file is essentially a stream of
characters, with some special character (or characters) to mark the end of
each line. Note that your source code files are themselves text files.
Also, you can create text files using most any editor. Word processors
store documents as binary files, but can often also do a Save As
to save a document as a text file, should you wish to do so. Input from the keyboard
works much the same as input from a text file, and output to the screen works much
the same as output to a text file. For example, you can use
cout << endl; to advance to the next line on the screen, though
underneath what it really does is to output an end of line marker.
A Program to Read a Text File
Look at readtext.cpp.
No matter how you have come up with a text file, this program should be
able to read it. However, it does assume that the lines contain less than
120 characters each. Note that nowhere in the code is there an indication
that it is reading a text file. That is because text files are the default
type of file. Also, note that we need to include the fstream
header in a program that manipulates files.
The first important thing that is done is to open the stream InFile
as shown below. FileName is a string containing the
external name of the file, that is, the name it is known by on disk.
The ios::in indicates that we are opening the file for input.
If the open should fail, we write out an error message. Note that it is common
usage to write error messages to the cerr stream and not to
cout . That is because some operating systems (such as Linux) allow
error messages to be redirected to a file (under certain conditions).
Thus, if cout were redirected to a file, error messages written to
cerr would still show up on the screen.
fstream InFile;
InFile.open(FileName, ios::in);
if (InFile.fail())
{
cerr << "Could not open file " << FileName << endl;
exit(1);
}
|
The main function then calls the DisplayFile function to do the work of
displaying the file's contents on screen. Note that the file stream must
be passed as a reference parameter. See below for a simplified version of
the DisplayFile function.
/* Given: InFile A file stream already opened for input.
Assumes: That InFile is associated with a text file.
Task: To display on screen the contents of the file.
Return: InFile The modified file stream.
*/
void DisplayFile(fstream & InFile)
{
StringType Line;
InFile.getline(Line, MaxString);
while (! InFile.fail())
{
cout << Line << endl;
InFile.getline(Line, MaxString);
}
}
|
You can see that the getline function can be used to read a string
from the InFile stream in much the same as it can be used to read a
string from cin . This function was used before in the
Array of Characters section
of the Introductory Topics portion of these web pages, as well as in the
Character Arrays section of the
Intermediate Topics portion of these web pages. Note that the fail
function can be used to tell us when reading from InFile
fails, probably because we have reached the end of the file.
The last important thing that is done is to close the InFile stream.
Files should always be closed when you are finished using them. Since keeping a
file open consumes some system resources, get into the habit of closing a
file as soon as you are done with it. The close indicates that the program
is finished accessing the file at least for the moment.
(The program could open it again later if needed.)
A Program to Create a Text File
The maketext.cpp example shows one way to do this.
This program again handles the data one line at a time. The file stream is
called OutFile , and the program begins by asking the user for the
name of the file and then opening OutFile as an output file, using
the user-supplied filename. The ios::out is what indicates that
we want an output file. If the open fails, we write an error message and exit
from the program as shown below.
fstream OutFile;
OutFile.open(FileName, ios::out);
if (OutFile.fail())
{
cerr << "Could not create file " << FileName << endl;
exit(1);
}
|
The WriteToFile function is used to write data to the OutFile
stream. Note that just as we use << in writing to cout,
here we use it to write to OutFile .
The complete code for the function is shown below.
void WriteToFile(fstream & OutFile)
{
StringType Line;
cout << "Enter a line of text (or press CTRL z to quit):" << endl;
cin.getline(Line, MaxString);
while (! cin.fail())
{
OutFile << Line << endl;
cout << "Enter a line of text (or press CTRL z to quit):" << endl;
cin.getline(Line, MaxString);
}
}
|
The last important thing that the program does is to close the file. It is
even more important to close an output file than an input file. Forgetting
to close an output file may result in loss of data. This is because many
operating systems "buffer" I/O (input/output) operations. Thus data that
has been written to a file may actually not be in the file yet, but rather
only be in a buffer in main memory. If the program ends without the file being closed,
the data in the buffer may not get flushed out and written to the file. If only a
small amount of data has been written to a file and the close operation is left out,
the file may end up having size zero because none of it really got sent to the file yet.
Since we now have two programs to manipulate text files, one to create a
text file, and one to read a text file, it would be helpful to summarize
overall what these programs do. The flow of data in the two related
programs can be shown in a diagram called a data flow diagram.
The following is a data flow diagram for our two programs:
A source or destination for data is shown in a rectangle. In the above
diagram these are the users of the programs. A file is shown as a rectangle
with one side open. Programs are shown as ovals. The flow of data is shown
using arrows to indicate the direction of data flow. These arrows can even
be labelled to indicate what data is flowing in the given direction. In the
above case, we see two-way data flow between the user the maketext program,
for example. This is because the program prompts the user for items such
as the name of the file and the user responds by supplying the data. There
is a one-way flow of data from the maketext program to the text file;
nothing is read into the maketext program from the file. Although this diagram
was rather simple, data flow diagrams can be very helpful in more complex situations.
Creating a Formatted Text File with Numbers in It
We can also write numbers to a text file. When you do so, the numbers
are translated to readable text. For example, the number 236 is translated
to the character 2, followed by the character 3, followed by the character 6.
This translation takes a little extra time, but it results in a file that is easily
readable. See the makeform.cpp program as an example.
Opening the file proceeds exactly as before. The WriteToFile
function gets data from the user and writes it to the file. The essential
lines for the entry of one set of data and the writing of it to OutFile
are shown below. After reading the price (a number) we must read in the
newline character so that the next input operation (for a string) doesn't
see that newline and think that the empty string has been entered. Writing
the Price to the file works just the same as writing ProductName
to the file, though the former is a number and the latter a string.
cout << "Enter a product name (or press CTRL z to quit):" << endl;
cin.getline(ProductName, MaxString);
cout << "Enter the price of this product:" << endl;
cin >> Price;
cin.get(); // read in the newline and discard it
OutFile << ProductName << ' ' << Price << endl;
|
Reading a Formatted Text File with Numbers in It
To read the data in the file created by the previous program, you read the
data in much the same way that it was written. The details can be found
in the readform.cpp program. The essential steps
in reading one set of product data are shown below. Note that after
reading the price from the file, one needs to read the newline character so
that the next time a product name is read, the newline isn't taken as
representing an empty string in the stream.
InFile >> ProductName;
InFile >> Price;
InFile.get(); // read the newline and discard it
|
Note that the first line above assumes that the product name does not contain
any spaces. (If we wanted to allow spaces it would make for a much more
complex program.) Also, in the code (not shown here) to output ProductName ,
it is assumed that the product name is at most 40 characters in length.
Binary Files:
A Program to Create a Binary File of Integers
Recall that a binary file is faster for storing and retrieving non-text
data. Numbers are not stored in readable form. Let's look at the
example program makeint.cpp to see how to
create such a file. This particular one will contain only integers,
though it is possible to write out a mixture of integers, floats, text, etc.
One of the first things to notice is the inclusion of the fstream
header file. This will be used in all of the file-related example programs.
In the main function you see that OutFile is set up as an
fstream variable. The open command and associated error
checking look much like that for text files. However, the open uses
ios::binary to specify a binary file. Note the use of the
vertical bar (pipe symbol) to do a bitwise or of the two
constants ios::out and ios::binary .
fstream OutFile;
OutFile.open("int.dat", ios::out | ios::binary);
if (OutFile.fail())
{
cerr << "Could not create file int.dat" << endl;
exit(1);
}
|
The essential steps in writing to the binary file stream are to determine
the size of the item to be written out and then to actually write it out.
The WriteToFile function contains these key items, as shown below:
IntSize = sizeof(Num);
OutFile.write(reinterpret_cast <char *> (&Num), IntSize);
|
The sizeof function is a very useful one. It figures out
for you the size (in bytes) of a variable (or type name). This saves you
a lot of work in trying to look up such details. Also, the size of integers,
floats, etc. can vary between different types of computer systems.
The write function is the function to use to write data
to a binary file. So, this section is definitely different from what we
used to write data to a text file. The write function
takes two parameters. The first is the address of where the data is that
we want to write out. The ampersand is used here to get the address of the
Num variable. Note that if you have a whole array to write out,
an array name is an address, a pointer, so no ampersand would be used.
The second parameter is the number of bytes of data to be written out.
One ugly technicality of the write function is that it expects
its first parameter to have type "pointer to a character", which is
written as char * . Since we really have a pointer to an
int we must cast the variable to the other type.
The code above uses the reinterpret_cast to do this.
Just use this cast every time that you use write . If you
have an older compiler you may need to use a C-style cast like this:
OutFile.write((char *) &Num, IntSize);
|
A Program to Read a Binary File of Integers
Look at the readint.cpp program to see how to
read in the data that we wrote out to a binary file by using the above program.
It is very similar in structure. Note that the open statement again uses
ios::binary to indicate that we are using a binary file.
InFile.open("int.dat", ios::in | ios::binary);
|
The sizeof function is of course used to figure out the
number of bytes that we want to read in. In this case it is the size of
an int . The read function is then used to read
from the binary file. The first parameter is the address of where to put
the data that is read. The second is the number of bytes to be read.
Thus the parameters look exactly like the parameters for the
write function. The first parameter even needs the same cast.
IntSize = sizeof(Num);
InFile.read(reinterpret_cast <char *> (&Num), IntSize);
|
A Program to Create a Binary File of Records
Let's use employee.h and
employee.cpp that we used in the section
on records to set up and deal with employee records. Recall that employee.h sets up a
record type called EmployeeType with fields called FirstName ,
LastName , ID , and WageRate . The functions
ReadEmployee , PrintEmployee , and EmpCompare
are also provided via these two files.
Then look at the makeemp.cpp program. It is very
similar to our makeint.cpp program that we examined
above. Note that the open command specifies that we are creating a binary file.
The section of code that repeatedly writes to the file is shown below.
It uses the ReadEmployee function to get a record of employee
data from the user. The write function is then used to output
the record to the file. Note that write has the same
idiosyncrasies as we saw before: The first parameter must be a pointer,
so in this case we give it the address of the employee record. The first
parameter must be cast to type char * . Also, the second
parameter has to be the number of bytes to write out, computed as usual
using the sizeof function.
RecordSize = sizeof(Employee);
Result = ReadEmployee(Employee);
while (Result == OKFlag)
{
OutFile.write(reinterpret_cast <char *> (&Employee), RecordSize);
Result = ReadEmployee(Employee);
}
|
A Program to Read a Binary File of Records
The counterpart to the previous program is
reademp.cpp, a program to read the emp.dat
binary file of records that the previous program produced and to display this
employee data on the screen. The emp.dat file would have to be in
the current directory for our program to be able to find it and open it.
This program begins by opening the emp.dat file as a binary file.
It then uses essentially the following code to read a record at a time from
the file and to display its data on the screen. Minor details are left out
here to make things clearer. Note how similar this is to the
readint.cpp program.
RecordSize = sizeof(Employee);
InFile.read(reinterpret_cast <char *> (&Employee), RecordSize);
while (! InFile.fail())
{
PrintEmployee(Employee);
InFile.read(reinterpret_cast <char *> (&Employee), RecordSize);
}
|
A Program to Sort a Binary File of Records
Writing a good sort program for files is beyond the scope of what we can do
here. For one thing, files can be very long. The only type of sorting that
we are familiar with so far is the sorting or arrays, and arrays are of a
fixed (and not too huge) size. To learn how to sort long files, look up
the topic of external sorting.
For our purposes here, we will assume that the file to be sorted is fairly short.
Thus we can read the records from the file into an array, sort the array, and
then write the records from the sorted array out to the file.
See the sortemp.cpp program. It sorts the
emp.dat file that the previous two programs dealt with.
This project also uses emparray.h and
emparray.cpp to provide us
with EmpArrayType as a type name for an array of 50 employee
records as well as the SelectionSort function that we will
use to sort the array of records. When compiling this project you also
need employee.h and
employee.cpp.
The main function begins by opening the emp.dat file for
input as a binary file. The program has a new LoadArray
function to load the array with records read from this file. Since our
project also includes the old LoadArray function that reads
records from the keyboard, you might wonder if there would be a conflict.
In this case there is not. Even though the two functions have the same
name, the parameter lists are different. That is enough so that the compiler
can see these functions as distinct. (If you ever want to have two or
more functions by the same name, just be certain that the number of parameters
is different and/or that the types vary for at least one of the parameters.)
The code for the new LoadArray function is shown below.
Of course, it uses the read function to read each record
from the file. The loop is controlled by the fail function,
which you will recall returns true if end of file is reached or if an
error occurred which prevented the read operation from succeeding. Of
course we also check to make sure that we don't run off of the end of
the array. When the loop ends, we return a code to indicate how the
loop ended. Compare this new LoadArray function with the
old one which is in emparray.cpp. You
will see a lot of similarities. Of course, the old one read data from the
keyboard, not from a file.
EmpCount = 0;
RecordSize = sizeof(Employee);
InFile.read(reinterpret_cast <char *> (&Employee), RecordSize);
while ((! InFile.fail()) && (EmpCount < EmpMax))
{
EmpArray[EmpCount] = Employee;
EmpCount++;
InFile.read(reinterpret_cast <char *> (&Employee), RecordSize);
}
if (InFile.fail())
return OKFlag;
else // array ran out of room
return TooMuchDataFlag;
|
When control returns to the main function, the data file is
closed. If the return code from LoadArray shows that all was
fine, we proceed to sort the array via the SelectionSort
function. This SelectionSort function is similar to that
used earlier to sort an array of integers (see
select.cpp) but has been modified to handle
an array of records. The new function is shown below. It is located in the
emparray.cpp file, while the
EmpCompare function that it uses is found in the
employee.cpp file.
/* Given: EmpArray The array of employee records to be sorted.
Count The number of items in EmpArray.
Task: To sort EmpArray into ascending order using selection sort,
basing the order on the EmpCompare function.
Return: EmpArray The sorted array.
*/
void SelectionSort(EmpArrayType EmpArray, int Count)
{
int i, k, MinIndex;
EmployeeType Min;
for (i = 0; i < Count - 1; i++)
{
// Find the minimum from index i to Count - 1.
// Assume its the first item until we know better.
Min = EmpArray[i];
MinIndex = i;
for (k = i + 1; k < Count; k++)
if (EmpCompare(EmpArray[k], Min) < 0) // Found a better min.
{
Min = EmpArray[k];
MinIndex = k;
}
if (MinIndex != i) // swap EmpArray[i] and the minimum
{
EmpArray[MinIndex] = EmpArray[i];
EmpArray[i] = Min;
}
}
}
|
In the above function note that Min has been changed so that
it is now a record. The if test to compare two items now uses
the EmpCompare function since it can handle the comparing of two
records. Those are the main changes.
Next, we again open the emp.dat file, but this time it is opened
for output, not input. (Notice that we use a new fstream variable
NewFile . Some compilers may let you reuse the old
fstream variable EmpFile as long as you have already
closed it.) We write the records from the array to the file
in the usual way and then close the file. The line used to write out
each record is shown below. The first parameter is the address
of the record in EmpArray at index k .
OutFile.write(reinterpret_cast <char *> (&EmpArray[k]), RecordSize);
|
Comparison of Binary and Text Files
It is instructive to write programs to solve a file-related problem, first by
using a binary file and then by using a text file. Let's imagine that
we have a series of parts data, where the data on each part consists of an
ID number, the number in stock, the price, and a description (a string).
When using a binary file we write whole records of parts data to the file
at once. When using a text file, we write out separately each of
the four pieces of data about a given part. Remember that the text file
will be readable by an editor, but the numbers in the binary file will not
be readable in this way.
We can set up a type called PartType for a record of parts data
as shown in parts.h. The programs to create the
data files will differ in how they open the file and in how they write to
the file. One will specify that it opens the file as a binary file, but the
other will not (and will thus be a text file by default). For the binary file we
will use write to write to the file, whereas for the text
file we will use the usual output operator and will output each of the four
pieces of parts data separately.
Similarly, we can compare programs to read these two files. Again, they differ
in how they open the file and in how they read from the file. In particular,
with the binary file we will use the read function to read a whole
record, but with the text file we will read each of the four pieces of parts
data from the file separately, using the usual stream operator (or the
getline function when reading the string).
Random File Access
A Program to Modify a Binary File of Records
A Software Engineering Example
Suppose that we want another program in our suite of programs dealing with the
emp.dat binary file of employee records. This time we want a program
that will let us modify the data contained in the records. This is useful
for fixing typos and the like. Let's use a software engineering approach
to this problem, both to illustrate the
software development life cycle
and to assist us with this problem.
Analysis
Let's first get a good sense of the inputs, the processing, and the outputs.
Suppose that in order to modify a record, the user is expected to input the
last name and first name of the employee. Then the user is prompted whether
or not to change the ID or wage rate for that employee record. The user can change as
many records as desired, indicating that it's time to stop by pressing
CTRL z instead of entering a name. (Remember that we would use
CTRL d instead under Linux.) Of course, all changed records are written out to the
same spot they were at in the emp.dat file. (Getting to any
particular spot is what random access is all about.)
Design
Let's think about the data first. Obviously, we have the file of records
to deal with. Since we want to allow the user to search for individual
employee records (and then possibly make changes), we might want to read
the data into an array of records. We know how to search an array of records.
This leads to a few complications, however. One is that an array has a limit
on how many records it can hold. The second is that since we probably want
to use binary search (which is faster than sequential) we will need to have
the records in order. We could either have this program sort the array, or
assume that the user has already sorted the file, which might not be a safe assumption.
Because of these complications, let's look for an alternative. Since
random access let's us jump to any location we want in a file, a file with random access behaves
a lot like an array. Thus we should be able to imitate the binary search
algorithm on the file itself and not use an array at all. This would simplify
things. Since file access is slow this will mean that searching will be a bit slower,
but if the file is not huge this should not be a problem as binary search only does a
few probes to find an item in short and medium-sized arrays or files.
We might at this point draw a data flow diagram showing the flow of data
among the various programs associated with the emp.dat file.
The diagram below shows that the flow of data between the user and each
program is two-way (since each program at least asks the user to press a
key to continue). Some of the programs only have one-way data flow between
the program and the file, but others have two-way data flow. Our new program,
here named modemp , obviously has to both read data from the file
and write updated data to it.
Next, let's design the functions in a top-down fashion. At first glance we
might try for a main function with three functions under it: one to do the
binary search, one to print an individual record on the screen so that the
user can see what we have, and one to allow the user to make changes to the
data. However, since the binary search will need to compare records to see
if the names match, we will need the EmpCompare function from
the employee.cpp file that we have used
before. Also, this file contains a PrintEmployee function that does
exactly what we want in printing an employee record, so we will use it too.
Reusing code is a great idea, as it can save lots of work!
Since we want to allow repeated lookups, the loop that allows this might
best be placed in another function, perhaps called ProcessFile .
This leads to the following structure chart:
Of our new functions, the first one we come to below main
is ProcessFile . Let's assume that it gets (via a parameter)
the file stream, already properly opened. Since we will want to read from
the file to get the correct employee record and then maybe to write an
updated record to the file, we want the file open for both input
and output, which is possible in C++. The main task of this function is,
of course, to allow repeated lookups and modifications of an employee
record. This leads to the following documentation for this function:
/* Given: EmpFile A binary file stream already opened for input and output.
Task: To allow the user to repeatedly look up an employee in
EmpFile by name. If the lookup succeeds, the info on the
employee is displayed on the screen and the user is given
a chance to modify the ID and WageRate for the employee.
Return: EmpFile The modified file stream.
*/
void ProcessFile(fstream & EmpFile)
|
The SearchFile function is to imitate our binary search in the
data file. It will need the file as a parameter and will need to be given
the name of the employee to look up. Let's assume that we pass in an employee
record containing the first and last name of the employee to look up. We
don't care what is in the other fields of the record sent into the function.
The function should, however, return this record with all of the fields filled
in if there is a match. In the function name we can return true
or false to indicate if a match was found. We also need to
somehow return the location where a match was found. This is typically done
as the number of bytes into the file where the matching record begins. This
number can then be used elsewhere in the program if we want to write updated
employee data to this location. The resulting documentation for this function
then looks something like the following. Note that "seeking" refers to
moving to a given position in the file.
/* Given: EmpFile A file stream already opened for input and output.
Employee An employee record containing the last name and
first name to search for.
Assumes: That EmpFile is in ascending order.
Task: To do a binary search in EmpFile for Employee.
Return: EmpFile The file stream (which can be modified by reading
and seeking in the file in that the file postion
pointer may be moved).
Employee If found, this parameter will contain the complete
record for the person looked up.
Location The location of the Employee record in the file
(as the number of bytes into EmpFile).
SearchFile In the function name, true is returned if Employee
was located, false otherwise.
*/
bool SearchFile(fstream & EmpFile, EmployeeType & Employee, long & Location)
|
Finally, let's design the Modify function. It needs to be given (via
parameters) the file stream, the employee record that we may wish to change,
and its location in the file (as the number of bytes into the file where it
is found). The function's task is to ask the user whether or not to change
the data in the ID and WageRate fields, to get this data, and then to write it out
to the correct location for this record in the file. We thus get documentation
for this function as follows:
/* Given: Employee An employee record.
EmpFile A file stream, open for input and output.
Location The offset in EmpFile at which Employee can be found.
Task: To allow the user to change the ID or WageRate in the Employee
record, if desired, with the modified data being written to EmpFile.
Return: Employee The (possibly) modified employee record.
EmpFile The modified file stream for the file of records.
*/
void Modify(EmployeeType & Employee, fstream & EmpFile, long Location)
|
You might ask why we haven't allowed the user to change the FirstName or
LastName fields. Changing either of these might mess up the ascending order
of the data in the array. Since our binary search depends on this ordering,
the modify program might fail to work after changing a name! An alternative approach
would be to skip the sorting and binary search. We could just process the file
sequentially, reading each record and changing it and writing it back to the file as needed.
At this point we might write out in pseudocode the algorithm for one
or more of the functions. Let's do this for the ProcessFile
function. Since the SearchFile function essentially follows
the well-known binary search algorithm, we probably don't need to write out
pseudocode for it. Also, the Modify function sounds simple enough
to write without doing any pseudocode first. Here is the pseudocode for
the ProcessFile function:
void ProcessFile(fstream & EmpFile)
{
set up needed local variables
ask the user for the employee's last name
read this into the LastName field of Employee
while (no input failure)
{
ask the user for the employee's first name
read this into the FirstName field of Employee
if (SearchFile function finds Employee in EmpFile)
{
print the data in the Employee record
call Modify on Employee and EmpFile using Location given by SearchFile
}
else
print a "not found" message
ask the user for the employee's last name
read this into the LastName field of Employee
}
}
|
Prototyping
Next we might construct a quick prototype that we could let users
try out. Our first prototype might not even access the file at all. The
main function would simply call ProcessFile which
would contain the loop that prompts the user to repeatedly enter the first
and last names for the employee to look for. We could use a stub
for the SearchFile function. The stub doesn't do any searching at
all, but simply sends back some hard-coded employee data. This data is
surely incorrect, but the user will be able to see what things look like
on the screen. PrintEmployee can then be used to print this
fake data, and Modify can be used to ask the user a series of
questions about whether to change the ID or WageRate fields. No code to write
changes to the file would be present, however.
We might then create a second prototype which adds more functionality.
It could really open the file and search for desired records, but maybe
not yet allow the user to actually change any data. In a large project, one
might use a sequence of prototypes that gradually approach the desired finished product.
Coding
We now code the complete program. This can be found in the
modemp.cpp file. (Note that we use
employee.h and
employee.cpp too.) Since we have never used
random access before, pay particular attention to how this is done. First, however,
we begin with the main function, where we open our file for
both input and output, since we need to read a record and maybe write out an
updated record, read another record and maybe write out an updated one, etc.
Since we go back and forth between reading and writing it does not make
sense to keep opening and closing the file. (For one thing, opening a
file may be somewhat time-consuming.) Instead we just open the file
for both input and output and leave it open for the duration of the program.
EmpFile.open("emp.dat", ios::in | ios::out | ios::binary);
if (EmpFile.fail())
{
cerr << "Could not open file emp.dat" << endl;
exit(1);
}
|
We would probably next look at the ProcessFile function.
Since we wrote pseudocode for it earlier, it is now rather easy to write
it out in C++. Note the use of a long variable, which is an extra long
integer. A long is what one normally uses to keep track of one's
position in a file, since the number of bytes in a file can be a rather large number.
Nothing further will be said here about the coding of this function.
Next, let's look at the Modify function. We write it to
fit the documentation that we already wrote above. Recall that it receives
Location as the number of bytes to go into the file to get to the relevant
employee record, a copy of which is being passed via the Employee parameter.
void Modify(EmployeeType & Employee, fstream & EmpFile, long Location)
{
char Choice;
bool Modified = false;
cout << endl << "Do you wish to modify the ID number (y/n)? ";
cin >> Choice;
if ((Choice == 'y') || (Choice == 'Y'))
{
Modified = true;
cout << "Enter the corrected ID number: ";
cin >> Employee.ID;
}
// similarly ask about modifying the WageRate field (details not shown)
if (Modified)
{
EmpFile.seekp(Location, ios::beg);
EmpFile.write(reinterpret_cast <char *> (&Employee), sizeof(Employee));
}
}
|
How to prompt for the updated data should be obvious. When the boolean
Modified flag is true , then we want to write out
the updated data to the file. If this flag is false then there
is no sense in wasting the time to write unchanged data to the file.
The basic idea is that we have to move the file position pointer, the one
used to keep track of where to write to the file, to the correct location.
Then we write out the record using our familiar write function
that we always use with our binary files. Some form of "seek" command is
used to move the file position pointer. If we want to use the pointer that
keeps track of where to write, we must use the seekp version.
The letter 'p' is a reminder that we plan to "put" some data into the file.
(There is also a seekg version used when we want to move the
file position pointer that keeps track of where we are going to read from
a file. The 'g' is a reminder that we want to "get" data from the file.)
The first parameter to either version of seek is the number of bytes to move
into the file. The second parameter is a constant used to indicate where
we are starting from. Typically one uses ios::beg which means
that we will move a certain number of bytes into the file from the beginning.
Finally, let's look at the coding of the SearchFile function.
This is the one that imitates a binary search within the file to try to
find a record containing the first and last names in the Employee
record. Of course, we already have an EmpCompare function to tell
us if two records have names that match. In such a case EmpCompare
returns 0. It returns a -1 if the employee record given as its first parameter
is alphabetically less than the employee record given as its second parameter. See
employee.cpp if you wish to look at the details.
bool SearchFile(fstream & EmpFile, EmployeeType & Employee, long & Location)
{
EmployeeType EmployeeTemp;
bool Found;
int CmpResult;
long Mid, Low, High, RecordSize;
Found = false;
Low = 0L;
// Go to the end of the file:
EmpFile.seekg(0L, ios::end);
RecordSize = sizeof(EmployeeTemp);
// Find the number of records and subtract 1 to get high index:
High = EmpFile.tellg() / RecordSize - 1L;
while ((! Found) && (Low <= High))
{
Mid = (Low + High) / 2;
Location = Mid * RecordSize;
EmpFile.seekg(Location, ios::beg);
EmpFile.read(reinterpret_cast <char *> (&EmployeeTemp), RecordSize);
CmpResult = EmpCompare(Employee, EmployeeTemp);
if (CmpResult == 0)
{
Employee = EmployeeTemp;
Found = true;
}
else if (CmpResult < 0)
High = Mid - 1L;
else
Low = Mid + 1L;
}
return Found;
}
|
In the code above, note that any variable used to keep track of a position
within the file is a long . Once again this is because the
number of bytes in a file can be huge, so that an ordinary int
might overflow. Variables Low , High , and
Mid are used to hold record numbers, where the file's
records are numbered 0, 1, 2, etc. Even these numbers could get to be
rather large, so we use type long for them. Note that variable
Low starts at value 0L , where the L
is used to indicate a constant of type long .
Finding the proper location to use for variable High is
somewhat tricky. We use seekg with the ios::end
constant, indicating the end of the file, to seek 0 bytes after the end of
the file. In other words this moves the file position pointer (for reading)
to the end of the file. We then use the tellg function to
report how many bytes into the file we now are. If we divide this number
by the number of bytes in an employee record, the result is the number
of records in the file. Finally, we subtract 1 (since numbering begins
at 0) to get the initial value for the High record number.
The loop used in the binary search should look familiar. The computation
for finding Mid is the same as always. However, once
Mid is known we need to read in record number Mid
so that we can compare it to the record we are looking for. To get to the
correct location we use seekg with a first parameter computed
as Location = Mid * RecordSize . This gives us the number of
bytes by which to move into the file to get to the desired record. We
then use our usual read function to read the record and then
call upon EmpCompare to compare the record just read with the
Employee record. The rest of the code is much like that found
in ordinary binary search.
After coding, one would then proceed with testing and debugging. Special
cases that should be tested include finding and modifying the first and last
records in the file. We won't go into this step further here. There is also
the maintenance step once the software is put into actual use. Finally, there
is the documentation, a lot of which has been accumulated during the above
discussion. We have a description of the software requirements, a
data flow diagram, a structure chart, documentation for each function, etc.
Related Items
Back to the main page for Software Design Using C++
|