CIS Department > Tutorials > Software Design Using C++ > Professional Programming: Issues and Tools

Software Design Using C++

Professional Programming: Issues and Tools

Introduction

Are we writing professional programs yet? Unfortunately, the answer is probably no, even if you have worked through all of the Software Design in C++ web pages! This is partly because there is so much to learn. Most people learn how to write good quality, professional programs through years of experience. The help of a mentor and working with others on teams all help to expand one's ability to write good software. Courses in software engineering, systems analysis and design, user interface design, etc. are also quite useful.

Another reason that we are not yet writing professional programs is that these web pages usually took the easiest approach since the main goal was to teach programming to beginners. Even when these web pages got into more difficult material, an attempt was made to present it in as clear a manner as possible. The easiest method of doing something is often the best, but that is not always the case. In the following material we will look at some issues that have been largely overlooked in these web pages so far (such as security concerns) and will mention some of the tools used by professional programmers to make their jobs easier.

Security Issues

Security is probably not an issue with a small program that you write for your own use and which is not even accessible by others. Consider, in contrast, software that is run on a company server each time that someone on the Internet clicks on a particular link on a company web page. If that software has a security flaw, it may then be possible for malicious users to do things on this server that they should not be able to do. For example, they might be able to read data that is confidential (such as credit card numbers), they might be able to change data on the server, they might be able to crash the server, and they might even be able to gain administrative (root) access to the server, thus giving themselves complete access to everything on this server! There are many types of common security flaws in software. We primarily discuss one well-known type, buffer overflow, below.

Example Security Issues

Before looking at the main example, buffer overflow, let's have a brief look at several other security concerns. One is how to protect confidential data such as passwords and credit card numbers. This is commonly done with some form of encryption. A related issue is how to authenticate users, that is, how to verify that they really are who they say they are. This, too, uses encryption. The study of encryption is beyond what can be done in these web pages. The references below provide some further information. Detailed books and courses entirely on encryption are now becoming available as well. The short answer for the programmer who needs to use encryption is generally to use some encryption technology that is already well-tested. (This might involve licensing the technology. Note that Microsoft has provided a cryptography API into many of its products. To use it, you start by including the wincrypt.h header. See Microsoft's documentation for further details.) Encryption is difficult to do well. Too many folks have tried to throw together their own encryption scheme only to have huge flaws discovered in it after it was put into use. As of this writing, for example, there is a fair amount of concern about the security holes in a particular form of wireless networking. Obviously the creators of this software did not intend it to have such flaws. Security and encryption are simply hard to do well.

Another source of bugs and potential security flaws is failing to check if a system call or a call to some similar function generates an error. Many system calls return a value that indicates that an error happened. In object-oriented C++ an exception might be thrown instead. Other ways of indicating an error condition are also sometimes used. The programmer should check to see if function calls fail in some way and design the program to handle this. Sure, most system calls work fine nearly all of the time, but it could be that rare failure that leads to a difficult to find bug or security flaw. As an example, consider the use of new to dynamically create an array, an object, or whatever. Always check to see if the new operation failed. Although there will almost always be room to allocate space for a new item, consider what your software should do in the rare case that the new operation fails. Make sure that your software handles this case by recovering from the error if possible and failing gracefully if recovery is not possible.

There are many other security issues that a programmer should consider when creating professional software. These include the filtering of bad input (hackers love to supply bad input to your programs to try to get a result that you did not intend), setting permissions on files so that the least amount of access necessary is given, etc. Filtering all but acceptable characters is especially important in a web application that receives data from the user. This filtering must be done on the server end, as the potential attacker has complete control over what happens at the client end. (Thus using JavaScript to filter the data at the user's browser is not the way to handle this.) Only accept the characters that you wish to allow and limit the length of the data to what your program can reasonably handle. Another issue that often occurs is where to place a temporary data file. Don't place it in a "world-writable" location as an attacker may be able to replace the temporary file with something else and thus get your application to do something that you never intended. See the references below for further information. As Jay Heiser said in the March 2002 issue of Information Security: "The only way to avoid costly mistakes is first to assume that there may be important things you need to learn, and then have your work repeatedly tested by security experts. One thing mature programmers know is that they don't know everything." Next, let's look at our main example.

Buffer Overflow

This typically involves the misuse of the run-time stack. Suppose that the software in question is the software mentioned above that is run each time an Internet user clicks a particular link on the company web page. This web page contains a form where the user supplies some data. When the user clicks the submit button this data is processed by your software on the server. The malicious hacker, of course, submits some rather unorthodox data. Suppose further that a function in your software has a local variable, a character array named buffer, that is used to hold a copy of some or all of the data that the user submits. The hacker's data is deliberately longer than will fit in your buffer variable so that it overflows and overwrites the rest of the stack frame for your function. In particular, it overwrites the return address used when the function ends. The hacker tries to arrange it so that the new return address sends the computer to a location in the stack that was just overwritten, a location that contains code that the hacker desires to run on your server. The hacker's code might create a backdoor that allows that person easy access to your server! Your server has just been taken over.

It might sound difficult to arrange things so that the fake return address points to the desired piece of hacker code, but a bunch of NO-OPs in front of the code make the attack more likely to work. (A NO-OP is an instruction that does nothing. Control simply proceeds to the next line of code.) As long as the fake return address points to one of the NO-OPs or the first line of hacker code, the system will end up executing the hacker code that installs the backdoor. Here is a picture of the run-time stack, both before and after the hacker's data has overflowed the local variable buffer. Note that the stack grows from high to low addresses on most computers, but when a string is copied into a variable the data goes into one memory location after another, from low to high addresses. Thus the malicious data supplied by the hacker is put in the buffer variable, then overflows into local variable c, and then into the return address location (and possibly further). The bad return address may be repeated several times in a row to get a better chance of having that bad address land on the return address location in the stack frame. You might think that having the run-time stack grow from low addresses to high might get rid of the buffer overflow problem, but no, there are still ways to manipulate things to get a buffer overflow.

[buffer overflow in run-time stack]

What aspect of coding makes buffer overflow possible? The usual culprit is strcpy, a function that these web pages have used extensively whenever we needed to copy a character array type of string. The strcpy function is simple to understand and use, but it does not do any bounds checking. It just copies characters from the source string to the destination until it reaches a NULL character in the source string. It will overflow the destination if the destination variable (buffer) is not large enough to hold the data. Note that the strcat function also has the same problem. However, the strncpy function can be a reasonable alternative to strcpy (if used wisely) as it only copies up to a fixed number of characters. (If you decide to try strncpy, read the documentation on it carefully. Make sure that strncpy is copying no more data than will fit in your buffer. Since strncpy doesn't always append a NULL in the destination string you may have to do so manually. Be sure that you have saved space in the buffer for this NULL as well.) Let's look at some of the details of this:


// Suppose we need to copy user input, which we suppose is in TempStr, into local variable buffer.
// Using strcpy as follows would allow overflow if data is longer than char array buffer can hold.
strcpy(buffer, TempStr);

// Instead, replace the use of strcpy with something like this.
int Num;
Num = sizeof(buffer) - 1;
strncpy(buffer, TempStr, Num);   // Only copies up to Num characters (quits early if NULL found).
buffer[Num] = '\0';              // If data in TempStr was too long, no NULL char was added, so we
                                 // add one manually.  It won't hurt if there is a NULL before this.

The above method will truncate the data if it does not all fit into buffer. Another approach is to dynamically allocate the character array buffer so that it is big enough. Then buffer won't overflow and the data all fits. Below is an outline of this. It would be best to add to this code a check to see than length is reasonable. Further, you probably do not want to allocate a huge buffer. To fix that you could cut length down to some fixed maximum if it is too big.


int length;
char * buffer;
  
length = strlen(TempStr);
buffer = new char[length + 1];   // Save room for the NULL at the end of the string.
if (buffer == NULL)    //*** Check if this is how your compiler handles insufficient space.
   {
   cerr << "Insufficient space to allocate for buffer" << endl;
   exit(1);
   }

// Since buffer has enough room, strcpy should be OK, but let's be paranoid and use strncpy:
strncpy(buffer, TempStr, length);
buffer[length] = '\0';

// Do whatever you wish with buffer...

// Once buffer is no longer needed, reclaim the space:
delete [] buffer;

Thus the basic solution (assuming dynamic allocation of buffers is not used) is to avoid the use of strcpy and similar functions that lack bounds checking in any software where a buffer overflow by a malicious user would be problematic. This includes not just web software, but also software on a server that internal users normally access. A malicious internal user could log into the server as normal and then use a buffer overflow attack to gain administrative (root) access to this server. This user has just gained total access to the server. One should also be careful when reading data into a character array that the data cannot overflow the array.

Perhaps the best solution to the problem of how to copy strings without allowing buffer overflows is to use an object of the STL string class, using assignment with the = operator (or a copy constructor) to copy the string objects. This is said to have no potential for buffer overflow problems. However, you still may be able to increment an iterator so that it is off the end of the string. If you use some other string class, be sure to check that the function or operator used to copy such strings does not allow buffer overflow to occur. For the character array type of string, if you need to use that scheme, you can write your own string copy function. The following is an example of this approach.

#include <iostream>
#include <cctype>
using namespace std;

const int StrMax = 20;  // whatever maximum you want

typedef char StringType[StrMax];   // use this type for your character array strings
typedef char StringType2[80];  // just added for testing purposes, do not use!


/* Given:  Source       The character array type of string to be copied.
   Task:   To copy Source to the Destination string, being sure not to copy more
           characters than will fit and rejecting any characters other than
           alphanumeric characters, NULL, space, plus sign, minus sign, and period.
   Return: Destination  Containing a copy of the data from the Source string,
                        truncated if need be.  If a character other than those
                        listed above is found, the empty string is returned in Result.
           In the funtion name, false is returned if a character was found other than
           those allowed in the list above.  False is also returned if the data would
           not fit in Destination.  Otherwise, true is returned.
*/
bool MyStringCopy(StringType Source, StringType Destination)
   {
   char *p;
   char ch;
   int k;
   bool Flag;

   p = Source;
   Flag = false;

   for (k = 0; k < StrMax; k++)
      {
      ch = *p;
      if (ch == '\0')
         {
         Destination[k] = '\0';
         Flag = true;
         break;
         }
      else if (isalnum(ch) || (ch == ' ') || (ch == '+') || (ch == '-') || (ch == '.'))
         {
         if (k == StrMax - 1)  // out of space
            {
            Destination[k] = '\0';
            break;
            }
         else
            {
            Destination[k] = ch;
            p++;
            }
         }
      else   // bad data, return empty string
         {
         Destination[0] = '\0';
         break;
         }
      }

   // just to be sure there's a NULL at the end of the string:
   Destination[StrMax - 1] = '\0';
   return Flag;
   }


int main(void)
   {
   bool r;
   StringType s;
   StringType s1 = "hello world";
   StringType s2 = "hello, world";
   StringType s3 = "hello world.";
   StringType s4 = "hello world!";
   StringType s5 = "0123456789012345678";   // exactly 19 characters should fit
   StringType2 s6 = "01234567890123456789";  // 20 characters should not fit

   r = MyStringCopy(s1, s);
   if (r)
      cout << "Flag is true" << endl;
   else
      cout << "Flag is false" << endl;
   cout << s << endl << endl;

   r = MyStringCopy(s2, s);
   if (r)
      cout << "Flag is true" << endl;
   else
      cout << "Flag is false" << endl;
   cout << s << endl << endl;

   r = MyStringCopy(s3, s);
   if (r)
      cout << "Flag is true" << endl;
   else
      cout << "Flag is false" << endl;
   cout << s << endl << endl;

   r = MyStringCopy(s4, s);
   if (r)
      cout << "Flag is true" << endl;
   else
      cout << "Flag is false" << endl;
   cout << s << endl << endl;

   r = MyStringCopy(s5, s);
   if (r)
      cout << "Flag is true" << endl;
   else
      cout << "Flag is false" << endl;
   cout << s << endl << endl;

   r = MyStringCopy(s6, s);
   if (r)
      cout << "Flag is true" << endl;
   else
      cout << "Flag is false" << endl;
   cout << s << endl << endl;

   return 0;
   }

If you write your own string copy function, be very sure that it works correctly in all possible cases. Note that the above MyStringCopy function carefully checks to be sure that the data will fit into the Destination string. It also does some filtering of the data, quitting if it finds an unwanted character. Adjust this to suit your needs. Below is the output from the test program for this function. Note that the string containing a comma and the string containing an exclamation are rejected since these characters were not wanted. Also notice that the string of 19 characters is accepted, but the string of 20 characters is not, since 20 characters plus the NULL to mark the end of the string will not fit in an array of 20 characters.


Flag is true
hello world

Flag is false


Flag is true
hello world.

Flag is false


Flag is true
0123456789012345678

Flag is false
0123456789012345678

There is a lot more information available about buffer overflow problems. See the first two books in the references below for more on this topic. Note that running off the end of any array is problematic, even in cases that don't open your software up to a buffer overflow attack. If you write data beyond either end of the array you may overwrite something important. This can lead to software that sometimes crashes or produces incorrect results. Because of this problem, the STL's vector class is often used instead of arrays. If you use the at function to reference the item at a given index, it will nicely throw an exception if the index is out of bounds. However, if you use [] to try to access the same item, the program may crash. Also, with vectors and other containers in the STL one often uses iterators to access individual items or a range of items in these containers. It is up to the programmer to be sure not to use an iterator to access a location in memory outside of the container on which the iterator is being used. As Nicolai M. Josuttis states on p. 203 of his reference book, "In this respect, iterators are just as unsafe as ordinary pointers." (Recall that if you follow a pointer that has been incremented to point beyond the end of something or set it to point to some random location, it often results in a runtime error that crashes your program. Both arrays and pointers are somewhat dangerous to use unless one is very careful with them ).

If you program for the Windows platform, you should consider using Microsoft's strsafe.lib and strsafe.h. These provide a library of safe string-handling functions. This library is provided as part of Visual C++ .NET 2003 and can also be obtained by downloading the Windows Core SDK from the SDK update site. For more information on these functions, go to MSDN and click on Library, User Interface Design and Development, Windows Management, Windows User Interface, Resources, Strings, String Overviews, Using the Strsafe.h Functions. Also refer to the article by Richard Grimes mentioned below in the references section.

Privacy Issues

There are many places where privacy issues can be of concern in software development. The ACM's Professional Standards (currently consisting of the General ACM Code of Ethics and Professional Conduct as well as the Software Engineering Code of Ethics and Professional Practice) provide helpful guidelines on this. These documents talk of the responsibility of computing professionals to protect the privacy and integrity of data about people. This data should be accurate, should be subject to correction by the individuals affected, and should be retained only for a reasonable period of time. Data should not be used for purposes beyond those for which it was collected, unless the permission of the affected individuals is obtained. Read these ACM documents for further details about privacy issues and other ethical concerns.

Memory Leaks

Even professionals sometimes have problems with memory leaks. A memory leak occurs when a program dynamically allocates memory space for an object, array, or variable of some other type, but fails to free up the space before the program completes. Some commercial software has suffered from this problem. The result for users is that they have less free memory after running the software than before (at least until they reboot their computer). The main way to prevent this problem is to discipline oneself to check that every time space is allocated it is always deallocated appropriately. Typically this involves the use of the delete operation.

Threads

Many modern programs are written using threads. A thread is a portion of a program that can be scheduled separately to run on the CPU. Suppose that a program needs to carry out two main tasks at a certain point. Each task could be carried out by a different thread. If one thread gets held up (perhaps waiting for some data to be read in from disk), the other thread might still be able to go forward. In a server environment, software that serves multiple users might use a different thread to handle each user. Threaded programming is known to be rather difficult. Consult a good reference book or search the Internet for online materials. Currently, the ThreadMentor home page and Iona Technologies' JThreads/C++ are sources of information and software for this topic. You can also search the MSDN library for information on threads.

Tools

The following is by no means a complete list of tools used by professional programmers. Rather, this partial list is intended to give some indication of the types of tools that are used. Good tools can be used to improve both the quantity and quality of work, whether that work is done by an individual or by a team. No particular endorsement of tools mentioned by name is implied.

Debuggers are a commonly used tool. Many compilers come with one. The Visual C++ Debugger was briefly explained in these web pages. For the g++ compiler under Linux, the gdb debugger is typically used. There is also a graphical debugger named ddd for g++ under Linux. It is currently available from http://www.gnu.org/software/software.html.
There are tools to look for certain security problems in one's source code. Of course, none of these tools can possibly find all security problems. Here are a couple of free tools that do this:
There are also tools to help in finding memory leaks. An Internet search will find a number of these. The following is a sample of what you can find.
- Memory Supervisor System.
- Memory Leaks
  Article by Randy Charles Morin with code to find memory leaks.
- In Visual C++ you can also use Help to search for _CrtMemState. This will give you more information on checking for memory leaks.
Version control systems are used to help in managing the multiple versions of the many files that compose a large project. At this writing, RCS and CVS are available via http://www.gnu.org/software/software.html. An Internet search would likely find others for various platforms.
Some software companies produce large suites of software development tools. One such company is Rational. Their suite includes project management, requirements management, visual modeling, use-case management, data modeling, run-time analysis, system testing, and other functionality.

References

Build Security In
Sponsored by Dept of Homeland Security, National Cyber Security Division.
Counter Hack: A Step-by-Step Guide to Computer Attacks and Effective Defenses. Ed Skoudis. Prentice Hall PTR (2002).
CWE/SANS TOP 25 Most Dangerous Programming Errors (with resources on how to avoid these).
Data & Computer Communications, sixth edition. William Stallings. Prentice Hall (2000). See especially chapter 18 for security issues and encryption.
Exploiting Software: How to Break Code. Greg Hoglund and Gary McGraw. Addison-Wesley Professional (2004).
Hacking Exposed: Network Security Secrets and Solutions, third edition. Stuart McClure, Joel Scambray, George Kurtz. Osborne/McGraw-Hill (2001).
How to avoid dangling pointers: Tiny programming errors leave serious security vulnerabilities
The Microsoft Security Development Lifecycle describes a process for creating secure software.
Microsoft Security & Privacy Page
This has security information for developers, IT professionals, home users, etc.
MSDN Security Developer Center
Has many hints on secure coding.
"Preventing Buffer Overruns in C++", Richard Grimes, Dr. Dobb's Journal, 29(1), January 2004, pp. 49-52.
Secure Coding: Principles & Practices. Mark G. Graff and Kenneth R. van Wyk. O'Reilly (2003).
Also see the companion Secure Coding Website.
Secure Programming for Linux and Unix HOWTO
By David A. Wheeler.
SANS Software Security Institute
Offers secure programming skills testing and free practice tests.
What Do We Mean By Memory Leaks and Buffer Overflows in a Web Application?
Writing Secure Code, 2nd edition.
Michael Howard and David Leblanc. Microsoft Press (2002).


Computing & Information Systems Department		Search CIS Site Tutorials