Thursday, September 27, 2007

Understanding Scope

Understanding Scope
All variables have a finite lifetime when your program executes. They come into existence from the point at which you declare them and then, at some point, they disappear - at the latest, when your program terminates. How long a particular variable lasts is determined by a property called its storage duration. There are three different kinds of storage duration that a variable can have:
automatic storage duration
static storage duration
dynamic storage duration
Which of these a variable will have depends on how you create it. We will defer discussion of variables with dynamic storage duration until Chapter 3, but we can look into the characteristics of the other two in this chapter.
Another property that variables have is scope. The scope of a variable is simply that part of your program in which the variable name is valid. Within a variable's scope, you can legally refer to it, to set its value or use it in an expression. Outside of the scope of a variable, you cannot refer to its name - any attempt to do so will cause a compiler error. Note that a variable may still exist outside of its scope, even though you cannot refer to it by name. We will see examples of this situation a little later in this discussion.
All of the variables that we have declared up to now have had automatic storage duration, and are therefore called automatic variables. Let's take a closer look at these first.
Automatic Variables
The variables that we have declared so far have been declared within a block - that is, within the extent of a pair of curly braces. These are called automatic variables and are said to have local scope or block scope. An automatic variable is 'in scope' from the point at which it is declared until the end of the block containing its declaration.
An automatic variable is 'born' when it is declared and automatically ceases to exist at the end of the block containing the declaration. This will be at the closing brace matching the first opening brace that precedes the declaration of the variable. Every time the block of statements containing a declaration for an automatic variable is executed, the variable is created anew, and if you specified an initial value for the automatic variable, it will be reinitialized each time it is created.
There is a keyword, auto, which you can use to specify automatic variables, but it is rarely used since it is implied by default. Let's put together an example of what we've discussed so far.
Try It Out - Automatic Variables
We can demonstrate the effect of scope on automatic variables with the following example:// EX1_06.CPP
// Demonstrating variable scope
#include
using namespace std;
int main()
{ // Function scope starts here
int count1 = 10;
int count3 = 50;
cout << endl
<< "Value of outer count1 = " << count1
<< endl;
{ // New scope starts here...
int count1 = 20; // This hides the outer count1
int count2 = 30;
cout << "Value of inner count1 = " << count1
<< endl;
count1 += 3; // This affects the inner count1
count3 += count2;
} // ...and ends here
cout << "Value of outer count1 = " << count1
<< endl
<< "Value of outer count3 = " << count3
<< endl;
// cout << count2 << endl; // uncomment to get an error
return 0;
} // Function scope ends here

How It Works
The output from this example will be:
The first two statements declare and define two integer variables, count1 and count3, with initial values of 10 and 50 respectively. Both these variables exist from this point to the closing brace at the end of the program. The scope of these variables also extends to the closing brace at the end of main().
Remember that the lifetime and scope of a variable are two different things. It's important not to get these two ideas confused.
Following the variable definitions, the value of count1 is output to produce the first of the lines shown above.
There is then a second curly brace which starts a new block. Two variables, count1 and count2, are defined within this block, with values 20 and 30 respectively. The count1 declared here is different from the first count1. The first count1 still exists, but its name is masked by the second count1. Any use of the name count1 following the declaration within the inner block refers to the count1 declared within that block.
The variable name count1 has been duplicated here only to illustrate what happens. Although this code is legal, it isn't a good approach to programming in general. It's confusing, and it's very easy to hide variables defined in an outer scope accidentally.
The value shown in the second output line shows that within the inner block, we are using the count1 in the inner scope - that is, inside the innermost braces:cout << "Value of inner count1 = " << count1
<< endl;
Had we still been using the outer count1, then this would display the value 10. The variable count1 is then incremented by the statement: count1 += 3; // This affects the inner count1
The increment applies to the variable in the inner scope, since the outer one is still hidden. However, count3, which was defined in the outer scope, is incremented in the next statement without any problem:count3 += count2;
This shows that the variables which were declared at the beginning of the outer scope are accessible from within the inner scope. (Note that if count3 had been declared after the second of the inner pair of braces, then it would still be within the outer scope, but in that case count3 would not exist when the above statement is executed.)
After the brace ending the inner scope, count2 and the inner count1 cease to exist. The variables count1 and count3 are still there in the outer scope and the values displayed show that count3 was indeed incremented in the inner scope.
If you uncomment the line,// cout << count2 << endl; // uncomment to get an error
then the program will no longer compile correctly, because it attempts to output a non-existent variable. You will get an error message something like:
C:\Program Files\MyProjects\Ex1_06\Ex1_06.cpp(29): error C2065: 'count2': undeclared identifier
This is because count2 is out of scope at this point.
Positioning Variable Declarations
You have great flexibility in where you place the declarations for your variables. The most important aspect to consider is what scope the variables need to have. Beyond that, you should generally place a declaration close to where the variable is to be first used in a program. You should write your programs with a view to making them as easy as possible for another programmer to understand, and declaring a variable at its first point of use can be helpful in achieving that.
It is possible to place declarations for variables outside of all of the functions that make up a program. Let's look what effect that has on the variables concerned.
Global Variables
Variables that are declared outside of all blocks and classes (we will discuss classes later in the book) are called globals and have global scope (which is also called global namespace scope or file scope). This means that they are accessible throughout all the functions in the file, following the point at which they are declared. If you declare them at the very top of your program, they will be accessible from anywhere in the file.
Globals also have static storage duration by default. Global variables with static storage duration will exist from the start of execution of the program, until execution of the program ends. If you do not specify an initial value for a global variable, it will be initialized with 0 by default. Initialization of global variables takes place before the execution of main() begins, so they are always ready to be used within any code that is within the variable's scope.
The illustration below shows the contents of a source file, Example.cpp, and the arrows indicate the scope of each of the variables.
The variable value1 appearing at the beginning of the file is declared at global scope, as is value4, which appears after the function main(). The scope of each global variable extends from the point at which it is defined to the end of the file. Even though value4 exists when execution starts, it cannot be referred to in main() because main() is not within the variable's scope. For main() to use value4, you would need to move its declaration to the beginning of the file. Both value1 and value4 will be initialized with 0 by default, which is not the case for the automatic variables. Note that the local variable called value1 in function() hides the global variable of the same name.
Since global variables continue to exist for as long as the program is running, you might ask the question, 'Why not make all variables global and avoid this messing about with local variables that disappear?' This sounds very attractive at first, but as with the Sirens of mythology, there are serious side effects which completely outweigh any advantages you may gain.
Real programs are generally composed of a large number of statements, a significant number of functions and a great many variables. Declaring all variables at the global scope greatly magnifies the possibility of accidental erroneous modification of a variable, as well as making the job of naming them sensibly quite intractable. They will also occupy memory for the duration of program execution. By keeping variables local to a function or a block, you can be sure they have almost complete protection from external effects, they will only exist and occupy memory from the point at which they are defined to the end of the enclosing block, and the whole development process becomes much easier to manage.
If you take a look at ClassView for any of the examples that you have created so far, and extend the class tree for the project by clicking on the +, you will see an entry called Globals. If you extend this, you will see a list of everything in your program that has global scope. This will include all the global functions, as well as any global variables that you have declared.
Try It Out - The Scope Resolution Operator
As we have seen, a global variable can be hidden by a local variable with the same name. However, it's still possible to get at the global variable using the scope resolution operator, ::. We can demonstrate how this works with a revised version of the last example:// EX1_07.CPP
// Demonstrating variable scope
#include
using namespace std;
int count1 = 100; // Global version of count1
int main()
{ // Function scope starts here
int count1 = 10;
int count3 = 50;
cout << endl
<< "Value of outer count1 = " << count1
<< endl;
cout << "Value of global count1 = " << ::count1 // From outer block
<< endl;
{ // New scope starts here...
int count1 = 20; //This hides the outer count1
int count2 = 30;
cout << "Value of inner count1 = " << count1
<< endl;
cout << "Value of global count1 = " << ::count1 // From inner block
<< endl;
count1 += 3; // This affects the inner count1
count3 += count2;
} // ...and ends here.
cout << "Value of outer count1 = " << count1
<< endl
<< "Value of outer count3 = " << count3
<< endl;
//cout << count2 << endl; // uncomment to get an error

return 0;
} // Function scope ends here

How It Works
If you compile and run this example, you'll get the following output:
The shaded lines indicate the changes we have made to the previous example; we just need to discuss the effects of those. The declaration of count1 prior to the definition of the function main() is global, so in principle it is available anywhere through the function main(). This global variable is initialized with the value of 100: int count1 = 100; // Global version of count1
However, we have two other variables called count1, which are defined within main(), so throughout the program the global count1 is hidden by the local count1 variables. The first new output statement is:cout << "Value of global count1 = " << ::count1 // From outer block
<< endl;
This uses the scope resolution operator (::) to make it clear to the compiler that we want to reference the global variable count1, not the local one. You can see that this works from the value displayed in the output.
In the inner block, the global count1 is hidden behind two variables called count1: the inner count1 and the outer count1. We can see the global scope resolution operator doing its stuff within the inner block, as you can see from the output generated by the statement we have added there:cout << "Value of global count1 = " << ::count1 // From inner block
<< endl;
This outputs the value 100, as before - the long arm of the scope resolution operator used in this fashion always reaches a global variable.
We mentioned namespaces earlier in this chapter, when discussing the namespace std - we accessed the namespace std by employing the using directive. Alternatively, we can access a namespace by using the scope resolution operator - for example, we can write std::endl to access the end-of-line operator in the standard library. In the example above, we are using the scope resolution operator to search the global namespace for the variable count1. By not specifying a namespace in front of the operator, the compiler knows that must it search the global namespace for the name that follows it.
We'll be seeing a lot more of this operator when we get to talking about object-oriented programming, in which context it is used extensively. We'll also talk further about namespaces, including how to create your own, in Chapter 5.
Static Variables
It's conceivable that you might want to have a variable that's defined and accessible locally, but which also continues to exist after exiting the block in which it is declared. In other words, you need to declare a variable within a block scope, but to give it static storage duration. The static specifier provides you with the means of doing this, and the need for this will become more apparent when we come to deal with functions in Chapter 4.
In fact, a static variable will continue to exist for the life of a program even though it is declared within a block and only available from within that block (or its sub-blocks). It still has block scope, but it has static storage duration. To declare a static integer variable called count, you would write:static int count;
If you don't provide an initial value for a static variable when you declare it, then it will be initialized for you. The variable count declared here will be initialized with 0. The default initial value for a static variable is always 0, converted to the type applicable to the variable. Remember that this is not the case with automatic variables. If you don't initialize your automatic variables, they will contain junk values left over from the program that last used the memory they occupy.

No comments: