Thursday, September 27, 2007

Data Types in C++

Data Types in C++
The sort of information that a variable can hold is determined by its data type. All data and variables in your program must be of some defined type. C++ provides you with a range of standard data types, specified by particular keywords. We have already seen the keyword int for defining integer variables. As part of the object-oriented aspects of the language, you can also create your own data types, as we shall see later. For the moment, let's take a look at the elementary numerical data types that C++ provides.
Integer Variables
As we have said, integer variables are variables that can only have values that are whole numbers. The number of players in a football team is an integer, at least at the beginning of the game. We already know that you can declare integer variables using the keyword int. These are variables which occupy 4 bytes in memory and can take both positive and negative values.
The upper and lower limits for the values of a variable of type int correspond to the maximum and minimum signed binary numbers which can be represented by 32 bits (4 bytes). The upper limit for a variable of type int is 231-1, and the lower limit is -231.
In Visual C++, the keyword short also defines an integer variable, this time occupying two bytes. The keyword short is equivalent to short int.
C++ also provides another integer type, long, which can also be written as long int. In this case, we can write the statement,long bigNumber = 1000000L, largeValue = 0L;
where we declare the variables bigNumber and largeValue with initial values 1000000 and 0 respectively. The letter L appended to the end of the values specifies that they are long integers. You can also use the small letter l for the same purpose, but it has the disadvantage that it is easily confused with the numeral 1.
We don't include commas when writing large numeric values in a program.
Integer variables declared as long occupy 4 bytes and since this is the same as variables declared as int using Visual C++ 6.0, they have the same range of values.
With other C++ compilers, long and long int may not be the same as int, so if you expect your programs to be compiled in other environments, don't assume that long and int are equivalent. For truly portable code, you should not even assume that an int is 4 bytes (for example, under older 16-bit versions of Visual C++ an int was 2 bytes).
The char Data Type
The char data type serves a dual purpose. It specifies a one-byte variable that you can use to store integers, or to store a single ASCII character, which is the American Standard Code for Information Interchange. We can declare a char variable with this statement:char letter = 'A';
This declares the variable letter and initializes it with the constant 'A'. Note that we specify a value which is a single character between single quotes, rather than the double quotes which we used previously for defining a string of characters to be displayed. A string of characters is a series of values of type char, which are grouped together into a single entity called an array. We will discuss arrays and how strings are handled in C++ in Chapter 3.
Because the character 'A' is represented in ASCII by the decimal value 65, we could have written this:char letter = 65; // Equivalent to A
to produce the same result as the previous statement. The range of integers that can be stored in a variable of type char is from -128 to 127.
We can also use hexadecimal constants to initialize char variables (and other integer types). A hexadecimal number is written using the standard representation for hexadecimal digits: 0 to 9, and A to F (or a to f) for digits with values from 10 to 15. It's also preceded by 0x (or 0X) to distinguish it from a decimal value. Thus, to get exactly the same result again, we could rewrite the last statement as follows:char letter = 0x41; // Equivalent to A
Don't write decimal integer values with a leading zero. The compiler will interpret such values as octal (base 8), so a value written as 065 will be equivalent to 53 in normal decimal notation.
Integer Type Modifiers
Variables of the integral types char, int, short or long, which we have just discussed, contain signed values by default. That is, they can store both positive and negative values. This is because the default type modifier for these types is the modifier signed. So, wherever we wrote char, int, or long, we could have written signed char, signed int, or signed long respectively.
If you are sure that you don't need to store negative values in a variable (for example, if you were recording the number of miles you drive in a week), then you can specify a variable as unsigned:unsigned long mileage = 0UL;
Here, the minimum value that can be stored in the variable mileage is zero, and the maximum value is 4,294,967,295 (that's 232-1). Compare this to the range of -2,147,483,648 to 2,147,483,647 for a signed long. The bit which is used in a signed variable to determine the sign, is used in an unsigned variable as part of the numeric value instead. Consequently, an unsigned variable has a larger range of positive values, but it can't take a negative value. Note how a U (or u) is appended to unsigned constants. In the above example, we also have appended L to indicate that the constant is long. You can use either upper or lower case for U and L and the sequence is unimportant, but it's a good idea to adopt a consistent way of specifying such values.
Of course, both signed and unsigned are keywords, so you can't use them as variable names.
Floating Point Variables
Values which aren't integral are stored as floating point numbers. A floating point number can be expressed as a decimal value such as 112.5, or with an exponent such as 1.125E2 where the decimal part is multiplied by the power of 10 specified after the E (for Exponent). Our example is, therefore, 1.125×102, which is 112.5.
A floating point constant must contain a decimal point, or an exponent, or both. If you write neither, you have an integer.
You can specify a floating point variable using the keyword double, as in this statement:double in_to_mm = 25.4;
A double variable occupies 8 bytes of memory and stores values accurate to 15 decimal digits. The range of values stored is much wider than that indicated by the 15 digits accuracy, being from 1.7×10-308 to 1.7×10308, and can be either positive or negative. Note that you can't get exactly zero with a floating point number, which can occasionally cause some strange errors in calculations.
If you don't need 15 digits precision, and you don't need the massive range of values provided by double variables, you can opt to use the keyword float to declare floating point variables occupying 4 bytes. For example, the statement,float pi = 3.14159f;
defines a variable pi with the initial value 3.14159. The f at the end of the constant specifies it to be a float type. Without the f, the constant would have been of type double. Variables declared as float are of 7 decimal digits precision and can have values from 3.4×10-38 to 3.4×1038, positive and negative.
You can find a complete summary of the various data types in the MSDN online documentation, provided with Visual C++ 6.0.
Logical Variables
Logical variables can only have two values: a value called true or a value called false. The type for a logical variable is bool, named after George Boole, who developed Boolean algebra. Variables of type bool are used to store the results of tests which can be either true or false, such as whether one value is equal to another. You can declare a variable of type bool with the statement:bool testResult;
Of course, you can also initialize them when you declare them:bool colorIsRed = true;
You will find that the values true and false are used quite extensively with variables of numeric type, and particularly of type int. This is a hangover from the time before variables of type bool were implemented in C++. The symbols TRUE and FALSE commonly represented the integers 1 and 0 respectively, which generally worked in the same way as the bool values true and false. Note that TRUE and FALSE are not keywords in C++, and they are not legal bool values.
Variables with Specific Sets of Values
You will sometimes be faced with the need for variables that have a limited set of possible values which can be usefully referred to by labels - the days of the week, for example, or months of the year. There is a specific facility in C++ to handle this situation, called an enumeration. Let's take one of the examples we have just mentioned - a variable that can assume values corresponding to days of the week. We can define this as follows:enum Week {Mon, Tues, Wed, Thurs, Fri, Sat, Sun} this_week;
This declares an enumeration type called Week and the variable this_week, which is an instance of the enumeration type Week that can only assume the values specified between the braces. If you try to assign anything other than one of the set of values specified to this_week, it will cause an error. The symbolic names listed between the braces are known as enumerators. In fact, each of the names of the days will be automatically defined as representing a fixed integer value. The first name in the list, Mon, will have the value 0, Tues will be 1, and so on. By default, each successive enumerator is one larger than the value of the previous one. If you would prefer the implicit numbering to start at a value other than zero, you can just writeenum Week {Mon = 1, Tues, Wed, Thurs, Fri, Sat, Sun} this_week;
and they will be equivalent to 1 through 7. The enumerators don't even need to have unique values. You could define Mon and Tues as both having the value 1 for example, with the statement:enum Week {Mon = 1, Tues = 1, Wed, Thurs, Fri, Sat, Sun} this_week;
As it's the same as an int, the variable this_week will occupy four bytes, as will all variables which are of an enumeration type.
Having defined the form of an enumeration, you can define another variable thus:enum Week next_week;
This defines a variable next_week as an enumeration that can assume the values previously specified. You can also omit the keyword enum in declaring a variable so, instead of the previous statement, you could write:Week next_week;
If you wish, you can assign specific values to all the enumerators. For example, we could define this enumeration:enum Punctuation {Comma=',', Exclamation='!', Question='?'} things;
Here we have defined the possible values for the variable things as the numerical equivalents of the appropriate symbols. The ASCII codes for these symbols are 44, 33 and 63 respectively in decimal. As you can see, the values assigned don't have to be in ascending order. If you don't specify all the values explicitly, values continue to be assigned incrementing by 1 from the last specified value, as in our second Week example.
You can omit the enumeration type if you don't need to define other variables of this type later. For example:enum {Mon, Tues, Wed, Thurs, Fri, Sat, Sun} thisWeek, nextWeek, lastWeek;
Here we have three variables declared that can assume values from Mon to Sun. Since the enumeration type is not specified, we cannot refer to it. Note that you cannot define other variables for this enumeration at all, since you would not be permitted to repeat the definition. Doing so would imply that you were redefining values for Mon to Sun, and this isn't allowed.
Defining Your Own Data Types
The typedef keyword enables you to define your own data type specifier. Using typedef, you could define the type name BigOnes as equivalent to the standard long int type with the declaration:typedef long int BigOnes; // Defining BigOnes as a type name
This defines BigOnes as an alternative type specifier for long int, so you could declare a variable mynum as long int with the declaration:BigOnes mynum = 0L; // Define a long int variable
There's no difference between this declaration and the one using the built-in type name. You could equally well use:long int mynum = 0L; // Define a long int variable
for exactly the same result. In fact, if you define your own type name such as BigOnes, you can use both type specifiers within the same program for declaring different variables that will end up as having the same type.
Since typedef only defines a synonym for an existing type, it may appear to be a bit superficial. We will see later that it can fulfill a very useful role in enabling us to simplify more complex declarations than we have met so far. We will also see later (in chapter 6) that classes provide us with a means of defining completely new data types.

No comments: