Thursday, September 27, 2007

Variable Types and Casting

Variable Types and Casting
Calculations in C++ can only be carried out between values of the same type. When you write an expression involving variables or constants of different types, for each operation to be performed the compiler has to convert the type of one of the operands to match that of the other. This conversion process is called casting. For example, if you want to add a double value to an integer, the integer value is first converted to double, after which the addition is carried out. Of course, the variable which contains the value to be cast is itself not changed. The compiler will store the converted value in a temporary memory location which will be discarded when the calculation is finished.
There are rules governing the selection of the operand to be converted in any operation. Any expression to be calculated can be broken down into a series of operations between two operands. For example, the expression 2*3-4+5 amounts to the series 2*3 resulting in 6, 6-4 resulting in 2, and finally 2+5 resulting in 7. Thus, the rules for casting operands where necessary only need to be defined in terms of decisions about pairs of operands. So, for any pair of operands of different types, the following rules are checked in the order that they are written. When one applies, that rule is used.
Rules for Casting Operands
If either operand is of type long double, the other is converted to long double.
If either operand is of type double, the other is converted to double.
If either operand is of type float, the other is converted to float.
Any operand of type char, signed char, unsigned char, short, or unsigned short is converted to type int.
An enumeration type is converted to the first of int, unsigned int, long, or unsigned long that accommodates the range of the enumerators.
If either operand is of type unsigned long, the other is converted to unsigned long.
If one operand is of type long and the other is of type unsigned int, then both operands are converted to type unsigned long.
If either operand is of type long, the other is converted to type long.
We could try these rules on a hypothetical expression to see how they work. Let's suppose that we have a sequence of variable declarations as follows:double value = 31.0;
int count = 16;
float many = 2.0f;
char num = 4;
Let's also suppose that we have the following rather arbitrary arithmetic statement:value = (value - count)*(count - num)/many + num/many;
We can now work out what casts the compiler will apply. The first operation is to calculate (value - count). Rule 1 doesn't apply but Rule 2 does, so the value of count is converted to double and the double result 15.0 is calculated. Next (count - num) must be evaluated, and here the first rule in sequence which applies is Rule 4, so num is converted from char to int, and the result 12 is produced as a value of type int. The next calculation is the product of the first two results, a double 15.0 and an int 12. Rule 2 applies here and the 12 is converted to 12.0 as double, and the double result 180.0 is produced. This result now has to be divided by many, so Rule 2 applies again and the value of many is converted to double before generating the double result 90.0. The expression num/many is calculated next, and here Rule 3 applies to produce the float value 2.0f after converting the type of num from char to float. Lastly, the double value 90.0 is added to the float value 2.0f for which Rule 2 applies, so after converting the 2.0f to 2.0 as double, the final result of 92.0 is stored in value as a double.
In spite of the last paragraph reading a bit like The Auctioneer's Song, I hope you get the general idea.
Casts in Assignment Statements
As we saw in example Ex1_04.cpp earlier in this chapter, you can cause an implicit cast by writing an expression on the right-hand side of an assignment that is of a different type to the variable on the left-hand side. This can cause values to be changed and information to be lost. For instance, if you assign a float or double value to an int or a long variable, the fractional part of the float or double will be lost and just the integer part will be stored. (You may lose even more information, if your floating point variable exceeds the range of values available for the integer type concerned.)
For example, after executing the following code fragment,int number = 0;
float decimal = 2.5f;
number = decimal;
the value of number will be 2. Note the f at the end of the constant 2.5. This indicates to the compiler that this constant is single precision floating point. Without the f, the default would have been double. Any constant containing a decimal point is floating point. If you don't want it to be double precision, you need to append the f. A capital F would do the job just as well.
Explicit Casts
With mixed expressions involving the basic types, your compiler automatically arranges casting where necessary, but you can also force a conversion from one type to another by using an explicit cast. To cast the value of an expression to a given type, you write the cast in the form:static_cast(expression)
The keyword static_cast reflects the fact that the cast is checked statically - that is, when your program is compiled. Later, when we get to deal with classes, we will meet dynamic casts, where the conversion is checked dynamically - that is, when the program is executing. The effect of the cast is to convert the value that results from evaluating expression to the type that you specify between the angled brackets. The expression can be anything from a single variable to a complex expression involving lots of nested parentheses.
Here's a specific example of the use of static_cast<>():double value1 = 10.5;
double value2 = 15.5;
int whole_number = static_cast(value1) + static_cast(value2);
The initializing value for the variable whole_number is the sum of the integral parts of value1 and value2, so they are each explicitly cast to type int. The variable whole_number will therefore have the initial value 25. The casts do not affect the values stored in value1 and value2, which will remain as 10.5 and 15.5 respectively. The values 10 and 15 produced by the casts are just stored temporarily for use in the calculation and then discarded. Although both casts cause a loss of information in the calculation, the compiler will always assume that you know what you are doing when you explicitly specify a cast.
Also, as we described in Ex1_04.cpp, when relating to assignments with different types, you can always make it clear that you know the cast is necessary by making it explicit:strips_per_roll = static_cast(rolllength/height); //Get number of strips in a roll
You can write an explicit cast for any standard type, but you should be conscious of the possibility of losing information. If you cast a float or double value to long, for example, you will lose the fractional part of the value converted, so if the value started out as less than 1.0, the result will be 0. If you cast double to float, you will lose accuracy because a float variable has only 7 digits precision, whereas double variables maintain 15. Even casting between integer types provides the potential for losing data, depending on the values involved. For example, the value of an integer of type long can exceed the maximum that you can store in a variable of type short, so casting from a long value to a short may lose information.
In general, you should avoid casting as far as possible. If you find that you need a lot of casts in your program, the overall design of your program may well be at fault. You need to look at the structure of the program and the ways in which you have chosen data types to see whether you can eliminate, or at least reduce, the number of casts in your program.
C++ provides three other types of explicit cast, although these are not so common as the static_cast<>(). These types are:
reinterpret_cast<>() - this handles casts between unrelated types, for example, an integer to a pointer
const_cast<>() - this removes the const qualifier
dynamic_cast<>() - this allows you to cast between pointers (and references) to polymorphic classes in an inheritance hierarchy. Don't worry about what these terms mean at the moment, we'll discuss them in more detail in chapter 8.
Old-Style Casts
Prior to the introduction of the new style casts into C++, an explicit cast of the result of an expression to another type was written as:(the_type_to_convert_to)expression
The result of expression is cast to the type between the parentheses. For example, the statement to calculate strips_per_roll in our previous example could be written:strips_per_roll = (int)(rolllength/height); //Get number of strips in a roll
Essentially, there are four different kinds of casts, and the old-style casting syntax covers them all. Because of this, code using the old-style casts is more error prone - it is not always clear what you intended, and you may not get the result you expected. Although you will still see the old style of casting used extensively (it's still part of the language), I strongly recommend that you stick to using only the new casts in your code.
The Bitwise Operators
The bitwise operators treat their operands as a series of individual bits rather than a numerical value. They only work with either integer variables or constants as operands, so only data types short, int, long and char can be used. They are useful in programming hardware devices, where the status of a device is often represented as a series of individual flags (that is, each bit of a byte may signify the status of a different aspect of the device), or for any situation where you might want to pack a set of on-off flags into a single variable. You will see them in action when we look at input/output in detail, where single bits are used to control various options in the way data is handled.
There are six bitwise operators:
& Bitwise AND Bitwise OR ^ Bitwise Exclusive OR
~ Bitwise NOT >> Shift right << Shift left
Let's take a look at how each of them works.
The Bitwise AND
The bitwise AND, &, is a binary operator that combines corresponding bits in its operands. If both corresponding bits are 1, the result is a 1 bit, and if either or both operand bits are 0, the result is a 0 bit.
The effect of a particular binary operator is often shown using what is called a truth table. This shows, for various possible combinations of operands, what the result is. The truth table for & is as follows:
Bitwise AND
0
1
0
0
0
1
0
1
For each row and column combination, the result of & combining the two is the entry at the intersection of the row and column. Let's see how this works in an example:char Letter1 = 'A', Letter2 = 'Z', Result = 0;
Result = Letter1 & Letter2;
We need to look at the bit patterns to see what happens. The letters 'A' and 'Z' correspond to hexadecimal values 0x41 and 0x5A respectively (see Appendix B for ASCII codes). The way in which the bitwise AND operates on these two values is shown below:
You can confirm this by looking at how corresponding bits combine with & in the truth table. After the assignment, Result will have the value 0x40, which corresponds to the character '@'.
Because the & produces zero if either bit is zero, we can use this operator to make sure that unwanted bits are zero in a variable. We achieve this by creating what is called a 'mask' and combining with the original variable using &. We create the mask by putting 1 where we want to keep a bit, and 0 where we want to set a bit to zero. The result will be 0s where the mask bit is 0, and the same value as the original bit in the variable where the mask is 1. Suppose we have a char variable, Letter, where, for the purposes of illustration, we want to eliminate the high order 4 bits, but keep the low order 4 bits. (Remember that a char variable occupies one byte, which is eight bits.) This is easily done by setting up a mask as 0x0F and combining it with the letter using & like this,Letter = Letter & 0x0F;
or, more concisely:Letter &= 0x0F;
If Letter started out as 0x41, it would end up as 0x01 as a result of either of these statements. This operation is clearly illustrated in the diagram below:
The 0 bits in the mask cause corresponding bits in Letter to be set to 0, and the 1 bits in the mask cause corresponding bits to be kept.
Similarly, you can use a mask of 0xF0 to keep the 4 high order bits, and zero the 4 low order bits. Therefore, this statement,Letter &= 0xF0;
will result in the value of Letter being changed from 0x41 to 0x40.
The Bitwise OR
The bitwise OR, , sometimes called the inclusive OR, combines corresponding bits such that the result is a 1 if either operand bit is a 1, and 0 if both operand bits are 0. The truth table for the bitwise OR is:
Bitwise OR
0
1
0
0
1
1
1
1
We can exercise this with an example of how we could set individual flags packed into a variable of type int. Let's suppose that we have a variable called style, of type short, which contains 16 individual 1-bit flags. Let's suppose further that we are interested in setting individual flags in the variable style. One way of doing this is by defining values that we can combine with the OR operator to set particular bits on. To use in setting the rightmost bit, we can define:short VREDRAW=0x01;
For use in setting the second-to-rightmost bit, we could define the variable hredraw as:short HREDRAW=0x02;
So we could set the rightmost two bits in the variable style to 1 with the statement:style = HREDRAWVREDRAW;
The effect of this statement is illustrated in the diagram below:
Because the OR operation results in 1 if either of two bits is a 1, ORing the two variables together produces a result with both bits set on.
A very common requirement is to be able to set flags in a variable without altering any of the others which may have been set elsewhere. We can do this quite easily with a statement such as:style = HREDRAWVREDRAW;
This statement will set the two rightmost bits of the variable style to 1, leaving the others at whatever they were before the execution of this statement.
The Bitwise Exclusive OR
The exclusive OR, ^, is so called because it operates similarly to the inclusive OR but produces 0 when both operand bits are 1. Therefore, its truth table is as follows:
Bitwise EOR
0
1
0
0
1
1
1
0
Using the same variable values that we used with the AND, we can look at the result of the following statement:result = letter1^letter2;
This operation can be represented as:
letter1 0100 0001
letter2 0101 1010
EORed together produce:
result 0001 1011
The variable result is set to 0x1B, or 27 in decimal notation.
The ^ operator has a rather surprising property. Suppose that we have two char variables, first with the value 'A', and last with the value 'Z', corresponding to binary values 0100 0001 and 0101 1010. If we write the statements,first ^= last; // Result first is 0001 1011
last ^= first; // Result last is 0100 0001
first ^= last; // Result first is 0101 1010
the result of these is that first and last have exchanged values without using any intermediate memory location. This works with any integer values.
The Bitwise NOT
The bitwise NOT, ~, takes a single operand for which it inverts the bits: 1 becomes 0, and 0 becomes 1. Thus, if we execute the statement,result = ~letter1;
and letter1 is 0100 0001, then the variable result will have the value 1011 1110, which is 0xBE, or 190 as a decimal value.
The Bitwise Shift Operators
These operators shift the value of an integer variable a specified number of bits to the left or right. The operator >> is for shifts to the right, while << is the operator for shifts to the left. Bits that 'fall off' either end of the variable are lost. The illustration below shows the effect of shifting the 2 byte variable left and right, with the initial value shown.
This one's on the production drive as 59_02_05.cdr
We declare and initialize a variable called number with the statement:unsigned short number = 16387U;
As we saw in the last chapter, we should write unsigned literals with a letter U or u appended to the number. We can shift the contents of this variable with the statement:number <<= 2; // Shift left two bit positions
The left operand of the shift operator is the value to be shifted, and the number of bit positions that the value is to be shifted is specified by the right operand. The illustration shows the effect of the operation. As you can see, shifting the value 16,387 two positions to the left produces the value 12. The rather drastic change in the value is the result of losing the high order bit.
We can also shift the value to the right. Let's reset the value of number to its initial value of 16,387. Then we can write:number >>= 2; // Shift right two bit positions
This shifts the value 16,387 two positions to the right, storing the value 4,096 in the variable number. Shifting right two bits is effectively dividing the value by 4 (without remainder). This is also shown in the illustration.
As long as bits are not lost, shifting n bits to the left is equivalent to multiplying the value by 2, n times. In other words, it is equivalent to multiplying by 2n. Similarly, shifting right n bits is equivalent to dividing by 2n. But beware: as we saw with the left shift of the variable number, if significant bits are lost, the result is nothing like what you would expect. However, this is no different from the multiply operation. If you multiplied the two-byte number by four you would get the same result, so shifting left and multiply are still equivalent. The problem of accuracy arises because the value of the result of the multiply is outside the range of a two-byte integer.
You might imagine that confusion could arise with the operators that we have been using for input and output. As far as the compiler is concerned, the meaning will always be clear from the context. If it isn't, the compiler will generate a message, but you need to be careful. For example, if you want to output the result of shifting a variable number left by two bits, you could write:cout << (number << 2);
Here, the parentheses are essential. Without them, the shift operator will be interpreted by the compiler as a stream operator, so you won't get the result that you intended.
In the main, the right shift operation is similar to the left shift. For example, if the variable number has the value 24, and we execute the statement,number >>= 2;
it will result in number having the value 6, effectively dividing by 4. However, the right shift operates in a special way with signed integer types that are negative (that is, the sign bit, which is the leftmost bit, is 1). In this case, the sign bit is propagated to the right. For example, let's declare and initialize a variable number, of type char, with the value -104 in decimal:char number = -104; // Binary representation is 1001 1000
Now we can shift it right 2 bits with the operation:number >>= 2; // Result 1110 0110
The decimal value of the result is -26, as the sign bit is repeated. With operations on unsigned integer types, of course, the sign bit is not repeated and zeros appear.
These shift operations can be faster than the regular multiply or divide operations on some computers - on an Intel 80486, for example, a multiply is slower than a shift left by at least a factor of 3. However, you should only use them in this way if you are sure you are not going to lose bits that you can ill afford to be without.

No comments: