Low-Level Routines Are Error-Prone

C++ Primer 4/e在The IO Library Revisited这个地方有一个警告:‘In general, we advocate using the higher-level abstractions provided by the library. The IO operations that return int are a good example of why.

It is a common programming error to assign the return from get or one of the other int returning functions to a char rather than an int. Doing so is an error but an error the compiler will not detect. Instead, what happens depends on the machine and on the input data. For example, on a machine in which chars are implemented as unsigned chars, this loop will run forever:

char ch; // Using a char here invites disaster!

     // return from cin.get is converted from int to char and then compared to an int
     while ((ch = cin.get()) != EOF)
              cout.put(ch);

The problem is that when get returns EOF, that value will be converted to an unsigned char value. That converted value is no longer equal to the integral value of EOF, and the loop will continue forever.

At least that error is likely to be caught in testing. On machines for which chars are implemented as signed chars, we can’t say with confidence what the behavior of the loop might be. What happens when an out-of-bounds value is assigned to a signed value is up to the compiler. On many machines, this loop will appear to work, unless a character in the input matches the EOF value. While such characters are unlikely in ordinary data, presumably low-level IO is necessary only when reading binary values that do not map directly to ordinary characters and numeric values. For example, on our machine, if the input contains a character whose value is '\377' then the loop terminates prematurely. '\377' is the value on our machine to which -1 converts when used as a signed char. If the input has this value, then it will be treated as the (premature) end-of-file indicator.

Such bugs do not happen when reading and writing typed values. If you can use the more type-safe, higher-level operations supported by the library, do so.’

中文版的这样写:‘一般而言,我们倡导使用标准程式库提供的高阶抽象物件。“返回int”的IO操作可作为例子来解释其中原由。

将get()或其他任何“返回int”的函式的返回值赋值给某个char变数,是个常见的编程错误。这么做会造成错误,但编译器确无法察觉。其所导致的结果取决于机器本身和被输入的资料。如果在某一个机器上chars被实作为unsigned chars,下面的回圈就会无穷执行下去:

char ch;//这里使用char会带来灾难!
//cin.get 返回植被转换,从int转为char,而后和一个int比较
while((ch = cin.get()) != EOF)
cout.put(ch);

问题在于当get()返回EOF时,其值会被转为unsigned char,于是不再等于EOF整数值,导致回圈永远执行下去。

这个错误很可能在测试中遇上。另,如果你的机器将char实作为singned char,那就无从确认上述回圈会发生什么事。把一个越界值赋给signed数值时会发生什么事得由编译器决定。在很多机器上这个回圈似乎可以正常运作,除非遇到input stream内有个字元与EOF值相等。由于这样的字元不大可能出现于一般资料中,所以可推断低阶IO只在以下的情况才必要:读取的二进制值没有直接映射至常用字元和数值。例如我的机器上如果输入资料包含一个其值为’\377’的字元,上述回圈就会非正常结束。在我得的机器上’\377’是-1被视为signed char转换而得的值。如果输入资料含有这个值,它就会被(不正常地)视为end-of-file。

当读写带有型别(typed)的值时,不会出现上述错误。无论如何,如果你能使用标准程式库提供的更为安全的高阶操作,请使用它。’

习惯了C的写法的人应该不太容易习惯标准程式库的使用吧,不过还是要习惯才可以。