引子:C 语言中`int main()`和 `void main()` 有何区别?
前言
文章基于官方文档汇总而成,引用地址会放在文章末尾,若有修改意见欢迎评论指出~
渣翻译请见谅,如果翻译有定义性错误,还请评论指出,谢谢 !
刚开始上课的时候,难免会接触到int main()
和void main()
,当初我只是初步认识到了void main()
是一种 不标准 的写法,但是今天进一步认识到void main()
算是一种Undefined behavior(未定义行为)。
main函数原型&&终止行为
在C和C++中流传着很多版本的main
函数原型,不同的书里也有不同的写法。今天我从几种标准(C89/99/11以及C++98/03/11/14)的角度来寻找一下什么是“标准行为”以及在主函数中return后发生了什么。
比较常见的是下面几种:
void main()
main()
int main()
int main(void)
int main(int argc,char *argv[])
void main()
首先,从标准角度(所有版本)来说,void main()
肯定是错的,没有任何标准(C89/99/11以及C++98/03/11/14)中允许过这种写法。
但是我在APUE里看到了一种把主函数写为void main()
的原因,不知道是不是有人从这个角度说的然后就以讹传讹了。
The problem is that these compilers don’t know that an exit from main is the same as a return. One way around these warnings, which become annoying after a while, is to use return instead of exit from main. But doing this prevents us from using the UNIX System’s grep utility to locate all calls to exit from a program. Another solution is to declare main as returning void, instead of int, and continue calling exit. This gets rid of the compiler warning but doesn’t look right (especially in a programming text), and can generate other compiler warnings, since the return type of main is supposed to be a signed integer.
还有一种可能是从嵌入式来的,没有操作系统,入口点是硬件实现,返回任何东西都没意义。
main()
在K&R C与C89里,函数没有显式声明返回类型,则默认是int
:
C89对函数定义的语法(Syntax)描述如下(注意declaration-specifiers的opt下标符号):
$${declaration\textrm{-}specifiers*{opt}}\hspace{2mm}{declarator\hspace{2mm}declaration\textrm{-}list*{opt}}\hspace{2mm}{compound\textrm{-}statement}$$
C89中declaration-specifiers在Syntax上为:
- storage-class-specifier
- type-specifier
- type-qualifier
表明在C89中函数的return type
可以省略。
K&R C里的描述如下:
Various parts may be absent; a minimal function is
dummy() {}
which does nothing and returns nothing. A do-nothing function like this is sometimes useful as a place holder during program development. If the return type is omitted,
int
is assumed.
所以说:
func(){}
// 等价于
int func(){}
但是这种方式在C99之后就被废除掉了(注意declaration-specifiers没有opt下标了):
declaration-specifiers declarator declaration-list(opt) compound-statement综上,在C89中,函数的返回类型可以省略,但默认为int
,即
主函数声明
main()
隐式是int main()
。
int main()
int main()
和int main(void)
在C语言中是有区别的:
int main()
// 不等价于
int main(void)
在C语言中参数列表为空(即不提供参数列表也不为void),表示不提供参数数量和参数类型信息:
int func(){
print("func()\n");
return 0;
}
int main(void){
func(1,2,3,4);// call func();
}
The empty list in a function declarator that is not part of a definition of that function specifies that no information about the number or types of theparameters is supplied.
C99/11 Standard
在C99/11标准中,明确定义了对于标准的main函数的两个原型:
The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters:
程序启动时调用的函数名为main,这种实现声明了这个函数没有原型。 它应该用int的返回类型来定义,并且不带参数:
int main(void) { /* ... */ }
or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared):
或者带有两个参数(这里称为argc和argv,尽管使用任何名称,因为它们是本地声明的函数):
int main(int argc, char *argv[]) { /* ... */ }
or equivalent;or in some other implementation-defined manner.
If they are declared, the parameters to the main function shall obey the following constraints:
或者等价;或者以某种其他编译器相关实现的方式。
如果被声明,主函数的参数应该遵守以下约束条件:
- The value of argc shall be nonnegative.
- argv[argc] shall be a null pointer.
- If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup. The intent is to supply to the program information determined prior to program startup from elsewhere in the hosted environment. If the host environment is not capable of supplying strings with letters in both uppercase and lowercase, the implementation shall ensure that the strings are received in lowercase.
- If the value of argc is greater than zero, the string pointed to by argv[0]
represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment. If the value of argc is greater than one, the strings pointed to by argv[1] through argv[argc-1]
represent the program parameters. - The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.
- argc的值应该是非负的。
- argv[argc] 应该是一个空指针。
- 如果argc的值大于零,则argv [0]至argv [argc-1]包含的数组成员应该包含指向字符串的指针,这些字符串在程序启动之前由系统环境给定已被定义的值。 其目的是向程序提供在程序启动之前从托管环境中的其他地方确定的信息。 如果系统环境不能提供大小写字母的字符串,则这类实现应确保字符串以小写形式接收。
- 如果argc的值大于零,则argv [0]指向的字符串代表程序名称; 如果程序名称在主机环境中不可用,那么argv [0][0]应为空字符。 如果argc的值大于1,argv [1]至argv [argc-1]指向的字符串则代表程序参数。
- 参数argc和argv以及argv数组所指向的字符串应该可以被程序修改,并在程序启动和程序终止之间保留它们的最后存储的值。
C++ Strandard
由于C和C++中对于函数参数列表的规则并不一致(C++中参数列表为空代表不接收任何参数)。所以C++中main的原型和ISO C也并不太一样:
- a function of () returning int and
- a function of (int, pointer to pointer to char) returning int
main 返回的值
main必须要有返回值的原因是:在C和C++中使用return-statement都是将return的值作为参数来调用exit
/std::exit
来终止程序。
If status is zero or EXIT_SUCCESS, an implementation-defined form of the status successful termination is returned.
ISO C99/11:
If the return type of the main function is a type compatible with int, a return from the initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument;reaching the } that terminates the main function returns a value of 0. If the return type is not compatible with int, the termination status returned to the host environment is unspecified.
如果main函数的返回类型是与int兼容的类型,则从初始调用返回到主函数相当于以main函数返回的值为参数来调用exit函数;到达终止 }主函数返回值 0.如果返回类型与int不兼容,则返回到主机环境的终止状态是未指定的。
Forward references: definition of terms (ISO/IEC 9899:201x 7.1.1), the exit function (ISO/IEC 9899:201x 7.22.4.4).
注意:
ISO/IEC 9899:201x -6.9.1 #12
If the } that terminates a function is reached, and the value of the function call is used by
the caller, the behavior is undefined.如果 }到达终止函数,并且函数调用的值被使用的行为是未定义的。
所以函数结尾不写return是undefined behavior
*在Xcode9.0环境下,不写return的函数无法通过编译*
ISO C++11/14:
A return statement in main has the effect of leaving the main function (destroying any objects with automatic storage duration) and calling std::exit with the return value as the argument. If control reaches the end of main without encountering a return statement, the effect is that of executing
main中的return语句具有离开main函数的功能(销毁具有自动存储持续时间的任何对象)并以返回值作为参数调用 std :: exit。 如果控制到达主函数的末尾而没有遇到return语句,那么就是执行:
return 0;
待补充
exit
#include <stdlib.h>
void exit(int status);
The exit function causes normal program termination to occur. If more than one call to the exit function is executed by a program, the behavior is undefined.
.exit 函数使正常的程序终止执行。如果一个程序执行多个对 exit 函数的调用,其行为将是不确定的。
- First, all functions registered by the atexit function are called, in the reverse order of their registration,except that a function is called after any previously registered functions that had already been called at the time it was registered. If, during the call to any such function, a call to the longjmp function is made that would terminate the call to the registered function, the behavior is undefined.
- Next, all open streams with unwritten buffered data are flushed, all open streams are closed, and all files created by the tmpfile function are removed.
- Finally, control is returned to the host environment. If the value of status is zero or EXIT_SUCCESS, an implementation-defined form of the status successful termination is returned. If the value of status is EXIT_FAILURE, an implementation-defined form of the status unsuccessful termination is returned. Otherwise the status returned is implementation-defined.
- 首先,按照相反的顺序调用 atexit 函数所注册的所有函数,除非在注册时调用了之前已经调用的任何已经注册的函数。 如果在调用任何这样的函数期间调用 longjmp 函数来终止对已注册函数的调用,则其行为是不确定的。
- 接下来,所有开放的未写入的缓冲区数据流被擦除,并关闭所有开放的数据流,并移除由 tmpfile 函数创建的所有文件。
- 最后,控制权返回到主机环境。 如果状态值为zero或EXIT_SUCCESS,则返回状态 successful termination的自定义形式。 如果状态值为EXIT_FAILURE,则返回状态不成功终止的实现定义形式。 否则,返回的状态是实现其定义的。
The exit function cannot return to its caller.
_Exit
#include <stdlib.h>
void _Exit(int status);
The _Exit
function causes normal program termination to occur and control to be returned to the host environment. No functions registered by the atexit function or signal handlers registered by the signal function are called. The status returned to the host environment is determined in the same way as for the exit function (7.20.4.3).Whether open streams with unwritten buffered data are flushed, open streams are closed,or temporary files are removed is implementation-defined.
The _Exit
function cannot return to its caller.
_Exit
函数使正常的程序终止发生,然后控制返回到主机环境。 没有被 atexit 函数注册的函数或者被信号函数注册的信号处理程序将会被调用。 返回到主机环境的状态的确定方式与退出函数(7.20.4.3)的方式相同。无论开放且具有未写入缓冲区的数据流是否被擦除,开放的流是否是关闭的,或者临时文件是否被移除,都是实现其定义的。
_Exit
函数不能返回给它的调用者。
Undefined behavior
定义
在计算机程序设计中,未定义行为(英语:undefined behavior)是指行为不可预测的计算机代码。这是一些编程语言的一个特点,最有名的是在C语言中。在这些语言中,为了简化标准,并给予实现一定的灵活性,标准特别地规定某些操作的结果是未定义的,这意味着程序员不能预测会发生什么事。
例如,在C语言中,在任何自动对象被初始化之前,通过非字符类型的左值表达式读取这个变量存储的值会产生未定义行为,除以零或访问数组定义的界限之外的元素(参见缓冲区溢出)也会产生未定义行为。在一般情况下,之后的任何行为是不确定的;甚至只要程序的执行存在未定义行为,在引起未定义行为操作发生之前也可能不要求保证程序的行为可预测(如ISO C++)。特别地,标准从来没有要求编译器判断未定义行为,因此,如果程序调用未定义行为,可能会成功编译,甚至一开始运行时没有错误,只会在另一个系统上,甚至是在另一个日期运行失败。当一个未定义行为的实例发生时,正如语言标准所说,“什么事情都可能发生”,也许什么都没有发生。
和未指定行为(unspecified behavior)不同,未定义行为强调基于不可移植或错误的程序构造,或使用错误的数据。一个符合标准的实现可以在假定未定义行为永远不发生(除了显式使用不严格遵守标准的扩展)的基础上进行优化,可能导致原本存在未定义行为(例如有符号数溢出)的程序经过优化后显示出更加明显的错误(例如死循环)。因此,这种未定义行为一般应被视为bug。
Example in C and C++
Attempting to modify a string literal causes undefined behavior
尝试修改字符串字面量会产生未定义行为:
char * p = "CFC";// valid C, ill-formed in C++11, deprecated C++98/C++03
//C++11中错误,C++98/C++03不推荐使用
p[0] = 'W'; // undefined behavior 未定义行为
防止这一点的方法之一是将它定义为数组而不是指针:
char p[] = "CFC"; /* 正确 */
p[0] = 'W';
Integer division by zero results in undefined behavior
除以零会导致未定义行为。
return x/0; // undefined behavior
Certain pointer operations may result in undefined behavior:
某些指针操作可能导致未定义行为:
int arr[4] = {0, 1, 2, 3};
int *p = arr + 5; // undefined behavior for indexing out of bounds
p = 0;
int a = *p; // undefined behavior for dereferencing a null pointer
In C and C++, the comparison of pointers to objects is only strictly defined if the pointers point to members of the same object, or elements of the same array. Example:
在C和C ++中,如果指针指向相同对象的成员,或者指向同一个数组的元素,那么只能严格定义指向对象的指针。样例:
int main(void)
{
int a = 0;
int b = 0;
return &a < &b; /* undefined behavior */
}
Reaching the end of a value-returning function (other than main()) without a return statement results in undefined behavior if the value of the function call is used by the caller:
到达返回数值的函数(除main函数以外)的结尾,而没有一个return语句,会导致未定义行为:
int f(){
} /* undefined behavior if the value of the function call is used*/
Modifying an object between two sequence points more than once produces undefined behavior. It is worth mentioning that there are considerable changes in what causes undefined behavior in relation to sequence points as of C++11. The following example will however cause undefined behavior in both C++ and C.
在两个连续点之间多次修改对象会产生未定义的行为。值得一提的是,与C ++ 11相比,未定义的行为与顺序点的关系有所改变。下面的例子然而会导致C ++和C 中未定义的行为
i = i++ + 1; // undefined behavior
When modifying an object between two sequence points, reading the value of the object for any other purpose than determining the value to be stored is also undefined behavior.
在两个连续点之间修改对象时,除了确定要存储的值之外,读取对象的值也是未定义的行为。
a[i] = i++; // undefined behavior
printf("%d %d\n", ++n, power(2, n)); // also undefined behavior
Benefits
Documenting an operation as undefined behavior allows compilers to assume that this operation will never happen in a conforming program. This gives the compiler more information about the code and this information can lead to more optimization opportunities.
将记录为未定义的行为允许编译器假定这个行为永远不会在符合的程序中执行。这给了编译器更多关于代码的信息,这些信息可以有更多机会去优化代码。
C语言的一个例子:
An example for the C language:
int foo(unsigned char x){
int value = 2147483600; /* assuming 32 bit int */
value += x;
if (value < 2147483600)
bar();
return value;
}
The value of x
cannot be negative and, given that signed integer overflow is undefined behavior in C, the compiler can assume that at the line of the if check value >= 2147483600
. Thus the if
and the call to the function bar
can be ignored by the compiler since the if
has no side effects and its condition will never be satisfied. The code above is therefore semantically equivalent to:
x
不能为负值,并且考虑到有符号整数溢出在C中是未定义的行为,编译器可以假设在if检查value >= 2147483600
。因此if
对函数的调用bar
可以被编译器忽略,因为if
没有带来任何影响,它的条件永远不会被满足。上面的代码因此在语义上等同于:
int foo(unsigned char x){
int value = 2147483600;
value += x;
return value;
}
Had the compiler been forced to assume that signed integer overflow has wraparound behavior, then the transformation above would not have been legal.
Such optimizations become hard to spot by humans when the code is more complex and other optimizations, like inlining, take place.
Another benefit from allowing signed integer overflow to be undefined is that it makes it possible to store and manipulate a variable’s value in a processor register that is larger than the size of the variable in the source code. For example, if the type of a variable as specified in the source code is narrower than the native register width (such as “int on a 64-bit machine, a common scenario), then the compiler can safely use a signed 64-bit integer for the variable in the machine code it produces, without changing the defined behavior of the code. If the behavior of a 32-bit integer under overflow conditions was depended upon by the program, then a compiler would have to insert additional logic when compiling for a 64-bit machine, because the overflow behavior of most machine code instructions depends on the register width.
A further important benefit of undefined signed integer overflow is that it enables, though does not require, such erroneous overflows to be detected at compile-time or by static program analysis, or by run-time checks such as the Clang and GCC sanitizers and valgrind; if such overflow was defined with a valid semantics such as wrap-around then compile-time checks would not be possible.
如果编译器被迫假定有符号整数溢出具有环绕行为,那么上面的转换就不合法了。
当代码更复杂时,人们难以发现这样的优化,并作出其他优化,如内联。
允许有符号整数溢出未定义的另一个好处是可以在处理器寄存器中存储和操作变量的值,该寄存器的值大于源代码中变量的大小。例如,如果源代码中指定的变量类型比本地寄存器宽度窄(例如64位机器上的“ int ” ,这是最常见的情况),那么编译器可以安全地使用带符号的 64 位寄存器 机器代码中产生的整数变量,而不会改变代码的定义行为。如果在溢出条件下32位整数的行为被程序所依赖,那么在编译64位机器时,编译器将不得不考虑额外的逻辑,因为大多数机器代码指令的溢出行为取决于寄存器宽度.
未定义的有符号整数溢出的另一个重要好处是,它可以在编译时或通过静态程序分析或通过运行时检查(如Clang和GCC sanitizers和valgrind来检测这种溢出的错误; 如果这样的溢出是用一个有效的语义来定义的,比如环绕行为,那么编译时检查是不可能的。
Risks
C and C++ standards have several forms of undefined behavior throughout, which offers increased liberty in compiler implementations and compile-time checks at the expense of undefined run-time behavior if present. In particular, there is an appendix section dedicated to a non-exhaustive listing of common sources of undefined behavior in C. Moreoever, compilers are not required to diagnose code that relies on undefined behavior, due to current static analysis limitations. Hence, it is common for programmers, even experienced ones, to unintentionally rely on undefined behavior either by mistake, or simply because they are not well-versed in the rules of the language that can span over hundreds of pages. This can result in bugs that are exposed when optimizations are enabled on the compiler, or when a compiler of a different vendor or version is used. Testing or fuzzing with dynamic undefined behavior checks enabled, e.g. the Clang sanitizers, can help to catch undefined behavior not diagnosed by the compiler or static analyzers.
In scenarios where security is critical, undefined behavior can lead to security vulnerabilities in software. When GCC’s developers changed their compiler in 2008 such that it omitted certain overflow checks that relied on undefined behavior, CERT issued a warning against the newer versions of the compiler. Linux Weekly News pointed out that the same behavior was observed in PathScale C, Microsoft Visual C++ 2005 and several other compilers; the warning was later amended to warn about various compilers.
C和C ++标准在整个制定过程中都有几种不确定的行为,如果存在未定义的运行行为,则会增加编译器执行和编译时检查的自由度。特别是有一个附录部分,专门用于列举C中不确定行为的常见来源。不过,由于当前的静态分析限制,编译器不需要诊断依赖于未定义行为的代码。因此,甚至经验丰富的程序员通常会无意中依赖未定义的行为,或者是因为他们不熟悉可能长达数百页的语言规则。这可能会导致在编译器上启用优化时或在使用不同发行商或不同版本的编译器时公开的bug。启用动态未定义行为检查(例如Clang sanitizers)的测试或模糊测试可以帮助捕获未经编译器或静态分析器诊断的未定义行为。
在安全性至关重要的情况下,未定义的行为会导致软件中出现安全漏洞。GCC的开发人员在2008年修改了他们的编译器,忽略了某些依赖于未定义行为的溢出检查时,CERT会对新版本的编译器发出警告。 Linux Weekly News 指出,同样的行为存在于 PathScale C, Microsoft Visual C++ 2005和其他一些编译器; 警告规则在后来已经被修订,用来警告各种编译器.
References
- ISO/IEC (1999). ISO/IEC 9899:1999(E): Programming Languages – C §6.5 Expressions para. 2
- ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages – C++ §2.13.4 String literals [lex.string] para. 2
- ISO/IEC (2003). ISO/IEC 14882:2003(E): Programming Languages – C++ §5.6 Multiplicative operators [expr.mul] para. 4
- ISO/IEC (2007). ISO/IEC 9899:2007(E): Programming Languages – C §6.9 External definitions para. 1
- ISO/IEC (2011) .ISO/IEC 9899:201x(E): Programming Languages – C § 5.1.2.2.1Program startup para. 1,2
- ANSI X3.159-1989 Programming Language C, footnote 26
- https://imzlp.me/posts/15272/
- “Vulnerability Note VU#162289 — gcc silently discards some wraparound checks”. Vulnerability Notes Database. CERT. 4 April 2008. Archived from the original on 9 April 2014.
- “Order of evaluation – cppreference.com”. en.cppreference.com. Retrieved 2016-08-09.
- https://zh.wikipedia.org/wiki/%E6%9C%AA%E5%AE%9A%E4%B9%89%E8%A1%8C%E4%B8%BA
- https://en.wikipedia.org/wiki/Undefined_behavior#Examples_in_C_and_C++
- Undefined behavior can result in time travel. 27 Jun 2014 [2015-03-09].
- Lattner, Chris. What Every C Programmer Should Know About Undefined Behavior. LLVM Project Blog. LLVM.org. May 13, 2011 [May 24, 2011].
- The Jargon File on “nasal demons”,未定义行为的一个可能后果。