In the last tutorial, we have built our hello_world.c
using the command line, revealing some make
steps and mechanisms. Before growing your code with fancy functions and header files, it is also important to understand the process of compilation. Apparently, to transform your source code (hello_world.c
) into an executable (hello_world_exc
) takes four steps: Preprocessing, Assembling, Compiling and Linking. In fact, all of these four processes invisibly happen behind the scene every time your compile. If you want to see them, try this
gcc -v -Wall -g -o hello_world_exc hello_world.c
Notice a new flag -v
, which stands for "verbose", indicating that you want to see everything.
If you tried the previous command, your screen should now be bombarded with information. It was because the simple hello_world.c is indeed not that trivial, especially to your machine. But don't fret, just read on!
1) Preprocessing (from .c to .i)
In summary, there are three tasks in this step, namely: Comment Stripping, Text Substitution and File Inclusion. Firstly, remember we put some non-programmatic texts, or comments, after the //
, those are only useful to human coders, hence the Comment Stripping. Secondly, this special character # is called a Preprocessor Directive. #include
means File Inclusion which, in our case, request the header file of Standard Input Output Library, which in turn requests for other libraries too. On the other hand,#define
will mean Text Substitution.
To see the preprocessed file, try
gcc -E hello_world.c -o hello_world.i
If you peeked inside this file, you would see something similar to this
# 1 "hello_world.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "hello_world.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/features.h" 1 3 4
# 367 "/usr/include/features.h" 3 4
# 1 "/usr/include/x86_64-linux-gnu/sys/cdefs.h" 1 3 4
# 410 "/usr/include/x86_64-linux-gnu/sys/cdefs.h" 3 4
# 1 "/usr/include/x86_64-linux-gnu/bits/wordsize.h" 1 3 4
# 411 "/usr/include/x86_64-linux-gnu/sys/cdefs.h" 2 3 4
# 368 "/usr/include/features.h" 2 3 4
# 391 "/usr/include/features.h" 3 4
# 1 "/usr/include/x86_64-linux-gnu/gnu/stubs.h" 1 3 4
# 10 "/usr/include/x86_64-linux-gnu/gnu/stubs.h" 3 4
# 1 "/usr/include/x86_64-linux-gnu/gnu/stubs-64.h" 1 3 4
# 11 "/usr/include/x86_64-linux-gnu/gnu/stubs.h" 2 3 4
# 392 "/usr/include/features.h" 2 3 4
# 28 "/usr/include/stdio.h" 2 3 4
.........
extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));
# 942 "/usr/include/stdio.h" 3 4
# 2 "hello_world.c" 2
# 2 "hello_world.c"
int main(){
printf("Hello world! \n");
return 0;
}
Here, the #
indicates line number and the whole file basically means a series of file calling each other, for example file hello_world.c
(line #1
) called to file /usr/include/stdio.h
(line #1
and #27
) and subsequently, file /usr/include/features.h
(line #1
and #367
), and so on.
2) Compiling (from .i to .s)
The next step is to compile the preprocessed file into assembly code, depending on the processor and system.
To see this file, try
gcc -S hello_world.i -o hello_world.s
If you take a peek into hello_world.s
, you will see some assembly-language instructions like this
.file "hello_world.c"
.section .rodata
.LC0:
.string "Hello world! "
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %edi
call puts
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609"
.section .note.GNU-stack,"",@progbits
3) Assembling (from .s to .o)
We are pretty close now, next step is to convert Assembly-language to machine language, with the Assembler. If your code requires functions from other codes, the Assembler will leave it blank, which is to be filled later by the Linker in the next step. To obtain the machine code, invoke this command
as hello_world.s -o hello_world.o
or directly from source hello_world.c
, using gcc
gcc -c hello_world.c -o hello_world.o
4) Linking
The mechanism of linking multiple machine code is quite simple, however, the actually command could be quite involving, if you have to do it manually
ld -dynamic-linker /lib/ld-linux.so.2/usr/lib/crt1.o ... /lib/a/b/d.o ... hello_world.o
Fortunately, gcc have a command to do this automatically for us
gcc hello_world.o
and this would give out the default a.out
, which run exactly like our previously compiled hello_world_exc
. However, if you still love this name for your executable, you could have linked by this command
gcc hello_world.o -o hello_world_exc
Summary
In summary, the process of compilation comprises of 4 steps: Preprocessing, Compiling, Assembling and Linking. I would like to attach here a beautiful graph by Prof.Chua Hock-Chuan, Nanyang TU, Singapore.
Noted that the output of the graph is an (.exe) because this graph is originally produced to illustrate the process on Windows OS. On Linux, the executable does not have this extension.
Reference
http://codingfreak.blogspot.com/2008/02/compilation-process-in-gcc.html
https://www3.ntu.edu.sg/home/ehchua/programming/cpp/gcc_make.html