Hey guys, most of you might have wondered how does a C/C++ program actually get transformed into an executable code. Well, that involves a number of steps and also a few components, viz.
1. The source code – The textual version of the code, written by the programmer. It includes the C or C++ statements, along with the preprocessor statements ( those starting with a '#' ). The source code has the extension .c or .cpp (and some other like .C, ...)
2. The Preprocessor – The source code is fed to the preprocessor, which replaces the # statements with the respective statements to form the expanded/extended source code. Generally it has the extension *.i
3. The Compiler – The *.i file is fed to the compiler, which checks the code for any syntax error , and reports error(s) if any. Remember that the compiler never checks for any dependencies or relation in the code with the other code. It will check for syntax error only. It produces the assembly code (*.asm or *.s).
4. The Assembler – The assembler converts the assembly code to “Relocatable object code” having the extension *.o or *.obj
5. The Linker – Final and one of the most important blocks of the entire procedure. It checks for the dependencies and resolves them and hence combine two or more object codes to final executable code, or the *.exe (*.out in some OS's) file.
1. The source code – The textual version of the code, written by the programmer. It includes the C or C++ statements, along with the preprocessor statements ( those starting with a '#' ). The source code has the extension .c or .cpp (and some other like .C, ...)
2. The Preprocessor – The source code is fed to the preprocessor, which replaces the # statements with the respective statements to form the expanded/extended source code. Generally it has the extension *.i
3. The Compiler – The *.i file is fed to the compiler, which checks the code for any syntax error , and reports error(s) if any. Remember that the compiler never checks for any dependencies or relation in the code with the other code. It will check for syntax error only. It produces the assembly code (*.asm or *.s).
4. The Assembler – The assembler converts the assembly code to “Relocatable object code” having the extension *.o or *.obj
5. The Linker – Final and one of the most important blocks of the entire procedure. It checks for the dependencies and resolves them and hence combine two or more object codes to final executable code, or the *.exe (*.out in some OS's) file.
This entire
conversion from source code to executable code is called the build
process.
Now let us take a
very simple example. Here we write a small C program named “sample.c”
:
We will be using
linux operating system as it has simple and useful tools to show the
step by step process.
Here you may stuck
at two points:
- We have not used any preprocessor statements like #include ,etc. This is just to keep the code and description simple. So remember that we can exclude preprocessing here. Hence we can save our file as sample.i which indicates that we don't need it get preprocessed.
- There is no definition (body) of the function “func(int)”. This is to show the work of linker and compiler only. It would get clear below.
Now compile the code
using following command:
gcc -c sample.c
Remember, the
command “gcc -o sample.out sample.c” will do the eintire build
process. But gcc with -c option only compiles and it is not linked,
to produce the sample.o file (This is the object code)
Viola! Compilation
was successful, though we know that the function “func” has no
body, but still our code is semantically correct [correct by
syntax]. Hence it was compiled properly.
Now type in your
terminal the following code:
nm sample.o
The nm command shows
you the symbolic version of the object code, which is supposed to be
fed to the linker.
You will get the
following output:
'U' means unresolved
dependency. This shows that 'func' has an unresolved dependency.
Now try this :
gcc -o sample.out
sample.c
This will do the
entire build process.
As expected, you
will get the following error:
ld is the GNU C
linker, which when tries to find the body of 'func', inside the code
and the standard C library, fails, hence, generates this error.
Hence it is the
linker which searches and resolves the dependencies of the functions
in our code.
Now let us come to a
standard question. Why we use #include and how does this work?
Let us take #include
<stdio.h>
Have you guys ever
opened the file stdio.h ?
If you open it, you
will find that it contains only the declaration (prototype) of the
printf() or scanf() (and so on) functions. It does not contain the
body of those functions.
The compiler
actually needs only the prototype, so as to check whether we are
supplying correct arguments to the function or not. The compiler
never checks for the body of the printf() function.
Hence, for the
prototype to be included in our code, we are using the
#include<stdio.h> statement.
The body of the
function is in object code format, in the C library [ here it is with
the GNU C library glibc ]
At the linking time,
the linker searches for the body of printf() function in standard C
library , which is already in object code. These object codes come
with the compiler set itself.
Thus the linker when
finds the object code for printf(), comnbines it with the object code
of our C program to generate the final executable file.
Now what are loader
and debugger??
Loader loads the
executable code from the secondary memory to RAM.
The Debugger is a
feature included with the IDE in general, and helps the user to
insert breakpoints in our code. The compiler compiles the code and
stops at each breakpoint. This facilates in removing any user made
errors (bugs) from the code, and hence the name Debugger.
Final interresting
point:
Try the following
code in your terminal:
objdump –disassemble
sample.o
The “OBJect DUMP”
command with the disassemble option shows you the object code in
assembly language [Remember the *.asm or *.s file? ]
Isn't it
interesting?
Very informative! Did all this, and never cared too see what happens behind the scenes :D
ReplyDeleteThanks, sometimes getting to know "behind the scenes" is more interesting ;)
Delete