Profiling a C code

How to profile a code? How to count the executed cycles? How to check the execution time of a code?

          Profiling a code is extremely important to understand program behaviour. We need code profiling tools to evaluate how well code will perform on a architecture, to identify critical issues and/or to find out instruction scheduling, performance of an algorithm.

         There are so many soft wares available online to profile a code:, but which one to use?

         I used RDTSC and clock_gettime() to profile a C code. This post will explain you profiling a C code using both the instructions.

What is RDTSC (Read Time-Stamp Counter)?

         RDTSC is a time step counter which loads the current value of the processor's time-stamp counter into the EDX:EAX registers. The time-stamp counter is contained in a 64-bit MSR. The high-order 32 bits of the MSR are loaded into the EDX register, and the low-order 32 bits are loaded into the EAX register. The processor monotonically increments the time-stamp counter MSR every clock cycle and resets it to 0 whenever the processor is reset.

Intel processors allow the programmers to access a time-stamp counter. The time-stamp counter keeps an accurate count of every cycle that occurs on the processor. To access this counter, we use the RDTSC instruction. This instruction loads the high-order 32 bits of the register into EDX, and the low-order 32 bits into EAX.

Code execution time: clock_gettime()?

         RDTSC will give you number of executed cycles, now how will you get the executing time? ‘clock()’ function defined in ‘time.h’ provides time in low resolution which is not suitable for code profiling. I would suggest to use ‘clock_gettime()’ which provides time in nanosecond resolution.

        The prototype of the function is:
 int clock_gettime(clockid_t clk_id, struct timespec *tp)

       Here clk_id allows to select specific clock offered by the system. I used CLOCK_REALTIME which is a system-wide real time clock.
       The current clock time, for the chosen clock is stored in the struct provided by the *tp pointer.

        The timespec struct is defined as follows:

                                          struct timespec 
                                                           time_t tv_sec;  /* seconds */
                                                                   long tv_nsec;   /* nanoseconds */

C Code for profiling?

 *     Sakshama Ghoslya,
 *   Hyderabad, India

 * wnNrExeTimeX86Intel.c

/***************** HOW TO USE THIS FILE TO PROFILE YOUR CODE *****************
*   Make sure you run your code to multiple iterations (usually 1000+) to get
*   better results. Iter = number of iterations. (Keep default = 1)
*   Call wnNrExeTimeX86Intel(0, Iter) just before your code
*   Call wnNrExeTimeX86Intel(1, Iter) just after your code

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

RDTSC is a current time-stamp counter variable ,which is a 64-bit variable, into registers (edx:eax).
TSC(time stamp counter) is incremented every cpu tick (1/CPU_HZ)

unsigned long long int getCycles()
unsigned int low;
unsigned int high;
__asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high));

return ((unsigned long long int)high << 32) | low;

static unsigned long long int startClock, endClock;
unsigned long long int executedCycles;
static struct timespec startTime, endTime;

void wnNrExeTimeX86Intel(char flag, int Iter)
if (flag == 0)
/* To get the current time */
clock_gettime(CLOCK_REALTIME, &startTime);

/* to get the current cycles */
startClock = getCycles();

else if (flag == 1)
/* To get the current time */
clock_gettime(CLOCK_REALTIME, &endTime);

/* to get the current cycles */
endClock = getCycles();

/* Number of iterations can't be negative */
if (Iter <= 0)
Iter = 1;
printf("\nWarning: Number of iterations should be positive. Results may not be correct");

/* Total executed cycles */
executedCycles = endClock - startClock;

/* Execution time in milli seconds
* Time in milli seconds = (time in seconds: tv_sec)*1000 + (time in nano seconds: tv_nsec)/1000000
double exeTime = ((endTime.tv_sec - startTime.tv_sec) * 1000 + (endTime.tv_nsec - startTime.tv_nsec) / (1.0 * 1000000));

/* Execution time in nano seconds */
double exeTimeNano = exeTime * 1000000;

executedCycles = executedCycles/Iter;
exeTimeNano = exeTimeNano/Iter;

double exeTimeMicro = exeTimeNano/1000;

/* Print total executed cycles */
printf("\nProgram executed cycles = %llu cycles", executedCycles);

/* print total execution time */
printf("\nProgram execution time  = %0.0f nano secs | %0.3f micro secs\n", exeTimeNano,exeTimeMicro);

/* Print CPU clock frequency */
printf("\nSystem clock frequency = %f GHz", executedCycles / exeTimeNano);

/* Print CPU clock duration */
printf("\nSystem clock duration = %f Nano secs\n", exeTimeNano / executedCycles);

         If you divide the ‘number of executed cycles’ by ‘execution time’ you will get ‘clock frequency’ of your CPU.


  • Make sure you are running sufficient number of instructions to get more accurate results.
    (If you have small piece of code then run it in a loop) 
  • Close all the programs running in your processor before profiling the code.

1 comment:

  1. The term "low-code app development" didn't exist until a few years ago but the concept isn't a new one. business users who see an opportunity to optimize a process and take it upon themselves to create their own apps. Rather than spend the time and manual effort to code an app from scratch that is made up of common features and components, low code development platforms let the developers work from existing templates and drag prebuilt elements, forms, and objects together to get a particular department or team the simple working app they need with a lot less hassle.