Profiling a C code

How to profile a code? How to count the executed cycles? How to check the execution time of a code?

          Profiling a code is extremely important to understand program behaviour. We need code profiling tools to evaluate how well code will perform on a architecture, to identify critical issues and/or to find out instruction scheduling, performance of an algorithm.

         There are so many soft wares available online to profile a code: https://en.wikipedia.org/wiki/List_of_performance_analysis_tools, but which one to use?

         I used RDTSC and clock_gettime() to profile a C code. This post will explain you profiling a C code using both the instructions.


What is RDTSC (Read Time-Stamp Counter)?


         RDTSC is a time step counter which loads the current value of the processor's time-stamp counter into the EDX:EAX registers. The time-stamp counter is contained in a 64-bit MSR. The high-order 32 bits of the MSR are loaded into the EDX register, and the low-order 32 bits are loaded into the EAX register. The processor monotonically increments the time-stamp counter MSR every clock cycle and resets it to 0 whenever the processor is reset.


Intel processors allow the programmers to access a time-stamp counter. The time-stamp counter keeps an accurate count of every cycle that occurs on the processor. To access this counter, we use the RDTSC instruction. This instruction loads the high-order 32 bits of the register into EDX, and the low-order 32 bits into EAX.


Code execution time: clock_gettime()?

         RDTSC will give you number of executed cycles, now how will you get the executing time? ‘clock()’ function defined in ‘time.h’ provides time in low resolution which is not suitable for code profiling. I would suggest to use ‘clock_gettime()’ which provides time in nanosecond resolution.

        The prototype of the function is:
 int clock_gettime(clockid_t clk_id, struct timespec *tp)

       Here clk_id allows to select specific clock offered by the system. I used CLOCK_REALTIME which is a system-wide real time clock.
       The current clock time, for the chosen clock is stored in the struct provided by the *tp pointer.

        The timespec struct is defined as follows:


                                          struct timespec 
                                         {
                                                           time_t tv_sec;  /* seconds */
                                                                   long tv_nsec;   /* nanoseconds */
                                      };


C Code for profiling?





/*
 *     Sakshama Ghoslya,
 *   Hyderabad, India
 */


/* Function to calculate execution cycles and execution time */

#include<stdio.h>
#include<time.h>


/*
 RDTSC is a current time-stamp counter variable ,which is a 64-bit variable, into registers (edx:eax).
 TSC(time stamp counter) is incremented every cpu tick (1/CPU_HZ)
*/

unsigned long long int getCycles()
{
    unsigned int low;
unsigned int high;
     __asm__ __volatile__ ("rdtsc" : "=a" (low), "=d" (high));
    
return ((unsigned long long int)high << 32) | low;
}


int main()
{
unsigned long long int startClock, endClock, executedCycles;

struct timespec startTime, endTime;

long int index; 

/* To get the current time */
clock_gettime(CLOCK_REALTIME, &startTime);

/* to get the current cycles */
startClock = getCycles();           

/*********************************************************************
* *
* *
* Put you code here *
*  Execution time and cycles will be calculated for this code    *
* *
* *
**********************************************************************/


/* To get the current time */
clock_gettime(CLOCK_REALTIME, &endTime);

/* to get the current cycles */
endClock = getCycles();

/* Total executed cycles */
executedCycles = endClock - startClock;

/* Execution time in milli seconds */
/* 
* Time in milli seconds = (time in seconds: tv_sec)*1000 + (time in nano seconds: tv_nsec)/1000000 
*/
double executionTime = ((endTime.tv_sec - startTime.tv_sec)*1000 + (endTime.tv_nsec - startTime.tv_nsec)/(1.0*1000000)); 

/* Execution time in nano seconds */
executionTime = executionTime*1000000;

/* Print total executed cycles */
printf("\nExecuted Cycles = %llu cycles", executedCycles);

/* print total execution time */
printf("\nExecution Time  = %0.0f nano seconds\n", executionTime);

/* Print CPU clock frequency */
printf("\nClock Frequency = %f GHz", executedCycles/executionTime);

/* Print CPU clock duration */
printf("\nClock Duration = %f nano seconds\n", executionTime/executedCycles);

return 0;
}




         If you divide the ‘number of executed cycles’ by ‘execution time’ you will get ‘clock frequency’ of your CPU.

Note:


  • Make sure you are running sufficient number of instructions to get more accurate results.
    (If you have small piece of code then run it in a loop) 
  • Close all the programs running in your processor before profiling the code.



No comments:

Post a Comment