Thread Optimization Checks : Code Prominence [Archive]

OpenRCE_adityaks

February 7th, 2008, 12:50

Terminology:

Profiling: Traversing through Program for Run Time behavioral Checks. The profiling is basically dissected into two parts. The segregation is done on the size of Code Segments analyzed and the Interdependncy of segments.

Macroprofiling: Performing Run Time checks for complex code segment. This type of profiling basically deals with large software code and complexity of calls between them.

Microprofiling: Performing Run Time checks for single line code or short code segments.

Throughput: The number of instructions executed by the processor in per unit time.

Latency: It is desrcibed as the time interval required to complete the on production cycle.

The profiling calculates the run time usage and CPU utilization to query the resultant affect on the system.

Analytical View.
This entry strictly deals with the Thread Optimization Checks. When the concept of optimization is undertaken the Profiling of code is a Logical aspect that has to follow. For smaller segments of code [ single line command execution] , process of Microprofling is followed. When larger codes are encountered , the Macroprofiling is applied. When any process is initialized , threads will be generated based on the code that is executing. For all type of functions defined and called , it will generate a thread in system state during execution. The Instruction Usage plays a crucial role in Profiling. The more complex the code is the more timing lapse will be there to resolute the inherent complexity factor. It means the Latency factor is high. This in turn utilizes the CPU state.
Lets go through Thread Entry structure which will crystallizes the objects used:

Code:

typedef struct tagTHREADENTRY32 

                         {  

                            DWORD dwSize;  

                            DWORD cntUsage;  

                            DWORD th32ThreadID;  

                            DWORD th32OwnerProcessID;  

                            LONG tpBasePri;  

                            LONG tpDeltaPri;  

                            DWORD dwFlags;

                         } THREADENTRY32, *PTHREADENTRY32;



The level are given as:



                                  THREAD_PRIORITY_IDLE

                                  THREAD_PRIORITY_LOWEST

                                  THREAD_PRIORITY_BELOW_NORMAL

                                  THREAD_PRIORITY_NORMAL

                                  THREAD_PRIORITY_ABOVE_NORMAL

                                  THREAD_PRIORITY_HIGHEST

                                  THREAD_PRIORITY_TIME_CRITICAL



                         BOOL WINAPI Thread32First(

                                         HANDLE hSnapshot,

                                         LPTHREADENTRY32 lpte

                             );



                         BOOL WINAPI Thread32Next

                            (

                                          HANDLE hSnapshot,

                                          LPTHREADENTRY32 lpte

                             );



The thread entry structure is utilized extensively.

The code segments directly reflect the working aspect of Thread that will be executed in the context of memory. Lets see the model of Code Profiling.

http://www.secniche.org/code_profiling.gif

This model clearly presents the run time peripherals of code profiling process. The main point of this model is to point the out the kind of characteristic to look for profiling , when the code snippets are analyzed. So it depends a lot on the type of code is executing. The code that involves nested loops , pointer referencse and in depth code interdependencies will create a subtle enviornment to profile. The Latency rate will be high with low throughput.For Example of a complex code segment.

Code:



                          void printError( TCHAR* msg )

                                {

                                               DWORD eNum;

                                               TCHAR sysMsg[256];

                                               TCHAR* p;



                                         eNum = GetLastError( );

                                         FormatMessage( FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,

                                         NULL, eNum,

                                         MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), // Default language

                                         sysMsg, 256, NULL );



                              // Trim the end of the line and terminate it with a null

                              p = sysMsg;

                             

                             while( ( *p > 31 ) || ( *p == 9 ) )

                             ++p;

                             do { *p-- = 0; } while( ( p >= sysMsg ) &&

                             ( ( *p == '.' ) || ( *p < 33 ) ) );

                             

                             // Display the message

                             printf( "&#92;n  WARNING: %s failed with error %d (%s)", msg, eNum, sysMsg );

                  }

The ThreadEntry structure provides an description of thread in any process when the snapshot is generated. TheThread32First function provides information of first thread of any process in a system. Lets see the micropofiling model:

http://www.secniche.org/micro_profiling.gif

So this shows a general view point.To get into it and for practical citation I am going to ThreadProfile a windows binary which will execute in a serial manner and Intel VTune Profiler will be used for analysis. The binary I am going to analyze is Boo32.exe.. and the working paradigm is provided below:

Code:

 

                        E:&#92;tools>BOO32.EXE

                        BOO32 -- Simple Win32 boot sector read/write utility.

                        Copyright (c) 1998 Data Fellows.



                        usage: boo32 [-r | -w] filename [drive]



                        filename: boot sector image file (512 bytes).

                        drive:    a letter and a colon (e.g. "A:" for boot sector,

                        or a decimal number for MBR (e.g. "0" for the first physical hard drive).

                        Default is "A:".



                        -r        read the sector to the image file

                        -w        write the sector from the image file (this is the default)

I run the boo32.exe from console and I feed it in Intel Vtune Thread profiler. Let's see the view.

http://www.secniche.org/boo32.gif

The profiler is projecting the serial nature of binary i.e the simple running but not as such specific operation carried out.Then I run pslist.exe which will enumerates number of processes in a system with the help of Thread Generation.

Code:



E:&#92;tools>pslist



               Process information for KNOCK:



               Name                Pid Pri Thd  Hnd   Priv        CPU Time    Elapsed Time

               Idle                  0   0   1    0      0    83:10:18.734     0:00:00.000

               System                4   8  54  293      0     1:05:27.281     0:00:00.000

               smss                400  11   3   21    216     0:00:00.015    96:06:18.265

               csrss               508  13  14  451   2300     0:08:46.531    96:06:14.000

               winlogon            540  13  23  574   8388     0:00:18.359    96:06:09.062

               services            596   9  16  269   2220     0:01:03.750    96:05:58.125

               lsass               608   9  14  387   3328     0:00:17.828    96:05:57.812

               svchost             792   8  17  213   3524     0:00:00.968    96:05:53.484

               svchost             868   8  12  306   2368     0:00:05.812    96:05:52.203

               svchost             948   8  46 1166  20652     0:00:49.765    96:05:51.828

               svchost            1056   8  11  168   1828     0:00:56.671    96:05:50.906

               explorer           1332   8  16  899  44580     0:22:53.359    96:05:42.796

               googletalk         1680   8  20  526  33124     0:27:11.593    96:04:51.312

               IEXPLORE           1388   8  14  688  83844     0:43:19.843    79:25:23.968

               svchost            1196   8   8  138   2720     0:00:00.484    79:14:05.296

               spoolsv             588   8  10  132   3688     0:00:00.796    30:14:08.375

               acrotray           1896   8   2   31    984     0:00:00.078    24:35:06.546

               Opera              1356   8   9  237  50260     0:02:15.187     2:39:50.093

               winamp             1108   8  14  268  14088     0:00:23.218     1:55:54.109

               console            3476   8   2   31   2812     0:00:03.296     1:01:36.875

               cmd                4012   8   1   30   2152     0:00:00.078     1:01:36.390

               dexplore           1032   8   5  292  10424     0:00:11.250     0:49:13.796

               notepad            4084   8   1   30   1176     0:00:00.265     0:33:25.953

               VTuneEnv           2872   8   9  510  50208     0:02:05.359     0:16:29.453

               vtunecca           1560  13   8  268  10004     0:00:00.203     0:16:24.625

               wmiapsrv           2904   8   3  151   1664     0:00:01.125     0:16:21.718

               mspaint            1144   8   5  142   7032     0:00:01.359     0:06:55.750

               wuauclt            3804   8   7  172   6784     0:00:00.281     0:02:55.296

               cmd                3392   8   1   31   2152     0:00:00.078     0:00:09.125

               pslist             1036  13   2   87    896     0:00:00.062     0:00:00.031

Let see the Thread Profiling for optimization checks:

http://www.secniche.org/pslist_profile.gif

So one can see the Limit checks , the Blue Code states it is over utilized.Lets see the summary for checking the Crticial Sections if they are used i.e. Mutexes , Semaphores etc.

http://www.secniche.org/pslist_sum.gif

After looking the summary thinga are some what clear for Blue code. The examples are taken in generalized manner to show the changes occur with resultant threads. With complex system the results are different. But with the process of Optimizationa and Profiling the code can be controlled in a sequential manner.

More opinions are required.

Regards
0kn0ck

https://www.openrce.org/blog/view/1050/Thread_Optimization_Checks_:_Code_Prominence