Log in

View Full Version : ways to optimize a fastcall function masm/poasm


BanMe
December 23rd, 2010, 13:02
Code:

SVC_HEAP_INIT equ 0
SVC_HEAP_WRITE equ 1
SVC_HEAP_READ equ 2
SVC_HEAP_CLOSE equ 3

HEAP_INIT struct
ServerHeapBase PVOID ?
ServerHeapPart PVOID ?
ServerHeapIndex DWORD ?
HEAP_INIT ends

HEAP_READ struct
HeapOffset DWORD ?
Data DWORD ?
HEAP_READ ends

HEAP_DATA struct
Init HEAP_INIT <>
Read HEAP_READ <>
HEAP_DATA ends

PHEAP_DATA TYPEDEF PTR HEAP_DATA

NT_ERR macro
.if Eax
Int 3
.endif
endm

CHECK_RETURN macro
.if !Eax
Int 3
.endif
endm

.code
comment -
HeapInit Initializes the Server Heap that I use to store data such as handles for later use.ecx = service,edx = PHEAP_DATA
HeapWrite uses 3 Parameters eax is the value to write,ecx = the service called, edx = PTR HEAP_DATA
HeapRead uses only ecx = Service,2 parameters in PHEAP_DATA HeapOffset, and a ptr to the area to read the data to..
HeapClose same params as heap Init.
-
HeapManager PROC FASTCALL ServiceWORD,Params:PHEAP_DATA
test ecx,ecx
jz HeapInit
dec ecx
jz HeapWrite
dec ecx
jz HeapRead
dec ecx
jz HeapClose
mov eax,0
jmp HeapReturn
HeapInit:
;assume edx:PSTACK_DATA
push edx
push eax
push eax
push 01024h
push eax
push eax
push HEAP_ZERO_MEMORY
call RtlCreateHeap
CHECK_RETURN
pop edx
mov HEAP_DATA.Init.ServerHeapBase[edx],eax
push edx
xor ecx,ecx
push 01024h
push ecx
push eax
call RtlAllocateHeap
CHECK_RETURN
pop edx
mov HEAP_DATA.Init.ServerHeapPart[edx],eax
xor ecx,ecx
mov HEAP_DATA.Init.ServerHeapIndex[edx],ecx
mov eax,ecx
jmp HeapReturn
HeapWrite:
mov ecx,HEAP_DATA.Init.ServerHeapPart[edx]
push edx
mov edx,HEAP_DATA.Init.ServerHeapIndex[edx]
mov [ecx + edx*4],eax
pop edx
inc HEAP_DATA.Init.ServerHeapIndex[edx]
xor ecx,ecx
mov eax,ecx
jmp HeapReturn
HeapRead:
mov ecx,HEAP_DATA.Read.Data[edx]
mov eax,HEAP_DATA.Init.ServerHeapPart[edx]
add eax,HEAP_DATA.Read.HeapOffset[edx]
mov eax,[eax]
mov [ecx],eax
xor ecx,ecx
mov eax,ecx
jmp HeapReturn
HeapClose:
push edx
push HEAP_DATA.Init.ServerHeapPart[edx]
push MEM_RELEASE
push HEAP_DATA.Init.ServerHeapBase[edx]
call RtlFreeHeap
push HEAP_DATA.Init.ServerHeapBase[edx]
call RtlDestroyHeap
xor ecx,ecx
mov eax,ecx
pop edx
HeapReturn:
ret
HeapManager endp


So I have this code and I've worked out most of the bugs for use in masm using Poasm to compile it fastcall and polink to compile the masm obj with the poasm obj..

I've been trying to think of ways to minimize Use of API, while still achieving a desirable result..To this effect can see the internal api's(RtlCreate and RtlAllocate) used in this function could be could be substituted by the stack, the base could be re/stored and passed back, and unsafe 'writes' with ambigious registers could be avoided.This could optimize it even further..some thoughts or opinions?

BanMe

Indy
December 23rd, 2010, 15:18
I would not use the heap. Use the kernel api provided as services.

BanMe
December 23rd, 2010, 22:57
The whole point of my exercise is finding a way to do it without API..
the ways I've thought of as 'start' vectors would be the obvious reusage of how compilers setup local vars (add esp, size of locals).. This then leaves the tasks of 'sharing, reading from and writing to' this 'memory blob'. the TLS Data SEGMENT being modifiable if 'present' seems like a valid option for this..this reduces my write operation to a simple push instruction xD This also brought to be some masm kayaker code,as well as evaluators stumbling description So this exercise is successful so far..but this still has no outcome or test code yet..just "theories and research".

Maximus
December 24th, 2010, 04:54
i am not sure i do understand what you mean... however:

* if you use up to (8k-current stack size in its page) you can allocate the space you need in your stack with a simple sub/add.
* if you need up to 1Mb of space, simply loop descending and touch each 4k page of the stack until you reach the desired size. Then, use the variable.

you could also make your application 'require' more than 1mb of stack, just to be safe and ensure you'll have enough space.

BanMe
December 24th, 2010, 21:06
yes you do get it..:}

I understand your first suggestions but the latter confuzzles me..ie Why do I need to 'touch' them.Is this a reaction of some automatic stack enlargement based off of the current need for stack space?

Also I need very little space 4 pages is more then enough.

#dynamic TLS data to make this private stack memory available across 'certain' threads seems to be 1 viable path..

But how about others methods, like a loadable module(no entry) using the FARSTACK option and setting the ss on function call from a created thread(this is almost to controlled)..Theres got to be more to this one, as well but I haven't had the full time to get to far into it.

Indy
December 25th, 2010, 04:53
Do not quite understand the problem

If you need to use a global buffer, ie TLS you can use the following methods:
o Loading links to the buffer in TEB.
o Loading link at the bottom of the stack. The beginning was not used.
o Loading links into hyperspace in the modules. This free memory formed of memory granularity.
o Direct allocation TLS at various notifications.

Maximus
December 25th, 2010, 16:01
yep, all indy proposal are good, - just a question (google translator problem i guess...)

-->hyperspace in the modules<-- are you referring to the slack space of the last page of every module mapped in memory?

If you can allocate a buffer in main stack, then just place a pointer to its base in the PEB so you can retrieve it from any thread deferencing the PEB from the current TEB (maybe ImageSubsystemMinorVersion could do the work, i do not think it is checked alot by the system itself).

Synchro on write can be done pretty easily, use a LOCK CMPXCHG reg,baseaddressofbuffer and loop (using PAUSE and not nop!) until you get exclusive access to the buffer. then, use the same method to release the lock buffer.
On read, just dont bother of synchro, UNLESS you need to read buffer larger than 4 bytes and that needs to be 'in syncro': you need a read-lock in this case, as it might happen to have a write in between you read 2 different dwords.

@indy: bottom of the stack? do you mean by manually allocate a buffer there, or by touching all the pages until we get there? I dont see other ways to reach the bottom of the stack

@banme: windows place a pageguard over the next stack page - when it triggers, it expands the stack. when you allocate more than 4kb of local storage in a function, all compilers place a function call that 'touches' all the stack pages, so that the function wont fault under random casualties. just make a function with 5kb local, open it in ida and you see the call.

edit----
meanwhile, totally OT - Merry Christmas to all!!

Indy
December 25th, 2010, 16:16
Maximus
Quote:
yep, all indy proposal are good, - just a question (google translator problem i guess...)

The problem is not translating it in the description.

Quote:
-->hyperspace in the modules<-- are you referring to the slack space of the last page of every module mapped in memory?

Yes.

Quote:
bottom of the stack?

Code:
$GET_ENVIRONMENT macro Reg32
mov Reg32,fs:[TEB.Tib.StackBase]
mov Reg32,dword ptr [Reg32 - 4]
endm


Quote:
use a LOCK CMPXCHG reg,baseaddressofbuffer

This is unnecessary. Access to the alignment of four bytes of data are always atomically. If it matters, then the prefixes are causing problems with morphing

Mechanism for expanding the stack is simple - the last page is marked as a guard and with access to her manager allocates another.

Maximus
December 25th, 2010, 17:36
aah you're right, i were thinking of the 8b version initially (that's why i did suggest the lock on read >4bytes).
---edit
rethinking, the LOCK is needed anyway: (1) if you are running in a multi-processor server, you need the LOCK signal asserted. (2) xchg is implicitly locked, but not cmpxchg - in case of exactly parallel cmpxchg execution, i think you'll end up with read(1)read(2)write(2)write(1) (no lock, so this is allowed), thus you need to lock the memory any way.
---

For stack, mov reg, [ESP-1000] is way simpler - why do you suggest fetching the current base from TIB?

BanMe
December 26th, 2010, 21:12
*old code*

BanMe
December 27th, 2010, 23:46
TlsGetSet
I wanted a Thread local storage engine that uses a segment to store data into.
I also wanted to do it without API of anysort, Only the PE and asm.(masm/poasm)
This engine has to be fastcall so all parameters are passed in registers(no ebp pushes..)
I dont know why I want to do this, but I do. so maybe someone will enjoy..

So I wrote it.. and a simple test.. soon to be test(s)..
Code:

.486
.MODEL FLAT, STDCALL
OPTION CASEMAP:NONE
INCLUDE \masm32\include\windows.inc
INCLUDE \masm32\include\ntdll.inc
INCLUDELIB \masm32\lib\ntdll.lib


OPTION DOTNAME
.TLS SEGMENT DWORD FLAT PUBLIC 'TLS'
_tls_start LABEL DWORD
_tls_data DWORD 128 dup(0)
_tls_end LABEL DWORD
.TLS ENDS
OPTION NODOTNAME

.data
SVC_TLS_DATA_WRITE equ 0
SVC_TLS_DATA_READ equ 1
align 4

CallbackStub PROTO WORD,WORD,:PVOID
Callbacks DWORD CallbackStub,0,0;little extra space..
_tls_index DWORD 0
TLS_ENGINE struct
ReadData DWORD 0
ReadIndex DWORD 0
TlsIndex DWORD 0
TlsData DWORD 0
TLS_ENGINE ends

PTLS_ENGINE TYPEDEF PTR TLS_ENGINE

PUBLIC _tls_used
_tls_used IMAGE_TLS_DIRECTORY < _tls_start, _tls_end, _tls_index, Callbacks, 0, 0 >

ServerTls TLS_ENGINE <0,0,_tls_index,_tls_data>

.data?

@DynamicTlsSvc@8 PROTO SYSCALL
DynamicTlsSvc equ <@DynamicTlsSvc@8>

.code
start:
test_1 proc
mov eax,LdrLoadDll
lea edx,ServerTls
xor ecx,ecx
call DynamicTlsSvc;test write LdrLoadDll address to TLS_STACK..
inc ecx
call DynamicTlsSvc
;eax should equal LdrLoadDll
ret
test_1 endp
CallbackStub proc hInstanceWORD,fwReasonWORD,Context:PVOID;Callback
mov eax,1
Ret
CallbackStub endp

end start


Code:

.486
.model flat
option casemap:none

include \masm32\include\windows.inc
include \masm32\include\w2k\ntdll.inc
includelib \masm32\lib\ntdll.lib


SVC_TLS_DATA_WRITE equ 0
SVC_TLS_DATA_READ equ 1
TLS_ENGINE struct
ReadData DWORD 0
ReadIndex DWORD 0
TlsIndex DWORD 0
TlsData DWORD 0
TLS_ENGINE ends

PTLS_ENGINE TYPEDEF PTR TLS_ENGINE
.code
DynamicTlsSvc PROC FASTCALL ServiceWORD,Params:PTLS_ENGINE
test ecx,ecx
jz TlsWrite
dec ecx
jz TlsRead
mov eax,1
jmp TlsReturn
TlsWrite:
push ebx
mov ecx,TLS_ENGINE.TlsData[edx]
mov ebx,TLS_ENGINE.TlsIndex[edx]
mov ebx,[ebx]
mov [ecx+ebx*4],eax
inc TLS_ENGINE.TlsIndex[edx]
pop ebx
xor ecx,ecx
mov eax,ecx
jmp TlsReturn
TlsRead:
push ebx
mov ecx,TLS_ENGINE.TlsData[edx]
mov eax,TLS_ENGINE.ReadIndex[edx]
mov ebx,TLS_ENGINE.TlsIndex[edx]
mov ebx,[ebx]
cmp eax,ebx
jge TlsReturn
mov eax,[ecx+eax*4]
pop ebx
xor ecx,ecx
TlsReturn:
ret
DynamicTlsSvc endp


build.bat

Code:

\masm32\bin\poasm /AIA32 TlsSvc.asm
\masm32\bin\ml /c /coff testreadwrite.asm
\masm32\bin\polink /SUBSYSTEM:WINDOWS Sin32.obj TlsSvc.obj
cmd /k

Indy
December 29th, 2010, 17:15
Possible dynamic(it loading itself into the remote process and etc.) initialization TLS. For this we must find some variables/entries, such as LdrpInitializeTls, LdrpAllocateTls.

BanMe
December 29th, 2010, 20:04
Well I got to thinking, with that *twinkling* that you are correct, in the back of my mind..But only in that context that applies to 'agressive' behaviors, self utilization is more my goal,maybe a more thorough understanding of the PE model. You are correct in that we need some 'functionalities' of the Ldr for Tls to work appropriatly but I bet these can be overcome, maybe.. I can use the example of a PE filled out in code *not memory*, to construct a dll in memory and point the entrypoint to the CallbackStub .. xD iono Im just thinking here.. *side note* this can also be done by setting the Thread Local Storage Pointer in the thread but that adds code I want the system to do it for me.

vague references..
vortex's LoadPeToMem
http://www.masm32.com/board/?PHPSESSID=d09ac44c52c0d9029f89c9a198243842&topic=6920.0

HandCrafted PE..
http://www.masm32.com/board/?PHPSESSID=4fa8e6b45fd2aaf36f85623277d3f35d&topic=12240.0

Ricnar said something interesting on this ..

Quote:
[Originally Posted by Ricardo Narvaja;78695]Sorry for my bad english

the pointer of tls is in the header, if different of zero, go and look the address pointed by this and execute the code in this address, and you can change out of the header this pointer, and redirect the execution.
But if the tls pointer in the header is zero, the header is not writable, and we cannot change this pointer, and 99% of the programs have this pointer with value zero in the header, in this cases, tls will be not executed at all, and we cannot change this value of zero in the header, only this is valid for programs users of tls with value different of zero, not for all, i think, correct me if i'm wrong.

ricnar


ricnar was referring to the TLS of the Procedure sort, so I investigated a little more..

TlsAlloc routine uses the TlsBitmap and the peblock..yet I have yet to find futher references to what checks this bitmap o0.. somewhere in the loader...still looking.

Indy
December 29th, 2010, 23:57
Quote:
vague references..
vortex's LoadPeToMem
http://www.masm32.com/board/?PHPSESSID=d09ac44c52c0d9029f89c9a198243842&topic=6920.0

Very bad way to loading. Absolutely no compatibility with the system. Necessary to trace the native loader and emulate the sections.

2386

BanMe
December 30th, 2010, 01:28
I do agree that is bad method just a handy loader reference. The real idea on my palet, is to incorporate the pe headers of a dll as a mostly static object, and use this as a wrapper to install directly into the peb or map to self,this if works could be used as trigger for dynamic tls procedures changing,and the change should'nt be restricted to the inards of the tls procedure. Again I'm just thinking forward and variantly.


Found some food for thought..

http://www.zanshu.com/ebook/210_04app/

http://www.zanshu.com/ebook/210_04app/HTML/ch21b.htm

To provided "compatibility with the system" I'm going to add a init that Initializes the TlsBitmap and change the '_tls_data' to a direct pointer to system's tls storage. So I am not sure why my .tls section is not loaded into the ThreadLocalStoragePointer but I do intend to find out..Maybe more on __declspec(thread) data is what I need.. :d This comes after a test..

Code:

test_3 proc
invoke TlsAlloc
mov eax,LdrLoadDll
lea edx,ServerTls
xor ecx,ecx
call DynamicTlsSvc
invoke TlsGetValue,1
;eax does not contain LdrLoadDll.. in my opinion it should..so how is my next goal.
ret
test_3 endp

BanMe
January 16th, 2011, 14:40
TrOk I am posting now because I would like to refine my 'tactical brute force' methodology above into something that just relies on the PE to inform the loader of A Tls per thread data Array.

Reading the Pe 8.2 specs wasnt much help, the maybe 6 paragraphs describing it and windows functionality, I don't think its easy to distill out the info I want, so I reread and tried to look further.

I like this quote from the authors of the pecoff_v8.doc..

[Originally Posted by PECOFF documentation Concepts of: Section]
"All the raw data in a section must be loaded contiguously. In addition, an image file can contain a number of sections, such as .tls or .reloc, which have special purposes.
"



[Originally Posted by partially paraphrased from PECOFF documentation: TLS section _TLS_START VA to template.]
But they also have a few things that 1 cannot do without the other, as noted in docs
the first entry in a tls section if using PE TLS should be a pointer to a template
and this pointer should be a base relocation in the .reloc section.


I didn't quite follow it yet.So I reread Pe docs, and started searching more for information about declspec(thread) again..and went over the materials and over it and that's when thats when I found the .tls$ directive, which was pretty interesting but even more so was the example..the 3rd person view I needed from some obscure coding that looks to be something like Idl or is Idl..I am not familiar with it but I see the structure of activity involved in it.the code noted below as example..is the view point of a linker I needed..

I will go no further examining my travels in looking for information and putting it together for expanding my knowledge, soon I will put together some code...something fun.

So in short to explain my methodology of thinking, instead of allocating memory with a API I choose to rely on the loader and the behavior of code already going to be executed to load the PE into memory, This reduces coding necessary to accomplish tasks such as 'the analysis of data on with per thread basis'.

windoz pecoff.doc, download it,read terms and agree..
http://www.microsoft.com/whdc/system/platform/firmware/pecoff.mspx

the example..
http://msdn.microsoft.com/en-us/library/aa227038%28v=vs.60%29.aspx

I thought ihad it..but still the pe show 0s.. does anyone have a working sample of this?Belay that..got it.

The bits add up..



TLS Initialization When using thread local variables declared with __declspec(thread), the compiler puts them in a section named .tls. When the system sees a new thread starting, it allocates memory from the process section named .tls. When the system sees a new thread starting, it allocates memory from the process heap to hold the thread local variables for the thread. This memory is initialized from the values in the .tls section. The system also puts a pointer to the allocated memory in the TLS array, pointed to by FS:[2Ch] (on the x86 architecture). FS:[2Ch] (on the x86 architecture) FS:[2Ch] (on the x86 architecture). The presence of thread local storage (TLS) data in an executable is indicated by a nonzero IMAGE_DIRECTORY_ENTRY_TLS entry in the DataDirectory. If nonzero, the entry points to an IMAGE_TLS_DIRECTORY structure, shown in Figure*11. IMAGE_TLS_DIRECTORY structure, shown in It's important to note that the addresses in the IMAGE_TLS_DIRECTORY structure are virtual addresses not RVAs. Thus, they will get modified by base relocations if the executable doesn't load at its preferred oad address. Also, the IMAGE_TLS_DIRECTORY itself is not in the .tls section; it resides in the .rdata load address. Also, the IMAGE_TLS_DIRECTORY itself is not in the .tls section; it resides in the .rdata section.

BanMe
January 17th, 2011, 21:14
Presenting my alpha code..in masm and poasm that mimicks _declspec(thread). I am somewhat satisfied with this small venture into PE, and there is so much more..This is outcome of research I did on 2 fields of a PE and many other researchers already posted materials, without them and there 'bits' i wouldn't have been able to figure this out.. :}

Refined my approach see code below..

Thanks and references to all those I noted along the way..

regards BanMe

BanMe
January 18th, 2011, 21:58
So Ive seen no interest in this as this can be done from higher level languages quite easily but then we dont really know what goes on behind all the layers, if you take that route, so if its implementable in C++ its implementable in asm.So I tried it. Now this requires some explaining.

What I was trying to accomplish:
See if the TlsArray can be setup before 'runtime', This can be done.Lets take a look at my pe to show you what I mean.

Code:

10000080 50 45 00 00 ASCII "PE" ; PE signature (PE)
10000084 4C01 DW 014C ; Machine = IMAGE_FILE_MACHINE_I386
10000086 0500 DW 0005 ; NumberOfSections = 5
....
100000A8 00100000 DD 00001000 ; AddressOfEntryPoint = 1000
100000AC 00100000 DD 00001000 ; BaseOfCode = 1000
100000B0 00200000 DD 00002000 ; BaseOfData = 2000
100000B4 00000010 DD 10000000 ; ImageBase = 10000000
100000B8 00100000 DD 00001000 ; SectionAlignment = 1000
100000BC 00020000 DD 00000200 ; FileAlignment = 200
100000E0 00001000 DD 00100000 ; SizeOfStackReserve = 100000 (1048576.)
100000E4 00100000 DD 00001000 ; SizeOfStackCommit = 1000 (4096.)
100000E8 00001000 DD 00100000 ; SizeOfHeapReserve = 100000 (1048576.)
100000EC 00100000 DD 00001000 ; SizeOfHeapCommit = 1000 (4096.)
...
//these are the fields that I have been trying to modify.
10000140 20200000 DD 00002020 ; TLS Table address = 2020
10000144 18000000 DD 00000018 ; TLS Table size = 18 (24.)

lets check out what i got there.
the values at 10002000 ~

Code:

10002000 00 00 00 00 04 00 00 00 .......
10002008 00 00 00 00 0C 00 00 00 ........
10002010 00 00 00 00 14 00 00 00 .......
10002018 00 00 00 00 1C 00 00 00 .......
10002020 00 20 00 10;ptr to my template 20 50 00 10*strange why this one points to relocation section . . P.
10002028 88 30 00 10 00 00 00 00 �0.....
10002030 00 10 .

So __tls_used is used to describe the the location of the template for the datay u want
Code:

_tls_used IMAGE_TLS_DIRECTORY <_reloc_table_start,_reloc_table_end,_tls_index, 0, 4096, 0 >


so that means _reloc_table_start = 10002000, for each thread that is inited my 'data' will be loaded into it.Now to see if I can modify that data..Need to hold the base address of each table and number of entries..

This has come at the cost of me having to redo my tls read write code in order to use the tables in the 'prescibed fashion'.But give some to gain some

neerm
January 22nd, 2011, 03:35
Nice , I was looking for different ways without API and found it very useful. Some points are little bit confusing. May be I am not understanding it properly. Making the list of problems once I am done with it, will post it. Good to hear that your hard work gives success.
___________________
pst repair (http://www.datanumen.com/aor/)

BanMe
January 22nd, 2011, 09:26
Ty neerm,I hope to complete a small write up about TLS data today,which I hope clears some of your confusion.I am not the greatest technical writer nor teacher...But I have some ideas on how I can improve,namely writing drafts and starting simple and working the way up to how I implemented it.


Regards BanMe

BanMe
January 22nd, 2011, 22:38
the basic setup of TLS DATA.
Code:

.386
.model flat,STDCALL
option casemap:none
option DOTNAME
include \masm32\include\windows.inc
PUBLIC _tls_used
.data
__tls_index DWORD 0

.TLS SEGMENT DWORD FLAT PUBLIC 'TLS'
__tls_start:
_tls_data DWORD 1234h
__tls_end:
.TLS ENDS

.rdata SEGMENT READONLY DWORD FLAT PUBLIC 'DATA'
_tls_used IMAGE_TLS_DIRECTORY <__tls_start,__tls_end,__tls_index,0,0,0>
.code
Start:
mov eax,1
Ret
end Start

BanMe
January 23rd, 2011, 14:16
XSL Cross module section reading with a PE..TLS.. and the windows LDR functions..

So I tried modifying THE IMAGE_TLS_DIRECTORY after compiling the above 'basic' TLS data code into a EXE.
Then I loaded it into olly and modify the addresses of _tls_start to 400000(DOS_HEADER) and _tls_end to 40019c(END of PE HEADER). after a quick save and reload I then began to assemble in olly the code for digging further into this.first I read the fs:[2c] and the value there was 00142A18 so i followed it in the dump and it has another address in it 00142A38, so at this address was the MZ heade or the signature there of.

Code:

00142A38 00905A4D


So I then did a little more searching around that area and found what looked my edited values 'stored' somewhere else and quite easily accessible for 'runtime modification', so say you wanted to create a thread with the MZ and PE headers in the stack at the time the thread is created.


Code:

001429E8 00400000 tlsbase.00400000
001429EC 0040109C tlsbase.0040109C
001429F0 00403000 tlsbase.00403000


You should notice that fs:2c address - 30 = 1429e8.
and that 38 - 18 = 20h .. this is for later use..I think..(math should be 'self' evident in code..)

My next Idea is to try to see if I can modify it to load a section from another module, namely ntdll as it should be there before everything else...I will post the test app here for anyone interested in doing there own tests.

The below is not the 'test' app.. lol. This is a weaponization of this 'knowledge' in the most simple regard.

Code:

public _***!_up_end
_***!_up_start:
push ebp
mov ebp,esp
mov eax,fs:[2c]
cmp eax,0
je *code*
sub eax,30
lea ecx,_***!_up_start
mov [eax],ecx
lea ecx,_***!_up_end
mov [eax+4],ecx
*code here should create a thread*
mov eax,fs:[2c]
mov [eax],0
pop ebp
_***!_up_end:
ret


atached is my pop loop, very small attempt at understanding polymorphism o0..the above is my step 2.. lol

regards BanMe

BanMe
January 24th, 2011, 13:56
So as with the explanation prior we will be modifying _tls_start and _tls_end this time to a section of ntdlls..my choice the reloc section in ntdll..So now off to olly,(hopefully you dont have aslr)goto memory view locate the relocation starting address for ntdll and the end.

on my system the reloc section of ntdll is 2eac in length.
starting at 7c9af000
and ending at 7c9b1EAC

with initial dword of 00001000.and subsequent of 00000088

lol it worked.
Code:

$ ==> >00001000
$+4 >00000088
$+8 >31EE31C0
$+C >330D32FF
$+10 >332A331F
$+14 >33623357
$+18 >33BF3374
...
$+2E9C >3040303C
$+2EA0 >30683064
$+2EA4 >3070306C
$+2EA8 >307C3078
$+2EAC >ABABABAB
$+2EB0 >ABABABAB
$+2EB4 >FEEEFEEE

Indy
January 24th, 2011, 18:00
Code:
PUBLIC _tls_used
.data
Tls ULONG 12345H

TlsId ULONG ?

TlsList PVOID TlsCallback
PVOID NULL

_tls_used IMAGE_TLS_DIRECTORY <Tls, Tls + 4, TlsId, TlsList, 0, 0>

.code
TlsCallback proc DllHandle:HANDLE, Reason:ULONG, Reserved:PVOID
nop
ret
TlsCallback endp

Ep proc
lea eax,MessageBoxA
ret
Ep endp

2414

In the linker is another interesting option _load_config_used

BanMe
January 24th, 2011, 19:02
Glad to see you pop your head in..funny when I play with a little 'vx' you show up and and say look here.. study this.. I thank you for that.

Indy
January 25th, 2011, 09:22
You have a very low threshold of entry, but the desire is great. Thus you learn in public, but other than you is not interesting. Make yourself a blog and write there that stuff.

BanMe
January 25th, 2011, 09:49
Tired I have gotten of doings things in private,with no response. I had to get out,be that to the wilderness around me,or to the wildness within me.

I am glad that I 'give off my desire to learn' as to be taken note of ..
I don't know everything, but I will hunt what I want down even the darkest of corridors.