Hi,

I have allocated 4 bytes of memory as uninitialized in .bss section and assigned data which is more than 4 bytes. My question is, what is maximum number bytes of that can be assigned. I saw the elf dump, and from what is I see, first 52 bytes is for header, then some bytes are occupied by the text section, then data section and finally the .bss section with some null bytes. That means the last address that is shown in the dump is the end of the size of bss section???

.intel_syntax noprefix
.section .bss
    .lcomm  buffer, 4

.section .data
    output:
        .ascii "This is the output"

.section .text
    .globl _start
    _start:
        mov    edi, offset buffer
        mov    esi, offset output
        mov    ecx, 18
        rep    movsb

        print:
            mov    eax, 4
            mov    ebx, 1
            mov    ecx, offset buffer
            mov    edx, 18
            int    0x80

        exit:
            mov    eax, 1
            mov    ebx, 0
Posted on 2009-11-22 22:14:42 by uglyhunK
You're probably good to the end of the page. Very Bad Practice to count on it, though (IMO). What do you lose by allocating the memory you need?

Use sys_brk (with ebx=0) to find top of valid memory. My experience has been that it'll return the "exact" value in some kernels, and "rounded up" to the top of the page in others, but that you can, in fact, read/write up to the top of page in either case. I see no advantage in using "shakey" memory, though.

Best,
Frank

Posted on 2009-11-23 12:11:28 by fbkotler
Thanks for the info. I did some reading about dynamic memory allocation using sys calls. One of the articles says, using malloc() (C function) is faster than using brk for memory allocation. I find it little surprising. What would you suggest ??
Posted on 2009-11-23 23:19:09 by uglyhunK

Thanks for the info. I did some reading about dynamic memory allocation using sys calls. One of the articles says, using malloc() (C function) is faster than using brk for memory allocation. I find it little surprising. What would you suggest ??
Why is that surprising? brk() involves a system call, and thus ring3->ring0->ring3 transition - and likely dealing with pagetables and other "costly things". malloc(), on the other hand, allocates chunks of memory (sbrk or mmap, depending on implementation) and can then stay in ring3 code for several allocations.
Posted on 2009-11-24 04:17:04 by f0dder
This is what the man page says about "brk" system call

"Avoid  using  brk() and sbrk(): the malloc(3) memory allocation package is the portable and comfortable way of allocating memory.
...........
...........
On Linux, sbrk() is implemented as a library  function  that  uses  the brk()  system  call,  and does some internal bookkeeping so that it can
return the old break value. "


- Other than saying malloc is portable and comfortable there is nothing else.
- Secondly, even sbrk uses brk system call and even mmap is a system call. So, there is ring3->ring0->ring3 transition in both the instances.

Now this confuses me even more. It would be helpful if you could throw some more light on this.

Thanks


Posted on 2009-11-24 09:21:33 by uglyhunK
In the case of (s)brk and mmap you'll have ring transition for each call, possibly page table manipulations, and probably a very coarse allocation granularity (dunno about brk, but mmap will give you at least 4kb granularity).

malloc, on the other hand, is implemented as ring3 code in libc, and does it's own bookkeeping to avoid all those ring transitions and coarse allocations granularities... along with supporting discarding memory blocks with free() :)

Unless you have very specific speed/size requirements and are going to write your own heap allocator on top of mmap, I'd suggest just going for libc's malloc... can't imagine seeing a *u*x system that doesn't have libc installed.
Posted on 2009-11-24 11:51:40 by f0dder
Hmmm... well... if you want "portable and comfortable", libc is the way to go, no doubt. If you do it "the way the book says", you'll use libc exclusively - no int 80h. At this point, might as well call it from C, too, and forget about assembly language. If the goal is "to write a program for Linux", there isn't much point in asm. Unless, of course, you "like" assembly language and "wanna" do it that way - which is where I stand. :)

Seems to me that an initial call to malloc must do a ring transition, and fiddle with the page tables, too - that's "why we're here". Subsequent calls can just manipulate a "big block", and should be faster... until malloc needs another "big block", at which point it needs to do the ring transitions and meddle with page tables again. So I would guess the speed depends on both "how many" and "how big".

The granularity would be 4k for sys_brk as well as sys_mmap2. Or possibly worse? X86 allows 1M pages, as I recall - dunno if it's ever used... Malloc will let us ask for just one byte, but returns memory aligned to 16 bytes, so the "practical" granularity would be 16 bytes, I guess. It "uses" a lot more memory than that, of course.

A "trick" I use to see "how libc does it" is to assemble a minimal program using the library, and run it with "strace myprog". A single call to malloc results in 3 "sys_brk"s (we could have done it in two!). I wondered how much more I could malloc without incurring another system call...


; nasm -f elf myprog.asm -Ox
; ld -o myprog myprog.o -I/lib/ld-linux.so.2 -lc

global _start
extern malloc

section .text
_start:
    nop

;    push  21D34h ; 3 sys_brk
;    push 21D35h ; 4 sys_brk
;    push 21D44h ; "
    push 21D45h ; 3 sys_brk + sys_mmap2 (!!!)
    call malloc
    add esp, 4
bp1:
    push 1
    call malloc
    add esp, 4
bp2:   
    mov eax, 1
    int 80h


The numbers were determined by experimentation (and will probably differ on another system). I have no idea what it means!

In the particular situation uglyhunK mentions - a buffer at the end of .bss that we wish to enlarge - there might be an advantage to doing sys_brk ourselves, in that we can get contiguous memory following our buffer. Malloc will return memory "wherever it feels like", sys_mmap2 is more flexible - might be able to get contiguous memory if we ask for it (haven't tried it) - but it defaults to giving us memory from 0x40000000 and up.

My "man 3 malloc" includes the text "This is a really bad bug." I don't think you're going to evade the bug in question by doing sys_brk or sys_mmap2. So you've got a number of imperfect options. Which way would you "like" to do it? :)

Best,
Frank

Posted on 2009-11-25 05:03:59 by fbkotler

In the case of (s)brk and mmap you'll have ring transition for each call, possibly page table manipulations, and probably a very coarse allocation granularity (dunno about brk, but mmap will give you at least 4kb granularity).


sys_brk extends the bss section within a 1-byte granularity, however the system itself is going to extend in page sized chunks and set your space within this allocated area, whereas sys_mmap preforms a full page allocation. With sys_brk, if you haven't reached the end of a page then the system doesn't allocate anything for you, it just gives you r/w permission to the memory. Also, 4kb is just the default, page size can be set by the user and will vary from system to system.


malloc, on the other hand, is implemented as ring3 code in libc, and does it's own bookkeeping to avoid all those ring transitions and coarse allocations granularities... along with supporting discarding memory blocks with free() :)


That's not really the biggest reason to use libc's malloc. Allocations with sys_brk occur within the current process's allocated space. Using sys_brk, 256 is affectively the same as using:

SECTION .bss
myMem: RESB 256


The difference is it happens dynamically. sys_mmap however actually extends the process memory, unfortunately on some linux systems (not sure if it's all of them as I only use the SELinux kernel) sys_mmap requires root privileges to preform this extended allocation. So, like you said, unless you are planning on writing your own allocator you probably won't have any use for sys_mmap in user-mode.


Unless you have very specific speed/size requirements and are going to write your own heap allocator on top of mmap, I'd suggest just going for libc's malloc... can't imagine seeing a *u*x system that doesn't have libc installed.


Very true.

With sys_brk, you are either going to simulate what sys_mmap does by allocating against page boundaries or you are going to waste ring-transfer calls by repeatedly requesting memory in smaller chunks only to have the system move a pointer and possibly change some attributes. Another down side to sys_brk is that, since it "allocates" within page blocks there is no guarantee that the memory will appear on a page boundary, meaning that if you later pass that memory to a routine like sys_mprotect, all hell will break loose.

One of my recent projects was a port of ATC to JWASM/Linux (ended poorly as JWASM doesn't like ATC's symbol scoping and, although works with simple classes, it loses the symbols needed for inheritance) in which I wrote a sys_brk based memory allocator. In the end I realized the best approach would be simply to use malloc() as it avoids the problems I was running into dealing with modifying page protections.
Posted on 2009-11-25 10:55:23 by Synfire
Frank: I'm not saying you should always stick to malloc(), just that sbrk()/mmap() isn't a good choice for arbitrary-sized dynamic allocations, and should only be used for big chunks of memory. Hence
Unless you have very specific speed/size requirements and are going to write your own heap allocator on top of mmap, I'd suggest just going for libc's malloc...
:)

If you have a "normal" application with a whole bunch of alloc/free calls, chances are your libc routines will have better code than what you write yourself, unless you spend a fair amount of time on a heap system. Of course there's also a fair chance that memory allocation isn't going to be a problem for you at all, neither in cpu cycles spent or wasted bytes because of granularity :)

As for granularity, x86 pages can be 4kb or 4mb, or 2mb in PAE mode, and iirc AMD also introduced 1GB pages? But granularity in mmap() could be coarser than that - not sure what linux does, but Windows VirtualAlloc (and mmap) allocates at 64kb granular addresses (well, not guaranteed to be 64kb, you should call GetSystemInfo() and check dwAllocationGranularity, but it's been 64kb everywhere I've looked).
Posted on 2009-11-25 11:04:20 by f0dder
The difference is it happens dynamically. sys_mmap however actually extends the process memory, unfortunately on some linux systems (not sure if it's all of them as I only use the SELinux kernel) sys_mmap requires root privileges to preform this extended allocation. So, like you said, unless you are planning on writing your own allocator you probably won't have any use for sys_mmap in user-mode.
Wtf? mmap() requiring root privileges to allocate memory? O_o - that can't be true, unless it's some specific form that requires root (like, requesting specific addresses).
Posted on 2009-11-25 11:07:06 by f0dder
f0dder,
That's SELinux for ya. From what I was told, when getting aggravated about it during the ATC port, it's because sys_mmap requests to extend the memory for a process which is "un-safe" and should only be done through super-user. The response I got was to write my allocator as a shared object (kinda like a DLL) which can be loaded from non super-user apps to handle their allocations. I just reverted to malloc().
Posted on 2009-11-25 12:24:44 by Synfire
This sounds really silly - mmap() requests to extend the memory for a process, but sbrk() doesn't? O_o - sounds like a wanky decision. Expanding program memory being a dangerous operation? Bullshit!

Trying to allocate a really huge chunk of memory when process quoates are in effect is another thing entirely, though, and would be a valid reason...
Posted on 2009-11-26 09:59:42 by f0dder