How the Linux kernel maps memory for ELF programs

tcll · 2026-01-26 11:53:40

So to keep this simple, I figured I'd ask someone who knows a few things about the Linux kernel tongue

This isn't exactly about SliTaz, but I feel the info would still be important to share publicly smile

I'm trying to map an ELF to a buffer for an IDE, kinda similar-ish to IDA Pro

The problem I'm having is what's documented hardly covers this aspect, and what IS covered isn't consistent with a bunch of ELF files I've tried, nvm the Linux source isn't readable (too much abstraction) in this area, heck ImHex's template doesn't even get this far, while my WIP template for HexEdit5 makes it to the programs, but the entry point may or may not map properly.

Many places will tell you to use the sections, but Linux only cares about programs, AND I have a few ELFs that don't even have sections, plus I'd prefer to do things correctly and just use programs, as sections are just tacked on optional information used mainly by GCC.

Though what's funny is, the ELFs giving me problems are the ones WITH sections, even though they run just fine in Linux XD

So yeah, the real issue is, well 3 parts actually:

1: allocating the proper size of the virtual buffer

2: copying the program to the proper offset in the virtual buffer

3: mapping the entry point appropriately (some can just be used as is, others need to be subtracted, and others don't map properly at all (more info needed))

This is just focusing on applications as libraries are a future interest and afaik don't have an entry point.

A lot of ELF info programs (objdump, readelf, and the like) will just print the entry point as is without telling you that number may not map as you expect.

Could I get some info to make everything work consistently, since this is one of the many areas nobody really wants to tell you about anything (much like ALSA, Xorg, KVM, namespaces, iptables, and many more)

tcll · 2026-05-29 23:49:07

Disclaimer: So some might frown upon me for this, but I promise I'm not using it in any distasteful way...

I've gotten better help from a local AI (ran with llama-server loading a gguf) than I have from the internet.

(Don't discredit my efforts just because I'm using AI please, I'm more than aware AI is 99% incorrect, and haven't trusted it since the birth of ML)

The solution to my problem search engines refused to show results for (until I knew about it) was the PT_DYNAMIC segment, and with the AI's help (and online validation, because AI is NOT to be taken at face value), I was able to get to the symbol table via DT_SYMTAB

Allegedly I'm supposed to use DT_GNU_HASH to actually parse the symbols, but this is yet another area the internet has no (or inaccessible) information about, but has thousands of pages talking about the gnu.hash section as if it's actually used, and a bunch of sources that are either too abstract to fully understand, or don't give enough detail.

How am I supposed to validate an AI isn't hallucinating if the internet isn't even willing to provide information about a particular topic?

(Nevermind the fact the information that IS provided isn't about how ELF files actually work, not even Oracle understands how ELF files actually work, and they designed their own architecture)

Would anyone actually know anything about the data at DT_GNU_HASH?

(How the bloom filter, buckets, and chain are actually used (in code), rather than just calculating how many symbols there are)

tcll · 2026-06-01 16:44:42

It's pretty sad when AI is better help for actually solving things than the internet where people can't explain things for whatever reason >:(

(People are supposed to be better than AI, not the other way around)

For actual proof of the previous post, AI helped me get this working (I wrote the code, AI only validated the idea):

[c]flag = chainvalue & 1[/c]

[c]symbolcount = maxbucketvalue(16) + flaggedchainiterations(2)[/c]

(I'm the first to put this in a hex editor template apparently thanks to the help of F-ing AI)

[attachment=54004,3853]

tcll · 2026-06-03 02:25:19

looks like that's actually incorrect, which isn't the AI's fault, it's GNU's fault for not documenting GNU_HASH, and the shoddy sources I got the relativistic info from for counting the number of symbols, which that's also incorrect...

https://sourceware.org/legacy-ml/binutils/2006-10/msg00377.html

This is yet another source by someone who doesn't know how ELF files work as they tell you about sections, which again, ELF files don't use.

(I'm getting really sick of finding nothing but misinformation) >:(

The reason you're only given SYMTAB and SYMENT is because you're only supposed to resolve symbol names to an index via HASH or GNU_HASH (thank you AI for telling me what everyone else refused to)

I hate AI, but FFS people, learn how ELF files work dangit! >.<

But yeah, the above image is incorrect because it finds more symbols than actually exist, shown here:

[attachment=54020,3858]

SliTaz Forum

Forum

#1 2026-01-26 11:53:40

How the Linux kernel maps memory for ELF programs

#2 2026-05-29 23:49:07

Re: How the Linux kernel maps memory for ELF programs

#3 2026-06-01 16:44:42

Re: How the Linux kernel maps memory for ELF programs

#4 2026-06-03 02:25:19

Re: How the Linux kernel maps memory for ELF programs

Pied de page