Wednesday, March 25, 2009

Linux virtuall address space.





Today I spotted a question about ability to read process data after process freeds the memory it has used. (It was on a one of the best C programming related forums I know, at CBoard )

So I started writing an answer, which accidentaly became lenghty. After I first read it I felt that there's no hope it could clarify a thing to anyone. But after a few beers, I started to feel pretty damn proud about that babbling. So I decided to copy it to here too. So enjoy.


About the reading memory after freeing it etc. I'll try to shed some light on this topic, and I hope if the more experienced guys spot a problem in my explanation, they will further teach both me and you

I will be talking in linux point of view, but I assume that most modern systems do it in somewhat similar manner. Of course there's some realtime systems (like OSE), in which all memory is available to all brogram blocks (Eg. processes), but I assume you're talking about some desktop system.

So when we simplify enough we can say following. . Most modern OSes do map the real physical memory into virtual memory addresses, and give to each process their own virtual address space. Actually, in most modern computer systems, we have a hardware memory management unit which handles conversions from virtual addresses to physical memory and vice versa. So basically, at user space (which is basically all applications that aren't either part of the kernel, or a kernel module) it is not possible to access straight to hardware (memory or other hardware). Hardware is invisible for userspace processes, it can only be accessed via drivers written in kernel space. (drivers offer interface for user space applications). [[There is a workaround, but let's not mess with it now]].

So process cannot directly access to RAM.

When we launch a process, certain virtual address space is given to it (and certain physical ran corresponding to it, but the process has no means to get direct HW addresses). The process has no access outside of this sandbox. When we launch another process, it will have it's own virtual address space, and it's own corresponding physical ram. If process is running out of ram, kernel has means to increase it.

If another process tries to access outside of it's sandbox, Eg. requests reading from / writing to location which is not in it's address space, MMU will wake up the kernel, which will check out what is happening. If request was illegal, kernel will generate signal SIGSEGV, and send it to your program. => segmentation fault.

However physical ram is not cleaned, and the amount assigned to each process may change. There's also possibility of swapping, where some memory pages are written temporarily to hard disk, to free up some ram for other purposes.

So basically, it indeed is possible that there is leftowers of data written by your program in the memory. But there should be no traces where in the physical ram your data was, so knowing which bytes belonged to what program - not to mention what were these bytes used for - is quite close to impossible. I say quite close because I've never considered that, and I do not have any accurate information how possible it could be.

If I go a little further off topic to see if I have understood it correctly (I believe there is far more experienced fellows amongst us - who will correct me if I go wrong), there is at least following benefits of having physical addresses hidden:

1. Security. No user space process has access to go and mess the HW as they want - they need to use drivers, which should be written in such a manner that hugest hazards are not possible...

2. Per process memory mapping. If memory would be accessible between processes... Oh joy. If you have ever attempted to build huge and complicated multithreaded software you know what I mean. It really is a mess. Use of one uninitialized pointer may crash anything anywhere, or just make generally bizarre things to happen.

I guess steps 1 and 2 could be implemented even if the applications could access ram directly - but I am not sure how easy / difficult it would be.

3. Shared libraries. Shared libraries are loaded to memory during runtime. When each process has their own virtual address space, shared libraries can be downloaded to physical memory only once. The same physical memory block containing the library can then be mapped for each process' address space.

Finally, the inter process communication often uses "creature" called shared memory. processes can request a shared memory block from kernel, which will then map some physical memory chunk to more than one process' virtual address space. Note however that virtual addresses in different processes need not to be the same, even though they point at same physical ram. Hence shared memory is usually accessed according to the offsets from the start of the memory pool.

However, note that when you quit your program, the shared memory is not automatically freed. (thanks to brewbuck for confirmation ). It will stay accessible (if it was created to be accessible).

And finally the [[]] which I wrote at the beginning. It is possible to write a driver, which can map some real hardware addresses to shared memory, which can then furthermore be mapped by a user space process. That way the HW can be accessed directly by a user space application.

No comments:

Post a Comment