It also explains why, when increasing the array size to 100k elements, only the virtual memory consumption increased since I did not access the array elements nor initialized it. When I initialize the array in a for-loop, the memory consumption is also translated in RES. |
Yeah, "virtual" memory usually is only actually allocated in the "physical" RAM when it is accessed/initialized for the first time. But even if you access/initialize the whole array in a long loop, then it is possible that "RES" will
not grow linearly, because the parts of the array that you accessed
least recently may get swapped out from "physical" RAM; they'd be swapped in again, if you access them again at a later time.
Suppose that process A and B need access to the same library. I assume the library is then stored in SHR of both processes. When A starts to "actively use" the library, it is loaded in A's RES. When B then actively starts using the library, is it loaded in B's RES? Or can B use the RES of A? |
Shared library code only needs to be held in the "physical" RAM
once, even if used by
multiple processes. The exactly same "physical" memory pages will simply be mapped into the "virtual" address space of any process that needs them. This does
not violate the process isolation, because those "shared" pages will be either "read only" or at least "copy on write" (COW). Also, the "shared" pages do
not have to be loaded into the "physical" RAM separately for each process; if one process already caused them to be loaded into the "physical" RAM, then they'll be readily available there for all other processes too! However, I'm not exactly sure whether those "shared" pages would be counted separately in the "RES" shown for each individual process, but I think that in the
global "physical" memory usage they will be counted only
once for the whole system.
Maybe we can say: The "RES" shown for a specific process is the fraction of that process' "VIRT" which - at this very moment - is held in the physical RAM; but it is possible for the "VIRT" of multiple processes to
partly overlap (because of shared memory pages), and therefore their "RES" may in fact
partly overlap too.
If we are talking about N mpi processes that follow the same algorithm, use the same libraries and so on, I would assume that this metric would be the best for my aim, no? |
I still think that, for a program that is
not just a "minimal" demo program, but that actually processes large amounts of data, the memory occupied by the
program code (and required libraries) quickly becomes
negligible, compared to the memory occupied by the
data. Even more so when you start
N instances of the same program, because for the reasons discussed before, the program code and the shared libraries need to be held in "physical" RAM only
once, regardless of how many instances of the program you start. But for the data, each instance of the program will obviously allocate its own separate memory buffers (those are
independent for each process!), which means that the "physical" memory occupied by data will increase
proportional with the number of instances – up to the point where it is
no longer possible to hold all data of all processes in the "physical" RAM at the same time and therefore some of the data needs to be swapped out to the disk...