17

I just came across this good question at SO: Why are global and static variables initialized to their default values? But I don't like any of the answers and I actually think it remains unanswered: why does C and C++ perform zero initialization of static storage duration variables?

C++ surely does it because C does it. And C likely does it simply because that's how ancient 1970s Unix did it, as specified by ABI or executable formats? The Wikipedia about .bss suggests that the origin of the name goes even further back to pre-Unix mainframe computers, but not necessarily as a zero-initialized section until Unix, or...?

At the same time, C does not perform zero initialization of stack or heap memory (automatic or allocated storage). The reason why is supposedly performance, which is also one reason why malloc has not been declared obsolete in favor of calloc.

But can't the same be said about zero initialization of .bss? Doesn't this lead to slower executable boot-up and if we skipped it we would get faster programs? Yet it has been there from the start.

As someone who usually works with embedded systems microcontrollers, I know that it is very common that the CRT comes with an option "fast boot-up" or "non-standard boot-up" skipping .bss and .data initialization in explicit violation of the C standard, just in order to start up faster. Hosted/PC systems don't really have that option but always perform .bss initialization - supposedly inexpensive but it is still overhead.

Does anyone know why .bss zero initialization was introduced and made mandatory when Unix and C were created?

17
  • 2
    C just assigned pointers. Remember this is portable assembly. Commented Mar 10 at 15:49
  • 8
    Speculation: Stack and heap can reuse allocations, bss can't. Generally memory is supplied zeroed by the os to avoid data leaks between users. So bss gets allocated from fresh zeroed memory, whilst stack and heap can either come from fresh zeroed memory or a prior allocation. Commented Mar 10 at 16:10
  • 5
    Because the .bss is memory from the operating system, it is important for security that it not contain stale information from other processes. Since the OS must clean the memory, it might as well clean it to a known state (all zeros).
    – Chris Dodd
    Commented 2 days ago
  • 1
    If .bss wasn't initialized to zero, what would you have it initialized to? (Posted simultaneously with the more seriously-worded comment from @ChrisDodd)
    – dave
    Commented 2 days ago
  • 1
    bss versus stack isn't really an apples-to-apples comparison. bss memory gets initialized just once per run of the program. Stack would have to get re-initialized on every repeated call to a given function. Likewise heap would have to get re-initialized every time memory is freed and malloc'ed again. Commented 2 days ago

4 Answers 4

23

The UNIX PROGRAMMER’S MANUAL (Seventh Edition, Volume 2B) says in the chapter Assembler Reference Manual on page 2 (my emphasis):

The data segment is available for placing data or instructions which will be modified during execution. Anything which may go in the text segment may be put into the data segment. In programs with write-protected, sharable text segments, data segment contains the initialized but variable parts of a program. [...]

The bss segment may not contain any explicitly initialized code or data. [...] The bss segment is actually an extension of the data segment and begins immediately after it. At the start of execution of a program, the bss segment is set to 0. [...] The advantage in using the bss segment for storage that starts off empty is that the initialization information need not be stored in the output file.

So there you have the reason: The data and bss segment are two sides of the same coin (holding initialized data in read-write storage), with the initialization values of the data segment stored in the executable file, while the initialization values of the bss segment are zero, and therefore don't need to be stored.

6
  • Yeah, and Unix was very sensitive on disk space. We have also the complex "sparse files", because not saving zeros was a good thing. Commented 2 days ago
  • 1
    That doesn't explain why the BSS is zeroed by the kernel. The reason that's necessary is to avoid leaking data from the kernel (e.g. pagecache) and/or other user's processes. Another reason it's useful is that most programs want some zero-init static storage so would have to include startup code to zero at least some of the BSS, only leaving garbage in the part of the BSS used for things that are always written before being read, like I/O buffers. Commented yesterday
  • @PeterCordes Interesting. That would imply that heap and stack are purged -- i.e., nulled or whatever -- before they are assigned to a process (lest they contain data left over from other processes). Is that so? I always thought e.g. dynamically allocated memory contains whatever it contains. Commented yesterday
  • @Peter-ReinstateMonica: Yes, fresh pages from the kernel are always zero-filled. Stale garbage in dynamically allocated memory comes from user-space free-lists (malloc/free and new/delete are library functions), not from memory returned to the kernel and re-allocated with a system call. See also Where do the values of uninitialized variables come from, in practice on real CPUs? Commented yesterday
  • 1
    I agree that ur-Unix was unlikely to care about info leaks, and sparse files didn't exist. I think it boils down to (1) no need to store uninit'd variables on a small and slow disk, so let's not store them, and (2) zero the bss memory on load because it was more useful than not zeroing the memory.
    – dave
    Commented 19 hours ago
14

Is the real question here why C doesn't have syntax for arrays in static storage that are allowed to be truly uninitialized, with random garbage from the previous use being allowed?

Probably because early Unix didn't have a way to get such memory from the kernel: BSS space was always zero-initialized (see dirkt's answer quoting an early Unix man page). This is useful, and avoids duplicating zeroing code into each executable for static storage you do want zero-initialized, e.g. to hold static int staticvar = 0; in a hypothetical C where static int staticvar; could have a non-zero value.

Zeroing fresh pages from the kernel (including the BSS) is also necessary for security on a multi-user system: if a fresh process's BSS had whatever was previously left in that RAM, you'd be leaking kernel data or potentially data from another user's process. (Or pagecache pages from files you don't have permission to read.)

The only real use-case I can see for a hypothetical language feature like char iobuffer[8192] __attribute__((uninitialized)); would be a single-user or embedded system that didn't attempt to do any security, allowing faster process startup by having two parts to the BSS: zeroed and non-zeroed. That's extra complexity beyond just a zeroed BSS for all the zero-init static storage.

Such storage would only be safely usable in programs where the first access was a write, since trap representations exist for integers. For an input buffer, you'd have to code carefully to avoid reading bytes left unwritten by a partial read. But C is happy to provide you with lots of ways to write unsafe code so that's not a clear reason.

But a feature like that would require extra syntax so that's a strong reason not to do it in a small "simple" language like early C. Unless they omitted any guarantee about the initial value of default-initialized static storage, like static int var; not being equivalent to static int var = 0;.

(John Doty commented: K&R first revision (1978) states: "In the absence of explicit initialization, external and static variables are guaranteed to be initialized to zero". So C has required zero-init of static storage since forever, so in terms of C the only question is why. In terms of implementation details, it's a question of whether the OS or user-space does the zeroing. On DOS, I think you did get old values in memory so your startup code would have to zero a BSS if you want that.)


Semi-related: Linux 2.6.33 added support for mmap(MAP_UNINITIALIZED), available only if built with CONFIG_MMAP_ALLOW_UNINITIALIZED, for dynamic allocation in that use-case.

The ELF file format implements a BSS by having the virtual size of a segment be larger than the disk/file size of the segment; the file-backed part holds the .data section and other read/write non-zero init things, the no-backing part is the BSS. AFAIK, other systems are similar, and would need a separate type of segment to request non-zeroed anonymous private pages.

10
  • In a simple embedded application, if you don't have an OS, your startup code is responsible for zeroing BSS before calling main().
    – John Doty
    Commented 2 days ago
  • @JohnDoty: Yes, given the way the actual C standard is written. Some of this answer is talking about the hypothetical alternatives that the OP is asking about, but did I accidentally imply otherwise for the actual C standards? (Genuine question; I'd like to avoid a misleading implication if there is one, as well as be technically correct.) Commented 2 days ago
  • 1
    Yeah C always had the requirement since it was designed for Unix, which always had it while C existed. The question is why it (.bbs) was there in Unix to begin with, in the early 1970s.
    – Lundin
    Commented 2 days ago
  • 1
    @Lundin: Oh, right, and Unix predates C. But it was always a multi-user OS, so I assume security was the reason they couldn't just leak pages into new processes, either as part of the BSS / brk, stack, or mmap. (err, MAP_ANONYMOUS didn't exist originally so you'd have to mmap("/dev/zero") and then of course you get zeroed pages.) I know things weren't as hardened as they are now against exploits, but that would be a gaping hole that didn't even require an exploit. e.g. want to read /etc/shadow? Just log in so its page is hot in the pagecache, then run a process that uses a lot of BSS. Commented 2 days ago
  • 1
    Second Ed. Unix zeroed bss on exec - see a.out description in manual. There was no explicit bss in the First Ed. though there was some provision for extending data beyond that which was contained in the file.
    – dave
    Commented 17 hours ago
5

The point of bss section was to save space.

If you don't use it then you have to explicitly initialize variables with a zero value and the binaries take more space on disk and it takes more time to load a larger binary.

Also in order to not explicitly to have the need to initiailize variables to zero, you could gather all uninitialized variables into single section and just have small code to initialize it to zero, instead of loading the zeroes from disk to memory.

And uninitialized variables are bad, and you may expect them to be zero, or if you can't expect that, then you need to initialize them yourself to the value you expect to be sure what the value is before using them.

So who clears the bss section is im9lementation dependent. If OS is aware of executable sections it could do it. In some cased it is not the job of the OS to clear bss, but the job of the program itself at startup before calling the main() function. The point still is, data known to be uninitialized is not stored into the binary file, but space for it is reserved and the memory area of these variables is set to zero by OS or program itself.

12
  • 1
    On any decent paged system, the linked can assign demand-zero pages, in which case it is the job of the OS.
    – dave
    Commented Mar 10 at 16:48
  • Additionally, early Unix C compilers treated 'int foo' at the file level like FORTRAN COMMON (before strict def/ref semantics became more widespread). Given multiple files with that declaration, it's unclear who gets to 'own' the storage to initialise it.
    – dave
    Commented Mar 10 at 16:52
  • 3
    "it only became mandatory to clear the bss section in ISO C99" is wrong. The Wikipedia article is poorly written but not exactly incorrect. It says that (in any C version) zero-initializing globals and statics isn't the same as initializing them to zero bytes (pointers are initialized to null, which may not be all zero bits), which is true, and it says that a certain section number of C99 mandates that, which is probably true (the section number may be different in other versions of the standard). No version of C says anything about a BSS section.
    – benrg
    Commented Mar 10 at 17:08
  • 6
    K&R first revision (1978) states: "In the absence of explicit initialization, external and static variables are guaranteed to be initialized to zero". So the requirement predates C99 by over 20 years.
    – John Doty
    Commented 2 days ago
  • 1
    @Joshua: On ARM systems following CMSIS conventions, both requirements can be satisfied by having a SystemInit() function which is invoked before RAM initialization; if that function returns, startup code will then initialize RAM and call main(), but if SystemInit() determines that the system is being "warm started" it can call main() without first initializing RAM; main() can then use a static object to decide how to proceed.
    – supercat
    Commented yesterday
0

Since we're talking about early Unix, the fount of minimalism in implementation, I will offer my opinion in a minimal answer:

Zeroing bss memory is low-cost and more useful than not zeroing bss memory, so the decision is easy.

Remember, we're talking about a system where pressing the return key took ~100 mS to get to the computer; another half a millisecond to zero a 256 word bss is not going to be seen as huge overhead, especially in view of what you saved by not reading it off the disk.

Think small.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.