How does (struct _IO_FILE *)->_IO_read_base get set?

Apologies for the probably weird question title. I didn’t want it to look like a dupe with a title like “How does C file I/O work at the low level?”. I want it to be obvious that my question is specific.

Anyways, when a file is fopen‘d in C, it returns a struct _IO_FILE *.

FILE *f = fopen("hello.txt", "r");
printf("Fileno: %in", f->_fileno); // 3

I’ve looked at libio.h and gdb’s “tab” output, and have confirmed that the contents of a struct _IO_FILE are as follows:

struct _IO_FILE {
  int _flags;
  char* _IO_read_ptr;
  char* _IO_read_end;
  char* _IO_read_base; // <-- file contents
  char* _IO_write_base;
  char* _IO_write_ptr;
  char* _IO_write_end;
  char* _IO_buf_base;
  char* _IO_buf_end;
  char *_IO_save_base;
  char *_IO_backup_base;
  char *_IO_save_end;
  struct _IO_marker *_markers;
  struct _IO_FILE *_chain;
  int _fileno;
  int _flags2;
  __off_t _old_offset;
  unsigned short _cur_column;
  signed char _vtable_offset;
  char _shortbuf[1];
  _IO_lock_t *_lock;
  __off64_t _offset;
  void *__pad1;
  void *__pad2;
  void *__pad3;
  void *__pad4;
  size_t __pad5;
  int _mode;
  char _unused2[...];

I’ve prodded at every one of them in gdb, and have noticed that f->_IO_read_base is 0x0 at first, but becomes a pointer to a proper string, which contains the entire contents of the file, only after having called fgetc() (or a similar function) at least once. After some gruelling and extensive searching of the glibc codebase, I seem to have tracked it down to a function called __uflow

So my question is, how does _IO_read_base get initialized? Where does it get the contents from? How does it acquire said contents? When does IO_read_base transform from a null pointer to a string? How would I go about doing this using only the struct itself and some system calls? I want to understand how this works at the low level.

(gdb) print fp->_IO_read_base 
$3 = 0x0
(gdb) n
434    in genops.c
< a few more times ... >
_IO_getc (fp=0x602010) at getc.c:38
38    getc.c: No such file or directory.
(gdb) print fp->_IO_read_base 
$4 = 0x7ffff7ff4000 "#include <stdio.h> ..."

You can see where it transforms. Somewhere in genops.c. Presumably __uflow(). But its source doesn’t answer any questions:

__uflow (fp)
     _IO_FILE *fp;
#if defined _LIBC || defined _GLIBCPP_USE_WCHAR_T
  if (_IO_vtable_offset (fp) == 0 && _IO_fwide (fp, -1) != -1)
    return EOF;

  if (fp->_mode == 0)
    _IO_fwide (fp, -1);
  if (_IO_in_put_mode (fp))
    if (_IO_switch_to_get_mode (fp) == EOF)
      return EOF;
  if (fp->_IO_read_ptr < fp->_IO_read_end)
    return *(unsigned char *) fp->_IO_read_ptr++;
  if (_IO_in_backup (fp))
      _IO_switch_to_main_get_area (fp);
      if (fp->_IO_read_ptr < fp->_IO_read_end)
    return *(unsigned char *) fp->_IO_read_ptr++;
  if (_IO_have_markers (fp))
      if (save_for_backup (fp, fp->_IO_read_end))
    return EOF;
  else if (_IO_have_backup (fp))
    _IO_free_backup_area (fp);
  return _IO_UFLOW (fp);
libc_hidden_def (__uflow)

Testing each call in gdb, every single if check fails, so I’m left to assume that it returns _IO_UFLOW (fp);. The funny thing is that _IO_UFLOW is a macro wrapper of __uflow, so…it’s calling itself. And it’s not recursing infinitely. Why?

And with that, I’ve hit a dead end, as there is still no explanation that I can find as to how fp->IO_read_ptr gets filled out. All I know is that it happens “somewhere” in genops.c.

Source: unix

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.