Huy's Notes


#os #system-design #engineering

Process is the unit of work in a computer system, it's managed by the [Operating System].

Anatomy of a Process

It's a much wider concept than a program, we can say program is part of a process. A process includes:

  • The current value of Program Counter (PC)
  • Contents of the processors registers
  • Value of the variables
  • Process Stack (SP) which contains temporary data such as subroutine parameters, return address, temporary variables
  • A data section that contains global variables

A child process will inherit all the resources from its parent process, and it will copy all memories of the parent.

In multiprogramming environment, the CPU switch back and forth between processes, and it has to save and load all of the process's information in the progress, this is called context switch.

Process Control Block

A process is represented by a data structure called Process Control Block (PCB) or Process Descriptor, it contains the following informations:

  • The current state of the process
  • The unique ID of the process
  • The pointer to its parent process
  • The pointers to its child processes if exists
  • The priority of the process
  • The pointers to locate memory of processes
  • The register save area
  • The processor its running on

In Linux, a process is represented by a struct called task_struct, it can be found in <linux/sched.h>.

States of a Process

There are 5 main states of a process, but there are also many transistion states in between, the following diagram show every possible states and their transistions.

  • New: When the OS put the program to the main memory, a new process created.
  • Ready: After that, it will immediately put into Ready state, in which, it waits for the CPU to be assigned. It can also be put back to Ready state and stay in secondary memory due to lack of resources.
  • Running: Once the [process scheduler] picked up a process, CPU will be assigned and the execution begin. The number of processes being executed at the same time depends on the number of CPUs available in the system.
  • Blocked or Waiting: Depends on the scheduler or the instrinsic behavior of the process, like, when it need to wait for certain resource, or for the input from user, it can be send to Block or Wait state. CPU will be assigned to other processes.
  • Completion or Termination: This is when the process finished its execution, all context (PCB) of the process will be deleted.
  • Suspend Ready: If the main memory is full and a higher priority process comes for the execution, the OS will have to make room for that process, so lower processes will be throw to secondary memory and put to this state until main memory is available.
  • Suspend Wait: Processes that are blocking due to waiting for other resources will be put in this state until main memory become available and their wait is finished.

Process Termination

A process terminates when it finished its execution or being forced to termination. When this happen, its resources will be returned to the system, and its PCB will be erased and return to the free memory pool.

There are some termination reasons:

  • Normal Exit: The process terminates when it finished its execution, and call the exit() [system call].
  • Abnormal Termination: When one of the following errors happen:
    • Error Exit (voluntary): When error happen and the process handle it by exit (like calling panic!() in Rust).
    • Fatal Error (involuntary): Error happen that caused the program to abort.
    • Killed by signal: These signals are either sent by the parent process or the user.

The wait() system call returns the ID of a terminated child process, so the parent can tell which and when the child process has been terminated.

Termination Signals

The program must be aware of and know how to handle these termination signals for various resons. The following are some common signals:

  • SIGTERM: This signal can be blocked, handled or ignored, and this is the polite way to ask a program to terminate. kill command use this signal.

  • SIGINT: "program interrupt" signal is sent when the user types the INTR character (Ctrl + C).

  • SIGQUIT: Similar to SIGINT, except it's the QUIT character (Ctrl + \), this will produces a core dump when it terminate the process, like an error signal.

    If a program creates temporary files or something, it should not be deleted when handling this signal, so the user can examine them afterwards.

  • SIGKILL: This cause immediate termination, it cannot be handled or ignored, therefore always fatal. This is what kill -9 does. One should not use SIGKILL before trying other methods like SIGTERM or Ctrl + C.

  • SIGHUP: This to report that the user's terminal is disconencted.

Don't kill the parent process

If the parent process terminates, the child processes will become a zombie onced it terminated. Why?

When a process is terminated, the OS release most of its resources and information, but still keep some data and the termination status, because the parent process might be interested in these.

When a child process is terminated, parent process will receives a SIGCHLD signal, so it will be able to collect these data (done via wait() syscall). When this collection is done, the operating system release those last bits of information, and remove the pid from the process table. If the parent process is failed to do so, the system has to keep these data in the process table indefinitely.

Terminated process whose data has not been collected is called zombie process.

So, best practices is, program that spawn child processes must handle it properly to prevent the formation of zombies from its child processes.

Read more:

Process Creation

To create a process from a program on disk, the OS must load all the program static data and program code into the memory which has been allocated to that process. Some OSes does this eagerly, some modern OSes does this lazily.

Then, the OS will initialize the program's stack, allocate some memory for program's heap, do some I/O related things, for example, on UNIX systems, each process will have 3 file descriptors open for STDIN, STDOUT, STDERR.

The process will starts to run when the OS jump in to the main() routine of the process, at this point, it will transfers the control of the CPU to that process.


Referred in

If you think this note resonated, be it positive or negative, please feel free to send me an email and we can talk.