Write C Program to simulate Multi-Level Feedback Queue(MLFQ) CPU Scheduling
Rule 1: If Priority(A) > Priority(B), A runs (B doesn’t).• Rule 2: If Priority(A) = Priority(B), A & B run in a round-robin fashion using the time slice (quantum length) of the given queue.• Rule 3: When a job enters the system, it is placed at the highest priority (the topmost queue).• Rule 4: Once a job uses up its time allotment at a given level (re- regardless of how many times it has given up the CPU), its priority is reduced (i.e., it moves down one queue).• Rule 5: After some time period S, move all the jobs in the system to the topmost queue.
Ask the user to enter the number of Queues
Ask the user to enter the number of Processes
Ask the user to enter the number Please enter the CPU time slice
Ask the user to enter return after # units
#include
#include
#include
#include
#define PERIOD 500
typedef struct node /*Process node information*/
{
char name[20];
/*The name of the process*/
int queue;
/*The queue in which the process is in*/
int type;
/*The type of process, 0–CPU task, 1–IO task*/
int round;
/*Allocate CPU time slice*/
int cputime;
/*CPU execution time*/
int needtime;
/*The time required for the process to execute*/
char state;
/*The state of the process, W-ready state, Rexecution state, F-completion state*/
int count;
/*Record the number of times executed*/
struct node *next; /*Next process node*/
} PCB;
typedef struct Queue /*Multi-level ready queue node information*/
{
PCB *LinkPCB;
/*Process queue pointer in the ready queue*/
int prio;
/*The priority of this ready queue*/
int round;
/*The time slice allocated by this ready queue*/
struct Queue *next; /*Linked list pointer to the next ready queue*/
} ReadyQueue;
PCB *run = NULL, *finish = NULL; /*Define three queues, ready queue,
execution queue and completion queue*/
ReadyQueue *Head = NULL;
/*Define the first ready queue*/
int num;
/*Number of processes*/
int ReadyNum;
/*Number of ready queues*/
int period_count = 0;
/*Current running time*/
void Output();
/*Process information output function*/
void InsertFinish(PCB *in); /*Insert the process to the end of the
completion queue*/
void InsertPrio(ReadyQueue *in); /*Create a ready queue, the smaller
the specified priority, the lower the priority*/
void PrioCreate();
/*Create ready queue input function*/
void GetFirst(ReadyQueue *queue);
/*Get the head process in a
certain ready queue*/
void InsertLast(PCB *in, ReadyQueue *queue); /*Insert the process to
the end of the ready queue*/
void ProcessCreate();
/*Process creation function*/
void RoundRun(ReadyQueue *timechip); /*Time slice rotation scheduling
algorithm*/
void MultiDispatch(); /*Multi-level scheduling algorithm, execute one
time slice at a time*/
void MoveToHead(); /*Move all tasks to the topmost queue after a
certain period of time*/
int main(void)
{
PrioCreate();
/*Create a ready process queue*/
ProcessCreate(); /*Create a ready process queue*/
MultiDispatch(); /*Algorithm start*/
Output();
/*Output the final scheduling sequence*/
return 0;
}
void Output() /*Process information output function*/
{
ReadyQueue *print = Head;
PCB *p;
printf(“Process name\tQueue \tcpu time\t Time required \t process
status\n”);
while (print)
{
if (print->LinkPCB != NULL)
{
p = print->LinkPCB;
while (p)
{
printf(“%s\t%d\t%d\t%d\t\t%c\n”, p->name, p->queue, p->cputime,
p->needtime, p->state);
p = p->next;
}
}
print = print->next;
}
p = finish;
while (p != NULL)
{
printf(“%s\t%d\t%d\t%d\t\t%c\n”, p->name, p->queue, p->cputime, p>needtime, p->state);
p = p->next;
}
p = run;
while (p != NULL)
{
printf(“%s\t%d\t%d\t%d\t\t%c\n”, p->name, p->queue, p->cputime, p>needtime, p->state, p->count);
p = p->next;
}
}
void InsertFinish(PCB *in) /*Insert the process to the end of the
completion queue*/
{
PCB *fst;
fst = finish;
if (finish == NULL)
{
in->next = finish;
finish = in;
}
else
{
while (fst->next != NULL)
{
fst = fst->next;
}
in->next = fst->next;
fst->next = in;
}
}
void InsertPrio(ReadyQueue *in) /*Create a ready queue, the smaller
the specified priority, the lower the priority*/
{
ReadyQueue *fst, *nxt;
fst = nxt = Head;
if (Head == NULL) /*If there is no queue, it is the first element*/
{
in->next = Head;
Head = in;
}
else /*Find the appropriate location to insert*/
{
if (in->prio >= fst->prio) /*If it is larger than the first one,
it is inserted into the head of the team*/
{
in->next = Head;
Head = in;
}
else
{
while (fst->next != NULL) /*Move the pointer to find the
position of the first element smaller than it for insertion*/
{
nxt = fst;
fst = fst->next;
}
if (fst->next == NULL)
{
in->next = fst->next;
fst->next = in;
}
else /*Insert into the queue*/
{
nxt = in;
in->next = fst;
}
}
}
}
void PrioCreate() /*Create ready queue input function*/
{
ReadyQueue *tmp;
int i;
printf(“**********Multi-level feedback queue scheduling
algorithm**********\n”);
printf(“Please enter the number of ready queues?\n”);
scanf(“%d”, &ReadyNum);
printf(“Please enter the CPU time slice of each ready queue
separately?\n”);
for (i = 0; i < ReadyNum; i++)
{
if ((tmp = (ReadyQueue *)malloc(sizeof(ReadyQueue))) == NULL)
{
perror("malloc");
exit(1);
}
scanf("%d", &(tmp->round)); /*Enter the CPU time slice allocated
to each process in this ready queue*/
tmp->prio = 500 – tmp->round; /*Set its priority, the higher the
time slice, the lower its priority*/
tmp->LinkPCB = NULL;
/*The process queue that initializes
its connection is empty*/
tmp->next = NULL;
InsertPrio(tmp); /*Establish multiple ready queues from high to
low according to priority*/
}
}
void GetFirst(ReadyQueue *queue) /*Get the head process in a certain
ready queue*/
{
run = queue->LinkPCB;
if (queue->LinkPCB != NULL)
{
run->state = ‘R’;
queue->LinkPCB = queue->LinkPCB->next;
run->next = NULL;
}
}
void InsertLast(PCB *in, ReadyQueue *queue) /*Insert the process to
the end of the ready queue*/
{
PCB *fst;
fst = queue->LinkPCB;
if (queue->LinkPCB == NULL)
{
in->next = queue->LinkPCB;
queue->LinkPCB = in;
}
else
{
while (fst->next != NULL)
{
fst = fst->next;
}
in->next = fst->next;
fst->next = in;
}
}
void ProcessCreate() /*Process creation function*/
{
PCB *tmp;
int i;
printf(“Please enter the number of processes?\n”);
scanf(“%d”, &num);
printf(“Please enter the process name and the time required for the
process?\nname\ttime\ttype\n”);
for (i = 0; i < num; i++)
{
if ((tmp = (PCB *)malloc(sizeof(PCB))) == NULL)
{
perror("malloc");
exit(1);
}
scanf("%s", tmp->name);
getchar(); /*Absorbs the carriage return symbol*/
scanf(“%d”, &(tmp->needtime));
getchar(); /*Absorbs the carriage return symbol*/
scanf(“%d”, &(tmp->type));
tmp->cputime = 0;
tmp->state = ‘W’;
// tmp->prio = 50 – tmp->needtime; /*Set its priority, the more
time it takes, the lower the priority*/
tmp->queue = 0;
tmp->round = Head->round;
tmp->count = 0;
InsertLast(tmp, Head); /*Insert into the ready queue from high to
low according to priority*/
}
}
void RoundRun(ReadyQueue *timechip)/*Time slice rotation scheduling
algorithm*/
{
int flag = 1;
GetFirst(timechip);
while (run != NULL)
{
while (flag && period_count < PERIOD)
{
period_count++;
run->count++;
run->cputime++;
run->needtime–;
if (run->needtime == 0) /*The process is completed*/
{
run->state = ‘F’;
InsertFinish(run);
flag = 0;
}
else if (run->count == timechip->round) /*Time slice runs out*/
{
run->state = ‘W’;
run->count = 0; /*The counter is cleared to prepare for the
next time*/
InsertLast(run, timechip);
flag = 0;
}
}
flag = 1;
if (period_count >= PERIOD)
{
MoveToHead();
MultiDispatch();
return;
}
GetFirst(timechip);
}
}
void MultiDispatch() /*Multi-level scheduling algorithm, execute one
time slice at a time*/
{
int flag = 1;
int k = 0;
ReadyQueue *point;
point = Head;
GetFirst(point); /*Get the first process in the process queue of the
current execution queue*/
while (run != NULL)
{
Output();
while (flag && period_count < PERIOD)
{
period_count++;
run->count++;
run->cputime++;
run->needtime–;
if (run->needtime state = ‘F’;
InsertFinish(run);
flag = 0;
}
else if (run->count == run->round) /*Time slice runs out*/
{
run->state = ‘W’;
run->count = 0; /*The counter is cleared to prepare for the
next time*/
if (point->next != NULL)
{
run->round = point->next->round; /*Set its time slice to be
the time slice of the next ready queue*/
run->queue++;
/*Set the queue as the next
ready queue*/
InsertLast(run, point->next);
/*Insert the process into
the next ready queue*/
flag = 0;
}
else
{
RoundRun(point); /*If it is the last ready queue, the time
slice rotation algorithm is called*/
break;
}
}
else if (run->type == 1) /*IO task exits the CPU*/
{
run->state = ‘W’;
InsertLast(run, point);
flag = 0;
}
}
flag = 1;
if (period_count >= PERIOD) /*?Move all tasks to the top after a
set period*/
{
MoveToHead();
MultiDispatch();
return;
}
else
{
if (Head->LinkPCB != NULL)
point = Head;
if (point->LinkPCB == NULL) /*There is no ready process in the
current queue, and the ready queue pointer moves down*/
point = point->next;
if (point->next == NULL)
{
RoundRun(point);
break;
}
}
GetFirst(point);
}
}
void MoveToHead() /*Move all tasks to the HEAD queue after a cycle
ends*/
{
printf(“———–move to head before———–\n”);
Output();
printf(“———–move to head———–\n”);
if (run->state == ‘R’)
{
run->state = ‘W’;
run->queue = 0;
InsertLast(run, Head);
}
else
{
run = NULL;
}
ReadyQueue *other_queue = Head->next;
PCB *p;
while (other_queue)
{
if (other_queue->LinkPCB != NULL)
{
while (other_queue->LinkPCB)
{
p = other_queue->LinkPCB;
other_queue->LinkPCB = other_queue->LinkPCB->next;
p->next = NULL;
p->queue = 0;
InsertLast(p, Head);
}
}
other_queue = other_queue->next;
}
period_count = 0;
Output();
printf(“———–move to head———–\n”);
sleep(10);
}
Group Name: Nikkei
Group Members: Nicholas Condray, Khrystyna Kodatska
//description; In this code, the multi-level feedback queue scheduling algorithm is simulated. There are a total of five queues and three processes.
//The processes will eventually end up on the last queue after the first process executes and all processes move down. This results in a round-robin (RR) on only one queue.
//The time slices for each queue are always set to 1. This means that if a process has a burst time of 20, it will take 20-time units for the process to complete.
//Once all processes have been completed, the counter is reset and the process jumps to the top queue.
//There are two implementations of the scheduler. The first implementation is a global counter.
//This keeps track of the time units and once it reaches 20, all jobs that might still be running are moved to the highest priority queue.
//The second implementation is to move the jobs from the last queue on top. This means that all jobs in the last queue are moved to the top queue and the process repeats.
//If all queues are empty, the loop exits.
#include
#include
#define MAX_PROCESSES 100
#define MAX_QUEUES 10
typedef struct {
int id;
int burst_time;
int priority;
} process;
typedef struct {
int size;
int front;
int rear;
process* processes[MAX_PROCESSES];
} queue;
queue* create_queue();
void enqueue(queue* q, process* p);
process* dequeue(queue* q);
int main()
{
int n, i;
process* processes[MAX_PROCESSES];
queue* queues[MAX_QUEUES];
scanf(“%d”, &n);
for (i = 0; i < n; i++) {
processes[i] = (process*)malloc(sizeof(process));
processes[i]->id = i;
scanf(“%d %d”, &processes[i]->burst_time, &processes[i]->priority);
}
for (i = 0; i < MAX_QUEUES; i++)
queues[i] = create_queue();
for (i = 0; i < n; i++)
enqueue(queues[processes[i]->priority], processes[i]);
while (1) {
for (i = 0; i < MAX_QUEUES; i++) {
if (queues[i]->size == 0)
continue;
process* p = dequeue(queues[i]);
printf(“Process %d running on queue %d\n”, p->id, i);
p->burst_time–;
if (p->burst_time == 0) {
printf(“Process %d completed\n”, p->id);
free(p);
} else {
if (i == MAX_QUEUES – 1)
enqueue(queues[0], p);
else
enqueue(queues[i + 1], p);
}
}
}
return 0;
}8
Scheduling:
The Multi-Level Feedback Queue
In this chapter, we’ll tackle the problem of developing one of the most
well-known approaches to scheduling, known as the Multi-level Feedback Queue (MLFQ). The Multi-level Feedback Queue (MLFQ) scheduler was first described by Corbato et al. in 1962 [C+62] in a system
known as the Compatible Time-Sharing System (CTSS), and this work,
along with later work on Multics, led the ACM to award Corbato its
highest honor, the Turing Award. The scheduler has subsequently been
refined throughout the years to the implementations you will encounter
in some modern systems.
The fundamental problem MLFQ tries to address is two-fold. First, it
would like to optimize turnaround time, which, as we saw in the previous
note, is done by running shorter jobs first; unfortunately, the OS doesn’t
generally know how long a job will run for, exactly the knowledge that
algorithms like SJF (or STCF) require. Second, MLFQ would like to make
a system feel responsive to interactive users (i.e., users sitting and staring
at the screen, waiting for a process to finish), and thus minimize response
time; unfortunately, algorithms like Round Robin reduce response time
but are terrible for turnaround time. Thus, our problem: given that we
in general do not know anything about a process, how can we build a
scheduler to achieve these goals? How can the scheduler learn, as the
system runs, the characteristics of the jobs it is running, and thus make
better scheduling decisions?
T HE C RUX :
H OW T O S CHEDULE W ITHOUT P ERFECT K NOWLEDGE ?
How can we design a scheduler that both minimizes response time for
interactive jobs while also minimizing turnaround time without a priori
knowledge of job length?
1
2
S CHEDULING :
T HE M ULTI -L EVEL F EEDBACK Q UEUE
T IP : L EARN F ROM H ISTORY
The multi-level feedback queue is an excellent example of a system that
learns from the past to predict the future. Such approaches are common in operating systems (and many other places in Computer Science,
including hardware branch predictors and caching algorithms). Such
approaches work when jobs have phases of behavior and are thus predictable; of course, one must be careful with such techniques, as they can
easily be wrong and drive a system to make worse decisions than they
would have with no knowledge at all.
8.1
MLFQ: Basic Rules
To build such a scheduler, in this chapter we will describe the basic
algorithms behind a multi-level feedback queue; although the specifics of
many implemented MLFQs differ [E95], most approaches are similar.
In our treatment, the MLFQ has a number of distinct queues, each
assigned a different priority level. At any given time, a job that is ready
to run is on a single queue. MLFQ uses priorities to decide which job
should run at a given time: a job with higher priority (i.e., a job on a
higher queue) is chosen to run.
Of course, more than one job may be on a given queue, and thus have
the same priority. In this case, we will just use round-robin scheduling
among those jobs.
Thus, we arrive at the first two basic rules for MLFQ:
• Rule 1: If Priority(A) > Priority(B), A runs (B doesn’t).
• Rule 2: If Priority(A) = Priority(B), A & B run in RR.
The key to MLFQ scheduling therefore lies in how the scheduler sets
priorities. Rather than giving a fixed priority to each job, MLFQ varies
the priority of a job based on its observed behavior. If, for example, a job
repeatedly relinquishes the CPU while waiting for input from the keyboard, MLFQ will keep its priority high, as this is how an interactive
process might behave. If, instead, a job uses the CPU intensively for long
periods of time, MLFQ will reduce its priority. In this way, MLFQ will try
to learn about processes as they run, and thus use the history of the job to
predict its future behavior.
If we were to put forth a picture of what the queues might look like at
a given instant, we might see something like the following (Figure 8.1).
In the figure, two jobs (A and B) are at the highest priority level, while job
C is in the middle and Job D is at the lowest priority. Given our current
knowledge of how MLFQ works, the scheduler would just alternate time
slices between A and B because they are the highest priority jobs in the
system; poor jobs C and D would never even get to run — an outrage!
Of course, just showing a static snapshot of some queues does not really give you an idea of how MLFQ works. What we need is to under-
O PERATING
S YSTEMS
[V ERSION 1.01]
WWW. OSTEP. ORG
S CHEDULING :
T HE M ULTI -L EVEL F EEDBACK Q UEUE
[High Priority]
Q8
3
A
B
Q7
Q6
Q5
Q4
C
Q3
Q2
[Low Priority]
Q1
D
Figure 8.1: MLFQ Example
stand how job priority changes over time. And that, in a surprise only
to those who are reading a chapter from this book for the first time, is
exactly what we will do next.
8.2 Attempt #1: How To Change Priority
We now must decide how MLFQ is going to change the priority level
of a job (and thus which queue it is on) over the lifetime of a job. To do
this, we must keep in mind our workload: a mix of interactive jobs that
are short-running (and may frequently relinquish the CPU), and some
longer-running “CPU-bound” jobs that need a lot of CPU time but where
response time isn’t important. Here is our first attempt at a priorityadjustment algorithm:
• Rule 3: When a job enters the system, it is placed at the highest
priority (the topmost queue).
• Rule 4a: If a job uses up an entire time slice while running, its priority is reduced (i.e., it moves down one queue).
• Rule 4b: If a job gives up the CPU before the time slice is up, it stays
at the same priority level.
Example 1: A Single Long-Running Job
Let’s look at some examples. First, we’ll look at what happens when there
has been a long running job in the system. Figure 8.2 shows what happens
to this job over time in a three-queue scheduler.
c 2008–19, A RPACI -D USSEAU
T HREE
E ASY
P IECES
S CHEDULING :
T HE M ULTI -L EVEL F EEDBACK Q UEUE
4
Q2
Q1
Q0
0
50
100
150
200
Figure 8.2: Long-running Job Over Time
As you can see in the example, the job enters at the highest priority
(Q2). After a single time-slice of 10 ms, the scheduler reduces the job’s
priority by one, and thus the job is on Q1. After running at Q1 for a time
slice, the job is finally lowered to the lowest priority in the system (Q0),
where it remains. Pretty simple, no?
Example 2: Along Came A Short Job
Now let’s look at a more complicated example, and hopefully see how
MLFQ tries to approximate SJF. In this example, there are two jobs: A,
which is a long-running CPU-intensive job, and B, which is a short-running
interactive job. Assume A has been running for some time, and then B arrives. What will happen? Will MLFQ approximate SJF for B?
Figure 8.3 plots the results of this scenario. A (shown in black) is running along in the lowest-priority queue (as would any long-running CPUintensive jobs); B (shown in gray) arrives at time T = 100, and thus is
Q2
Q1
Q0
0
50
100
150
200
Figure 8.3: Along Came An Interactive Job
O PERATING
S YSTEMS
[V ERSION 1.01]
WWW. OSTEP. ORG
S CHEDULING :
T HE M ULTI -L EVEL F EEDBACK Q UEUE
5
Q2
Q1
Q0
0
50
100
150
200
Figure 8.4: A Mixed I/O-intensive and CPU-intensive Workload
inserted into the highest queue; as its run-time is short (only 20 ms), B
completes before reaching the bottom queue, in two time slices; then A
resumes running (at low priority).
From this example, you can hopefully understand one of the major
goals of the algorithm: because it doesn’t know whether a job will be a
short job or a long-running job, it first assumes it might be a short job, thus
giving the job high priority. If it actually is a short job, it will run quickly
and complete; if it is not a short job, it will slowly move down the queues,
and thus soon prove itself to be a long-running more batch-like process.
In this manner, MLFQ approximates SJF.
Example 3: What About I/O?
Let’s now look at an example with some I/O. As Rule 4b states above, if a
process gives up the processor before using up its time slice, we keep it at
the same priority level. The intent of this rule is simple: if an interactive
job, for example, is doing a lot of I/O (say by waiting for user input from
the keyboard or mouse), it will relinquish the CPU before its time slice is
complete; in such case, we don’t wish to penalize the job and thus simply
keep it at the same level.
Figure 8.4 shows an example of how this works, with an interactive job
B (shown in gray) that needs the CPU only for 1 ms before performing an
I/O competing for the CPU with a long-running batch job A (shown in
black). The MLFQ approach keeps B at the highest priority because B
keeps releasing the CPU; if B is an interactive job, MLFQ further achieves
its goal of running interactive jobs quickly.
Problems With Our Current MLFQ
We thus have a basic MLFQ. It seems to do a fairly good job, sharing the
CPU fairly between long-running jobs, and letting short or I/O-intensive
interactive jobs run quickly. Unfortunately, the approach we have developed thus far contains serious flaws. Can you think of any?
(This is where you pause and think as deviously as you can)
c 2008–19, A RPACI -D USSEAU
T HREE
E ASY
P IECES
S CHEDULING :
T HE M ULTI -L EVEL F EEDBACK Q UEUE
6
Q2
Q2
Q1
Q1
Q0
Q0
0
50
100
150
200
0
50
100
150
200
Figure 8.5: Without (Left) and With (Right) Priority Boost
First, there is the problem of starvation: if there are “too many” interactive jobs in the system, they will combine to consume all CPU time,
and thus long-running jobs will never receive any CPU time (they starve).
We’d like to make some progress on these jobs even in this scenario.
Second, a smart user could rewrite their program to game the scheduler. Gaming the scheduler generally refers to the idea of doing something sneaky to trick the scheduler into giving you more than your fair
share of the resource. The algorithm we have described is susceptible to
the following attack: before the time slice is over, issue an I/O operation
(to some file you don’t care about) and thus relinquish the CPU; doing so
allows you to remain in the same queue, and thus gain a higher percentage of CPU time. When done right (e.g., by running for 99% of a time slice
before relinquishing the CPU), a job could nearly monopolize the CPU.
Finally, a program may change its behavior over time; what was CPUbound may transition to a phase of interactivity. With our current approach, such a job would be out of luck and not be treated like the other
interactive jobs in the system.
T IP : S CHEDULING M UST B E S ECURE F ROM ATTACK
You might think that a scheduling policy, whether inside the OS itself
(as discussed herein), or in a broader context (e.g., in a distributed storage system’s I/O request handling [Y+18]), is not a security concern, but
in increasingly many cases, it is exactly that. Consider the modern datacenter, in which users from around the world share CPUs, memories,
networks, and storage systems; without care in policy design and enforcement, a single user may be able to adversely harm others and gain
advantage for itself. Thus, scheduling policy forms an important part of
the security of a system, and should be carefully constructed.
O PERATING
S YSTEMS
[V ERSION 1.01]
WWW. OSTEP. ORG
S CHEDULING :
T HE M ULTI -L EVEL F EEDBACK Q UEUE
Q2
Q2
Q1
Q1
Q0
Q0
0
50
100 150 200 250 300
7
0
50
100 150 200 250 300
Figure 8.6: Without (Left) and With (Right) Gaming Tolerance
8.3 Attempt #2: The Priority Boost
Let’s try to change the rules and see if we can avoid the problem of
starvation. What could we do in order to guarantee that CPU-bound jobs
will make some progress (even if it is not much?).
The simple idea here is to periodically boost the priority of all the jobs
in system. There are many ways to achieve this, but let’s just do something simple: throw them all in the topmost queue; hence, a new rule:
• Rule 5: After some time period S, move all the jobs in the system
to the topmost queue.
Our new rule solves two problems at once. First, processes are guaranteed not to starve: by sitting in the top queue, a job will share the CPU
with other high-priority jobs in a round-robin fashion, and thus eventually receive service. Second, if a CPU-bound job has become interactive,
the scheduler treats it properly once it has received the priority boost.
Let’s see an example. In this scenario, we just show the behavior of
a long-running job when competing for the CPU with two short-running
interactive jobs. Two graphs are shown in Figure 8.5 (page 6). On the left,
there is no priority boost, and thus the long-running job gets starved once
the two short jobs arrive; on the right, there is a priority boost every 50
ms (which is likely too small of a value, but used here for the example),
and thus we at least guarantee that the long-running job will make some
progress, getting boosted to the highest priority every 50 ms and thus
getting to run periodically.
Of course, the addition of the time period S leads to the obvious question: what should S be set to? John Ousterhout, a well-regarded systems
researcher [O11], used to call such values in systems voo-doo constants,
because they seemed to require some form of black magic to set them correctly. Unfortunately, S has that flavor. If it is set too high, long-running
jobs could starve; too low, and interactive jobs may not get a proper share
of the CPU.
c 2008–19, A RPACI -D USSEAU
T HREE
E ASY
P IECES
S CHEDULING :
T HE M ULTI -L EVEL F EEDBACK Q UEUE
8
Q2
Q1
Q0
0
50
100
150
200
250
300
Figure 8.7: Lower Priority, Longer Quanta
8.4
Attempt #3: Better Accounting
We now have one more problem to solve: how to prevent gaming of
our scheduler? The real culprit here, as you might have guessed, are
Rules 4a and 4b, which let a job retain its priority by relinquishing the
CPU before the time slice expires. So what should we do?
The solution here is to perform better accounting of CPU time at each
level of the MLFQ. Instead of forgetting how much of a time slice a process used at a given level, the scheduler should keep track; once a process
has used its allotment, it is demoted to the next priority queue. Whether
it uses the time slice in one long burst or many small ones does not matter.
We thus rewrite Rules 4a and 4b to the following single rule:
• Rule 4: Once a job uses up its time allotment at a given level (regardless of how many times it has given up the CPU), its priority is
reduced (i.e., it moves down one queue).
Let’s look at an example. Figure 8.6 (page 7) shows what happens
when a workload tries to game the scheduler with the old Rules 4a and 4b
(on the left) as well the new anti-gaming Rule 4. Without any protection
from gaming, a process can issue an I/O just before a time slice ends and
thus dominate CPU time. With such protections in place, regardless of
the I/O behavior of the process, it slowly moves down the queues, and
thus cannot gain an unfair share of the CPU.
8.5
Tuning MLFQ And Other Issues
A few other issues arise with MLFQ scheduling. One big question is
how to parameterize such a scheduler. For example, how many queues
should there be? How big should the time slice be per queue? How often
should priority be boosted in order to avoid starvation and account for
changes in behavior? There are no easy answers to these questions, and
thus only some experience with workloads and subsequent tuning of the
scheduler will lead to a satisfactory balance.
O PERATING
S YSTEMS
[V ERSION 1.01]
WWW. OSTEP. ORG
S CHEDULING :
T HE M ULTI -L EVEL F EEDBACK Q UEUE
9
T IP : AVOID V OO – DOO C ONSTANTS (O USTERHOUT ’ S L AW )
Avoiding voo-doo constants is a good idea whenever possible. Unfortunately, as in the example above, it is often difficult. One could try to
make the system learn a good value, but that too is not straightforward.
The frequent result: a configuration file filled with default parameter values that a seasoned administrator can tweak when something isn’t quite
working correctly. As you can imagine, these are often left unmodified,
and thus we are left to hope that the defaults work well in the field. This
tip brought to you by our old OS professor, John Ousterhout, and hence
we call it Ousterhout’s Law.
For example, most MLFQ variants allow for varying time-slice length
across different queues. The high-priority queues are usually given short
time slices; they are comprised of interactive jobs, after all, and thus
quickly alternating between them makes sense (e.g., 10 or fewer milliseconds). The low-priority queues, in contrast, contain long-running jobs
that are CPU-bound; hence, longer time slices work well (e.g., 100s of
ms). Figure 8.7 (page 8) shows an example in which two jobs run for 20
ms at the highest queue (with a 10-ms time slice), 40 ms in the middle
(20-ms time slice), and with a 40-ms time slice at the lowest.
The Solaris MLFQ implementation — the Time-Sharing scheduling
class, or TS — is particularly easy to configure; it provides a set of tables
that determine exactly how the priority of a process is altered throughout its lifetime, how long each time slice is, and how often to boost the
priority of a job [AD00]; an administrator can muck with this table in order to make the scheduler behave in different ways. Default values for
the table are 60 queues, with slowly increasing time-slice lengths from
20 milliseconds (highest priority) to a few hundred milliseconds (lowest),
and priorities boosted around every 1 second or so.
Other MLFQ schedulers don’t use a table or the exact rules described
in this chapter; rather they adjust priorities using mathematical formulae. For example, the FreeBSD scheduler (version 4.3) uses a formula to
calculate the current priority level of a job, basing it on how much CPU
the process has used [LM+89]; in addition, usage is decayed over time,
providing the desired priority boost in a different manner than described
herein. See Epema’s paper for an excellent overview of such decay-usage
algorithms and their properties [E95].
Finally, many schedulers have a few other features that you might encounter. For example, some schedulers reserve the highest priority levels
for operating system work; thus typical user jobs can never obtain the
highest levels of priority in the system. Some systems also allow some
user advice to help set priorities; for example, by using the command-line
utility nice you can increase or decrease the priority of a job (somewhat)
and thus increase or decrease its chances of running at any given time.
See the man page for more.
c 2008–19, A RPACI -D USSEAU
T HREE
E ASY
P IECES
S CHEDULING :
T HE M ULTI -L EVEL F EEDBACK Q UEUE
10
T IP : U SE A DVICE W HERE P OSSIBLE
As the operating system rarely knows what is best for each and every
process of the system, it is often useful to provide interfaces to allow users
or administrators to provide some hints to the OS. We often call such
hints advice, as the OS need not necessarily pay attention to it, but rather
might take the advice into account in order to make a better decision.
Such hints are useful in many parts of the OS, including the scheduler
(e.g., with nice), memory manager (e.g., madvise), and file system (e.g.,
informed prefetching and caching [P+95]).
8.6
MLFQ: Summary
We have described a scheduling approach known as the Multi-Level
Feedback Queue (MLFQ). Hopefully you can now see why it is called
that: it has multiple levels of queues, and uses feedback to determine the
priority of a given job. History is its guide: pay attention to how jobs
behave over time and treat them accordingly.
The refined set of MLFQ rules, spread throughout the chapter, are reproduced here for your viewing pleasure:
• Rule 1: If Priority(A) > Priority(B), A runs (B doesn’t).
• Rule 2: If Priority(A) = Priority(B), A & B run in round-robin fashion using the time slice (quantum length) of the given queue.
• Rule 3: When a job enters the system, it is placed at the highest
priority (the topmost queue).
• Rule 4: Once a job uses up its time allotment at a given level (regardless of how many times it has given up the CPU), its priority is
reduced (i.e., it moves down one queue).
• Rule 5: After some time period S, move all the jobs in the system
to the topmost queue.
MLFQ is interesting for the following reason: instead of demanding
a priori knowledge of the nature of a job, it observes the execution of a
job and prioritizes it accordingly. In this way, it manages to achieve the
best of both worlds: it can deliver excellent overall performance (similar
to SJF/STCF) for short-running interactive jobs, and is fair and makes
progress for long-running CPU-intensive workloads. For this reason,
many systems, including BSD U NIX derivatives [LM+89, B86], Solaris
[M06], and Windows NT and subsequent Windows operating systems
[CS97] use a form of MLFQ as their base scheduler.
O PERATING
S YSTEMS
[V ERSION 1.01]
WWW. OSTEP. ORG
S CHEDULING :
T HE M ULTI -L EVEL F EEDBACK Q UEUE
11
References
[AD00] “Multilevel Feedback Queue Scheduling in Solaris” by Andrea Arpaci-Dusseau. Available: http://www.ostep.org/Citations/notes-solaris.pdf. A great short set of notes by one of the
authors on the details of the Solaris scheduler. OK, we are probably biased in this description, but the
notes are pretty darn good.
[B86] “The Design of the U NIX Operating System” by M.J. Bach. Prentice-Hall, 1986. One of the
classic old books on how a real U NIX operating system is built; a definite must-read for kernel hackers.
[C+62] “An Experimental Time-Sharing System” by F. J. Corbato, M. M. Daggett, R. C. Daley.
IFIPS 1962. A bit hard to read, but the source of many of the first ideas in multi-level feedback scheduling. Much of this later went into Multics, which one could argue was the most influential operating
system of all time.
[CS97] “Inside Windows NT” by Helen Custer and David A. Solomon. Microsoft Press, 1997.
The NT book, if you want to learn about something other than U NIX. Of course, why would you? OK,
we’re kidding; you might actually work for Microsoft some day you know.
[E95] “An Analysis of Decay-Usage Scheduling in Multiprocessors” by D.H.J. Epema. SIGMETRICS ’95. A nice paper on the state of the art of scheduling back in the mid 1990s, including a
good overview of the basic approach behind decay-usage schedulers.
[LM+89] “The Design and Implementation of the 4.3BSD U NIX Operating System” by S.J. Leffler, M.K. McKusick, M.J. Karels, J.S. Quarterman. Addison-Wesley, 1989. Another OS classic,
written by four of the main people behind BSD. The later versions of this book, while more up to date,
don’t quite match the beauty of this one.
[M06] “Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture” by Richard McDougall. Prentice-Hall, 2006. A good book about Solaris and how it works.
[O11] “John Ousterhout’s Home Page” by John Ousterhout. www.stanford.edu/˜ouster/.
The home page of the famous Professor Ousterhout. The two co-authors of this book had the pleasure of
taking graduate operating systems from Ousterhout while in graduate school; indeed, this is where the
two co-authors got to know each other, eventually leading to marriage, kids, and even this book. Thus,
you really can blame Ousterhout for this entire mess you’re in.
[P+95] “Informed Prefetching and Caching” by R.H. Patterson, G.A. Gibson, E. Ginting, D.
Stodolsky, J. Zelenka. SOSP ’95, Copper Mountain, Colorado, October 1995. A fun paper about
some very cool ideas in file systems, including how applications can give the OS advice about what files
it is accessing and how it plans to access them.
[Y+18] “Principled Schedulability Analysis for Distributed Storage Systems using Thread Architecture Models” by Suli Yang, Jing Liu, Andrea C. Arpaci-Dusseau, Remzi H. ArpaciDusseau. OSDI ’18, San Diego, California. A recent work of our group that demonstrates the
difficulty of scheduling I/O requests within modern distributed storage systems such as Hive/HDFS,
Cassandra, MongoDB, and Riak. Without care, a single user might be able to monopolize system resources.
c 2008–19, A RPACI -D USSEAU
T HREE
E ASY
P IECES
S CHEDULING :
T HE M ULTI -L EVEL F EEDBACK Q UEUE
12
Homework (Simulation)
This program, mlfq.py, allows you to see how the MLFQ scheduler
presented in this chapter behaves. See the README for details.
Questions
1. Run a few randomly-generated problems with just two jobs and
two queues; compute the MLFQ execution trace for each. Make
your life easier by limiting the length of each job and turning off
I/Os.
2. How would you run the scheduler to reproduce each of the examples in the chapter?
3. How would you configure the scheduler parameters to behave just
like a round-robin scheduler?
4. Craft a workload with two jobs and scheduler parameters so that
one job takes advantage of the older Rules 4a and 4b (turned on
with the -S flag) to game the scheduler and obtain 99% of the CPU
over a particular time interval.
5. Given a system with a quantum length of 10 ms in its highest queue,
how often would you have to boost jobs back to the highest priority
level (with the -B flag) in order to guarantee that a single longrunning (and potentially-starving) job gets at least 5% of the CPU?
6. One question that arises in scheduling is which end of a queue to
add a job that just finished I/O; the -I flag changes this behavior
for this scheduling simulator. Play around with some workloads
and see if you can see the effect of this flag.
O PERATING
S YSTEMS
[V ERSION 1.01]
WWW. OSTEP. ORG