[Solution] Task Scheduling and Data Assignment Codeforces Solution

With the rapid development of chip hardware computing capabilities, hardware architectures are evolving towards more complex multi-core and heterogeneous architectures. In this problem, your goal is to design an efficient task scheduling algorithm together with a memory allocation policy for the software-hardware co-optimization problem that exists widely in operating systems, distributed systems, and compilers.

Description

There are $ℓ$ tasks that need to be scheduled on $n$ machines with varying processing powers. There are also $m$ disks with varying transmission speeds and storage capacities for storing the data generated by each task after its execution.

Task schedule. A task schedule is an assignment of tasks to machines and disks. More specifically, for each task $i$ , a task schedule specifies three parameters $(x_{i}, y_{i}, z_{i})$ , meaning that $i$ starts at time $x_{i}$ on machine $y_{i}$ and stores its data to disk $z_{i}$ . The earliest starting time of any task is 0.

Task Running Process Here are more details about the running process of a task $i$ when the task schedule is fixed.

The processing power of machine $j$ is $p o w e r (j)$ . Each task $i$ has a size of $s i z e (i)$ , which means it needs $⌈ s i z e (i) / p o w e r (y_{i}) ⌉$ units of time to execute when it is run on machine $y_{i}$ . Here, $⌈ . ⌉$ means rounding up.
Each disk $k$ has a read/write transmission rate of $s p e e d (k)$ . When a task $i$ finishes its execution, it will generate a piece of data of size $d a t a (i)$ that needs to be stored on some disk. This storing step will take $⌈ d a t a (i) / s p e e d (z_{i}) ⌉$ units of time if this data is stored on disk $z_{i}$ . Here, the value of time also needs to be rounded up.
Before a task $i$ starts its execution, it needs to read the output data of some other tasks. We let $P R E D_{d a t a} (i)$ denote the set of tasks whose output data are needed by task $i$ . This will take a total time of $\sum_{j \in P R E D_{d a t a} (i)} ⌈ d a t a (j) / s p e e d (z_{j}) ⌉$ and this must be done before task $i$ starts its execution. Note that when calculating the total time, each term needs to be rounded up individually and then summed.

Summarizing the above process, when task $i$ starts, it goes through three phases: reading data from other tasks, executing the task, and storing data. For convenience, we denote the starting time of the three phases as $a_{i}, b_{i}, c_{i}$ and the finishing time of the last phase as $d_{i}$ . These values are determined as follows.

$a_{i} = x_{i}$
$b_{i} = a_{i} + \sum_{j \in P R E D_{d a t a} (i)} ⌈ d a t a (j) / s p e e d (z_{j}) ⌉$ .
$c_{i} = b_{i} + ⌈ s i z e (i) / p o w e r (y_{i}) ⌉$ .
$d_{i} = c_{i} + ⌈ d a t a (i) / s p e e d (z_{i}) ⌉$ .

In addition, a feasible task schedule should satisfy the following requirements:

No preemption: Each machine executes only one task or transmits one data at a time, and no interruption is allowed during its lifecycle. In other words, for each pair of tasks $i$ and $j$ such that $y_{i} = y_{j}$ , the two intervals $(a_{i}, d_{i})$ and $(a_{j}, d_{j})$ cannot have any overlaps.
Task dependencies: Each task $i$ has a list of task-dependent tasks, denoted by $P R E D_{t a s k} (i)$ . For each task $j \in P R E D_{t a s k} (i)$ , task $i$ cannot start until $j$ finishes its execution. That is, it must satisfy $a_{i} \geq c_{j}$ . Note that if $i$ is scheduled on the same machine with $j$ , then it still needs to wait for $j$ to complete storing its data.
Data dependencies: For each task $j \in P R E D_{d a t a} (i)$ , task $i$ cannot start until $j$ finishes storing its data. That is, it must satisfy $a_{i} \geq d_{j}$ .
Affinity of machines: Each task $i$ has a list of affinitive machines. denoted by $A_{i}$ , that it can be assigned to. Each task must be scheduled on one of its affinitive machines. That is, it must satisfy $y_{i} \in A_{i}$ .
Disk Capacity: Each disk $k$ has a storage capacity of $c a p a c i t y (k)$ . The total size of all data stored on the same disk cannot exceed that disk's capacity. That is, for each disk $k$ it must satisfy $\sum_{i : z_{i} = k} d a t a (i) \leq c a p a c i t y (k)$ .

Given a feasible schedule, the makespan of this schedule is the finishing time of the last task. In other words, $m a k e s p a n = max_{1 \leq i \leq ℓ} d_{i}$

Objectives

Your objective is to find a feasible task schedule that minimizes the makespan.

Input

The first line contains an integer $ℓ$ representing the number of tasks. In the following $ℓ$ lines, each line represents a task: The first three integers represent task id $i$ , task size $s i z e (i)$ , and the size of its output data $d a t a (i)$ . All task ids are distinctive integers between $1$ and $ℓ$ . The fourth integer $k$ is the number of affinitive machines for this task, and the remaining $k$ integers are the IDs of these affinitive machines and they are distinctive as well.

Next, a line contains an integer $n$ representing the number of machines. In the following $n$ lines, each line has two integers representing machine id $j$ and its power $p o w e r (j)$ . All machine ids are distinctive integers between $1$ and $n$ .

Next again, a line contains an integer $m$ representing the number of disks. In the following $m$ lines, each line has three integers representing disk id $k$ , its speed $s p e e d (k)$ , and capacity $c a p a c i t y (k)$ . All disk ids are distinctive integers between $1$ and $k$ .

The remaining part describes the two types of dependencies between tasks.

The first line of this part is an integer $N$ representing the number of data dependencies. In the following $N$ lines, each line contains two integers $i, j$ , which means task $j$ is data-dependent on task $i$ .

Next, there is a new line with an integer $M$ representing the number of task dependencies. In the following $M$ lines, each line contains two integers $i, j$ , which means task $j$ is task-dependent on task $i$ .

All numbers on the same line are separated by whitespaces.

Variable constraints

number of tasks: $6 \leq l \leq 10000$
number of machines: $1 \leq n \leq 50$
number of disks: $10 \leq m \leq 30$
size of task: $10 \leq s i z e (i) \leq 600$
size of outputted data: $1 \leq d a t a (i) \leq 20$
computing power of machines: $1 \leq p o w e r (i) \leq 20$
capacity of disks: $1 \leq c a p a c i t y \leq 1050000$
rate of disks: $1 \leq s p e e d \leq 20$
all the ids are consequent and starts from one.
the dependence graph which includes both two types of dependence edge is acyclic.

Output

You should output $ℓ$ lines describing the task schedule for all tasks. The $i$ th line should contain four integers $i, x_{i}, y_{i}, z_{i}$ , separated by whitespaces, meaning that task $i$ will start at time $x_{i}$ on machine $y_{i}$ and store its data on disk $z_{i}$ .

Scoring

For each test, the score is your makespan divided by the lower bound, where the lower bound is computed via

where,

o u t d e g r e e (i)

means the number of edges that go out from task

i

in DAG (In other words, it is the number of tasks that depends on task

i

We guarantee that all the cases have feasible solutions since there is a disk of rate $1$ with a capacity greater or equal to the sum of $s i z e (i)$ over all tasks $i$ . If you output an invalid solution, you get zero. Here, makespan stands for the latest end-time of all tasks when the earliest start time of all tasks is zero. The final score is the summation of all tests' scores.

Note

At time $0$ , $J_{1}$ is scheduled on $M_{2}$ and stores its output to $D_{2}$ , it costs $40 / 2$ and $6 / 2$ unit time in executing phase and writing phase, respectively. So, it finishes at time $23$ . Note that the time cost in the reading phase is zero since it does not data-dependent on any task.

At time $23$ , $J_{2}$ is scheduled on $M_{1}$ and reads its data from $D_{2}$ and stores its output to $D_{1}$ . Finally, It costs $6 / 2 + 20 + 6 = 29$ unit time and it finishes at time $52$ .

At the same time, $J_{3}$ is scheduled on $M_{2}$ and reads its data from $D_{2}$ . And it costs $6 / 2 + 96 / 2 + 10 / 2$ and finishes at $79$ . Note though $J_{3}$ and $J_{1}$ are running on the same machine, $J_{3}$ still needs to read the $J_{1}$ 's outputted data from disk.

At time $52$ , $J_{4}$ starts on $M_{1}$ and reads data of $J_{1}$ and $J_{2}$ , which costs $6 / 2 + 6 / 1$ unit time. So the final time cost is $(3 + 6) + 20 + 6$ and finishes at time $87$ .

At time $79$ , $J_{5}$ starts on $M_{2}$ and costs $10 / 2 + 6$ unit time to read data and another $60 / 2$ unit time to execute, and does not output any data. So, it finishes at time $120$ .

At time $87$ , $J_{6}$ starts on $M_{1}$ and finishes at time $118$ .

So, the final makespan is $120$ and it is a feasible solution. Let's verify the feasibility of the above solution.

Dependency constraints: In the solution, $J_{2}$ , $J_{3}$ start after $J_{1}$ and $J_{4}$ starts after $J_{1}$ , $J_{2}$ , and so on.

The affinity of machines: In this case, the only restriction of affinity is that task 6 can only run in machine 1. And in this solution, it DOES.

No preemption: it is obvious.

The quota of disk: (1) The tasks storing data in disk 1 are $J_{2}, J_{4}, J_{5}$ , their total data size is $6 + 6 + 0 < 30$ ; (2) The tasks storing data in disk 2 are $J_{1}, J_{3}, J_{6}$ , their total data size is $6 + 10 < 17$ .

GUPTA MECHANICAL

Thursday, 1 December 2022

[Solution] Task Scheduling and Data Assignment Codeforces Solution

No comments:

Post a Comment

Subscribe Here

Translate

About Me

RECENT

Popular Posts

Comments

Follow Us

Tags

About Me

Pages

Tags

Contact Form