[Solution] Task Scheduling and Data Assignment Codeforces Solution
With the rapid development of chip hardware computing capabilities, hardware architectures are evolving towards more complex multi-core and heterogeneous architectures. In this problem, your goal is to design an efficient task scheduling algorithm together with a memory allocation policy for the software-hardware co-optimization problem that exists widely in operating systems, distributed systems, and compilers.
Description
There are tasks that need to be scheduled on machines with varying processing powers. There are also disks with varying transmission speeds and storage capacities for storing the data generated by each task after its execution.
Task schedule. A task schedule is an assignment of tasks to machines and disks. More specifically, for each task , a task schedule specifies three parameters , meaning that starts at time on machine and stores its data to disk . The earliest starting time of any task is 0.
Task Running Process Here are more details about the running process of a task when the task schedule is fixed.
- The processing power of machine is . Each task has a size of , which means it needs units of time to execute when it is run on machine . Here, means rounding up.
- Each disk has a read/write transmission rate of . When a task finishes its execution, it will generate a piece of data of size that needs to be stored on some disk. This storing step will take units of time if this data is stored on disk . Here, the value of time also needs to be rounded up.
- Before a task starts its execution, it needs to read the output data of some other tasks. We let denote the set of tasks whose output data are needed by task . This will take a total time of and this must be done before task starts its execution. Note that when calculating the total time, each term needs to be rounded up individually and then summed.
Summarizing the above process, when task starts, it goes through three phases: reading data from other tasks, executing the task, and storing data. For convenience, we denote the starting time of the three phases as and the finishing time of the last phase as . These values are determined as follows.
- .
- .
- .
In addition, a feasible task schedule should satisfy the following requirements:
- No preemption: Each machine executes only one task or transmits one data at a time, and no interruption is allowed during its lifecycle. In other words, for each pair of tasks and such that , the two intervals and cannot have any overlaps.
- Task dependencies: Each task has a list of task-dependent tasks, denoted by . For each task , task cannot start until finishes its execution. That is, it must satisfy . Note that if is scheduled on the same machine with , then it still needs to wait for to complete storing its data.
- Data dependencies: For each task , task cannot start until finishes storing its data. That is, it must satisfy .
- Affinity of machines: Each task has a list of affinitive machines. denoted by , that it can be assigned to. Each task must be scheduled on one of its affinitive machines. That is, it must satisfy .
- Disk Capacity: Each disk has a storage capacity of . The total size of all data stored on the same disk cannot exceed that disk's capacity. That is, for each disk it must satisfy .
Given a feasible schedule, the makespan of this schedule is the finishing time of the last task. In other words,
Objectives
Your objective is to find a feasible task schedule that minimizes the makespan.
The first line contains an integer representing the number of tasks. In the following lines, each line represents a task: The first three integers represent task id , task size , and the size of its output data . All task ids are distinctive integers between and . The fourth integer is the number of affinitive machines for this task, and the remaining integers are the IDs of these affinitive machines and they are distinctive as well.
Next, a line contains an integer representing the number of machines. In the following lines, each line has two integers representing machine id and its power . All machine ids are distinctive integers between and .
Next again, a line contains an integer representing the number of disks. In the following lines, each line has three integers representing disk id , its speed , and capacity . All disk ids are distinctive integers between and .
The remaining part describes the two types of dependencies between tasks.
The first line of this part is an integer representing the number of data dependencies. In the following lines, each line contains two integers , which means task is data-dependent on task .
Next, there is a new line with an integer representing the number of task dependencies. In the following lines, each line contains two integers , which means task is task-dependent on task .
All numbers on the same line are separated by whitespaces.
Variable constraints
- number of tasks:
- number of machines:
- number of disks:
- size of task:
- size of outputted data:
- computing power of machines:
- capacity of disks:
- rate of disks:
- all the ids are consequent and starts from one.
- the dependence graph which includes both two types of dependence edge is acyclic.
You should output lines describing the task schedule for all tasks. The th line should contain four integers , separated by whitespaces, meaning that task will start at time on machine and store its data on disk .
For each test, the score is your makespan divided by the lower bound, where the lower bound is computed via
where,
means the number of edges that go out from task in DAG (In other words, it is the number of tasks that depends on task ).We guarantee that all the cases have feasible solutions since there is a disk of rate with a capacity greater or equal to the sum of over all tasks . If you output an invalid solution, you get zero. Here, makespan stands for the latest end-time of all tasks when the earliest start time of all tasks is zero. The final score is the summation of all tests' scores.
At time , is scheduled on and stores its output to , it costs and unit time in executing phase and writing phase, respectively. So, it finishes at time . Note that the time cost in the reading phase is zero since it does not data-dependent on any task.
At time , is scheduled on and reads its data from and stores its output to . Finally, It costs unit time and it finishes at time .
At the same time, is scheduled on and reads its data from . And it costs and finishes at . Note though and are running on the same machine, still needs to read the 's outputted data from disk.
At time , starts on and reads data of and , which costs unit time. So the final time cost is and finishes at time .
At time , starts on and costs unit time to read data and another unit time to execute, and does not output any data. So, it finishes at time .
At time , starts on and finishes at time .
So, the final makespan is and it is a feasible solution. Let's verify the feasibility of the above solution.
Dependency constraints: In the solution, , start after and starts after ,, and so on.
The affinity of machines: In this case, the only restriction of affinity is that task 6 can only run in machine 1. And in this solution, it DOES.
No preemption: it is obvious.
The quota of disk: (1) The tasks storing data in disk 1 are , their total data size is ; (2) The tasks storing data in disk 2 are , their total data size is .
No comments:
Post a Comment