2.1 A Taxonomy of Parallel Computing
λ³λ ¬ μ»΄ν¨ν μ λΆλ₯
Beowulf-class systems β ensembles of PCs (e.g., Intel Pentium 4) integrated with commercial COTS local area networks (e.g., Fast Ethernet) or system area networks (e.g., Myrinet) and run widely available low-cost or no-cost software for managing system resources and coordinating parallel execution. Such systems exhibit exceptional price/performance for many applications.
μμ© COTS(κΈ°μ±) LAN (μ : κ³ μ μ΄λλ·) λλ μμ€ν μμ λ€νΈμν¬ (μ : Myrinet)μ ν΅ν© λ PC μμλΈ (μ : Intel Pentium 4) λ° μμ€ν 리μμ€ κ΄λ¦¬ λ° μ‘°μ μ μν΄ λ리 μ¬μ©λλ μ λΉμ© λλ λ¬΄λ£ μννΈμ¨μ΄ μ€ν λ³λ ¬ μ€ν. μ΄λ¬ν μμ€ν μ λ§μ μ ν리μΌμ΄μ μμ νμν κ°μ±λΉλ₯Ό 보μ¬μ€λλ€.
Superclusters β clusters of clusters, still within a local area such as a shared machine room or in separate buildings on the same industrial or academic campus, usually integrated by the institutionβs infrastructure backbone wide area netork. Although usually within the same internet domain, the clusters may be under separate ownership and administrative responsibilities. Nonetheless, organizations are striving to determine ways to enjoy the potential opportunities of partnering multiple local clusters to realize very large scale computing at least part of the time.
μΌλ°μ μΌλ‘ κΈ°κ΄μ μΈνλΌ λ°±λ³Έ κ΄μ λ€νΈμν¬λ‘ ν΅ν©λλ λμΌν μ°μ λλ νμ μΊ νΌμ€μ λ³λ 건물 λλ 곡μ λ¨Έμ λ£Έκ³Ό κ°μ λ‘컬 μμ λ΄μ μλ ν΄λ¬μ€ν° μ§λ¨μ ν΄λ¬μ€ν° μ§λ¨μ λλ€. μΌλ°μ μΌλ‘ λμΌν μΈν°λ· λλ©μΈ λ΄μ μμ§λ§ κ° ν΄λ¬μ€ν°λ€μ λ³λμ μμ κΆ λ° κ΄λ¦¬ μ± μμ κ°μ§ μ μμ΅λλ€. κ·ΈλΌμλ λΆκ΅¬νκ³ , μ‘°μ§μ μ μ΄λ μΌλΆ μκ° λμ λκ·λͺ¨ μ»΄ν¨ν μ μ€ννκΈ° μν΄ μ¬λ¬ λ‘컬 ν΄λ¬μ€ν°μ ννΈλ κ΄κ³λ₯Ό λ§Ίμ μμλ μ μ¬μ κΈ°νλ₯Ό λ릴 μμλ λ°©λ²μ κ²°μ νκΈ° μν΄ λ Έλ ₯νκ³ μμ΅λλ€.
3.3.2.6 Distribute Jobs Tasks Across Allocated Resources
With the resources selected, Maui then maps job tasks to the actual resources.
This distribution of tasks is typically based on simple task distribution algorithms such as round-robin or max blocking, but can also incorporate parallel language library (i.e., MPI, PVM, etc) specific patterns used to minimize interprocesses communication overhead.
3.3.2.6 ν λΉ λ 리μμ€μ μμ μμ λ°°ν¬
μ νν 리μμ€λ‘ Mauiλ μμ
νμ€ν¬λ₯Ό μ€μ 리μμ€μ 맀νν©λλ€.
μ΄λ¬ν μμ
λ°°ν¬λ μΌλ°μ μΌλ‘ λΌμ΄λ λ‘λΉ λλ μ΅λ μ°¨λ¨κ³Ό κ°μ κ°λ¨ν μμ
λ°°ν¬ μκ³ λ¦¬μ¦μ κΈ°λ°μΌλ‘νμ§λ§ νλ‘μΈμ€ κ° ν΅μ μ€λ² ν€λλ₯Ό μ΅μννλ λ° μ¬μ©λλ λ³λ ¬ μΈμ΄ λΌμ΄λΈλ¬λ¦¬ (μ : MPI, PVM λ±) νΉμ ν¨ν΄μ ν΅ν© ν μλ μμ΅λλ€.
8.3 Node Set Overview
While backfill improves the scheduler’s performance, this is only half the battle.
The efficiency of a cluster, in terms of actual work accomplished, is a function of both scheduling performance and individual job efficiency.
In many clusters, job efficiency can vary from node to node as well as with the node mix allocated.
Most parallel jobs written in popular languages such as MPI or PVM do not internally load balance their workload and thus run only as fast as the slowest node allocated.
Consequently, these jobs run most effectively on homogeneous sets of nodes. However, while many clusters start out as homogeneous, they quickly evolve as new generations of compute nodes are integrated into the system.
Research has shown that this integration, while improving scheduling performance due to increased scheduler selection, can actually decrease average job efficiency.
8.3 λ Έλ μΈνΈ κ°μ
backfill μ μ€μΌμ€λ¬μ μ±λ₯μ ν₯μ μν€μ§λ§ μ΄κ²μ μ λ°μ λΆκ³Όν©λλ€.
μ€μ μν ν μμ
μΈ‘λ©΄μμ ν΄λ¬μ€ν°μ ν¨μ¨μ±μ μΌμ μ±λ₯κ³Ό κ°λ³ μμ
ν¨μ¨μ±μ ν¨μμ
λλ€.
λ§μ ν΄λ¬μ€ν°μμ μμ
ν¨μ¨μ±μ ν λΉ λ λ
Έλ μ‘°ν©λΏ μλλΌ λ
Έλλ§λ€ λ€λ₯Ό μ μμ΅λλ€.
MPI λλ PVMκ³Ό κ°μ΄ λ리 μ¬μ©λλ μΈμ΄λ‘ μμ±λ λλΆλΆμ λ³λ ¬ μμ
μ λ΄λΆμ μΌλ‘ μμ
λΆνλ₯Ό λΆμ°νμ§ μμΌλ―λ‘ ν λΉ λ κ°μ₯ λλ¦° λ
Έλλ§νΌ λΉ λ₯΄κ² μ€νλ©λλ€.
κ²°κ³Όμ μΌλ‘ μ΄λ¬ν μμ
μ λμ’
λ
Έλ μ§ν©μμ κ°μ₯ ν¨κ³Όμ μΌλ‘ μ€νλ©λλ€. κ·Έλ¬λ λ§μ ν΄λ¬μ€ν°κ° λμ’
μΌλ‘ μμνμ§λ§ μλ‘μ΄ μΈλμ μ»΄ν¨ν
λ
Έλκ° μμ€ν
μ ν΅ν©λ¨μ λ°λΌ λΉ λ₯΄κ² λ°μ ν©λλ€.
μ°κ΅¬μ λ°λ₯΄λ©΄ μ΄λ¬ν ν΅ν©μ μ€μΌμ€λ¬ μ ν μ¦κ°λ‘ μΈν΄ μ€μΌμ€λ§ μ±λ₯μ ν₯μμν€λ©΄μ μ€μ λ‘ νκ· μμ
ν¨μ¨μ±μ κ°μμν¬ μ μμ΅λλ€.
The Maui Scheduler can be thought of as a policy engine which allows sites control over when, where, and how resources such as processors, memory, and disk are allocated to jobs.
In addition to this control, it also provides mechanisms which help to intelligently optimize the use of these resources, monitor system performance, help diagnose problems, and generally manage the system.
Maui Schedulerλ νλ‘μΈμ, λ©λͺ¨λ¦¬ λ° λμ€ν¬μ κ°μ 리μμ€κ° μμ
μ ν λΉλλμκΈ°, μμΉ λ° λ°©λ²μ μ¬μ΄νΈμμ μ μ΄ ν μμλ μ μ±
μμ§μΌλ‘ μκ°ν μ μμ΅λλ€.
μ΄ μ μ΄ μΈμλ μ΄λ¬ν 리μμ€μ μ¬μ©μ μ§λ₯μ μΌλ‘ μ΅μ ννκ³ , μμ€ν
μ±λ₯μ λͺ¨λν°λ§νκ³ , λ¬Έμ λ₯Ό μ§λ¨νκ³ , μΌλ°μ μΌλ‘ μμ€ν
μ κ΄λ¦¬νλ λ° λμμ΄λλ λ©μ»€λμ¦μ μ 곡ν©λλ€.
Running multi-site MPI jobs with Maui and MPICH Two things need to happen in order to run multi-site MPI jobs:
Nodes must be reserved and jobs must be run in a coordinated manner.
Jobs must be started such that they are set to communicate with with each other using MPI calls.
The meta scheduling interface to the Maui Scheduler can be used to reserve nodes and start jobs across distributed sites.
MPICH can be used to enable separate jobs to communicate with each other using MPI.
Flow:
A job is submitted to the meta scheduler.
The meta scheduler communicates with separate Maui schedulers to determine node availability.
The meta scheduler starts an individual job at each site. Each job consists solely of an MPICH ch_p4 server process running on each of the job’s nodes.
The meta scheduler creates an MPICH proc group file, with hostname and executable information for each site. An MPICH job is started using the proc group file.
One MPICH process runs on the submitting host.
This process communicates through ch_p4 servers to start MPICH processes on all nodes specified.
Maui λ° MPICHλ‘ λ€μ€ μ¬μ΄νΈ MPI μμ μ€ν λ€μ€ μ¬μ΄νΈ MPI μμ μ μ€ννλ €λ©΄ λ€μ λ κ°μ§κ° μνλμ΄μΌν©λλ€.
λ
Έλλ μμ½λμ΄μΌνλ©° μμ
μ μ‘°μ λ λ°©μμΌλ‘ μ€νλμ΄μΌν©λλ€.
μμ
μ MPI νΈμΆμ μ¬μ©νμ¬ μλ‘ ν΅μ νλλ‘ μ€μ λλλ‘ μμλμ΄μΌν©λλ€.
Maui Schedulerμ λν λ©ν μ€μΌμ€λ§ μΈν°νμ΄μ€λ λΆμ° λ μ¬μ΄νΈμμ λ
Έλλ₯Ό μμ½νκ³ μμ
μ μμνλ λ° μ¬μ©ν μ μμ΅λλ€.
MPICHλ₯Ό μ¬μ©νλ©΄ λ³λμ μμ
μ΄ MPIλ₯Ό μ¬μ©νμ¬ μλ‘ ν΅μ ν μ μμ΅λλ€.
νλ¦:
μμ
μ΄ λ©ν μ€μΌμ€λ¬μ μ μΆλ©λλ€.
λ©ν μ€μΌμ€λ¬λ λ³λμ Maui μ€μΌμ€λ¬μ ν΅μ νμ¬ λ
Έλ κ°μ©μ±μ κ²°μ ν©λλ€.
λ©ν μ€μΌμ€λ¬λ κ° μ¬μ΄νΈμμ κ°λ³ μμ
μ μμν©λλ€. κ° μμ
μ κ° μμ
μ λ
Έλμμ μ€νλλ MPICH ch_p4 μλ² νλ‘μΈμ€λ‘λ§ κ΅¬μ±λ©λλ€.
λ©ν μ€μΌμ€λ¬λ κ° μ¬μ΄νΈμ λν νΈμ€νΈ μ΄λ¦ λ° μ€ν κ°λ₯ μ 보μ ν¨κ» MPICH proc κ·Έλ£Ή νμΌμ μμ±ν©λλ€. MPICH μμ
μ proc κ·Έλ£Ή νμΌμ μ¬μ©νμ¬ μμλ©λλ€. νλμ MPICH νλ‘μΈμ€κ° μ μΆ νΈμ€νΈμμ μ€νλ©λλ€. μ΄ νλ‘μΈμ€λ ch_p4 μλ²λ₯Ό ν΅ν΄ ν΅μ νμ¬ μ§μ λ λͺ¨λ λ
Έλμμ MPICH νλ‘μΈμ€λ₯Ό μμν©λλ€.
Setup:
A user account must be created at each site. The job executable and data must be created at each site.
MPICH must be installed at each site, and on the submitting host.
It should be configured to use the ch_p4 device. The executable path must be added to the ~/.server_apps file for the user at each site.
The submitting host and user must be added to the .rhosts file for each site.
A meta job specification, detailing username and executable name for each site should be created.
μ€μ :
κ° μ¬μ΄νΈμμ μ¬μ©μ κ³μ μ λ§λ€μ΄μΌν©λλ€. μμ
μ€ν νμΌκ³Ό λ°μ΄ν°λ κ° μ¬μ΄νΈμμ μμ±λμ΄μΌν©λλ€.
MPICHλ κ° μ¬μ΄νΈμ μ μΆ νΈμ€νΈμ μ€μΉν΄μΌν©λλ€.
ch_p4 μ₯μΉλ₯Ό μ¬μ©νλλ‘ κ΅¬μ±ν΄μΌν©λλ€. μ€ν κ²½λ‘λ κ° μ¬μ΄νΈμμ μ¬μ©μμ ~ / .server_apps νμΌμ μΆκ°λμ΄μΌν©λλ€.
μ μΆ νΈμ€νΈ λ° μ¬μ©μλ κ° μ¬μ΄νΈμ .rhosts νμΌμ μΆκ°λμ΄μΌν©λλ€.
κ° μ¬μ΄νΈμ μ¬μ©μ μ΄λ¦κ³Ό μ€ν νμΌ μ΄λ¦μ μμΈν μ€λͺ
νλ λ©ν μμ
μ¬μμ λ§λ€μ΄μΌν©λλ€.
TORQUE for job submission
Maui for job scheduling
1 Overview
1.1 Queue Structure
1.2 Node-Queue Matrix
2 Job Submission
2.1 Example Interactive Job
2.2 Example Script
3 Job Control
4 Examples
4.1 Serpent2
4.2 Nuclear data
4.3 MCNPX
4.4 MCNP5
4.4.1 MPI Only
4.4.2 OpenMP Only
4.4.3 MPI and OpenMP
4.5 MCNP6.1
4.6 MCNP6.2
4.7 MCNP: Delete unneeded runtapes
4.8 Scale
4.9 Advantg
5 FAQ
5.1 How can I setup unique temporary directory for my job?
5.2 I’m not getting error/output files!
5.3 How can I request different CPU counts on different nodes/How can I use multiple queues?
5.4 How can I submit a job to a specific node?
5.5 How can I ensure that I have enough local disk space (in /tmp)?
5.6 I messed up my node allocation request! How do I fix it?
5.7 Where are my jobs?
5.8 Admin stuff