Slovenska bibliografija

NUK

napredno iskanje

domov > napredno iskanje > rezultati > izpis

Zapis SUTRS

VRSTA GRADIVA	analitična raven (sestavni del), tekstovno gradivo, tiskano, 1.01 - izvirni znanstveni članek
DRŽAVA IZIDA	Slovenija
LETO IZIDA	2003
JEZIK BESEDILA/IZVIRNIKA	slovenski
PISAVA	latinica
AVTOR	Kvas, Aleksander - avtor
ODGOVORNOST	Ojsteršek, Milan - avtor
NASLOV	Izboljšava algoritmov statičnega razvrščanja znestrogim proženjem vozlišč programskega grafa
V PUBLIKACIJI	Elektrotehniški vestnik. - ISSN 0013-5852. -ǂLetn. ǂ70, ǂšt. ǂ3 (2003), str. 143-148.
KRATKA VSEBINA	Za doseganje visoke zmogljivosti paralelnega računalnika je zelo pomembno učinkovito razvrščanje paralelnih programov naprocesne enote. Najpogosteje uporabljana oblika, v katero pretvorimo programe, ki se bodo izvajali na večprocesorskem računalniku, so programskigrafi. Vozlišča programskih grafov razvrstimo na procesne enote paralelnegaračunalnika s pomočjo algoritma za razvrščanje. Opazimo lahko, da vsa vozlišča ne potrebujejo vseh vhodnih operandov ob začetku njihovega izvajanja. Zato lahko takšna vozlišča razvrstimo tako, da izvajanje začnejonekoliko prej. Takšno proženje vozlišč imenujemo nestrogo proženje. Pri nestrogem proženju dobijo vozlišča manjkajoče operande kasneje, med izvajanjem. V članku so ovrednoteni izboljšani algoritmi CPM, VL in DSH. Glavna izboljšava je uporaba nestrogega proženja grobo razčlenjenih vozliščprogramskega grafa. Ugotovili smo, da je bil pri vseh izboljšanih algoritmih dosežen krajši ali enak čas izvajanja, in sicer od 0% do 68%. //Obtaining maximum performance from parallel distributed memory machines with a large number of processors depends on partitioning parallel programsinto modules and scheduling these modules for the shortest possible execution time. The scheduling problem is known to be NP-hard [2] and heuristic algorithms have been proposed to obtain optimal and sub optimal solutions [1 ,3, 10, 11]. In this paper, we present three efficient static algorithms for scheduling modules to the processing units of a parallel computer system. They are based on CPM [14], VL [5] and DSH [6,1] scheduling algorithms. We improved them with nonstrict trigerring of coarsegrained program graph nodes. The partitioning algorithm [12,9] partitions an application into tasks with an appropriate grain size and represents them in the form of a program graph (PG). If the sum of communication delays for transferring operands between nodes is greater than the sum of their execution times, it is possible to achieve faster execution time by joining them in a larger program grain. It is interesting that joining small grains into larger ones enables the processing element to start execution of a new grain without all input operands. We call this kind of execution a nonstrict triggering of program graph nodes. An example of a nonstrictly triggered node is shown in Fig. 1. We introduced two new attributes for each communication arc. ▫$C_s$▫ represents the operand's relative sending time. It defines the time from the moment when the operandis sent to the end of a node execution. ▫$C_r$▫ represents the operand's relative receiving time. It defines the time from the beginning of a program graph node execution to the moment when this operand must be present to avoid delaying the node's execution. We present our model of a macro dataflow computer (MMDC, Fig. 2) which supports nonstrict triggering of coarse grained program graph nodes. The MMDC is a loosely coupled multi processor with processing elements (PE) and an interconnection network. Every PE (Fig. 3) contains an input queue, local multiport memory, memory processor, scheduling processor and node execution processor. Improved scheduling algorithms were evaluated by simulations of the program graphs execution on thc MMDC. The original CPM algorithm computes the priority weight (the length of the exiting path) of each node without using any communication delay time for transferring operands between nodes. We defined a new priority weight. It is computed by using the execution time of a node and communication delay times ▫$C_s$▫ and ▫$C_r$▫ of its input and output operands. The length of the exiting path for node ▫$n_i$▫ ▫$(CP(n_i$▫)) is computed according to Eq. (1). We improved the VL algorithm optimisation phase that rearranges nodes by considering communication delay times ▫$C_s$▫ and ▫$C_R$▫. Original equations [5] that are used to compare the communication and execution time between nodes contain simple communication time delays. If we use nonstrict triggering, we can change the communication delays according to Eq. (2). In equations we then use ▫$C_u$▫ instead of C. The program graph and Gantt charts produced by the original and improved algorithms were used as simulation inputs. A comparison of ratios between the total execution times of the program graph with the usage of the original CPM algorithm and improved scheduling algorithms is depicted in Fig. 4. It is shown that all the improved scheduling algorithms outperform the original CPM algorithm. Improvements range from 35% to 68%. From Fig. 4 we can see that the improved DSH algorithm gives the best improvement but has the highest computational complexity (▫$O(n^6▫$)). Execution times ratios for other twoalgorithms are shown in Figs. 5 and 6. We can see that all the improved algorithms generate shorter or equal execution times than original algorithms (▫$T_{cpmo}/T_{cpmi}$▫, ▫$T_{vlo}/T_{vli}$▫, ▫$T_{dsh}/T_{dshi}$▫). The VL algorithm gave better results than the improved CPM algorithm, but it has a higher computational complexity than the CPM algorithm. We observed that improvements of our algorithms are better if the level of granularity is lower than one. If the level of granularity is greater than one, then improvements are only few percents. In that case we use the improved CPM algorithm because it has the lowest computation complexity ▫$(O(n^2$▫).
OSTALI NASLOVI	Improvement of efficiency of static program graph scheduling with nonstrict triggering
PREDMETNE OZNAKE	// paralelni računalniki // večprocesorski računalniški sistemi // al

izvedba, lastnina in pravice: NUK 2010