P-GRADE Portal
An introduction without tears
Content
0 Preface
i.
Release history
Release notes to Version 2.5
New features
- Version 2.5 reached the
long expected major advance in the evaluation of the P-GRADE Portal by
supporting the true exploiting of the Grid facilitating the automated
mass execution of workflows in the framework of Parameter_Study. A handy method for
every exploring scientist who wants to use the "Define one time and use
everywhere" principle to accelerate research
investigating the effect of results depending on a
large domain of predefined input parameters. Several new concepts has
been introduced as PS_Input port, PS_Output_port, e-Workflow and the new user convenience jobs as the Generator and the Collector.
- The workflow submission become
even more comfortable and flexible by the two new general
features :
- The user can suspend (and
resume) the execution of a workflow. An often required feature when
-for example - the selected Grid resource seems two slow to
perform the needed job, and rescue would be advisable.
- From now on the user needs
not to wait to get partial
results until the last job of his/her workflow terminates.
The results of terminated jobs can be downloaded immediately.
Known bugs
- In some cases the statistics of the PS-Workflow - see the
section Statistics of Figure 14.22 may render wrong information
about the number of e-Workflows in Finished
and Init states: Instead of
being regarded as Finished
some workflows might be erroneously added to the sum of the
e-Workflows being in Init
state. However the this error has no influence on the execution and on
the expected result of the PS-Workflow.
- We have experienced randomly occurring conflicts of
the P-GRADE Portal Server with Internet Explorer 7. In the
observed cases the P-GRADE Portal displays an exception but no data
loss occurs. The exception can be removed by clicking on some
other button in the user interface. We recommend the using of
Internet Explorer 6 or Firefox
Release
notes to Version 2.4.1
- Possibility to store the data of the
end users in reliable databases: It has turned out that the default
hibernate function (HSQL ) supported by the Gridsphere is error prone,
and in some cases the logging data of the
users have been lost after the restarting of the Portal therefore the
Portal administrator is supported from this Release on to define and
set up an external Data Base for the storing of the log information.
- The data transfer load of the
information system has been substantially reduced. The BDII server will
be asked for data on user request.
- A new job submission strategy has been
introduced observing the current load of
the portal server and therefore ensuring a
tolerable response time for the user
- The own data resource handling of the
portal server has been reconsidered. In connection to this the
redundant storage of workflow result has been deleted, and
a more accurate quota handling implemented.
- A bug has
been fixed occurring at the concurrent up-and downloading
of the proxy certificates. This
failure occurring typically at conducted practices when many users
executes the same command within a short time.
- Automatic VOMS extension of
certificates has been introduced
- Jobs can be submitted to VO-s via the
Glite infrastructure as well
Release
notes to Version 2.4:
New features and improvement of
services:
Revision of remote file handling:
User option for non automatic copy to
the worker node. ( See managed copy
)
Revision of rescue handling:
The new functionality includes all types of resources involving
the submissions to a Broker
Enhancement of verbosity level,
localization and accuracy in the forwarding of the eventual
errors occurring in the grid infrastructure
Protecting the Portal server by the
introduction of a changeable limit of jobs being
submitted and observed in one time.
Revision of MPI job
handling: A totally new middleware ensures -(and guaranties in
defined circumstances ) the success of submissions in case of MPI
jobs
Bug fix:
Total revision of low level script
layer
Solving the memory leak
problem of the visualization
Known bugs:
B.1
The Ldap server sometimes delivers
such hosts for the information
system which reference a common cluster with different
hostnames within a given site. As the information
system has no additional knowledge to unify these clusters
the aggregated data gained from the component CE-s sometimes show
the multiple of the real values.
B.2
The sites of the selected VO in the
overview window of the
Information System display even those jobs not belonging to the
selected VO.
B.3
In case of several existing
Workflow Editor Windows on the
users desktop the "old" windows tend to become
zombies ( insensitive to user commands and loosing connection to
the server )
Release notes to Version 2.3:
New features:
Extended -user individual - quota
handling
Full archive facility for generated workflows (See chapter XIII_Workflow_archive service)
Release notes to Version 2.2:
New features:
Separation of external and
internal file name references in the input/output ports of
the jobs ( See No
more restriction on file references)
Connecting the Portal to the EGEE Grid and exploiting
in this case the Broker service of the EGEE Grid for the jobs of a
workflow directed to this grid.
(See chapter X_Connection_to_the_EGEE_Grids)
Fault tolerant behaviour of
workflows (See chapter XI_Rescuing_the_workflow).
Welcome menu to change the default settings of personal user data (
See chapter XII._Welcome_Menu)
Release
notes to Version 2.1:
New features:
This documentation
includes the new features of Version 2.1 highlighted in the
chapters VI_Multi-GRID_support , VII
Information System , VIII_Handling_of_remote_files,
and IX User quotas.
Deleted features:
The operations Copy and Paste of the Workflow
Editor considered as
unimportant and error prone have been deleted.
Bug fixes:
Edited workflows in transient
(incomplete) state can be stored in the PORTAL and retrieved for
further
editing.
ii. Introduction
The P-Grade
portals mission is
to give user friendly access to Grid resources which is a
technology in a rapid evolution.
This evolution is
"mapped" in the Portal which offers
general low level solutions for simple Globus Grids, and high
level solutions for the modern sophisticated Grids like the EGEE.
Throughout in this paper you will find descriptions of general low
level solutions and special considerations referring only to the
EGEE Grid.
As the P-Grade Portal is a Multi Grid portal, able to
connect Grids of
different kind a substantial effort has been taken to make
the functionalities of the Portal as orthogonal as possible.
However at some point the different aspects, conditions and
possibilities of the EGEE grid must have been mentioned mixed in
the general text.
I
The aim
The
P-GRADE
Portal offers a
comfortable method of handling
workflows from
any
connection point of the World Wide Web.
The P-GRADE Portal cover several Cluster and GRID related technologies
(GLOBUS2,GLOBUS3, Condor, CondorG, CondorDAGMAN, PVM, PMI )
to meet the need of the interested user which intends
to access
remote computational resources and
hides
the difficulties to activate them.
If
you are negligent about details or if you are a hardened
GLOBUS professional with bad
nerves you can get a head start with the chapter
IV where the usage of the
Portal is explained by
a comprehensive example.
A
Workflow is a bundle of
jobs you
want to edit, launch and observe from remote computer
resources where access rights have been granted
for you by so called
certificates.
Technically a
Workflow
is a directed acyclic graph (DAG) where each
node
has a
computing resource
and a program (
job ) to be launched on that resource;
further the edges of the graph are the ¨information pipelines¨
(streams) which connect the
input and
output points (
ports) of the individual
jobs.
(See
Figure_1)
Jobs
are executable
(sequential or parallel) applications represented by their
binary
code.
A
node
is a wrapper of a
job
containing the references
of its executable
code, to its I/O
connections and to its
resource.
(See
Figure 16
for an outer, and
Figure_18 for an intern
look of a
node )
The input connection points (we will use the term
port interchangeably with term
¨point¨ referring input and output connection points) of the
nodes
that are not connected to any other output point of any other
node
are representing the
input file-s
of the whole Workflow. The output points of the
nodes
not serving as inputs to any other
nodes
are representing the
output file-s
of the Workflow.
(Let
us note, that any internal pipeline (stream) can be marked as either
volatile or
permanent, in this later case the
data flowing through it will be regarded and recorded as an
output_file of the Workflow , see
Figure_24 )
The
task of the Workflow is to generate OUTPUT files from the
INPUT ones.
There are several subtle points to
emphasize:
- Two jobs
of separate nodes
can be executed in parallel if they are independent, i.e. one has no
role in the generation of the inputs for the other, AND all
inputs
(represented by the edges of the graph) are available (either have been
prepared by successful terminations of the preceding jobs
of the calling sequence defined by the directed graph or have been
defined as the original input files of the whole Workflow ). Simply
saying the structure of the directed graph determines the order of the
execution.
- You must separately assign computing resources
(from the list of available ones - see: Setting_the_resources ) to the
unique jobs,
so you can instruct the workflow to execute at the same time -
for example - the job "Budapest" in a site in Hungary and job
"Paris" in a site in France and to
gather
the outputs from them in a third job "London" in a site in Brittan.
- The resources
must be part of the Grid implementing the GLOBUS2 middleware and must
accept those credentials that the users of the P-GRADE
Portal have got.
Figure
1
II The Players of
the PORTAL infrastructure and
their identifications
1. The
Players
Now let us summarize the main actors participating in the handling of
the
workflows (
see
Figure_2 ).
1.1
The user's
desktop machine.
You need an Internet-connected desktop
machine with a
browser which is able
to access the WWW.
Please note that the
user works with
two different user interfaces
in a parallel way when he/she uses the
P-GRADE
Portal :
1.2
The Portal server
There is a remote
Portal
server which you can access by a
browser.
This server will be used to store your
code,
program data (first of all local
input_files),
the
graphs of the workflows, the list of the defined
resources
and the living short term (proxy)
certificates.
From here you can download your workflows to edit and also from here
can
you launch your workflows, and the results can be downloaded from here
as
well.
The data stored in the Portal server on behalf of a single user is
restricted by the
user quotas.
1.3 The
set of remote resources ( the GRID)
The most important part of the
infrastructure is the set of
remote
computational
resources
(generally of computer clusters) where the
jobs
may actually run.
The resources are subordinated under
Grids.
See more detailed in paragraph
Setting:
Defining the resources
Complex
Grids
may subdivide the set of users and the
resources
accessible by
them in
virtual
organizations(VO). However this mapping may be
overlapping:
a user and a
resource
may belong to more than one
virtual organization of the Grid.
In these grids the access
right
represented by a user
certificate may
be associated to one (or more) virtual
organization(s) and not to the whole Grid.
The EGEE Grid requires that the user be registered at one
VO.
There is a general rule that a user must belong to just one
VO.
The registration procedure and policy
is VO dependent and not covered in this paper.
Resources are abstractions and associated to
sites, which
performing the task of a given resource.
In the EGEE Grid a site may serve
resources belonging to different virtual organizations. These
resources are not only computational resources (see Computing
Element) but storage resources
(see Storage
Element) as well. The resources
of a site may be shared by different virtual organizations. However the
user access to a resource must be completed only by a valid VO
membership reference.
Basically the default resources are set by the
system administrator in a static way . These data may be
inherited by common users, and can be extended or changed at will.
Therefore these settings may not correspond to the actual state
of
the Grid . The portlet
Information_System
is used to gain actual data about he Grid.
For the time being there is only a restricted
facility in the P_GRADE Portal allowing the automatic setting
of resources found by the Information System. ( See the button
Load
resources from MDS2 in Figure 12b)
1.4 The
Certificate Server (MyProxy)
At last we mention an administrative
player, the Certificate server,
which is a repository of ¨certificates¨.
A
certificate is virtual identity card
granting access to a set of
resources.
Certificates must be
signed by a trusted
Certificate Authority
(CA).
To understand the importance of this last one here is a little notice:
These players (the
user, the
Portal_server, the
resources)
are connected through an unreliable channel - the Internet - therefore
they have to build secure connections to identify themselves and to
have
sufficient protection from unjustified access. These rather
complicated tasks are executed with the help of the certificates which
have an identity card feature - granting access to an expensive
resource
only up to a limited amount of time.
Your previously obtained
personal
certificate
containing your personalities (
distinguished_name,
your
public_key , the expiration
date of the public
key, the name of the CA) as not encoded
open data
must have been issued and
¨signed¨ by a trusted
Certificate_Authority to
identify you.
The
distinguished name contains the
family and given name, organisation unit, organisation of the user
introduced by standard prefixes ( CN=,OU=,O=).
The
public key
(PK) is the binary code by the help of a messages
which has been previously encoded by the
secret key
(SK) can be decoded :
message = decode ( PK, encode( SK , message ) )
Each agent (the users and the
Certificate_Authority)
publishes own
public_key and
hides own
secret_key)
Technically the
signature means
an
additional text to your certificate file containing the
open_data
processed in three steps:
The MyProxy Certificate Server stores
the
public key of the
Certificate_Authority
- in form of a special
certificate
- and therefore this server
is able to decipher your
public
key, vouch for you and represent you
against third person, what is in our case a remote
resource.
The representation happens by issuing a short term -so called-
proxy
certificate signed by the ¨MyProxy¨ Certificate server.
This
representation is needed because the
resources
do not accept directly the personal certificates.
This delegation method has four advantages against the direct use
of personal certificates:
- The user needs to execute the rather complicated process of
handling of the personal certificate very seldom:
generally the first time he/she uses the P-GRADE
Portal , and in the cases when the long term certificate expires,
or
he/she obtains other certificates granting permissions to additional resources. ( See more at Uploading_a_personal_certificate
)
- The proxy certificate is a short term one, mutually
securing the user and the administrator of the used resources
against consequences of infinite loops resulting from program
errors, and against unwanted tempering, intrusion of third party.
(See how to obtain
a short term proxy )
- The resources
need not know the CA-s of the personal certificates of the users just
the single CA of the proxy server. The duty of the maintaining of
the
list of acceptable CA-s can be delegated to the administrator of
the MyProxy server.
- The generation of a proxy certificate - issued by a secure
machine - does not need the "pass phrase" by which a person can prove
his or her
right to the possessed private key file needed during the personal certificate upload process.
2.
The identifications
To handle the agents of the P-GRADE Portal environment there are
four different kinds of identification
interesting from the viewpoint of the users:
2.1 User against
the Portal Server
2.2 User against
own "userkey" file
2.3 User
against the Certificate Server
The third kind of identification is
associated to a
certificate account
of your personal
certificate
on the certificate server MyProxy.
You use this identification if
2.4 User against the Virtual
Organization
III.
Overview of the operation of the PORTAL
A possible full operation cycle is the
following scenario:
0.
Preparation:
Notes:
- Users may receive an account
from the Administrator of the Portal. (See the separate
document "Short Administrator's
Manual of the P-GRADE Portal")
Since the Release 2 of the P-GRADE
portal the account includes a user quota.
- Before starting a P-GRADE
Portal session the user must have already compiled and
hopefully
tested all the executable jobs intended to
feed as a workflow. In addition the user also must
have
the
references to the necessary input files.
The
executables of the jobs must be in the
local file system of the user of the
portal. Here
is a very important point to be emphasized:
In the obsolete version of the P-GRADE Portal the jobs
of a workflow might contain only local file references with the
consequence that the Portal_server
must have copied and stored these files in the host where the Portal_server
runs. Now this limitation is lifted for the input files and the
user can define an input_file by
its proper URL when the referenced resource infrastructure
supports
this feature. This enables the late binding i.e. the Portal_server
will not need to store and to relay the files defined this way just
their addresses instead.
See more detailed in VIII_Handling_of_remote_files
0.2 Users of the EGEE grid
Beyond all what has been described the
in the previous point the EGEE users must be members of
virtual organisations.
There is a general rule that a user must belong to just
one
VO.
Generally a user certificate
is
required for the VO membership registration. This
certificate must be trusted by the
Grid the
VO belongs to.
SPECIAL WARNING to the users of the
virtual organisation Gilda, and to the users of other VO-s
requireing certificates with WOMS extension:
The EGGE Grid community is in
transition from using the simple Grid certificate to the usage of
Certificates including VO specific extensions (VOMS).
This enables a more reliable and secure access to more than one VO with
one certificate.
However the VOMS related extension of the MyProxy service has been not
finished up to now and the API interface to the My Proxy service is
error prone.
The intermediate consequence is that the
Certificate/upload functionality
(See
Figure 2 and
Section 1
Uploading_a_personal_certificate) can not be executed
within the Portal for the time being.
The suggested roundabout is the issuing the of the following command in
a UIF machine belonging to the given VO where the valid certificate of
the user is already inserted:
myproxy-init
--voms <VO> -s <Host_of_MyProxy_Server>
-p <Port_of_MyProxy_Server>
-l <Proposed_user_account_name_on_MyProxyServer>
Example:
./myproxy-init
--voms gilda -s grid001.ct.infn.it -p 7512 -l myGildaCert
where the "myproxy-init" must be the special updated command
(written by the gilda people not complaining because of the "--voms"
parameter )
Please
note that the command prompts:
- the first time for the passphrase of the certificate
(belonging to the secret key file, See chapter 2.2)
- the next time for a password of the new certificate
account <Proposed_user_account_name_on_MyProxyServer>
(See
chapter 2.3)
Figure
2
1. Uploading a
personal certificate
By ¨
Certificate/
upload¨ (
Figure_2
) the user sends a personal
certificate
to the
Certificate_server
and establishes a
certificate
account.
This step happens rather seldom, because the expiration time
of personal certificates is fairly long.
The uploading process is a rather
complicated transaction
started from
Figure 5 and explained
detailed in
Chapter IV 2.1.
The upload creates
a
certificate account
of the certificate, and the user must remember the name and
the password of it for the
subsequent proxy generations.
(See also
Chapter
II 2.3)
2. Receiving a
short term - proxy - certificate.
3.
Setting: Defining the
resources
Filling a simple table of the
Portal_server
the user can define the URL and the access way to the basic services of
the
remote
resources where the jobs may run
(See also
Figure 12a and Figure 12b) See detailed the steps of
definition at
resource definition.
If the selected
GRID has an
information system, the
information system may automatically explore the possible sites and
services. See:
VII_Information_System
The user need not bother with the definition (finding) of
resources and connecting them to the jobs in the special
case she or he has access to an EGEE like Grid, because in this
case the Broker service does this task. See it more detailed in
Chapter
X_Connection_to_the_EGEE_Grids_and_the
usage of the Broker
4.
Defining a workflow
The user can create new workflows,
and load and archive existing ones.
Please note that the creation process is done with a SEPARATE program,
in a different window (the
Workflow_Editor)
which is downloaded
from the
Portal_server
and runs on your desktop. This has two consequences:
- If you have created or modified a Workflow application you must
use the ¨EDITOR/save|upload¨
(Figure_2)
command to send it from the temporary storage of your desktop to its
final storage which is on the Portal_server.
Similarly you may use the EDITOR /
Upload (Figure_2)
command to send local input_files
, and the code
of the executables to the Portal_server.
There is only a slight difference between save and upload: save
refers only to the skeleton file defining the Workflow but upload
refers
to
the local files mentioned in a Workflow description. You need not to
bother
with this: The system prompts you to upload the referenced files, when
there is no copy of them on the Portal_server.
In the other direction you may use the EDITOR/ open command to download
an existing Workflow from the Portal_server
in order to see it or to modify it.
- As the Portal_server
(connected to your desktop by HTTP) and the involved
portlet Workflow/Workflow_Manager
works
asynchronously with your Workflow_Editor
running on your desktop, you need to hit the Refresh button (visible for example
in Figure 35) whenever you
want to see the current state of the Portal_server.
(For new users it may be a little confusing, that upon saving a new
workflow it will not appear in the list of executable workflows
until clicking on ¨Refresh¨ )
- A previously uploaded Workflow can be saved from the Portal_server
to the User's desktop
machine by the Workflow/Storage/Download
command.(Figure_2)
See chapter XIII_Workflow_archivation
for details.
- From an archive of User's
desktop machine the user can reload a workflow into the Portal_server
by the Workflow/Upload
command. (Figure_2)
See chapter XIII_Workflow_archive
for details.
There is an important and suggested
different way of defining the
workflows:
They can be
imported from
the P-GRADE development tool (
P-GRADE). This
way has some advantages against the manual editing the Workflow in the
P-GRADE
Portal :
- The code of the workflow (the component nodes
with all input_file
mappings and the graph of the workflow) can be developed and tested
with the debug and graphic monitoring facility of the P-GRADE.
- To facilitate graphic monitoring of the running
application the source code can be
instrumented automatically in
P-GRADE, a process to insert trace sending code conforming to the
common graphic monitoring standards of
P-GRADE and of the current P-GRADE
Portal.
In the Workflow EDITOR program you
use the menu item
Workflow/Import
workflow (See
4.1.1.2_Import_process)
to open the file browser "
Import
Workflow"
in which you can search for the needed workflow files distinguished by
the name extension ".wrk" .
To learn more about P-GRADE please consult with
P-GRADE
4.1 Short
introduction in the Workflow Editor
You have learned already, that the
Workflow_Editor
is a separate graphic program which can be started from the
Workflow/Workflow_Manager portlet
by the button
Workflow Editor
of the
Workflow-tab and it runs
in the
desktop
of the user.
Shortly speaking the
Workflow_Editor
can create, modify and save a workflow. You will find in Chapter
IV a rather long introduction
in
the
use of the
Workflow_Editor. Here is
only
a short summary of the most important menu items of it
4.1.1
Workflow
creation
Workflow creation is the process when we define a new workflow on the
P-GRADE
Portal. The creation may be interactive building process, or an
import process:
4.1.1.1
Interactive building process
A new workflow can be created
within a recently established window (as
you see it at
Figure_14)
or within an existing copy (see
Figure
32) of the Workflow Editor program.
With the menu item
Workflow/New you may create a
new
empty workflow.
By the subsequent application of
Workflow/New
job and
Workflow/New port
you can build the proper parts of the graph of the workflow. (See
Figure_15)
4.1.1.2
Import process
A whole
workflow previously
built and tested by the application
P-GRADE
can be imported from the desktop machine with all of it dependent
parts by menu item
Workflow/Import
Workflow. (See
Figure32 ) The selected
menu item opens a
file browser enabling to select a workflow file of file type
extension
wrk,
which will be uploaded to the
Portal_server.
This workflow will behave just the same way as the workflows you have
manipulated manually. However in most cases you need only to check the
destination
resources
of the component
jobs.
4.1.2
Workflow saving
A just created
workflow has no name.
It must be saved by the menu item Workflow/
Save as. (See
EDITOR/Save| Upload on
Figure_2) This command
has two
effects: it uploads the workflow with its user defined name to
the workflow repository of the
Portal_server
and puts the workflow in the launch list of the portlet Workflow
Manager. After
any modification of the workflow (see
4.1.3_Workflow_modification_
)
the menu item Workflow/
Save
has the same effects. If the saving process finds that any of
the referenced files mentioned in the description of the saved
workflow have not yet been uploaded to the
Portal_server
(or not valid –see later) it prompts the user to enable the
start of the automatic upload process. Therefore the manual issue of
the
menu item Workflow/
Upload is
seldom used. (See
Figure_32)
4.1.3
Workflow
modification
Any part of a saved (or recently
created) workflow can be modified:
of the
ports ( See detailed at
Figure 21 and at the subsequent
Learning Notes on Port
Properties).
To handle these changes the user needs to access to
the workflow (i.e. to download it from the
Portal_server
to the desktop) by the menu item Workflow
/Open.
The user selects the needed Workflow from the Workflow Repository of
the
Portal_server.
Changes the user any file reference during the modification
process (even if he/she restores the previous text) a hidden marking
will record the event, and the previous file reference will be
invalidated, with the consequences that after the subsequent Workflow/
save command the user will be
prompted to enable the needed upload. Shortly speaking the system
automatically maintains the data consistence between the definition
environment (desktop machine) and the
Portal_server,
and the user is exempted from the duty to delete the obsolete files
from the
Portal_server. (See
detailed at
Figure 33 and the subsequent
Learning_Notes_on_Upload_File
)
The actual modification steps are discussed in details via examples in
Chapter IV in paragraph
4._Building_your_workflow_
4.2
Workflow deletion
5.
Starting a workflow
Using the
Workflow /Workflow
Manager/ Submit (Figure_2) command you
can
submit the
prepared workflow to the GRID i.e. you may let it run. Certainly the
following conditions must be fulfilled which are controlled by the
system partly at creation time and partly at load time:
- The input_files
and the executable codes
must be available. - See the note in the point 0._Preparation:
- The resources
(processors, clusters where the jobs belonging to
the nodes
should run) must be defined.
- The selected short term (proxy) certificate
must allow enough time limits and all of the requested resources
must accept this certificate.
See more detailed in paragraph
Run_time_user_actions
6. Observing
the progress of a workflow
If the submission was successful and
the
jobs
begin running you can follow the progress in three different ways:
6.1 Progress info
from the Workflow Manager
First of all, the elements of the
Workflow list of the
Workflow Manager (
Figure_38)
inform you about the state of the whole workflow (column
Status), and
about the eventual results (column
Output). The
elements of the column
View have
a tree structure, and their roots are the buttons
Details (
Figure_39).
In the detailed mode a sub list describes the state of each
job
composing the selected workflow.
Size shows the size of the
storage needed by the Workflow in the host of the
server.
Quota shows the percentage of
the
quota permitted for the
user.
The label of the column
Quota
includes the information about the full size of the quota (in
case of
Figure 38 it is 1 MB),
and the last line of
Workflow
list summarizes the percentage and the size of
occupied storage.
6.1.1
Detailed view (Figure_39)
In this list each line corresponds to a
component
job.
The line contains the following fields:
-
Workflow
name of workflow inherited from the root menu
-
Gridname name
of the Grid (or of the virtual organization) where the job runs.
-
Job
name of current component, as the user defined it in
the text field of "Name" of the job definition window
<jobname>properties.
(See
Figure 17 ,
Figure 18 )
-
Hostname
host where the
job
runs
-
Status
Status information must be distinguished between Workflow status and
Job
level status.
The possible Job states with proper coloring and in the natural
sequence (when applicable) are:
init
(white)
submitted (orange) only in
case of
brokering
(Since Release 2.2)
wait
(blue) only in case of
brokering (Since Release 2.2)
scheduled (magenta) only in case of brokering
(Since Release 2.2)
running (Red)
finished (green)
error (blue)
The possible Workflow states are
init
(white in overview window, green in detailed view)
The workflow is uploaded in
the Server
submitted (orange in overview
window, white in detailed view)
On user action and when no job is Run state
running
(red in overview window, white in
detailed view) On first
job enters Running state
finished
(green in overview window, white in detailed view
)
When the last job terminated successfully
error
(blue in overview window, white in
detailed
view)
On error in one job and with no possible jobs to run
rescue
(blue in overview window, white in
detailed
view)
On error in one job and with no possible jobs to run (Since
Release 2.2 See
rescue)
aborted
(red in overview window, white in
detailed
view)
On user action
-
Logs
buttons
Out and/or
Error to read the eventual files
written by the system on ¨stdout¨ and ¨stderr¨
respectively
-
Output A
green button indicates that, the application terminated
successfully and the result can be downloaded
-
Visualization
eventual buttons Visualize , All to call the graphic
monitoring for the whole workflow, or for the proper
job
or
for each possible parts.
-
Action
This array of buttons is inherited from the root menu and
will be discussed in paragraph
8_Run_time_user_actions_
The user should return to the root menu by hitting the button
Back
6.2 Progress info
from the Workflow Editor
If the graph of
the running workflow
is selected and visible in the Workflow Editor then you can see the
progress by the changing of the coloring of the
corresponding
nodes
of the
jobs that are being executed.
You can start the Workflow Editor in
two different ways to see the progress:
- either by the
Attach
button of that workflow in the Workflow list of the
Workflow Manager
- or by hit the button
Workflow Editor
of the
Workflow
Manager and
Open
the Workflow to be observed.
In accordance with the convention discussed in
6.1.1_Detailed_view_ the following
colors are
used:
orange The
job
waits for user submission
white
The
job
is submitted and waits to
run
red
The
job
is running
green
The
job
is finished
blue
The
job
has been aborted either by the system or by the user
Note that since the release 2.2 there is an additional state of a job
in the case it is running under the control of a Broker of the
EGEE Grid:
magenta The
job is scheduled
by the broker
Note that since the release 2.2 a new color indicates that the workflow
failed but can be restarted from a natural checkpoint composed upon the
jobs finished successfully:
saddle brown The
job is in
"rescue" state
6.3 Progress info
by Monitoring and Visualization
The third method is graphical
monitoring. It is discussed detailed in
chapter V
Monitoring_and_Visualization.
6.4
Suspend the run of the Workflow (Smart abort)
A new feature of Release 2.5 of the P-Grade Portal program is that the
run of a submitted workflow can be suspended and the
workflow becomes in the
rescueable state. So the Rescue
state can be generated not only by the system but by the user as
well. This feature is very useful if a job hangs on an
unfortunately selected resource but the other jobs of
the workflow have produced already results worth preventing
from recalculation, which would be necessary during
a resubmission following the abortion. See the new button
Suspend on
Figure
40a .
7.
Fetching the result
The last step is fetching the
result. The
Portal_server puts
the results in a zip
file.
This is a compressed directory hierarchy which follows the structure of
the
workflow graph. A subdirectory will be generated for each
node,
where the associated permanent local
output_files
are stored. The user can fetch
the
zip file by the standard download manager of the browser used.
This step is marked in
Figure_2 as
¨
Workflow/Workflow
Manager/output¨.
Please note, that the remote output files will not be retrieved to the
portal server and can be accessed by methods beyond the control
of the Portal. See
Figure 8.1
Please note, that the download manager
is part of the browser, and it is the responsibility of the
administrator of the user's web site to set it up properly.
7.1 Fetching
partilal results
The local permanent output files of the terminated jobs can be
downloaded individually even before the termination the
whole Workflow.
Before the release 2.5 of the P-GRADE Portal program only the the whole
set of of local outputs could have been downloaded.
Please note, that the partial download possibility is not valid
for the eWorkflows of the PS Tasks.
The
little
green triangle buttons in
the rows of the proper jobs and in the column
Output (See
Figure 40a) indicate the available
outputs and receive the download requests.
8.
Run time user actions
The recent version of the Portal
Program maintains two different lists of workflows:
- The tag Workflow/ Storage
lists all workflows of the user . See chapter XIII_Workflow_archivation
- The tag Workflow/Workflow
Manager lists only the active
workflows of the user.
The meaning of discrimination between the active and inactive workflows
is the following:
The cost of operation of polling the state of active workflows is
expensive because of heavy net traffic. Therefore the user may
get much slower responses if the number of active workflows
(
and the complexity of them ) is high.
The both lists will be updated as a consequence of
the
Save or
Save as command of the
Workflow Editor but
the the
Delete
command may has different consequences:
- The Delete command
of Workflow/ Storage deletes
the file system representing definition of the selected workflow
from Portal server irrevocably. See chapter XIII_Workflow_archivation .
- The Delete command
of the Workflow/ Workflow Manager has the
option
to delete the selected Workflow only from the list of active Workflows
i.e. from the Workflow Manager.
The user can use the buttons of
the
Actions
column (
Figure_38) in the row belonging to the
to each
elements of the
Workflow/
Workflow Manager list
- to start or stop the application by the
toggle button Submit/Abort,
- to Delete
the whole
Workflow from the Portal_server
(each file belonging to the definition and to the running
state of that workflow)/ or from the list of active workflows.
- to show the workflow via the Workflow_Editor
by hitting the button Attach.
- Starting
from the Release 2.5 of the P-Grade portal the button Suspend appeares among
the Actions
group.
See the next chapter 8.1_Suspend_the_run_of_the_Workflow.
The button
Delete all deletes all
workflows
of the user from the
Portal_server
or from the list of active workflows.
8.1
Suspend the run of the Workflow (Smart abort)
A new feature of Release 2.5 of the P-Grade Portal program is that the
run of a submitted workflow can be suspended and the
workflow
becomes in the
rescueable
state. So the Rescue state can be generated not only by the system but
by the user as well. This feature is very useful if a job
hangs on
an unfortunately selected resource but the other jobs of
the
workflow have produced already results worth preventing from
recalculation, which would be necessary during a
resubmission
following the up to now only possible
abortion.
See the new button
Suspend on
Figure_40a.
IV The detailed
operation of the PORTAL by an example
During the tour
you will build, start and observe a little test
workflow.
After the general preparation you will find the description
of
the workflow to build and submit after
Figure 14
1 Login
The user can reach the PORTAL through
a proper URL. For
example:
There you should find - depending on
the browser - something like this:
Figure
3

Figure 4
From here you can
launch the activities shown in
Figure_2
Let us begin with the
Certificate Manager
by selecting the tab
Certificates.
2.Certificates:
Setting
access rights to resources
Figure
5
The user can
Upload a personal
certificate to, or
Download
a temporary
proxy_certificate
from the
MyProxy
certificate server with the help of
the
Certificate Manager
triggered by the Certificates tab.
The very first action can be to fill the Certificate Server (
the so
called MyProxy server) with the existing personal certificates of the
user .
Please
select
Upload:
2.1
Upload detailed
In this process the user creates a
certificate account
Please rememeber that the
Upload step -for the time being -
must be skipped and done in a different way (outside of the scope of
P-Grade Portal) in the case when your Virtual
Organisation uses WOMS extension to the certificates. See
WOMS_Warning
Figure
6
The first screen of the upload process
requires your file named
userkey.pem containing your
secret_key (see
Figure_6).
You can search for it in your local directory system using the
Browse tool.
Fill the input field and accept it with
OK. The next panel requires a
password for your secret file as
Figure_7
shows (see also
User
identification against_own_userkey_file ) :
Figure
7
Upon
OK
the certificate file
will be requested. This
certificate
entitles you to use certain
resources
for a limited amount of time.
(See
Figure_8)
Figure
8
Upon
OK
you will see the window depicted in
Figure_9
where
you must select an existing
certificate_server(MyProxy)
by
hostname and
port, and must define an account (
login name and
password ) on it where your
certificate will be stored.
( See also
User
identification_against_the_Certificate_Server)
This
Certificate account stores just one certificate, so
this ¨login¨ is
actually the user name of the given certificate. The default host and
port of the Certificate Server is given by the system. You see here an
additional input field, the
lifetime -
this will be an upper limit for the short term proxy certificates you
may request by a subsequent download. You must hit
Upload to perform the
operation.
Figure
9
The system acknowledges the end of the successful upload process:
Figure 9a.
Next you may generate a short term
proxy_certificate. To get it you can
use the
Download button
of
the
Certificate List menu open in this state ( see
Figure 9a)
You will get Download menu:
2.2
Download detailed:

Figure
10
The proxy certificate will be generated
from the personal certificate by filling a form and will be
downloaded to the
Portal_server
upon hitting the button
Download.
The parameters of the form are the followings:
The fields
login
name and
password
refer to the account of the
previously uploaded certificate.
You can overwrite the default value of
lifetime
as the required
expiration time of the short term proxy certificate. However
the the actual value of the downloaded
proxy will not
exceed the value you have defined during Upload of
your certificate.
If you find this limit too short, please repeat the upload process.
Upon
Download in the
new (2) release of the Portal the user gets a message (
Figure 10a) indicating that downloaded
short term certificate can be associated to a
GRID.
Figure
10a
A
GRID
is an administrative community of certain
resources.
With the same
certificate the user can
reach all resources of a certain simple GRID or a certain
virtual_organization of a complex grid.
Resources are subordinated to grids, i.e. each resource must belong to
a
given GRID (or to a virtual organization of complex Grid).
In the new release of the Portals Infrastructure the different
jobs of the same workflow can
submit programs in resources of different available grids.
Therefore the user must have methods to maintain different certificates
to different Grids (or
virtual_organizations).
See more detailed in
VI_Multi-GRID_support.
This association is introduced by hitting the button
Set for Grid.
The actual selection can be performed on the subsequent frame (
Figure 10b) .
Upon selecting one available name (which may refer either to a simple
grid or to
a
virtual_organization or to
a
virtual organization
with broker support ) from the
check list
labelled
Select GRID
the hitting
of button
OK (
Figure 10b) closes the frame and returns to
the panel showing the
downloaded short term (proxy) certificates:
Figure 11
As you can see -comparing
it with Figure 5, there is just one
certificate on the list in our
case, and the time of the usage is restricted by the value you have
used
for the selection.
Important notice:
In the [
Actions] column
there may be a fourth button
Use this
at each unselected proxies , if the list has more than one
element belonging to the same GRID (or virtual organization).
With this button you can select the actual certificate you want to
apply for your subsequent job submissions directed in the respecting
GRID (or virtual organization).
With the help of button
Set for
Grid the GRID (or virtual organization) association can
be changed at any time (See
Figure 10b)
With the help of button
Details
the user can go back to a frame similar to of
Figure
10b but without the possibility to set a GRID (or
virtual organization).
Having defined the certificates you
need resource(s) where your
jobs
may run.
The main menu button
Settings
helps you to define them. Please note that these data are stored on
your
Portal_server
and not on the Certificate Server.
3.
Settings: Defining the
resources
However
resources belonging to different virtual organizations (or
even to different grids ) may be used within the same workflow.
See
the management of the grids in the Chapter
VI_Multi-GRID_support.
Please, note that the
Name in the table
settings (See
Figure
12a) means
virtual_organization
even in cases when the grid consists of just one virtual_oganization.
Hitting
the tab Settings the
list of virtual organizations available for the user will
be displayed:
Figure
12a
To select a line for
details the proper
button
Resources opens a new
frame listing the
resources
belonging to the selected
virtual_organization
.
Figure
12b
New elements to the resource
listing of Figure12b can be
added by three different ways:
-
By manual definition, filling the two component input fields of
the Contact_string the URL and Job_manager
acknowledged by the button Add
-
By inheriting the manual settings of the Administrator of
the P-GRADE Portal hitting the button Load
default
-
By letting the resources discovered automatically by the Information system hitting the
button Load resources from MDS2
Let
be noticed that the usage of this intelligent method is GRID dependent:
it is a condition of a running and accessible information system
of
MDS2 kind.
The Contact_string
defines the entry point of a resource (cluster ) in form of
an URL of the
leading Host of the cluster appended by symbolic
scheduler
name Job
manager which classifies the demands of the job against the
cluster.
More precisely
the contact string defines a program-queue belonging generally to a
cluster of hosts, whose elements may execute the job added to the queue.
In the EGEE like grids the name of the
resources identified by a contact string is called Computing Element (CE).
The same cluster -identified by the
leading Host- can serve several CE-s.
You may remove a
resource
from your list at any time by Delete.
On manuall resource addition please make sure the GRAM and the GridFTP
servers are listening on the same host. If you would like to add a
resource
where these two services are not described with the same
Contact_string please contact your portal administrator, otherwise
you
will experience submission problems with your newly defined resource.
If the
NAME of Figure 12a is of virtual organization with
broker support
kind - for example "hungrid_LCG_2_BROKER" - then the
window
opening
for hitting the button "Resources"
is not modifiable, i.e. it
has no significance, and is there just for historical reasons.
The contact string is a general term, which
has been slightly extended by the EGEE
Therefore
the usage of EGEE resources needs special consideration:
3.1 Direct
use of resources in the EGEE.
The user can explore the available
Computing
Elements by two ways:
- either by using the LCG-2_information_system. (See Figures 7.9 7.10)
- or by issuing the following command from a remote terminal
connected to a User Interface Machine of the EGEE:
lcg-infosites -vo <vitual_organization> ce.
For example in case of the
virtual_organization
"voce":
skurut4.cesnet.cz$
lcg-infosites --vo voce ce
****************************************************************
These are the
related data for voce: (in terms of queues and CPUs)
****************************************************************
#CPU
Free Total Jobs Running
Waiting ComputingElement
----------------------------------------------------------
9
9
0
0 0
ce.grid.tuke.sk:2119/jobmanager-pbs-voce
166
166
0
0 0
ce.polgrid.pl:2119/jobmanager-lcgpbs-voce
94 14
95
80 15
grid109.kfki.hu:2119/jobmanager-lcgcondor-long
36 32
0
0 0
ares02.cyf-kr.edu.pl:2119/jobmanager-lcgpbs-voce
78 65
0
0 0
zeus02.cyf-kr.edu.pl:2119/jobmanager-lcgpbs-voce
46 41
0
0 0
skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce
176
176
0
0 0
ce.egee.man.poznan.pl:2119/jobmanager-lcgpbs-voce
Two new features can be observed comparing the traditional Globus
resources:
- The newest type of job_managers
contain a subordinated queue as an
extension:
Here the "-" separated last part of the job manager -in the example
("voce" and "long") refer to the subordinated queues.
- At some places the traditional job managers are substituted by
"lcg" proprietary ones:
Warning:
When the job manager is of "lcg" inherent one - for
example "lcgpbs" instead of the common "pbs" we strongly discourage the direct usage of the
resources
of this kind in the Job properties
window
(See Figure_18.
where the job manager is "fork" and therefore the usage is correct ).
Instead of the direct use, please select a virtual organization with
Broker support as Grid
(See Figure
10.1) and apply the following constraint in the tab Rank& Requirement of
the JDL description:
other.GlueCEInfoHostname ==
<Host_of_Computing_Element >.
Example:
Selecting from the table above for example the Computing Element
"ce.polgrid.pl:2119/jobmanager-lcgpbs-voce"
the host is
"ce.polgrid.pl"
And the setting can be seen at Figure 10.9
4.
Workflow Editor:
Building your workflow
Now it is time to make our own
workflows. We select the tab Workflow
of the main menu:
Figure 13
By selecting the button
Workflow
Editor, an independent java program, the
Workflow_Editor will start.
Note: Your browser must have the JRE
1.5.2
Java plug-in (or higher) in order to let this program
start.
The first time
the Workflow Editor is loaded during the portal session some messages regarding possible risk of
using the program will be displayed.
The Workflow Editor, however, is harmless and should be allowed to run.
(The involved
Webstart technology notifies the user that the downloaded program may
access the local file system and
prompts the user to trust or dismiss the source of the certificate )
In the positive case the following
window will appear:
Figure 14
Next you will build a simple
workflow containing two
jobs
described as follows:
Example definition:
For simplicity both jobs are of
identical structure and use the
same executable program (in real life the executables are usually
different):
This executable is a simple
sequential program of C source
-in our example Cell.c¨
- that reads two integer numbers from two different text files, where
the program opens these files as ¨INPUT1¨ and ¨INPUT2¨ respectively, and the
value of the multiplication of the two numbers will be written in an
output text file opened as ¨OUTPUT¨.
You will build the connections to
your first job ¨Cascade1¨
such a way, that it will receive its both input_files from the local file system
as <path1>/I1
and
<path2>/I2
respectively.
Its output ¨OUTPUT¨ will be generated
somewhere in the GRID and will
serve as the ¨INPUT1¨
of
the second job -¨Cascade1.2¨.
The local file <path2>/I2 will be used as the ¨INPUT2¨ of this job .
Finally the result of the whole
workflow - ¨OUTPUT¨ of ¨Cascade1.2¨ - will be
generated in the GRID and
replicated to the Portal_server, downloadable for the user.
As a preparation, you must have
got the executable
code
(
¨Cell.exe¨) and the
input_file
in the form <path1>/
I1
<path2>/
I2
stored in your local file system.
This knowledge is enough to build the
workflow as follows. Let us
define the first
job
first:
Figure 15
Hitting the marked icon
New job
a new
job
will appear:
Figure 16
Double clicking on the
job will allow
its
properties to be edited.
(An alternative to double clicking is the RIGHT mouse click on all
graphic elements triggering a popup menu of possible operations.)
Figure 17
Learning
Notes on Job Properties:
In this menu the user defines the code of job to run (
Job Executable) and the call
conditions of the job - where to let it run (
Grid,
Resource),
with what kind of arguments
and conditions of the resources.
Arguments can be
line
Attributes
elaborated by the code internally eventually influencing the running of
the job,
Monitor flag to
indicate graphical observation of running of the job,
and some hints about the kind (
JobType)
and size of resources needed perform the job (
ProcessNumber)
Details:
Name
is given by the system as
default. The user can change it but the name of
job
must be different from any other job names.
- The value of "Name"
will be used to generate different file names in the internal
representation
therefore the value is restricted to be built of
alphanumerical characters and of underline character.
- Changing the Name
after a closed editing session may have a disturbing
consequence:
The path to the Job Executable
uploaded in the previous session will be lost and therefore the
system prompts the user to redefine the path.
JobType
is the kind of the code referenced by
Job
Executable to be started on the resource.
It can be traditional sequential (
SEQ) or parallel (
MPI,
PVM) . In case of MPI the resource
must be
informed about the number of hosts needed by the program (
Process Number)
Important notice: If the user wants to submit
MPI jobs with Broker support the
special JDL requirement must be entered:
(See Chapter X
2.9_Important_notice_to_MPI_submission)
Job
Executable defines the path of the executable code to be
uploaded from the local file system.
Upon a successful upload
and a subsequent download of the Workflow the input field will show
only the name of the executable
instead of the whole path. Any change of this input field
instructs the
system to upload a new executable -defined by an absolute path - from
the local file system.
The search for such a file is supported by the File Browser
Instrument
is a message field
set by the system only in the case when the executable
code
contains
special message sending instructions
for the real time
monitoring i.e. the
code
is
instrumented.
Process
Number has
significance only if the Job Executable
is of MPI
type.
It notifies the resource about
number of needed hosts.
Attributes
may be filled in
just
as the eventual command line parameters of traditional C
programs.
Grid
defines a
GRID or a
virtual organization
i.e. the high level administrative domain where the job must be
submitted.
Changing the GRID changes the subordinated list
of
selectable Resources as well.
Monitor flag can be
set by the user
only in the case when the
Instrument
is set.
The setting enables
job level monitoring .
Please note that setting (changing) of the monitor flag may
have an unexpected
effect on the selected
Resource
(and on selectable resources):
If the previous state of
Monitor
was "not set" and the relating selected
Resource could not
be
monitored then the new monitoring request will hide all the
resources lacking the monitoring infrastructure and the current
value in the field
Resource
will be replaced by first monitorable element of the list of
resources.
Resource defines the
resource
where and
with what kind of assumption the
Job
Executable will runs within the defined
Grid.
The selected resource may change
implicitly as a consequence of
changing
Grid and
Monitor setting.
If the
Grid is of
virtual
organization with broker support
kind - for example hungrid_LCG_2_BROKER - then the value of
"
Resource" has no
significance, and just for historical reasons.
Warning:
If the kind of the
virtual_organization
defined in the
Grid is
of EGEE like and
Job_manager of
the
Contact_string defined as
Resource is an LCG proprietary
one
please consider the warning in
chapter
3.1_Direct_use_of_resources_in_the_EGEE
and use
virtual
organization with broker support as
Grid with resource constraint in the
JDL.
JDL: Is a new
feature available from the Release 2.2.
It indicates the JDL
editor. This editor is applicable only if the
virtual_organization selected in
Grid is of EGEE compliant
type
i.e. the
Resource will be
determined by the system upon clues of matching
characteristics (set by the user with the help of the JDL Editor)
instead of direct assignment. The usage and effect of the JDL Editor
will be discussed in
Chapter
X_Connection_to_the_EGEE_Grids.
There is an alternate method to set the
resource
dependent features of the job. It can be handled centrally starting
from the
main menu
Workflow Properties.
After filling the needed fields the
window may look like this:
Figure 18
Hitting Ok you will return to
the main window:
Figure 19
Now you can define the
I/O ports
to this
job.
Hitting the
port icon you may add a new
port to
the
selected job
.
An alternate method to define a new port is the menu item New port in
the main menu Workflow (See
Figure 32)
The
selected
state of the
job
is visible by the red frame around the
job
:
Figure 20
Double clicking on
the
port icon
(or
selecting the
Properties item
of the popup menu triggered by the right mouse click as
Figure23
shows )
the port
properties will be definable in a pop-up window:
Figure 21
With the help of the
port properties window the user defines the direction, kind, name and
file association of the
port.
Learning
notes on Port
Properties:
A
port
connects
an
input or an output file opened by the job with the environment.
This environment can be from the point of view of the respecting port
an external file or an other
port of a different job.
The external file reference (defined in the field File) will be mapped to
the name (defined in the field Internal
File Name) used by the author of the executable to open the
given file.
Notice, that there is
no more restriction on usage of filenames:
In
the versions preceding Release 2.2 of the P-GRADE Portal the Port
property
field Internal
File Name (see Figure 21)
has not been
defined and
hence there was the additional restriction imposed on the user to
apply external file references
"ending" as the job executable expects
them i.e. the value corresponding to this field was generated from the
"/"
separated "tail" of the File field which used to have the form
[[protocol]<directory>]<FileName_applicabe_as
InternalFileName_as_well>
A port can be either an Input (
In)
or an Output (
Out) port .
Input:
If the
Type of the
port is
Input AND the port will
NOT be connected
to any
Output port of other jobs then the
port must refer to a genuine
input file.
The genuine input file must be defined as a full path in the
File in the form of [<
protocol>]
<path>
where
<protocol> can be defined only in case of
Remote files (see
VIII_Handling_of_remote_files).
If the
Input port will
not be connected to a genuine input file i.e. it will be
connected to the output port of a different job then
File field must (and can) not be
filled.
In both (local and remote) cases the user defined input field
<name> after the label
Internal File Name must correspond to the "fopen (
<name
>,"
r")"
instruction within the code of the job.
The
set
flag
managed
copy means
that the system automatically delivers the input file to the working
directory where the associated job will run.
This is the default case.
However in some cases when the user wants to handele
remote files and the
location of the input file is a
Storage
Element the file may be to big to be copied.
In such cases the user may decide (by clearing the flag) to take over
the responsibility of
reading the Grid file. Note, that in this case the executable of the
job
must be prepared by the user to open and read the Grid files using the
GFAL api of the EGEE.
Output:
If the
Type of the port is
Output
AND
File Type is NOT
Remote
then
File field must (and
can) not be written.
Please note that there is NO symmetry between genuine input and
output files: Genuine local output files
( i.e. those referenced by
Output
ports not connected to any other
Input
ports of different jobs) are stored in the
PORTAL Server and
will be downloaded to the local environment of the user by an
interactive command after the completion of the run of workflow.
The other case is when the
Output
File is
Remote. In this case the file
referenced as string after the label
File will be
stored according the full path of form [<
protocol>]
<path>
(see
VIII_Handling_of_remote_files).
In both (local and remote) cases the user defined input
field <name> after the label
Internal File Name must correspond to the "fopen (
<name>,"
w")"
instruction within the code of the job
Details:
Port name:
Given
automatically by the system. There is not too much sense to
change
it by the user.
The field will be used internally to generate subdirectories. Hence
it must contain only alphanumerical characters.
Type:
Selector, to indicate either a
reading
or
writing access
to the proper "fopen (<name>,
"{r/w}")" instruction in the code of
the job
File
Type:
Has significance only in case of genuine files. The default setting is Local.
If the setting is
Remote then the
Input
file will not be uploaded to the Portal Server during the
Save/Upload phase
terminating the definition of Workflow by this Workflow
Editor. Instead of just the reference to the
remote
file will be
stored and
the file transfers will be organized by the run time system.
A file defined to be
Remote on
an output port forces all connected input ports to be
Remote with identical
File names.
File:
Reference to a
genuine local or
remote
file [<
protocol>]
<path>
Please note, that this filed is not definable if the port is an
input one and connected to the output of an other port, or the
port is designed to be
a local output port.
The search for such a file in the
case FileType = Local is supported by the File Browser.
Internal File Name:
Internal reference to a file
used by the author of the corresponding job in a "fopen(...)"
instruction.
File storage type:
This
selector can be activated only in
case of Output files. Its
default setting is Permanent
for the genuine Output files
and Volatile for
the
"channel"
files connected to
other ports .
If a channel file is reset as Permanent
its data will not be
discarded after each connected job has read its
content
but "added to the output of the
workflow". It means in the case when the setting FileType = Local that the file will be
preserved for downloading as it would be a
genuine Output file.
An eventual resetting to Volatile
forces the change of the File
Type
to Local because a
temporary file in dedicated Remote
storage
device is undesirable.
Please select a proper input
file with the help of the File
Browser and fill the Internal File Name according the convention
required by example program
cell.c
:
Figure 22
The
Port
name is set
automatically, however the user may redefine it.
Hitting
OK you return to the
original editor to define other
ports.
Repeating the proper steps
(basically to define the location of INPUT2 for
port ¨1¨ - a similar window
is shown in
Figure 30)
you arrive to
change the properties of
port ¨2¨
Learning Notes on Port
Editing:
In this case hitting the right mouse
button (seen in
Figure23) offers three
possibilities:
Properties - to define the Port
properties
Delete
- to delete the port
Fix (Unfix)- toggle to glue (or
release) the relative graphic representation of the
port along the sides of the square representing the job
(This operation has no significance from the point of view of the
semantics of the workflow)
Figure 23
Selecting Properties by left click you get the Port
properties popup
window where
you may select "Out" as Type and enter
"OUTPUT" as Internal File Name:
Figure 24
Please
remember that in this (
Local) case of
Output you
must not define the
File even if
this
port
would be intended as the source
of the genuine output of the whole workflow. The reason is, that the
Workflow, upon successful termination of the submitted tasks,
will
not return individual files. Instead it packs the
Permanent Local output_files
into a compressed file tree reflecting the structure of the
workflow, and you can download it by the standard method of your
browser as you will see it in Figure 39.
Please note that you will be able to identify this
file using its
Internal File
Name.
If the user wants to reduce the storage load of the files produced by
the job then the eventual unneeded files
can be marked as
Volatile
instead of the default
Permanent.
Hitting
Ok completes the
definition of our first
job and the
icon New job
can be selected.
Figure 25
Let us define the job properties as
previously and let us create the
ports for the new job similar to
Figure19,
and let us select the first (0) one.
Here the
File will not
be
defined because this port will be connected to the output
port of the other job:
Figure 26
Hitting Ok we will receive the following Warning message
Figure 27:
Figure 27
The simplest way is to
answer it with No
and proceed to
perform the port connection.
After closing the Port property window the Editor looks like as:
Figure 28
Now we connect the
output (
port ¨
2¨) of ¨
Cascade1¨ with the
input port ¨
0¨ of the second
job
.
Pressing the
middle mouse
button on the output
port, holding it down, dragging it
up to the proper input
port and releasing it will
define the desired connection.
Editing Notes :
No rubber line will be seen during dragging.
Clicking on the arrow connecting ports the color changes from blue
to red and the connection becomes selected.
The selected connector can be deleted by the proper icons of the
menu bar (cat or delete) or even attributed it
graphically
to influence the color oft he connection for the simulation
regime
when the Workflow Edit is used to display the runtime state
of the submitted workflow. Hitting right click on the arrow opens a
popup menu where toggle item Switch to {ONLINE|OFFLINE}
can be selected. This minor coloring feature does not change
the semantics of the workflow to be defined.
Figure 29
Now we edit the second input port (Port
name 1) of the
new job:
Figure 30
And let us define the output port for
the second Job:
Figure 31
Confirming the change by
Ok
the
edition phase is complete. We just need to save our product for
the
Portal_server:
In the main menu let
us select the
operation Save as:
Figure 32
In this state the Workflow Editor
controls the correctness of the
workflow.
Learning Note
In case of an eventual error
(mostly bad references to the local files to upload, missing resources)
a warning message appears about the found errors.
Even in this case the user may decide to save the workflow. However in
this case the workflow is marked as
incomplete for
the
Workflow Manager and can not
be submitted only to be stored for a later modification.
This modification is initialized by the
Open menu command (see
Figure32
) supplying a list of the workflows of the user stored in
the
PORTAL Server
.
Selecting one workflow it will be downloaded and the editing can be
completed.
In case of saving of new workflow (in
case of
Save as or at
the first use of
Save) a popup
dialog (
Figure 32a) prompts the user
for a name of the Workflow.
This must be of alphanumerical characters and must be different of
workflow names have been stored in the
PORTAL Server .
Figure 32a
Let's define the workflow as ¨
WF1¨.
In a subsequent step system automatically proposes to issue the
Upload command to transfer the
referenced executable
code(s)
and the
input_file(s)
from the client's desktop to the
Portal_server:
Learning notes on upload file
:
The Upload proposal happens in
the following cases:
- New local input file or code file references has been detected
- The user has modified a field of kind Job Properties/
Job Executable (Figure 18)
or a field of kind Port Properties/File
(Figure 22 ).
If the user refuses the suggestion the workflow remains incomplete
.
Upload command can be issued later at any time even manually. See
Menu Upload Files...
of Figure 32
Figure 33
You select Yes and then the system
starts the uploading process, which is indicated in a pop up window Upload containing a progress
bar.
Upon termination the message
Finished will be visible and the system will wait for
the user to press the Close button:
Figure 34
Executing the editing steps above we have finished
the
creation of our new workflow
WF1
and can leave the
Workflow Editor
to return to our
Workflow Manager
Its page (See
Figure 35) must be
Refresh-ed to show our new
workflow WF1, which is now ready to run.
Before
doing it let us control the associations of
jobs
and
resources.
It can be done either step by step visiting the jobs for
properties or centrally by a new menu command of Release 2
Workflow Properties (See
Figure 32 )
It opens the following table:
Figure 34a.
If a change is needed it can be
performed in a 6 step process:
- Select a proper Grid
- Select the required resource from the list of the loaded Resources. (Remember the resources
are Grid dependent)
- Mark the left of the line(s) belonging to the required
Job(s).
- Confirm the changes with the button Set selected
- Leave
the window by Ok
- Save the Workflow
This table can be used the similar way
to control the monitoring of
the jobs. If the
code belonging to the
job is not
instrumented
then
the association will be refused.
5
Submitting the workflow
Figure 35
With the Submit command we can
activate the workflow:
Figure 36
6 Observing
the progress of the workflow
A side effect of submitting is the changing of the Submit button to
Abort. A subsequent Attach command reopens the
Workflow Editor but in a new cast: The progress of the workflow can be
followed by the changing of the colors:
Figure 37
In this state the first
job
has received the control and the second is waiting for the termination
of the first.
A click on the Refresh button of the Workflow manager window may
indicate the successful termination of the workflow.
The
Portal_server
just collects the referenced files and makes one compressed
downloadable
file out of them:
Figure
38
7
Fetching the result
A click on the green button in the [ Output ]
column starts the download
manager of the browser to copy the result file on the desktop of
the user. Please note that the user defines the destination
library of the workflow result with the tools of the download manager
in
a browser dependent way.
The pressing of the Details
button of the [
View ] column opens a
window from where important information
can
be concluded:

Figure 39
Beyond the
verbose state of the
constituent
jobs
in the
Status column
you can get the graphical rendering of the two stages Time –
Process communication diagrams ( by pressing the buttons under
column [
Visualization]
. You can also see the eventual messages of the
jobs
directed to the standard output ( by pressing the button
Out) and/or to the standard error
(by pressing the button
Err
- not visible in Figure 40 as the
jobs
did not produced error messages ) channels. This buttons are placed in
the column [
Logs].
Hitting the
Visualize
button, the visualization is performed by the independent program
called
Prove that is working on
a
proper trace file of the workflow. The availability of
job
level visualization is depending on two necessary conditions:
- The original executables have been translated with the
necessary instrumentation library incorporating trace sending
instructions.
- At the corresponding resource where the job
was running, the ]Mercury_monitor:_
monitoring infrastructure should have been installed receiving and
collecting these traces.
As you see this was not the case in our simple example.
You will get a more comprehensive view of the possibilities available
for monitoring in Chapter
V_Monitoring_and_Visualization
.
Finally we show the window returning the content of the standard output
upon hitting the Out button of Cascade 1.2
Figure 40
Figure 40a New Features: Suspend and
patrtial result download
V
Monitoring and Visualization
1 Introduction
Graphic monitoring means the
generating, collecting and graphic rendering of runtime data informing
the user about the state and about the progress of the submitted
workflow. In a parallel environment the dynamical conditions
triggering the run of a distinguished program parts are of special
importance: They help the user to pinpoint design flaws and the
temporarily missing resources. Therefore the “time space” diagram has
been selected as the base tool to render graphically the behaviour of
the
interacting program parts. This will be discussed in Section 3 where
the
work with the graphic tool
Prove –
running in the desktop of the user will be detailed.
We use the common term “program parts” in the respect of
graphic monitoring in two totally different contexts:
- At the level of the whole workflow to distinguish the
participating jobs;
- At the level of individual jobs
to distinguish the eventual parallel running processes.
For example the upper part of Figure_43 shows
the monitoring of the whole workflow, the lower part is a detailed view
of the progress its job
“cummu”.
1.1
Availability of monitoring
On one hand the possibility of the high level, –or workflow
monitoring is the generic property of the implied job
submission technique “Globus/Dagman” .
On the other hand the job
level monitoring –
badly needed first of all in cases when
the job
includes parallel processes – can only be performed if the
following conditions are all valid :
- The source code of the respecting processes has been extended by
special instructions at proper places to send monitoring messages.
(This
preparation is referred generally as “instrumentation” of the source
code)
The code can be instrumented either using the application P-GRADE (See Import_process) or by the
¨manual¨ use of the grm
library. A separate documentation of the using of the grm
library will be soon available.
- There is a special infrastructure (the Mercury_monitoring service
[1]) deployed in the remote resource where the
submitted job
runs to listen for and to gather these monitoring messages.
- The user has enabled the monitoring by setting the Monitor flag in the Job properties
window of the Workflow_Editor . (Figure_18)
2. Life cycle of
monitoring data
2.1 The source of data
As you see in Figure_2 the workflow results
–including monitoring data, in our terminology “the trace file”– primarily
arrive from the remote resources
into the Portal_server.
Actually huge amount of data may be produced by the
instrumentation.
As each job
is associated to a dedicated resource, there is a separate trace_file file to each job
.
2.2 The transport
The trace_file
will be collected in an autonomous, incremental way in packages as a
result of two possible events which are basically independent from the
activity of the user:
- The local temporary buffer for the current portion of the
trace_file
in a host of the remote resource is full.
- The respecting job
has terminated
2.3 The elaboration
These data need to be stored, filtered and elaborated. It is the Portal_server
which does the bulk of this work. It prepares the “image file” on user
demand. This “image file” - very few byte indeed compared
to
the trace_file
– will forwarded to the application program Prove running on the
desktop of the user.
Why the user should know all these nasty technical nuances?
First of all to understand the cause of the delays that the double
buffering imposes on the graphic rendering system. Almost as important
to understand that in given cases the user should assist to diminish
the
load of the Portal_server
by issuing of the “forget events”
command of the Prove, instructing the Portal_server
to truncate the corresponding trace_file
releasing the data about events have been arrived before a certain
time. (There is only a limited storage quota for each user in the Portal_server
which is a precious shared resource )
2.4 The frame of
destination: The visualization interface
The program Prove can be started from the “detailed” view of
the “Workflow Manager” as Figure_39
indicates.
Note: In the following a new workflow
application is selected as an example to demonstrate the full
palette of monitoring options. This fairly complicated workflow has
been
prepared such a way that all of its component jobs contain instrumented codes.
It is called ForecastWmin and performs a weather
forecast program, see Figure_41)
Figure
41
In this case the detailed view of this workflow in the Workflow Manager
indicates the possibility of the job
level monitoring by the show of proper buttons (Figure_42).
You can compare it with Figure_39 of
the
workflow application WF1 where the
buttons for the job
level monitoring are missing, because the jobs
of this application have not been prepared for monitoring.
Figure 42
Figure 42 shows the detailed view of the workflow ForecastWmin in
an intermediate state. The jobs
which are running and /or finished can be visualized by the program Prove which opens independent
windows upon hitting the respecting Visualize buttons. The Prove can be
opened for the high level view of the workflow as well. (Button Visualize in first line of the
workflow containing the name of it)
The button All “packs” all
visualization windows together starting from the high level view
as Figure_43 indicates:
Figure 43
Warning:
If the number of the elements along the vertical
axis (hosts / jobs) is high than certain alphanumeric texts
may not be displayed due to the low resolution.
In that case please increase the size (especially the
Height ) of the applet.
3 The Prove program
As previously indicated the program Prove visualizes time – space
diagrams.
The program_parts are represented
by colored bars placed as rows of a coordinate system where the
horizontal axis denotes the common time, and the –discrete –
vertical axis is labeled by the name of program parts which may
be jobs
or processes depending on the call context of the current item of
Prove.
- Green color indicates the state of a program part waiting for an
event to read.
- Black color indicates the “working” state
- Gray color indicates a state where the program is blocked
wanting
to send an event to an other program not yet ready to accept it.
Endpoints of arrows between bars are indicating times of sending and
receiving of events respectively. These arrows must be generally blue.
Exceptional red lines indicate bad trace_files,
unsynchronized clocks, lost monitor information. You are kindly
encouraged to report them to our Portal
maintenance team.
3. 1 User activities
The user activities may have effect on the trace_file
generation and on the graphical rendering of them.
3.1.1 Truncate trace
files
The only activity respecting the trace_file
generation is the menu command Trace/Forget
events
(see more detailed in the chapter 2)
Figure 44
The menu command Trace/Collect
is not used at present – it is reserved for forcing the remote resource
to update the trace_file
.
3.1.2 Visualization
activities
Visual rendering activities include the filtering, attributing, and
time scale zooming of the program parts.
3.1.2.1 Filtering
The menu item View/Filter serves
to diminish the program parts to be shown.
You can select the interesting program parts by the associated toggle
marks. The selection is will be actualized by selecting the “Show
changes” item, as Figure_45 shows. Please
note,
that “delta_m” –not visible in Figure_45-
has been selected too.
The operation Filtering can be regarded as a kind of “vertical
zooming”.
Figure 45
The result of selection is shown in Figure_46:
Figure 46
3.1.2.2 Change
state/statistics
Selecting the statistical regime instead of the default settings
informing about the time dependent states of the program parts a
color coded statistics of the occurrence frequency of distinguished
event types will be retrieved:
This operation can be started by selecting the menu item
Info/Statistics/Event. See Figure_47:
Figure 47
The result can be seen in Figure_48:
Figure 48
You can restore the original settings by selecting the menu
item Info/Statistics/Communication
(Figure_49)
3.1.2.3 Sorting
the
program parts vertically
You can change the order of the appearance of the program parts along
the vertical axis.Figure_49 shows the
path to the selection of the proper menu item from the list
Info/Sort/{Sort by communication
|Sort by name| Sort by hostname}
Figure 49
Figure_50 shows the new image:
Figure 50
3.1.2.4 Zooming
in
the time scale
One of the most important ways of the investigation of events is
the zooming facility in the time scale. The zooming works a stack
like way and does not use special buttons of the window but the
just the mouse buttons. The rules of selection are very simple:
- Left click of the mouse around the horizontal time line defines a
range delimiter. Releasing the dragged mouse in a new
position defines the other time range delimiter. The window will be
refreshed automatically blowing up the whole image in the ratio
of
the length of the original range divided by the length of the new one.
- This activity can be repeated in any steps or the
previous range selection revoked by the clicking of the right mouse
button.
The Figure_51 shows the state immediately
after the range selection (the little horizontal line toward the right
side of the calibrated time scale), and Figure_52
the state after the execution of the zoom instruction.
Figure 51
Figure 52
Any zoomed image (
Figure_52,
Figure_53)
contains an active ruler . With the help of it whole original
range time range can be swept over. However this operation can be
prohibitively slow: As it was discussed in
2.3, the desktop part of the Prove
program must send a request to the
Portal_server
for a new image which will be downloaded with a delay depending on the
network. Therefore the sweeping will not be as smooth as it would be,
in
the case of traditional local program.
The Figure_53 shows the image after a
repeated
zoom.
Figure 53
VI
Multi-GRID support
In P-GRADE Portal from version 2.1 users can execute their applications
in
several
Grids, each of which may consist of one or more Virtual
organizations, (VOs).
If a Grid consists of several VOs the user should have a certificate
for the Grid and this certificate should be registered to those VOs the
user would like to access.
For each of these
VO-s the user has to have a valid certificate, which
will be used
for authenticating
the user at the resources of that particular VO.
To use this multi-GRID
support the following steps have to be taken
- The
portal administrator has to set up the list of VOs, and may
define a set of default resources. These resources appear on
demand
in the resource list of every common user.
- Each
user can then setup his own resource list for this VO
- The
jobs of the workflow can then be allocated to any resource of any
VO, so different jobs of the very same workflow can be executed
on
different resources belonging to different VOs of even
different
Grids
- Before
execution the user has to download a short term proxy certificate
for each VO
involved in the workflow.
Important
notice for EGEE users:
The Portal ensures a multi-
Grid, multi VO support independently from the underlying
infrastructures.
However certain grids may
impose
restrictions:
EGEE restriction:
A VO
defined by the user when selects a Virtual
Organization with Broker support may be in contradiction with the VO permission
of the resource selectable by that Broker.
This unpleasant situation may only
occur if two conditions fulfill:
- The user has registered to more than one VO
(The EGEE declares that a user should be a member of just one
single VO. However EGEE does not prohibits multiple membership)
- The sets of resources
belonging to the mentioned different VO-s include a common site
"S" and the broker selects just this site to execute the user's job.
Let's
see the situation detailed:
- The user already a member of VO1 registers to VO2. As the site
"S"
also belongs to the VO2 in the Grid map File of site
"S" the user will be mapped as VO2 member.
- The user submits a job to a VO1 broker accepting him/her as a VO1
user and making the proper VO1 setting in the JDL description.(Figure 10.2)
- The local security system on site "S" finds a VO1 job and from the
delivered proxy_certificate (
including distinguished_name of the
user ) determines a contradicting VO2
membership from the mentioned Grid map File.
Let's
see all this in a bit more detail.
1.
Setting
up VOs of Grids and default resources (by portal administrator)
The Grid
and VO and the resource list
of the VO-s can be
edited
in the
Settings tab of the portal.
Only the root user has
privilege to setup
and modify
the list of Grids and VOs.
This means that he/she has to set up
at
least one Grid (or VO)
and
advisably one default resource for it.
In Figure 6.1
the
Grid configurations window can be seen
as edited by the root user.
The root user
adds a new VO by ‘Add
new’, and
delete existing
ones by ‘Delete’.
Note that in case of Grids composed of
several
VO-s the input field "Name" refers to the
VO and the input field "Grid" refers to
the Grid as hub over the several VOs.
The distinction is necessary because the resources will belong to
the VO but the information system access defined here refers to
the superimposed Grid.
Shortly speaking the string defined as "Grid" may appear only in the
top of hierarchy when the user selects a Grid as the root
for
information retrieval (see The
Information system).
If we want to define any VO -for example -
"HUNGRID" - of the
"EGEE" Grid then together with the VO "HUNGRID" we may define the
access to the whole "EGEE" Grid.
Having defined the HUNGRID as part of the EGEE grid
the whole information system of the EGEE Grid becomes visible (See Figure 7.6)
In cases when the Grid is not really subdivided by VO-s the Grid
is regarded to be consisting of one VO and -similar to the multi
VO case - this name of this VOis required
as "Name".
The filling of the field "Grid" is not obligatory, and in case of the
empty input string its value will be inherited from the value of "Name"
. This suits for needs of user groups
who using simple Grids do not want to make
distinction between the idea of VO and of Grid.

Figure
6.1 Grid
configurations list window
The administrator can also setup an
information system for the
Grid of the VO if it is available. Currently the
information
systems of types MDS2 and
LCG2 are supported. The
configuration of the Information System will
then be
used by the Information System
portlet. If there is no information system
then just choose ‘N/A’.
Please note that in case of the LCG2 the information system
refers to the whole Grid and not
to just one virtual organization.
Both
for MDS2 and LCG2 the host,
port, base-dn have to be defined for
contacting the Information System. You can see this in Figure
6.2.
For the MDS2
type you also have to refer to an existing MyProxy server account (See the "login" and "password" of Figure10, where "login" of Figure 10
corresponds to "Username" of Figure
6.2).
The other fields of the
MyProxy Server account ("hostname" and "port") are referring to the
MyProxy Server itself and they are defined during the installation of
the P-GRADE Portal in
the configuration file "PGradePortal.properties".
The system will
automatically download a proxy certificate from this account, and
will use it for
authenticating itself against the source of the information when
querying the
job-manager list for the Grid.

Figure
6.2 Defining
Information System for the Grid (MDS2)
A
default resource list can also be setup by the portal administrator.
This user
interface can be reached by clicking the ‘Resources’ button in the Grid
Configuration Window (Figure 6.1) .
The resource list window can be seen in Figure
6.3.
Figure
6.3 Defining
the DEFAULT resource list for the Grids
The
portal administrator defines a default list, which will then be
available for
any of the users for setting up their own resource lists. Resources can
be added
by ‘Add’ and can be deleted by ‘Delete’. At definition the URL (for
example
"n99.hpcc.sztaki.hu"), and a Job manager (for example
"jobmanager-fork")
have to be
provided.
A special case of the VO
definition is when we define a virtual_organization with
broker support for example "hungrid_LCG_2_BROKER" in Figure 6.1
In this case no information system will be defined. For
historical
reasons the window Resources
contains in this
case just one list
element -mostly the- "default.jobmanager". It will be set by the
administrator, and it may not be altered by a common user.
This value is not used in Release 2.2.
2.
Setting up the resource list for a VO
(any regular user)
Any regular user can define his own
resource list for
each of the available Grids.
Let us compare Figure
6.1
and Figure 12a. As you can see, the users
cannot edit
the Grid list itself, they can only edit resources list by clicking the
‘Resources’ button for each
Grid.
The
resource list window for any user for a particular VO can be seen
in Figure 12b. The user can add and
delete resources just like the portal
administrator by
‘Add’ and ‘Delete’. The default resources
defined by the administrator
can be
loaded by the ‘Load default’.
If and MDS2 type information system is
defined
for the Grid than it can also provide some resource configurations,
this can be
loaded by the ‘Load resources from MDS2’
button.
3.
Allocating
the workflow (any regular user)
The workflow and its jobs can be allocated
in the Workflow Editor(WE). For any job
any VO and resource in that VO can be
set. In Figure 34a you can see the window
Workflow properties in
the WE which
can be
opened from the Workflow menu or using the Ctrl+W hotkey.
A Resource
for the jobs can also be set in the job
properties window,
which opens
by clicking on the job.
The VO (Grid) in the
job properties window can be
selected marked by the label Grid. This window can be seen
in Figure 18 .
4
Supplying
certificate for each virtual organization before
execution (any regular user)
In the multi-GRID environment users have
to provide
certificate for each virtual organization, this means that
they have to map any valid
certificate for any virtual_organization
on the resources of which
they want to execute
their
application. The whole certificate management takes place in the
Certificate
tab of the portal just like before. Right after download, users are
offered to
map the certificate for any of the Grids. This can be seen in Figure 10a .
The
click on ‘Set for Grid’ leads
to the
interface in Figure 10b . The
details of the certificate such as the issuer, subject and
timeleft
are displayed, and
the desired Grid can be selected.
By
clicking ‘OK’ in this window
the user gets back to the certificate
list, which
can be seen in Figure 6.4.
In the column named ‘Set
for Grids’ all the names of valid virtual
organizations having been associated with the respecting
certificate are encountered.
Each certificate
can be assigned to any number of the virtual organizations, but only one certificate can be set for
a given virtual
organization any
time.
Figure
6.4 The
certificate mapping window with the Grid mappings
In
this window you can also modify mappings by the ‘Set for Grid’
function, which
leads to the certificate-mapping window already seen before in Figure 10b .
VII
Information System
The P-Grade
portal can handle the available Grid
dependent
information systems.
Two kind of
information systems are recognized in the P-Grade Portal:
the MDS-2 and the LCG-2 Information system.
Configuring
a Grid access (including specifying an information system for a grid)
is a task
of the administrator of the portal. See Setting
up VOs of Grids and default resources
1.
MDS-2
information system
The MDS-2
information system of the portal has two functions: one is getting the
list of
resources available in the Grid; the other is getting detailed
information
about individual resources.
1.1 View
of available resources in the Grid
When
the user clicks on the Information system
tab then the MDS
Monitor label the MDS Monitor
module of the portal is
activated by default. There are two modules under the tab "Information
System" the MDS Monitor and the LCG
Monitor. In case
of a subsequent selection of Information
system the last visited module will be activated.
If
the administrator of the portal has not yet specified a grid with MDS-2
information system, the following message can be seen in the portal
window (see
Figure 7.1).
Figure 7.1
If
one or more Grids with MDS-2 information systems have already been
defined
in the
portal the following screen
( Figure 7.2 ) can be seen after the
selection of the MDS Monitor label.
Figure
7.2
The
user can select a Grid to see the
available resources using the combo box which
is in the upper left part of the portal window.
Having selected a grid the user
must click on the View button right
next to the grid combo box to see the available resources.
If
the server (called as a GIIS server) or the service running on that
server from
where the portal gets this information is not
available the following
message
can be seen (see Figure 7.3).
Figure 7.3
1.2
View of detailed information about a
resource
If
the user would like to get detailed information about a resource he
should
click on the appropriate resource in the resource list (see Figure 7.2). The page
with the detailed information about a resource can be seen in Figure 7.4.
Figure 7.4
Figure 7.4 shows that the detailed information on every resource
provided
by MDS-2 can be divided into a static and a dynamic part.
If
any information (e.g.: CPU Model in Figure 7.4)
is not available from the MDS at
that moment the Not Available (N/A) text is displayed for that
attributes.
2.
LCG-2 information system
Introduction
To understand the relations what kind
of information is gained by the EGEE information system infrastructure
please see the following chart, where a
site
is a name for a collection of resources which are geographically
and by oragnization closely related.
Sites are generally clusters and compose the hardware infrastructure
for
computing elements and
storage elements.
The figure shows that the resources of a given site may be shared among
several
Virtual Organizations
more precisely among the
computing
elements and
storage elements of
the VO-s.
It can be seen that the separate
BDII servers
which are collecting information and are associated to different
Virtual Organizations "see"
diffrent views and fractions of the same Grid.
A BDII server may "show" even such sites to which the own VO is not
associated. With the Example above the
BDII
c "sees"
Site k
The BDII servers work in "pull" regime and have a general refresh
rate of 2 minutes. However the accuracy of data in a
distributed system is not guarantied.
In the P_GRADE Portal there are two possible queries of the VO
dependent BDII servers, where the user must
know that
Select Grid
list box (See figures
7.5 7.6 ) selects
just a BDII server associated to a VO:
Note:
The name
"Grid" in this command is based upon the circumstances that
generally the BDII servers - at least in the case of the IGEE
federation - encounters almost all sites of "foreign" VO-s
as well. See the example of the preceding paragraph BDII of VO c and Site k. However this working is not
guarantied. Therefore -as a rule of thumb-, please use that BDII server
in the Select Grid list box
which is belongs to the VO you will be interested setting Select VO :
- If the user selects the option All of the list box Select VO (See Figure 7.6) then -following the logic of the BDII
server - all sites observed by the BDII are
selected and the true values of the dynamic
load of the sites (number of running and waiting jobs) will
be displayed.
- If the user selects a dedicated VO of the list box Select VO (See Figures 7.7 7.8) then the
BDII server will return only the those sites associated to the
requested VO and what's more the displayed dynamic load values refer
only the jobs have been submitted under the "flag" of the requisted VO
therefore the no sound consequences can be drown about the
full dynamic load of sites to be questioned.
The suggested usage of the Select VO command is the following:
First select a dedicated VO to find all the sites of
requested VO, and select
All
to see the realistic load of a dedicated site afterwards.
2.1 View
of available sites in a Grid
When
the user clicks on the
label LCG Monitor of the tab of Information
system then the LCG Monitor
module will be
activated.
If
the administrator of the portal has not specified a grid with LCG
information system yet, the following message can be seen in the
portal
window (see
Figure 7.5).

Figure
7.5
If
one or more grids with LCG information systems have already been
defined
in the
portal the following screen ( Figure 7.6 )
can be seen after the user clicks on the LCG Monitor
label.

Figure 7.6
The
user can select a grid for the available sites using the combo box
which can be found toward the upper part of the portal window. After
selecting
a grid the user must click on the View
button right next to the grid combo box to see the available sites. By
default
the sites belong to the first grid in the grid list is displayed in
this page.
Each
site in the LCG type grid is built up from Computing_Element (CE) and
Storage
Elements (SE).
More precisely the site is a rather geographic idea.
There can be one ore more clusters inside of a site.
A cluster can be feed by one or more queue called Computing_Element.
In
the site’s list page the basic information about CE-s and SE-s
can be seen. The
information for each site by default is the aggregation of all
the CE and SE resources can be found at the respective site.
If
the server (called as a BDII server) or the service running on that
server from
where the portal gets this information is not available the
message "Cannot contact the BDII server"
can be seen.
2.1.1
Selecting a Virtual Organization
The users of
LCG type grids must belong to one or more virtual organization (VO). The CE’s
and
the SE’s are associated
to VO-s as well. The CE-s may belong to more than one VO. This
means that if a CE or SE associated to a VO only those users who belong
to the corresponding VO can access
these resources.
The
user can filter the sites associated to a specified VO by
the combo box can be
found under the grid combo box in the upper part of the portal window (Figure
7.7). See bug report

Figure 7.7
After
clicking the View button right next
to the combo box the sites that belong to the selected VO can be seen (Figure 7.8).

Figure 7.8
Selecting
a specified VO means the following:
-
The user can see the list of those sites which belong to the selected
VO .
-
When the user clicks on a site name
the detailed
information
will display only those
CE’s and SE’s which belong to the selected VO
Important remark -
see bug report B.1 while
interpreting the value of columns
Total Free Running Waiting
2.2
View of detailed information about a site of a Grid
If
the user would like to get detailed information about a site he should
click on
the appropriate name of the site in the site list (see Figure 7.6). The page with
the detailed information about a resource can be seen in Figure 7.9.

Figure 7.9
As can
be seen in this figure the selected VO is All. This means that
all CE-s
and SE-s have been found at that site are displayed.
If
the user select a VO in the site list page only those CE’s and SE’s
will be
displayed in the detailed view which are belong to a selected VO.
As
can be seen in the Figure 7.10
reflecting the site IFCA-LCG-2 with VO dteam only limited number of CE and
SE is displayed.

Figure 7.10
VIII
Handling of remote files
1 General aspects
of remote files
The P-GRADE Portal supports the
handling of remote files.
Remote
is a place within a given
virtual_organization which is
different
from the
local file system of the
user's desktop and its access is controlled by the grid
certificates.
Since the version 2.1 of the P-GRADE Portal
input
files can be sent to a
job not
only from the
local file system of the user's desk top but from
trusted remote places as well.
In a similar way the
output files of
a job can be sent into remote storage places as well.
The next figure explains the differences between the handling of local
and
remote files:
Figure
8.1
Life cycles of local and remote files
- Please note that the remote
input_files referenced in the
graph description of a workflow are not uploaded together with the local input files when the user -
subsequent the editing phase - uploads the workflow to the P-GRADE Portal Server ( in Step 1
of the lifecycle of the workflow ).
- When an input file is needed at the
site of the Executing Resource
for the submission of Job i
(Step i2) the file will be
copied
from the P-GRADE Portal Server
if it has been originated as a local
file of the User Desktop machine and it will be copied from the remote Location if it has been
defined as a remote file.
- In a similar way after termination of the Job i (Step i3 ) a generated output file will be copied to the P-Grade P Portal Server if it has
been defined to be local (i.e.
downloadable on the User Desktop) and it will be copied to the
respecting remote site in the
other case.
- After termination the whole workflow (Step 4) the compressed
bundle of the local output
files can be downloaded to the User Desktop machine.
On the other hand
the eventual communication between the remote site(s) and the
desktop machine is out of the scope of the P-GRADE
Portal.
2.Different kinds
of remote file usage
Remote files can be handled by several
protocols, stored by different means and can be referenced at
several levels in a Grid (and VO) dependent way.
There are two basically different ways to use remote files from the
point of view
of the user:
1. Low level usage supported by the Globus middleware.
2. High level usage generally supported by the EGEE
infrastructure.[
4]
2.1 Low level usage (Globus)
2.1.1 Protocol
To access a file on a remote place a transfer protocol is needed,
which
is explicitly or implicitly part of the URL describing the
location of the file.
Mostly the protocol
gsiftp
will be used
i.e. in this case the user will be identified against the remote host
by
the actual certificate.
2.1.2 File reference
The file will be referenced by the URL consists of the concatenation
of host name and the storage path of the file on that host.
2.1.3
File Storage
The remote files are stored as common files of a host and there is
a special file, the GridMap file of entries containing the so
called
distinguished name part
of the user
certificates
associated to a user account known on that system. So the system can
control the access permission of file
operations. The GridMap file is maintained by the local administrator
of
that host.
2.1.4
Example
The system will use this information
in arguments of the automatically generated
globus-url-copy instructions.
Figure
8.2
Low level access to a remote input file
2.2
High level usage (only within the EGEE with broker support)
In this chapter only the most important remote file related
features of the LCG like
grids (for example EGEE) are covered.
2.2.1 Protocol
The protocol is of low importance as the
JDL
job submission system and the joined internal services of the P-GRADE
Portal hide the protocol from the user.
In that case the job submission is performed by the Broker
support. See
Connection
to the
EGEE Grids and the usage of the Broker)
2.2.2 File reference
The high level remote files can be referenced within the P-GRADE Portal
by symbolical names directed to File Catalogues.
File Catalogues map the symbolical names to Grid File-s.
Grid files are not modifiable (after creation), may exists in several
replicas connected
by a common grid wide unique identifier "guuid" and the replicas are
stored in
Storage Elements.
There are more standards of File Catalogues. The actual type of the
File
Catalogue is defined by the administrator of the respective
virtual_organization.
A reference to a file catalogue - a symbolical name - begins with
the prefix "
lfn:" (abbreviation
of logical file name) but the syntax following this
prefix is different depending on the type of the File Catalogue:
Two type of File catalogues has been tested:
- LFC used for example in the
VO hungrid, seegrid, gilda and voce
- RMC used former in the VO voce.
In both cases the user is emphatically
suggested to define the environment variable "LCG_GFAL_INFOSYS" as the
catalogues are accessible via the information system.
This environment variable is mostly defined by the system
administrator of
the UIF machine i.e. on the same machine where the P-GRADE Portal
server runs.
However, it is possible that the working nodes (CE -s) where the
actual jobs run miss this setting. In that case the operations relating
remote files will fail.
The user should put this setting manually in the
JDL
part of the Job Properties window. See
Figure 10.8
The value of this setting may differ in different VO-s. Please check it
in the UIF machine with the instruction
set
| grep
LCG_GFAL_INFOSYS
Typical values are at the time of writing of this manual:
lcg-bdii.cern.ch:2170
for the VO voce
bdii.phy.bg.ac.yu:2170
for the VO seegrid
2.2.2.1 LFC file catalogue
The
file name here has a fix hierarchical form:
/grid/<VO>/<Username>/[<LFC_Catalog_Directory_Name>/]...<fileName>
where the
<LFC_Catalog_Directory_Name>-s must refer existing catalogue
directories having been defined by proper LFC commands[
4].
See
Figure 8.3 as example.
In connection with the usage
of LFC catalogue the special setting of two environment variables
is required:
- LCG_CATLOG_TYPE
must
be set as "lfc" and
-
LFC_HOST
defines
the URL of the LFC catalogue
It is very important that it
is the responsibility of the user to set these environmental variables
properly in the JDL description. (See
Figure 10.8 )
2.2.2.2
RMC file catalogue
In this case
the name is not hierarchical, but a plain string. For example:
MyTestFile_25_Nov_2005
No user setting of
environment variables is required
2.2.3
File Storage
In the
EGEE the remote (grid) files are stored in so called Storage
Elements. Local administrators of the sites
belonging to common virtual
organization
may have
different policy about usage of the local Storage Element.
The user
can instruct the system within the P-GRADE Portal to store the
generated output file on a certain Storage
Element.
This is a
possibility of the JDL description modifiable by the Workflow_Editor. See the
input field Output SE in the Figure 10.6
The user can
explore the available Storage Elements by two ways:
2.2.4.
Example
Figure 8.3
High level file definition used by the LFC catalogue
.
IX User
quotas
For the safety of the overall operation
of the
Portal_server the
Release 2 of the P-GRADE Portal introduces the term of and manages the
administration of user quotas.
User quota is a
predefined amount of the storage resources available for a User on the
host machine acting as the server of the
P-GRADE Portal
(See "Portal server" on
Figure 2) .
The amount of the user quota (defined in MB) is set by the system
administrator of the P-GRADE Portal centrally:
The administrator can set different amount of storage for each user and
can reset
it
at any time.
See the
pane Quota per portal user on tab
Settings which defines a common
default value
and the
pane User Quota listing the users
with their quota limits
where the administrator can define
individual values:
Please
remember that this pane is visible and editable only by the
administrator (user
root).
Figure 9.1
Note:
In
the eventual (and possibly improbable case) when user quota becomes
exhausted as a consequence of the activity of the administrator
who has decreased the quota, the
user
will get the same warning messages as if he/she would have stepped
over the limit.
No user data will be lost but the
user will be forced to take measures to free
enough places.
The quota is the highest
amount of the valuable
common storage resource which can be allocated by a user directly or
indirectly:
- Direct usage - where the user has direct control over the amount
-
may involve the files
(input files, code of
executables and graph description) uploaded in the saving process
of workflows and
the output files having been generated
as the result of the runs of the workflows.
- Indirect usage may involve the trace
files generated during job level monitoring.
The
quota management does not guarantee the availability of the defined
amount.
The only
purpose of the
quota management is the prohibition of excessive usage and/or of
malevolent exhausting of the common storage resources.
Shortly speaking it defends first of all the system against the user,
but not the user against the system.
If the quota is exhausted the
user receives a proper warning
message.
Suggested user actions:
- The simplest thing the user can do is to delete the obsolete
workflows from the server (See button Delete
on Figure 38 ) or the user can use the Workflow Archive Service
to save the workflows on the desktop machine , or
clear parts of them.
- If the workflow is important the user may experiment by
substituting the referenced input_files
by shorter ones and/or performing runs supplying shorter or no output files.
- A tricky method is to start and immediately abort a workflow: in
this case the residues of the previous runs are deleted by
the system.
X Connection to the
EGEE Grids and the
usage of the Broker
1. General
rules to submit individual jobs of a workflow by the Broker of
the EGEE
Since the Release 2.2 of the
P-GRADE portal the user can submit one or more jobs of a workflow with
broker support into an
EGEE like Grid
[4].
However this freedom is coupled with the installation
restriction
that the
Portal
Server ( see
Figure 2)
must be set up on a
so called "UIF machine" belonging to the EGEE like Grid
to be reached.
The main differences in the usage between a traditional low level
Globus
Grid and an EGEE like Grid from the point of view of user are the
followings:
- The user lets the Broker service of the EGEE like Grid to
choose
the
optimal resource - in this case - the Computing
Element where the given job should be submitted.
- The input of the Broker is the parameter set corresponding to
the
rules of the Job Description Language (JDL) [3].
The JDL description defines the job and
may store user hints in order to select an optimal
resource.(See Figure 10.9 showing a restriction on
the host of the required resource)
The JDL description can be edited by the user. However in the current
implementation of the P-GRADE Portal
large parts of the JDL description are inherited from the proper parts
of the job description, and these entries
can be altered only by changing the corresponding parts with the Workflow_Editor.
See Figure 10.3 10.5
10.6
where the corresponding windows are shown together:
On the left sides
you see the primarily editable windows of Port properties, on the
right hand side you find their mappings in the JDL description.
In a similar way the proper editable values of Job
Properties ( Figure_10.1) will be mapped into the JDL tabs Job (Figure
10.2) and Sandbox (Figure_10.3)
- The user can use the services of the classical Storage Elements to
use remote files.
The remote files must be referenced by logical names (See Chapter VIII 2.2.2_File_reference)
As the remote
files in this case are Grid files, and there is no expectation against
the user to provide such special executable code which is able to read
Grid
files directly, therefore - in the default case, when the flag managed_copy is set - the applied
connector infrastructure of the
P-GRADE Portal copies
the referenced input
remote
files
as temporary local files on the
working node where the job actually runs.
With this method the
executables are much more portable and it is easy to create and test
them in a local environment.
However with the unsetting of the flag managed
copy the user is able to indicate that the
supplied
code will read the Grid files directly and therefore
the resource consuming copy step can be skipped in given
cases. The port properties
part of the Figures 10.5
and 10.6 demonstrate the usage
of Grid files. Please note that the logical file names are
assigned by the prefix lfn:
The system recognizes
a
virtual
organization with broker support if two
conditions for the
Name
defined in the window "GRID configurations" (See
Figure 6.1)
and selected as
Grid in the
window "Job properties" (
Figure 10.1)
are
uphold:
- The prefix of the name must be the name of a virtual_organization. This value will
be copied to the JDL description automatically . See Figure 10.2
- The postfix of the name should be "_LCG2_BROKER"
In this case the button
JDL
Editor... of the Job Properties windows becomes sensitive. (See
Figure 10.1) and the
Resource information has no
significance.
For a more detailed usage of the JDL language please consult with
[3]
2. JDL Editor details
2.1 Opening the JDL Editor
Figure 10.1
2.2
Setting retry count
Figure 10.2
In this window only
Retry count (the
highest number of repetitions in case of eventual errors) can be
defined.
In this and in all subsequent tabs of the JDL Editor the button
View opens a different window
to show the whole JDL file to be generated.
2.3 Checking the Sandbox
Figure 10.3
Local files of the
ports and the executable of the
job
are copied in the proper Sandboxes.
Please observe
the proper mapping of
Internal File
Name from the left hand side of Figure 10.3
and from
Executable
of the Job Properties window (
Figure_10.1)
to the right
hand
side of Figure 10.3
Several system files (an envelop shall,
info.tar.gz,
x509up... ) are needed
to copy the eventual
remote
input files to the executing machine, and to start the
executable of the job.
Please remember that brokering and the mentioning of the eventual
remote input files
in the tab
Input Data of
JDL (See
Figure 10.5)
does not ensure in itself the access to the
remote
input files
from the
executable program in the working node of the
CE
therefore the implemented automatic copy
mechanism of the P-GRADE Portal infrastructure is
used
(See
Remote_input_file_handling)
2.4 Setting
Ranks&Requirements
Figure 10.4
The fields of Rank and Requirements
can be filled according to the rules of the JDL. It is
free text from the point of view of the portal server and the checking
of the syntax will be done by the broker and the eventual errors will
be returned in the standard Error Output channel run time.
2.5 Checking Input Data
Figure 10.5
2.6 Setting optional Storage Element
in Output Data
Figure 10.6
If
the job has a proper remote
output reference then the system will deliver it automatically to
the proper destination.
The user can define a destination
Storage_Elements
in the text field of
Output
SE: In the absence of this definition a default "near" one will
be
used.
2.7 Setting the
Environment Variables eventually needed on the Working Nodes of the
Computing Element
Figure 10.7
The next window shows a typical
setting to reach lfc catalogue on the worker node:
Figure 10.8
2.8 Example of "misuse" :
Direct a job to a dedicated site
Figure 10.9
2.9 Important notice
to MPI submission
Because of a well known problem of the
LCG information system the MPI submission for the time being needs the
following user entered requirement extension of in
the tab Rank&Requirement
of the JDL:
(other.GlueCEInfoLRMSType
== "PBS") || (other.GlueCEInfoLRMSType == "LSF")
XI Rescuing the workflow
The
execution of a workflow may
fail for many reasons. In general, however, this means that some part
of the
workflow had completed already and only the left part has to be
executed for
the completeness of the workflow. In such cases it saves time and CPU
time if
the user can examine what might have gone wrong, do modifications, such
as
reallocating the failed job to a proper resource, and then resubmit the
non-finished jobs of the workflow. This mechanism is supported in
P-GRADE Portal
from Release 2.2 and is called rescuing. Currently before rescuing a
workflow
the user can modify the resources of a job in the Workflow Editor or
can adjust the certificate belonging the resource in the
Certificates tab of the portal.
The general assumption is that the code our workflow is tested, and the
genuine input files and especially the eventual remote input
files do not change during the period the error
is detected and the failed jobs are restarted. Shortly speaking
Rescuing may help to overcome difficulties having arisen due to
broken resources and invalid certificates.
Please
read the next step-by-step
guide for getting familiar with the Rescue function as a portal user.
-
Workflow status: rescue

Figure
11.1
The submitted job "Count3" of the workflow
"demo-RESCUE" has failed for some reason, and the
workflow
status has changed for rescue,
which means that the user may modify the
workflow
and then may attempt to let it run further by pressing the button Rescue.
Please note that the execution of the workflow will stop only then when
there is no more independent job to be executed.
-
Read the log for possible reasons
Figure
11.2
The user reads
the error log belonging to the failed job and identifies the
authentication
problem at the given resource. He decides to launch the Workflow Editor
in
which he can reallocate his job to a working resource, see this in the
following step.
-
Modify the workflow:
reallocating the failed job
Figure 11.3
The user
reaches the workflow (by button Attach Figure 11.1 )which is now in Rescue mode (
stopped job painted blue). He opens
up the job
properties window
for the problematic Count3 job:
Figure 11.4
Then the user
changes the resource
in the
window job properties to a
properly working one.
Figure 11.5
Finally, in the Workflow menu the user
saves his
modification with the menu item Save
resources, which stores his modification
on the server side.
-
Rescuing the workflow
Figure 11.6
In
the window
Workflow Manager
the "continue button" Rescue in
this state
is appearing . Clicking the button Rescue the previously failed
job "Count3" starts running on the new
resource.
The already finished jobs Count1 and Count2 will not
be resubmitted!
Figure 11.7
-
Workflow finished
Figure 11.8
With
modifying the resource the user could
Rescue his workflow, which then successfully completed only by
executing the
non-finished jobs and preserving the results of the finished jobs from
the
first attempt.
XII. Welcome Menu
Since the Release
2.2 of the P-GRADE Portal a new Welcome portlet greats the user
logged
in .
In this menu the user can customize the portal and can alter own
role, personal data, and first of all the original password
received from the system administrator.
Figure 12.1 Welcome menu
XIII Workflow archive
service
An existing workflow can be saved from
the Workflow Manager list
of the
Portal Server
and stored in the
local file system belonging to the user's
Desktop Machine and can be
uploaded from there in the reveres order subsequently. See
Figure 2 (arrows
Workflow/Storage/Download Workflow/Upload) for overview
and
Figure 13.1 for the actual usage:
Figure 13.1
1 Saving the definition of a workflow and clearing the
temporary parts:
Clicking on the operation
Storage (
Figure 13.1) opens the storage list
showing the workflows can be saved:

Figure 13.2
Three parts of a workflow can be handled
independently:
- Under column Workflow the definition
part of a workflow is accessible.
Download selects a workflow
and opens the Download
Manager of the browser, by which the user can define a
destination in the local file system in order to download
the
definition of the selected workflow in form of a compressed file.
The saved workflow can be retrieved later from the local file
system
(See paragraph 2.
Uploading
the definition of a workflow to modify / resubmit or uploading the
content of a trace fie for visualization:)
Please note that the workflow is saved in its current state i.e.
with its eventual temporary files.
If you do not need this please apply set
init:
set init is an auxiliary
operation to discard the temporal files have been generated during
eventual previous workflow submissions.
Both cases -with an without set init
- may have own merits:
Saving the workflow in the state as it was facilitates the
subsequent investigation of a spoiled run by an expert (For
example to discriminate user, portal and Grid related errors in
complicated cases)
Saving the workflow bringing it to the
init
state minimizes the information needed to save the definition of
the workflow. This option will be suggested if the user wants to
migrate the workflow to a different user, to a different portal,
or wants to save it intending to resubmit or edit it in the future.
- The operations under column Trace are
optional and depending on the existence of the trace file .
As trace files may be of substantial size they can be Downloaded or Deleted separately.
- Under the column Output
there is no Download option as
this functionality is available under the
Workflow/Workflow Manager
tag. Here only the output of a workflow can be Deleted from the Portal server
machine.
Please
note that in the forth column ALL the button Delete
is
visible only if the workflow is inactive i.e. the workflow
is
not in the Workflow
/ Workflow Manager list
2. Uploading the
definition of a workflow to modify /
resubmit or uploading the content of a trace fie
for visualization:
Clicking on the operation
Upload
(
Figure 13.1)
opens the set of file browsers to define the paths of the saved
files in the user's desktop environment
to be uploaded in the Portal Server:
Figure 13.3
The input field of
Workflow archive must refer to one
of the compressed files have been previously stored by the
Storage/ Workflow Download
operation. (See paragraph
1_Saving_the_definition_of_a_workflow)
Demo Workflows are
prefabricated
example/test applications to be uploaded. See more detailed in
the
next section
Important notice:
The result of the successful
Upload
from a
Workflow
archive operation will not be visible immediately in the
Workflow
/ Workflow Manger list.
However it appears both in the
Storage
list, and in the
Open list of
the
Workflow_Editor.
Therefore user following the successful
Upload should
- enter
the Workflow Editor in tab Workflow
/ Workflow Manger (Figure 13),
- Open the
workflow list in the Workflow Editor, select requested Workflow
- Save it on
the server (Figure 32)
- (and hit the Refresh
button on the Workflow
/ Workflow Manger tab)
(See the arrows
EDITOR/Open ,
EDITOR/Save|Upload of
Figure 2 for overview
)
3. Uploading of
the demo applications
The
Demo
Workflows section of
Figure 13.3
shows the available prefabricated demo applications.
These generally test the P-GRADE Portal
and the current environment (certificates, settings and the Grid).
The names and numbers of the displayed
test applications may be different from that shown by Figure 13.3 , and
they may be reset by the portal Administrator.
The user can either select one application (by the
radio button confirmed by
OK button
) or all the available Demo Workflow applications (by the
Upload all button).
The selected applications will appear in the Workflow
Manager
list just after the user manually modified them by the Workflow
Editor.
However it is not guaranteed that the
application will be associated with the proper
resources, and can be submitted imediately.
The inexperieneced Portal user is suggested to follow the next
steps:
- Select an application by the radio button and confirm
the
upload with OK.
- Control the success of Upload reading the Message line
- Control the existence of valid proxy certificate in the Certificate tab
- Control the existence of required Grids/ resources in the Setting tab
- Control the association of Grids to the selected valid
proxy certificate in the Certificate
tab
- Select the tab Workflow/
Workflow Manager
- Select the button Workflow
Editor
- Use the menu item open
in the WE window toaccess and download the demo
application.
- In the appearing WE graph open each
job of the application:
Select one of the resources has been defined/checked in (4) , and
conform the changes by OK
- Save the workflow by SaveAs..
- Submit the workflow
with the proper button of the tab Workflow
/ Workflow Manager
3.1 The Equation
Solver application
This application solves the n (in our example 5)
dimensional equation system
A*
x =
B
See details here
[5]
The Figure 13.3 contains four versions of the the
common workflow prepared for two different virtual
organisations, and discriminating in each the direct
(static) and dynamic (Broker associated) resource
reservations.
The expected results of
x
(approximations of the vector [1,2,3,4,5] ) can be read out the
simplest way by hitting the
Out
button of column
Logs
belonging to the line of Job
Multip_B
in the detailed view of the the submitted workflow
within the Workflow Manager portlet.
XIV Parameter Study -
Mass Workflow
Processing
1 Introduction
One of the most frequent users
favored ways
of exploiting the services of a computational grid is
when the user wants to solve such problems where sets of inputs
must be applied to a single algorithm.
The name of the scenario is Parameter
Study when
- the algorithm is independent from the input - i.e. the same
code represented by the algorithm can be applied to any member of
the input set-, and
- the outputs -equal in cardinality with the input set - will
be evaluated/elaborated in a later phase (eventually by a differenet
algorithm)
The inputs of these generally exploring/searching
tasks need not be the members of a
single set representing one
of the possible characteristics of a feature but of several
sets with different kind of features as well . In this case t all
combinations of actual characteristics of different
features must be studied.
For example, if we have two independent features
, set1 and
set2 where the members of
set1 are {
c11,
c12 ,
c13 } and the members of
set2 are {
c21 ,
c22 } then combination of
possible actual charcteristics compose a new set { {
c11,
c21},
{
c11,
c22}, {
c12,
c21},
{
c12,
c22},{
c13,
c21},
{
c13,
c22}}
having the cardinality computed by the multplication
of the cardinality of the base sets (
Descartes
product), in our
case 3*2 = 6 .
The members of this combination must be applied one by
one to the algorithm which in our case yields 6 independent runs
each with two parametrized input values.
We will use the term
PS
Set
(or Parameter Study Set ) for each of the independent feature
sets.
2. Basic principles
The Workflows created with the
help of the P_GRADE
Portals are ideally suitable to serve as the representation of
the mentioned algorithm because the load of the executions can be
distributed in the Grid. The simplest way we regard a tested
P-GRADE workflow a black box and "pump" in it the members
of the combined inputs. To do that efficiently the user must be
careful to submit the jobs belonging to the parametric
workflows
with the assistance of the Broker whenever it is possible.
The workflows defined together with their
PS_Set(s)
are called as Parameter Study Workflows or
PS Workflows
The subseqeent
Chapter 3
contains the properties of the basic parameter study introducing
the idea of the
PS Input Port.
Chapter 4 deals
with the advanced parameter study introducing the term of two
specialized job types
Generator
and
Collector and the term of
two new Port types associated to the new jobs:
PS Output Port and
Collector PS Input Port.
3. Basic Parameter Study:
Implementation
3.1 Preparation
of parameters and results
As the black box principle is used it is only the interface - the
definition of the parameterized input files and the
placement of the output of an executed workflow - which must be
defined slightly differently compared to a "normal"
workflow.
As the cardinality and size of inputs (and of outputs) may
by considerable the implementation decision was that these files
must be stored remotely, for example, in
Storage Elements if
an EGEE like VO is involved.
Hence, the obligatory convention is that each independent feature set
-
PS Set -
must be represented as a sub
directory (
PS Subdirectory) within
the scope of the
selected remote storage system, and the files found in these sub
directories are regarded to be the members of the respecting
PS_Set.
Similar convention is valid for the
results.
They will be dropped in a
user defined sub directory as independent compressed files and they
will be
identified by an automatically generated file name extension containing
the indices of current member(s) of the PS Set(s)
determining the run of the workflow involved.
It follows that it is the repsponsibility of the user, that:
- The input sub directories
(often represented by Grid
File catalogues) should exist before the submission of the
Parameter Study;
- The input member files should be
existing before the
submission of the Parameter Study; and they may not be changed in
number
during the whole elaboration process not to spoil the indexing
system controlling the elaboration process.
- The members of the input files
of a PS Set should be
identical in structure because they will be elaborated by a common
code.
- A PS_Subdirectory
must contain neither any other files for different purposes nor
directory entries.
To preserve resources all parts
of a successfully
elaborated member workflow of
a PS must be cleared from the P-GRADE Portal server. Consequently -not
to loss information - the result of a single workflow
- containing only
the Permanent Local Output files in case of "simple" workflows - must be
extended by the eventual log messages
of
jobs ( and of the execution engine) had been directed either to
the
standard output or to the standard error channels.
3.1.1 Input connection
A
PS_Set can be connected to any job
by the
modification of a "common" input
Port into
the so called
PS Input Port.
The
PS Input Port feature can be selected by
the
toggle "
Switch to PS"
accessible by the right click on the icon of the input Port (See
Figure 14.1)
Note that the PS Input Port is indicated by dark green color.
Figure 14.1 Selecting input Port as PS Port
The definition of a
PS_Input Port differs from
the
definition of a
common input
port only in that
respect, that in the former
case the sub directory of the remote files
representing the
PS_set must be
defined in the field
Directory
instead of the input field
File
of the later.
(See
Figure 14.2)
Figure 14.2 PS Input Port definition
It must be obvious to the reader up
to now that during the
elaboration of the Parameter Study the subsequent input
file within the
PS_Subdirectory will be
copied as the
Internal
File Name states to the
local working directory of the execution system and referenced as
such by the Open statement of the executable of the respecting
job.
The actual syntax of the field
Directory
is dependent on the
kind
of the remote file:
It can be
- one of the several File Catalogue formats if the high
level (EGEE like) remote file
handling is used (as the Figure 14.2 shows)
- or must be a common
URL if low level (Globus 2 like) remote file handling is used.
Please remember that in connection with
high level grid file
catalogues the proper environment variables must be set in
the JDL description of the respecting Job as
Figure 10.8 indicates.
3.1.2
Result connection
A new submenu item
PS properties
within the Workflow Editor (See
Figure 14.3)
opens the window where the placement of the
result must be defined (See
Figure 14.4).
Figure 14.3 PS result window selection
In this window the
Grid ( or
Virtual Organization in case on an IGEE
like grid) and within that the
Output
Directory containing the results must
be defined. See
Figure 14.4
Figure 14.4 Result container definition
The
rules governing the syntax
of the
Output
Directory are the same as were in the case of the PS Port
Directory.
The Portal tries to generate the
Output
Directory upon the input field automatically if the user
has not defined it previously.
If the directory refers to a LCG_2 like Grid catalogue then
LCG Catalog Type and the URL
of the
LFC Host must be
defined as well. (See
2.2.2_File_reference)
3.2 Submitting
and observing a Parameter Study task
Upon the existence of the
defined
PS_port(s)
the system
recognizes a saved and stored workflow as a
PS_Workflow. In the
Workflow Manager list these
workflows are distinguished
by the buttons
PS_Detailes
(See
Figure 14.5)
Figure 14.5 PS_Workflow in Workflow Manager List
Hitting
the button
Submit the one by
one execution of the members
of PS workflow will be started:
The system calculates the
Descartes
product determining the
number of total submissions, associates the proper
input files to the next member workflow item - the so called
element Workflow (or
eWorkflow)
- and tries to
submit it.
To avoid the overloading of the Portal Server and of the Grid
infrastructure there is an upper limit of
eWorkflows
which can be "living" in parallel (
eWorkflow buffer).
If an
eWorkflow terminates or fails
without hope to be
resubmitted by the manual
rescue
operation it will be
cleared from the
eWorkflow_buffer
(and from P-GRADE
Portal server at all) and the system automatically submits the next
eWorkflow
and this process continues until the whole Portal Study task
terminates or the
eWorkflow_buffer
will be filled with eWorkflows
need user interaction.
Figure 14.6 The eWorkflow Buffer
The new
Statistics bar
informs the user about the current state of the whole PS
Task:
- Total is the number of
the Descartes_product, i.e. it
is the
static number of independent eWorkflows
to be executed within the framework of the PS Task.
- Init is the number of eWorkflows waiting -in the virtual queue of the
P-GRADE Portal - for the submission.
- Submitted is the number
of
actual eWorkflows being processed and
not expecting - without eventual Abort-
user interaction.
- Error is the number
of eWorkflows failed without rescue possibility.
- Rescue is the number of eWorkflows which expect user interaction in
order to the computation can be continued.
- Finished is the number
of eWorkflows terminated properly.
From the definition above follows that these categories ( not
considering the first) mean mutually excluding states of an
eWorkflow and the equation
Total
= Init + Submitted + Error + Rescue + Finished
is hold.
Note:
A special case
occurs if the
eWorkflow is terminates
properly but the Grid
infrastructure is unable to copy the compressed file representing the
result to the
destination remote
storage. In this case the button "
Error"
appears in the column "
Log".
Hitting the button the text message containing error report about the
respecting eWorkflow(s) can be read. In this case the
respecting
eWorkflow(s) will be
accounted as "
Finished" in the
Statistics but the
eWorkflow(s)
will not be cleared from the P-GRADE portal as in normal
case. Consequently, after the termination of the whole Parameter Study
task the result(s) of the eventual remnant
eWorkflow(s) can be
downloaded from the Portal server as a single compressed file similar
to the case of a common workflow having local output
results. This circumstance will be indicated by the
traditional
green triangle in the OUTPUT column in the Workflow Manager
window.
The
button Suspend has increased
importance in case of an eWorkflow: The
probability that a job will be
assigned to a bad or overloaded resource
rapidly increases with the cardinallity of the of the PS
parametres (and with the complexity of the Workflow).
Hittng the button
Details
- following the black box principle - leads to the
traditional detailed view to the
eWorkflow
being processed:
Figure 14.7 An eWorkflow in detaled
view
There
is a slight difference between Figure 14.7 and
Figure_40a
: Some action buttons are missing here as the state of a
single
eWorkflow
can not be graphically animated in the
Workflow Editor, and the deleting is not permitted
here to prevent the user to kill inadvertently the whole
PS_Workflow task.
The
Abort instruction
refers only to the
eWorkflow.
3.3 Result Evaluation
3.3.1 Results of PS
worflows
After the succeessful run of the whole PS workflow task the
eWorkflow s can be
feched from the defined subdirectory.
Example:
The result can be listed - using the definiton of the
Figure_14.4
for the placement -
by the following IUF machine command:
lfc-ls
/grid/seegrid/hermann/EQU/OUTPUT
Ax_EQUAL_B_PS.1.zip
Ax_EQUAL_B_PS.2.zip
Ax_EQUAL_B_PS.3.zip
Ax_EQUAL_B_PS.4.zip
Ax_EQUAL_B_PS.5.zip
Ax_EQUAL_B_PS.6.zip
3.3.2 Results
of remote output files defined in PS workflows
Special consideration is needed if some of the output files of the
original Workflow are remote files. In this case no special
measure
must be taken by the user. However the user must know that in
this
case the system generates a
unique
output file for each
eWorkflow. The the
prefix of
the names for these
files will be the same as the
File
defined in the respecting port properties window, and the
postfix of the
file will be the name
of the
eWorkflow
(Workflow name + instance number) .
Example:
The job
Multp_B of of
the workflow
Ax_EQU_B_from_A_GEN_Collector
Figure_14.8
has one ouput port defining a remote file having the name
"
lfn:/grid/seegrid/hermann/PS/EQU_AGEN_11_10/Multip_B_10/out".
After the execution of the whole Parameter Study task,
where there are two PS parameters - as you can see it
by by
Tortal generated items
of
Figure_14.13
- ,
the IUF machine terminal command
lfc-ls
/grid/seegrid/hermann/PS/EQU_AGEN_11_10/Multip_B_10
will encounter the
following files:
out.Ax_EQU_B_from_A_GEN_Collector.1
out.Ax_EQU_B_from_A_GEN_Collector.2
As you see
out
is the
prefix,
out.Ax_EQU_B_from_A_GEN_Collector
is the name of the
workflow, and
1, and
2 are
the instance numbers denoting the respecting instances of the runs.
4. Advanced
Parameter Study : Generators and Collectors
Figure 14.8 Generator (Job 5) and
Collector (Job Collector)
4.1 Overview
As a Parameter Study (PS) executes the same
operation over a
(usually large) set of inputs and produces a (usually large) set of
outputs, the
obvious question arises: ”How can be inputs for a PS generated and how
can be outputs
produced by a PS evaluated?” These tasks
can
be done manually. However, for a wide class of problems support can
be given for the
user to tackle these problems in an automated way. This can be
done
by the
introduction of two new types of jobs with the following features:
The Generator job
generates a set of input files and puts them on a remote storage
represented by
a PS_Output_port. These files are used
by the subsequent Parameter
Study jobs
of the workflow. Therefore, the Generator job must have a new
kind of output port: PS
Output port. This port type is discussed later in detail.
The Collector
job
processes a set of outputs produced by the preceding Parameter Study jobs in a
single unit. Therefore collector jobs start to run after every
Parameter Study job
of the whole PS -i.e. eachl eWorkflow - has been terminated.
A
Collector job can be connected to a Parameter
Study job by a special
connection.
The source of such a connection is a remote
output port linked to the
Parameter Study job, where any job is
a Parameter Study job
within a PS workflow with the exception of
Generator(s) and
Collector(s). The destination of the connection is a special PS
Input
port called Collector PS Input
Port linked
to the Collector job. Both ports represent the same directory on a
remote Grid storage.
4.2 Overall
Semantics
The overall execution of a PS Graph is divided in
three
subsequent steps:
- If there is any
job of Generator
type it will be executed just before the step (2) and the results will
be stored
as the proper PS Output Port describes. (A consequence is that
Generator job(s) must be the root
element(s) of the directed acyclic workflow graph, i.e. no other job
can be defined as a predecessor of a
Generator job)
- The whole
Parameters Study Task
(represented by the PS jobs whose kind is neither Generator nor
Collector) will be executed for each member of the
Parameter Set.
- If there are Collector
jobs then they will be executed once over the set(s) of output files
represented by one or more remote directories. (A consequence is that
Collector job(s) must be the leaves of the directed acyclic workflow
graph, i.e. no other job can be defined as a successor of a Collector
job).
4.3 Generator
Job
detailed.
Note:
In this chapter two different applications will be used as
demonstration, the Advanced version of Ax_EQUAL_B_PS (See Figure 14.2 ) the Ax_EQU_B_PS_A_GEN (
See Figure 14.8) and
a
simple one, the A_GEN_EXAMPLE
There
are two
kinds of generators:
- The general type is characterized by
free semantics i.e. the binary executable for the component is provided
by the user – as in case of “traditional” jobs - and it
is the responsibility of the user provided the executable to generate
the set
of outputs according to the conventions determined by the PS Output
port.
- The output set will be generated by
generator codes that are part of the Portal. These generator components
can be controlled by the user. (Via key and parameter values.)
See 4.3.1_Auto_Generator
In both cases one and
only one PS
Output port defines the name, storage and delivery conventions
for the
generated
files.
Figure 14.9 PS Output port
The
PS Output port
defines 3 properties:
- Directory:
It is the subdirectory on the remote
storage where the element files
will be
stored.The naming conventions are the same as for PS Input Ports, i.e.
LFC
Grid file
catalog and Globus GridFTP URL-s are allowed.
- Internal file name:
Internal file name
is a prefix part of
name of files to be generated by the job executable. The postfix part
of the filenames
must be different for each file and must start with a dot (“.”)
separator
character.Example:If the Internal File Name
is
“OUTPUT”, the number of parameters is 2 then the names of the files
generated by a Generator component will
be { OUTPUT.1, OUTPUT.2
}
- Managed copy:
This option has significance only if the executable for the generator
is given by
the user (i.e. the Generator job is not an Auto Generator
job).If the option is turned on the system assumes that the files
will be
generated by
the generator binary executable on the worker node where the generator
runs. In
this case the system takes over the responsibility of copying the
generated files
to the remote destination defined by the PS output port Directory.
If the “managed copy” option is turned off then the user’s executable
is
responsible for generating the Grid Files and copying them into the
destination Directory on a remote storage. In
this case the naming convention for the
“Internal file name” can be overruled, as the system now is not
directly
related to the generated files, thus it is not sensitive to the file
names.
Let it be emphasized that the PS_Output Ports of
Generators
must be connected to PS_input Ports of PS jobs and the “Directory”
values of the connected ports must be the same. (One
port inherits this value from another port when the two ports are
connected
together.)
In the Workflow Editor an existing Job can be
redefined as Generator ( or Collector) if and only if there is at
leaset one defined PS Input Port in the defined Workflow:
Figure 14.10 How to define a Generator
(or Collector)
4.3.1 Auto Generator
The Auto Generator
(AG) is a special convenience job
tailored such a way that a user can create and modify a whole set of
parameter
files using a built in macro processor.
An Auto Generator job can be defined from an existing -general-
Generator Job:
Figure 14.11 How to define an Auto
Generator Job from a common Generator
The AG has the
following features
distinguishing it from other Job wrappers:
- It has no input port and just on
predefined PS Output Port
- The defined file set creation will be
executed on the Portal Server Machine
and not in a remote resource
- Its semantics is predefined by
the macro processor and can be
controlled by user parameters determining the content and number of the
files to be created.
The use of the Auto
Generator is suggested first of all in case of the legacy
applications where complicated input files may be required
with eventual
format
restrictions and/or reference such input data structures where internal
coherence of
data must
be ensured.
4.3.1.1 Formal definition
The formal description
of the generation process - assuming
ASCII files - is the following:
The
base of the generation is a template
called Input file text . The Input file text
is a arbitrary sequence of final strings
and keys.
Final strings
will be copied in the result files without any changes.
Keys must be
associated with non empty finite
sets.
The
elements of
finite sets will be encountered and
substituted within the
Input_file_text
in place of the
keys
in turn, such a way that for each substitution combination a new file
will be
created. However, within a given output file the multiple occurrence of
a
certain key will be replaced everywhere by the same value.
Let be the Input_file_text aXbYcX where X={2,3} Y={6,7} are the keys
and a,b,c
the final_strings .
The generated four
files
will contain the following strings:
“a2b6c2”, “a3b6c3”,
“a2b7c2”, “a3b7c3”.
4.3.1.2
Representation of
the macro
The macro
representation consists
of two logical parts:
The actual
representation of the
Input
file text is an editable input text window. It canbi found in
the "Job properies" window of the
Auto_Generator job :
Figure 14.12 Input file text definition
(Application Ax_EQU_B_PS_A_GEN)
The template defined by
the Input file text:
- can be edited, or
- uploaded from the local file system with the help of the
file browser represented by the button Load
from File...
The names of
keys will be separated from the
final_strings by
the editable
Left and Right Parametric key delimiters.
Hitting the button
Parse the content
of
found keys
will be
parsed and the found keys will be
listed under
Keys:.
Double clicking on a
member of the mentioned Keys
list a new window will be opened where the elements of the set
represented by the given key can be defined:
Figure 14.13 Definition of a finit set
associated to a key (Application
Ax_EQU_B_PS_A_GEN)
The set can be defined in the Value
Definition pane by one
of the following methods:
- Encountering the members (selecting
the radio button Set)
- Reading the members from a defined
local input file where the button Browse...
opens the File browser
assisting the search (selecting the radio button Set from
local File). In
both mentioned cases (Set and Set from local File) there
is no type restriction on the values and the values are delimited by
the Separator value.
- Range can be applied
only for numbers and the elements are gained by the semantic of a
classical DO cycle.
- Random uses a built in
random generator, where the Seed value, the size of
the set (Cases) and the lower and upper range of the
generated number values(From:,To:) should be defined.
The
generation of the
key values is
performed upon hitting the button Generation and the user can visually control the defined
set in the table Generated
items.
The
generated elements are represented as ASCII characters and
–as inputs of eventual legacy Applications - can be defined to be
accordance
with a more restricted format restriction:
The
common length
of the string representing the values
must be defined if the toggle Free
format is not set. In this case
the toggle Left aligned determines
whether the eventual empty spaces will fill the right or the left side
of
string representing the given value.
For REAL numbers there are further
format conversion
possibilities making them readable for programs expecting inputs with C and FORTRAN, Java format conventions.
Example-Part 2:
Figure 14.14 Auto Generator
(Application
A_GEN_EXAMPLE)
As there is one
PS_Output_port for
the output generation there will be just one
set of files.
If the
Grid is an LCG
brokerable type then a
Storage
Element and the Environment variables must be associated to
the
Remote File defined in the
PS Output port. This can be defined by
the button
Attributes Editor
(See
Figure_14.12) .
Attribute editor opens a new
window for two tabs, one for the definition for the Storage Element
(See
Figure 14.15) and one for the definition of
the host for the Grid File Catalog (See
Figure 14.16)
Please
note, that the
definition of the Output SE
is obligatory if an LCG_2 like Grid File
Catalogue based directory is defined.
Figure 14.15 Storage definition for
Auto Generator Outputs
(Application Ax_EQU_B_PS_A_GEN)

Figure 14.16 Host definition for Grid File Catalogue
(Application Ax_EQU_B_PS_A_GEN)
4.3.2 Common Generator
The
properties of a common
Generator are the same as those of a general job. The Common
Generator may have any number
of common input ports as well. The only restriction is, that the job
must
have just one output port and it should be a
PS
Output Port
4.3.3
Result of the generation - by an example
The
generation of the
output files - as it was mentioned in Chapter
4.2 - will be started as the first step followning the
submission of the PS
Workflow.
See the pane Jobs in generator phase of
Figure 14.17 . This pane appeares in the list PS workflow details only if at
least a Generator job has been defined.
Figure 14.17 PS
Workflow Detailed in submission state after sccessful
generation
(Application A_GEN_EXAMPLE)
Example-Part3:
Let
us suppose that the
Internal File Name is “OUTPUT”
as the Figure 14.9
shows then
the file generation of the previous example
will look like as the following
table
shows. The names of the generated Files can be seen by hittng the button Out in the line of the job AgenEx ( See Figure 14.17)
|
File names in the
catlogoue /grid/seegrid/hermann/PS/EQU_AGEN_/
|
content |
| OUTPUT.0.0 |
a2b6c2 |
| OUTPUT.0.1 |
a2b7c2 |
| OUTPUT.1.0 |
a3b6c3 |
| OUTPUT.1.1 |
a3b7c3 |
Table:
Generated example files
4.4 Collector
Job
detailed:
Figure 14.18 A job of collector (COLL) type
See Figure_14.8 for the
detailes of the job named as Collector
and having the type name COLL.
This type can be selected by right mouse button clicking on a job
icon in a WE window as Figure_14.10
shows.
The semantics of a Collector_job
is determined by the binary
Job Executable provided
by the user for this component, similarily to the Common Generator .
It is the task of the user defined
executable - see
the example of
Figure_14.18 MatrixDemoWithCollector.exe
- to encounter,open,
read, and
evaluate
each input file
defined by the
Collector PS Input
port(s).
Important notice:
If the user
in the
Workflow_Editor connects
the input port of a collector job to an output port of a
PS_job then input port automaticaly
changes to be a
Collector_PS_Input_Port.
Its color indicated by dimmed light green differing from the green of
the common input ports.
You remember
that the connected output port of the PS job must refer to a remote
file!
As the section 3.3.2_Results_of_remote_output_files
explanes: In this case the ouput port to which the Collector PS
Input
Port is connected to implicitly defines a grid file subdirectory where
the results of the PS are gethered.
See the detailes of a Collector PS Input port on the next figure:
Figure 14.19 Collector PS Input port
If the toggle “managed copy” is
set then the P-GRADE Portal will automatically copy the remote
files in to the working
direcrtory of the machine executing collector job. The generated names
of these local files will be structured as
follows:
The common prefix
of the names is defined by the Internal File Name
and the postfix of the names
will be
inherited from the names of the postfix part of
the remote files (workflowname + instance number , See 3.3.2_Results_of_remote_output_files).
Example
Using the settngs of the
Figure_14.19
-and remembering
3.3.2_Results_of_remote_output_files
- the user can expect the following input files in his/her local
working directory to be opened:
INPUT1.Ax_EQU_B_from_A_GEN_Collector.1
INPUT1.Ax_EQU_B_from_A_GEN_Collector.2
If the toggle managed
copy is not set, it is the responsibility of the Job Executable of the
Collector job
to read the grid files.
4.5 Short case study
Let us suppose that the user defined
a complew PS Workflow consisting
of more Generators (in our case a user defined and an
Autogenerator) and more Collectors as the
Figure
14.20 shows:
Figure 14.20 Complex PS Workflow with
more Generators and Collectors
After the termination of the
Generators the 3 main parts of the
Workflow manager can be observed on the snapshot
Figure 14.21 rendering the Detailed view of
the PS _Workflow:
- Jobs in generator phase
lists the Generator jobs with their states.
- eWorkflow list shows the
submit pool with the generated element Workflows which are just
running -and may be manipulated while they do not leave the pool
(either by successful termination or by user Abort). This list is headed by the Statistics disussed earlier in
Chapter 3.2
- The members of the Jobs in
collector are inactive at the moment as the number of eWFs
in the states Finished + Error has not reached the
value of Total.
Figure 14.21 Intermediate State: the eWorkflows are running
The
Figure 14.22 shows the state
when all of the Collectors have been terminated :
Figure 14.22 Terminated PS Workflow
Note that the eWorkflow list
is empty as all the eWorkflows have been elaborated.
5. PS Persistency
During a long PS-WF experiment
- theoretically it may last weeks or
months long - it may occour that the portal must be stopped
and restarted by the Portal administrator. The
PS has been designed such a way that even in this case the most
improtant user results would not be lost:
Each output of the generated eWorkflows are landing on a
Storage Element, so the terminated eWF-s are
preserved and the user can resume the execution
of the PS via the Submit
button. Only the results of the eWF-s which were
running in the moment of the shutdown will be lost.
However, this restricted damage will be repaired as these
eWF-s will be resubmitted upon the mentioned resume operation.
Shortly speaking the P-GRADE
Portal guaranties the eWF level
checkpinting in case of a Portal brakedown instead of the
more fine granulated job level one.
XV
References
[1] Mercury
monitor:
[2] P-GRADE:
[5 ] Equation Solver application
http://www.lpds.sztaki.hu/pgportal/v23/includes/Equation_Solver.html