P-GRADE Portal

 Version 2.5 February 15,  2007

An introduction without tears
 
 

Content

0_Preface
i._Release_history
ii._Introduction

I The aim

II_The_Players of the PORTAL infrastructure_and_their_identifications
   1 The Players
      1.1_The_user´s_desktop_machine
      1.2_The_Portal_server
      1.3 The set of remote resources ( the GRID)     
      1.4 The Certificate Server (MyProxy)
   2 The identifications
      2.1_User_against_the_Portal_Server_
      2.2_User_against_own_userkey_file 
      2.3_User_against_the_Certificate_Server
      2.4_User_against_the_Virtual_Organization

III Overview of the operation of the PORTAL
   0 Preparation
   1 Uploading_a_personal_certificate
   2 Receiving_a_short_term_-_proxy - certificate
   3 Settings: Defining_the_resources_
   4 Defining_a_workflow_
      4.1_Short_introduction_in_the_Workflow Editor
         4.1.1_Workflow_creation_  
            4.1.1.1_Interactive_building_process
            4.1.1.2_Import_process
         4.1.2_Workflow_saving
         4.1.3_Workflow_modification
      4.2_Workflow_deletion
   5 Starting_a_workflow
   6 Observing_the_progress_of_a_workflow
6.1_Progress_info_from_the_WorkflowManager
6.1.1_Detailed_view
6.2_Progress_info_from_the_Workflow Editor
6.3_Progress_info_by_Monitoring_and Visualization
   7 Fetching_the_result
7.1_Fetching_partilal_results
   8.Run time user actions
8.1_Suspend_the_run_of_the_Workflow

IV The detailed operation of  the PORTAL by an example
   1 Login
   2 Certificates: Setting access  rights  to resources
   3 Settings: Defining the_resources
          3.1_Direct_use_of_resources_in_the_EGEE
   4 Workflow Editor: Building_your_workflow
   5 Submitting_the_workflow
   6 Observing_the_progress_of_the_workflow
   7 Fetching_the_result

 
V Monitoring_and_Visualization
   1 Introduction
      1.1_Availability_of_monitoring
   2 Life_cycle_of_monitoring_data
      2.1_The_source_of_data
      2.2_The_transport
      2.3_The_elaboration 
      2.4_The_frame_of_destination:_The visualization interface
   3_The_Prove_program
      3.1_User_activities  
         3.1.1_Truncate_trace_files
         3.1.2_Visualization_activities_
            3.1.2.1_Filtering
            3.1.2.2_Change_state statistics
            3.1.2.3_Sorting_the_program_parts
            3.1.2.4_Zooming_in_the_time_scale

VI_Multi-GRID_support
1. Setting up VOs of Grids and default resources (by portal administrator)
2. Setting up the resource list for a VO (any regular user)
3. Allocating the workflow (any regular user)
4. Supplying certificate for each VO before execution (any regular user)

VII Information System
1. MDS-2 information system
1.1 View of available resources in the Grid
1.2 View of detailed information about a resource

2. LCG-2 information system
2.1 View of available sites in the Grid
2.1.1 Selecting a Virtual Organization
2.2 View of detailed information about a site of a Grid

VIII_Handling_of_remote_files
          1_General_aspects_of_remote_files
          2. Different_kinds_of_remote_files
                   2.1_Low_level_usage (Globus)
                             2.1.1_Protocol
                             2.1.2_File_reference
                             2.1.3_File_Storage
                             2.1.4_Example
                    2.2 High_level_usage (only_within_the_EGEE)
                             2.2.1_Protocol
                             2.2.2_File_reference
                                     2.2.2.1_LFC_ file_catalogue
                                     2.2.2.2_RMC_file_catalogue
                             2.2.3_File_Storage (Storage Element)
                             2.2.4._Example
                              
                  

IX User Quotas

X_Connection_to_the_EGEE_Grids_and_the usage of the Broker
1._General_rules_to_submit_individulal jobs by the broker of EGEE
2._JDL_Editor_detailes
2.1_Opening_the_JDL_Editor
2.2_Setting_retry_count
2.3_Checking_the_Sandbox
2.4_Setting_Ranks&Requirements
2.5_Checking_Input_Data
2.6_Setting_optional_Storage_Element_in_Output Data
2.7_Setting_the_Environment_Varibales
2.8_Examlpe_of_"missuse"_:_Direct_a_job_to a dedicated resource
2.9_Important_notice_to_MPI_submission


XI_Rescuing_the_workflow

XII._Welcome_Menu

XIII_Workflow_archive service
1.  Saving the definition of a workflow and clearing the temporary parts
2. Uploading the definition of a workflow to modify / resubmit or uploading the content of a trace fie for visualization:
3. Uploading_of_the_demo_applications
 3.1_The_Equation_Solver_application


XIV_Parameter_Study_-_Mass_Workflow_Generation
1. Introduction
2. Basic principles
3. Basic Parameter Study: Implementation
3.1_Preparation_of_parameters_and_results
3.1.1_Input_connection
3.1.2_Result_connection
3.2_Submitting_and_observing_a_Parameter Study task
3.3_Result_Evaluation
3.3.1_Results_of_PS_worflows
3.3.2_Results_of_remote_output_files
4._Advanced_Parameter_Study_:_Generators abd Collectors
4.1_Overview
4.2_Overall_Semantics
4.3_Generator_Job_detailed
4.3.1_Auto_Generator
4.3.1.1_Formal_definition
4.3.1.2_Representation_of_the_macro
4.3.2_Common_Generator
4.3.3_Result_of_the_generation
4.4_Collector_Job_detailed
4.5_Short_case_study
5._PS_Persistency

XV_References

0 Preface

i. Release history


Release notes to Version 2.5

New features


  1. Version 2.5 reached  the long expected major advance in the evaluation of the P-GRADE Portal by supporting the true exploiting of the Grid facilitating the automated mass execution of workflows in the  framework of  Parameter_Study.  A handy  method  for every exploring scientist who wants to use the "Define one time and use everywhere" principle to accelerate  research  investigating  the effect  of results depending  on a large domain of predefined input parameters. Several new concepts has been introduced as PS_Input port, PS_Output_porte-Workflow and the new user convenience jobs as the Generator and the  Collector.
  2. The workflow submission become even more comfortable and flexible by the  two new general features :
    • The user can suspend  (and resume) the execution of a workflow. An often required feature when -for example - the  selected  Grid resource seems two slow to perform the needed job, and rescue would be advisable.
    • From now on the user needs not to wait to get partial results until the last job of his/her workflow terminates.
      The results of terminated jobs can be downloaded immediately.

Known bugs


  1. In some cases the statistics of the PS-Workflow  - see the section Statistics of Figure 14.22 may render wrong information about the number of e-Workflows in Finished and Init states: Instead of being regarded as  Finished some workflows might be erroneously added  to the sum of  the e-Workflows being in Init state. However the this error has no influence on the execution and on the expected result  of the PS-Workflow.
  2.  We have experienced randomly occurring conflicts of  the  P-GRADE Portal Server with Internet Explorer 7. In the observed cases the P-GRADE Portal displays an exception but no data loss occurs. The exception can be removed by  clicking on some other button in the user interface. We recommend the using of  Internet Explorer 6 or Firefox


Release notes to Version 2.4.1

  1. Possibility to store the data of the end users in reliable databases: It has turned out that the default hibernate function (HSQL ) supported by the Gridsphere is error prone, and in some cases the  logging data of the users have been lost after the restarting of the Portal therefore the Portal administrator is supported from this Release on to define and set up an external Data Base for the storing of the log information.
  2. The data transfer load of the information system has been substantially reduced. The BDII server will be asked for data on user request.
  3. A new job submission strategy has been introduced observing the current  load of the  portal server and therefore ensuring a tolerable response time for the user
  4. The own data resource handling of the portal server has been reconsidered. In connection to this the redundant storage of  workflow result  has been deleted,  and a more accurate quota handling implemented.
  5.  A bug has been fixed occurring at the concurrent up-and downloading  of  the proxy certificates. This failure occurring typically at conducted practices when many users executes the same command within a short time.
  6. Automatic VOMS extension of certificates has been introduced
  7. Jobs can be submitted to VO-s via the Glite infrastructure as well

 



Release notes to Version 2.4:

New features and improvement of services:

Revision of remote file handling: User option for non automatic copy to the worker node. ( See managed copy )

Revision of  rescue handling: The new functionality includes all types of resources involving  the submissions to a Broker

Enhancement of verbosity level, localization  and accuracy  in the forwarding of the eventual errors occurring in the grid infrastructure

Protecting the Portal server by the introduction of a changeable  limit of  jobs  being submitted and observed in one time.

Revision of  MPI job handling:  A totally new middleware ensures -(and guaranties in defined circumstances )  the success of submissions in case of MPI jobs

Bug fix:


Total revision of low level script layer

Solving the memory leak problem of the visualization

Known bugs:

B.1
The Ldap server sometimes delivers such hosts for the information system which  reference a common cluster  with different hostnames within a given site.  As the  information system  has  no additional knowledge to unify these clusters the aggregated  data gained from the component CE-s sometimes show the  multiple of the real values.  
B.2
The sites of the selected VO in the overview window of the  Information System display even those jobs not belonging to the selected VO.
B.3
In case of  several existing Workflow Editor Windows on the users  desktop the "old"  windows   tend to become zombies ( insensitive to user commands and loosing connection to the  server )

Release notes to Version 2.3:

New features:

Extended -user individual - quota handling

Full archive  facility for generated workflows (See chapter XIII_Workflow_archive service)


Release notes to Version 2.2:

 New features:

 Separation of external and internal file name references in the input/output ports of  the  jobs ( See No more restriction on file references)

Connecting  the Portal to the EGEE  Grid  and exploiting in this case the Broker service of the EGEE Grid for the jobs of a workflow directed to this grid.
(See chapter X_Connection_to_the_EGEE_Grids)


Fault tolerant behaviour of workflows (See chapter XI_Rescuing_the_workflow).

Welcome menu to change the default settings of personal user data ( See chapter XII._Welcome_Menu)

Release notes to Version 2.1:

New features:

This  documentation includes the new features of Version 2.1 highlighted in the chapters VI_Multi-GRID_support , VII Information System , VIII_Handling_of_remote_files, and IX User quotas.

Deleted features:

The
operations Copy and Paste  of the Workflow Editor considered as unimportant and error prone have been deleted.

Bug fixes:

Edited workflows in transient (incomplete) state can be stored in the PORTAL and retrieved for further editing.




  ii. Introduction

The P-Grade portals mission is to give user friendly access to Grid resources which is a technology in a rapid evolution.
This evolution is "mapped"  in the Portal which offers  general low level solutions for simple Globus Grids, and high level  solutions for the modern sophisticated Grids like the EGEE.
Throughout in this paper you will find descriptions of general low level solutions and special considerations referring only to the EGEE Grid.
As the P-Grade Portal is a Multi Grid portal, able to connect Grids of different kind a substantial effort has been taken to make the functionalities of the Portal as orthogonal as possible.
However at some point the different aspects, conditions and possibilities of the EGEE grid must have been mentioned  mixed in the general text.
 

I  The aim

 
The  P-GRADE Portal  offers a comfortable method of handling workflows from any connection point of the World Wide Web. 

The P-GRADE Portal cover several Cluster and GRID related technologies (GLOBUS2,GLOBUS3, Condor, CondorG, CondorDAGMAN, PVM, PMI )  to  meet the  need of the interested user which  intends to access remote computational resources and hides the difficulties  to activate them.

 
If you are negligent about details or if you are a hardened GLOBUS professional with bad
nerves you can get a head start with the  
chapter IV  where the usage of the Portal is explained by a comprehensive example.
 
 
 
A Workflow is a bundle of jobs  you want to edit, launch and observe from remote computer resources where access rights have been granted for you by so called certificates.
Technically a Workflow is a directed acyclic graph (DAG) where each node has a computing resource and a program ( job ) to be launched on that resource; further the edges of the graph are the ¨information pipelines¨ (streams) which connect the input and output points (ports) of the individual jobs. (See Figure_1)

Jobs are executable (sequential or parallel)  applications represented by their binary code.
 


A node is a  wrapper  of  a job  containing  the  references  of its  executable  code, to its I/O connections  and to its resource.
(See Figure 16 for an outer, and Figure_18 for an intern look of a node )

The input connection points (we will use the term port interchangeably with term ¨point¨ referring input and output connection points) of the nodes that are not connected to any other output  point of any other node  are representing the input file-s of the whole Workflow. The output points of the nodes not serving as inputs to any other nodes are representing the output file-s of the Workflow.
(Let us note, that any internal pipeline (stream) can be marked as either volatile or permanent, in this later case the data flowing through it will be regarded and recorded as an output_file  of the Workflow , see Figure_24 )
 
The task of the Workflow is to generate OUTPUT files from the INPUT ones.
 
There are several subtle points to emphasize:
 
In Figure_1 you see the input and output connection points of a node as little green and gray squares. Green indicates input ports, gray indicates output ports.
A port maps the external references (input_file, output_file, pipeline) to the internal I/0 representation of the  job
(Port usage will be detailed in the Chapter IV    The detailed operation of the PORTAL by an example  for input  and output ports)
At present there is a limitation: no more than 16 ports can be associated to a node.

 

Figure 1
 

II The Players of the PORTAL infrastructure and their identifications

 

1. The Players

 
Now let us summarize the main actors participating in the handling of the workflows ( see  Figure_2  ).

1.1 The user's desktop machine.

You need an Internet-connected desktop machine with a browser which is able to access the WWW.
Please note that the user works with two different  user interfaces in a parallel way when he/she uses  the P-GRADE Portal :
 

1.2  The Portal server

There is a remote Portal server which you can access by a browser.   
This server will be used to store your code, program data (first of all local input_files), the graphs of the workflows, the list of the defined resources  and the living short term (proxy) certificates. From here you can download your workflows to edit and also from here can you launch your workflows, and the results can be downloaded from here as well. 
The data stored in the Portal server on behalf of a single user is restricted by the user quotas.
 

1.3  The set of remote resources ( the GRID)

The most important part of the infrastructure  is the set of remote computational resources (generally of computer clusters) where the jobs may actually run.
The resources are subordinated under Grids.  See more detailed in paragraph Setting: Defining the resources

Complex Grids may subdivide the set of users and the  resources  accessible by them  in virtual organizations(VO). However this mapping may be overlapping: 
 a user and a  resource  may belong to more than one virtual organization of the Grid.
In these grids the  access right represented by a user certificate  may be  associated  to one (or more) virtual organization(s)  and not to the whole Grid.

 The EGEE Grid requires that the user be registered  at one VO. 
There is a general rule that a user must belong to  just one  VO.

The registration procedure and policy is VO dependent and not covered in this paper.
      
Resources are abstractions and  associated to sites, which performing the task of a given resource.

In the EGEE Grid a site may serve resources belonging to different virtual organizations. These resources are not only computational resources (see Computing Element) but storage resources (see Storage Element) as well. The resources of a site may be shared by different virtual organizations. However the user  access to a resource must be completed only by a valid VO membership reference.

Basically  the default  resources  are set by the  system  administrator in a static way . These data may be inherited by common users, and can be extended or changed at will.
Therefore these settings  may not correspond to the actual state of the Grid . The portlet Information_System  is used to gain actual data about he Grid.
For the time being  there is only a  restricted  facility in the P_GRADE Portal allowing the automatic setting  of  resources found by the Information System. ( See the button Load resources from MDS2  in Figure 12b)

1.4 The Certificate Server (MyProxy)

At last we mention an administrative player, the Certificate server, which is a repository of ¨certificates¨.
A certificate is virtual identity card granting  access to a set of  resources.  Certificates must be signed by a trusted Certificate Authority (CA).
 
To understand the importance of this last one here is a little notice:
 
These players (the user, the Portal_server, the resources) are connected through an unreliable channel - the Internet - therefore they have to build secure connections to identify themselves and to have sufficient protection from unjustified access.  These rather complicated tasks are executed with the help of the certificates which have an identity card feature - granting access to an expensive resource only up to a limited amount of time.

Your previously obtained personal certificate containing your personalities ( distinguished_name, your public_key , the expiration date of the public key, the name of the CA) as not encoded open data  must have been issued and ¨signed¨ by a trusted Certificate_Authority  to identify you.

The distinguished name contains the family and given name, organisation unit, organisation of the user introduced by standard prefixes ( CN=,OU=,O=).
The public key (PK) is  the binary code by  the help of a  messages which has been previously encoded by the secret key (SK)  can be  decoded :
                               message = decode ( PK, encode( SK , message ) )
Each agent (the users and the Certificate_Authority)  publishes  own public_key  and hides own secret_key)

Technically the signature means an additional text to your certificate file containing the open_data processed in three steps:
  1. A control sum will be generated from the open data by well known hash function.
  2. The result will be encoded by your public_key.
  3. The result will be encoded by the secret_key of the Certificate_Authority.

The MyProxy Certificate Server stores the public key of the Certificate_Authority  - in form of a special certificate - and therefore this server is able to decipher your public key, vouch for you and represent you against third person, what is in our case a remote resource.
The representation happens by issuing a short term -so called- proxy certificate signed by the ¨MyProxy¨ Certificate server.
This representation is needed because the resources do not accept directly the personal certificates.
 
This delegation method has four advantages against the direct use of  personal certificates:
 
 

2. The identifications

 
To handle the agents of the P-GRADE Portal environment  there are four different kinds of identification interesting from the viewpoint of the users:
 

2.1 User against the Portal Server

This first kind of identification is required when you would like to access the Portal_server via the Internet, i.e. it is your account on the Portal_server. (See Figure_3)

See more detailed in Chapter III 0 Preparation  how to gain a user account .

2.2 User against own "userkey" file

This second kind of identification  is associated to the secret_key  file  (userkey.pem) belonging to  your personal certificate.
This is needed when you upload your long term personal certificate on the MyProxy server. (See Figure_7)

2.3 User against the Certificate Server 

The third kind of identification is associated to a certificate account of  your personal certificate on the certificate server MyProxy.
You use this identification if

  2.4 User against the Virtual Organization

Users_of_the_EGEE grid must be members of a virtual organization (VO).
Registration to a VO -a one time administrative issue - happens in possession of a valid user certificate.
The registration procedure and policy is VO dependent and not covered in this paper.

 

III. Overview of the operation of the PORTAL

 
A possible full operation cycle is the following scenario:
 

0. Preparation:

 0.1 Users  of  general simple  grids.


Using the portal login name and password of the account, and a valid personal certificate (consisting of two files - see later more details Figure_6 , Figure_8  ) the user enters the Portal_server.
See more detailed in Chapter IV 1 Login
Notes:




0.2 Users of the EGEE grid

Beyond all what has been described the in the previous point the EGEE users must be members of  virtual organisations.
There is a general rule that a user must belong to  just one  VO.
Generally a user certificate 
is required for the VO membership registration.  This  certificate must be trusted  by the Grid the VO belongs to.    


SPECIAL WARNING to the users of the virtual organisation Gilda, and to the users of other  VO-s requireing  certificates with WOMS extension:
The EGGE Grid community is in transition from using the simple Grid certificate to the usage of Certificates including VO specific extensions (VOMS).
This enables a more reliable and secure access to more than one VO with one certificate.
However the VOMS related extension of the MyProxy service has been not finished up to now and the API interface to the My Proxy service is error prone.
The intermediate consequence is that the Certificate/upload functionality (See Figure 2 and Section 1 Uploading_a_personal_certificate) can not be  executed  within  the Portal for the time being.
The suggested roundabout is the issuing the of the following command in a UIF machine belonging to the given VO where the valid certificate of the user is already inserted:

myproxy-init --voms  <VO> -s <Host_of_MyProxy_Server> -p <Port_of_MyProxy_Server> -l <Proposed_user_account_name_on_MyProxyServer>

Example:

./myproxy-init --voms  gilda -s grid001.ct.infn.it -p 7512 -l myGildaCert

where the "myproxy-init" must be the special  updated command (written by the gilda people not complaining because of the "--voms" parameter )

          Please note  that the command  prompts:

After this one time roundabout the Certificate/download functionality (See Figure 2 and Section 2 Receiving_a_short_term_-_proxy_-certificate) can be executed the  traditional way within the  Portal framework.



Figure_2 indicates the possible activities that the P-GRADE Portal  permits you:
 





Figure 2
 

1. Uploading a personal certificate

By ¨Certificate/ upload¨ ( Figure_2 ) the user sends a personal certificate to the Certificate_server and establishes a certificate account

This step happens rather seldom, because the expiration time of personal certificates is fairly long. 
The uploading process is a rather complicated  transaction started from Figure 5  and explained detailed in Chapter IV 2.1.  
 The upload creates a certificate account  of the certificate, and the user must remember  the  name and the password of it for the subsequent   proxy generations.
(See also Chapter II 2.3)

2. Receiving a short term - proxy - certificate.

By the operation ¨Certificate/download¨ in (Figure_2)    the user accesses the Certificate_server   and reads the certificate account to load a short living proxy of  valid personal certificate  into the Portal_server. (The user must do it every time intending to submit a job, and there is no time  left for the current proxy_certificate. For security and economic reasons the expiration time of this type of certificates  is generally limited in one week.  Please note that the resources where you want to submit your jobs  ¨see¨ and accept only that proxy_certificate  which you  have downloaded  from   Certificate_server ¨MyProxy¨ and which you have selected to use for the subsequent submission.

The downloading process is started from Figure 5 detailed in Chapter IV 2.2.  
The user must reference the certificate account of the uploaded personal certificate . (See also Chapter II 2.3)


 

3. Setting: Defining the resources

Filling a simple table of the Portal_server the user can define the URL and the access way to the basic services of the remote resources where the jobs may run  (See also Figure 12a  and  Figure 12b)  See detailed the steps of definition at resource definition.
If the  selected GRID  has an information system, the information system may automatically explore the possible sites and services. See: VII_Information_System
The user need not bother with the definition (finding) of resources and connecting them to the  jobs  in the special case  she or he has access to an EGEE like Grid, because in this case the Broker service does this task. See it more detailed  in Chapter X_Connection_to_the_EGEE_Grids_and_the usage of the Broker

Notice:
In connection with the direct use of resources of  EGEE Grids  please  read the Chapter IV :  3.1_Direct_use_of_resources_in_the_EGEE

4. Defining a workflow

The user can create new workflows, and  load and archive existing ones.
Please note that the creation process is done with a SEPARATE program, in a different window (the Workflow_Editor) which  is downloaded from the Portal_server  and runs on your desktop. This has two consequences: 

There is an important and suggested different way of defining the workflows:
 
 They can be imported from the P-GRADE development tool (P-GRADE). This way has some advantages against the manual editing the Workflow in the P-GRADE Portal :

In the Workflow EDITOR program you use the menu item Workflow/Import workflow (See 4.1.1.2_Import_process) to open the file browser "Import Workflow" in which you can search for the needed workflow files distinguished by the name extension ".wrk" .
To learn more about P-GRADE please consult with P-GRADE
 

4.1 Short introduction in the Workflow Editor

You have learned already, that the Workflow_Editor is a separate  graphic program which can be started from the Workflow/Workflow_Manager  portlet by the button Workflow Editor of the Workflow-tab and it runs in the desktop of the user.
Shortly speaking the Workflow_Editor can create, modify and save a workflow. You will find in Chapter IV a rather long introduction in the use of the Workflow_Editor. Here is only a short  summary of  the most important menu items of it
 
 4.1.1  Workflow creation 
 
Workflow creation is the process when we define a new workflow on the P-GRADE Portal. The creation may be interactive building process, or an import process:

4.1.1.1 Interactive building process
A new workflow can be created within a recently established window (as you see it at Figure_14)    or   within an existing copy (see Figure 32) of the Workflow Editor program.
With the menu item Workflow/New you may create a new empty workflow.
By the subsequent application of  Workflow/New job and Workflow/New port you can build the proper parts of the graph of the workflow.  (See Figure_15)
 
4.1.1.2 Import process
 
A whole workflow  previously built and tested by the application P-GRADE can be imported from the desktop machine with all of it dependent parts  by menu item Workflow/Import Workflow. (See Figure32 ) The selected menu item opens a file browser enabling to select a workflow file of file type extension  wrk, which  will be uploaded to the Portal_server. This workflow will behave just the same way as the workflows you have manipulated manually. However in most cases you need only to check the destination resources of the component jobs.
 
 4.1.2 Workflow saving
 
A just created workflow has no name. It must be saved by the menu item Workflow/Save as. (See EDITOR/Save| Upload on Figure_2) This command has two effects: it uploads the workflow with its user defined name to the workflow repository of the Portal_server and puts the workflow in the launch list of the portlet Workflow Manager. After any modification of the workflow (see 4.1.3_Workflow_modification_ ) the menu item   Workflow/Save has the same effects. If the  saving process finds that any of the referenced files mentioned in the description of the saved workflow  have not yet been uploaded  to the Portal_server (or not valid –see later) it  prompts  the user to enable the start of the automatic upload process. Therefore the manual issue of the menu item Workflow/Upload is seldom used.  (See Figure_32)
 
 4.1.3 Workflow modification
 
Any part of a saved (or recently created) workflow can be modified:
add or delete a job, (See detailed at Figure 15),
add or delete a port (See detailed at Figure 19 and at  Figure 23 ),
add or delete a connection between two ports,  (method described between Figure 28 and Figure 29)
changing any attribute of the job    ( See detailed at Figure 17 and at the subsequent  Learning Notes on Job Properties ) and
                                           of the ports  ( See detailed at Figure 21 and at the subsequent Learning Notes on Port Properties).
To handle these changes the user needs to access to the workflow (i.e. to download it from the Portal_server to the desktop) by the menu item Workflow/Open. The user selects the needed Workflow from the Workflow Repository of the Portal_server.
Changes the user any file reference during  the modification process (even if he/she restores the previous text) a hidden marking will record the event, and the  previous file reference will be invalidated, with the consequences that after the subsequent Workflow/save command the user will be prompted to enable the needed upload. Shortly speaking the system automatically maintains the data consistence between the definition environment (desktop machine) and the Portal_server, and the user is exempted from the duty to delete the obsolete files from the Portal_server. (See detailed at Figure 33 and the subsequent Learning_Notes_on_Upload_File )
 
The actual modification steps are discussed in details via examples in Chapter IV  in paragraph 4._Building_your_workflow_
 

4.2 Workflow deletion

 
There is no way to delete a workflow directly with the help of  Workflow_Editor  commands.
The reason is that Workflow_Editor runs in the desktop and the workflow is stored in the Workflow Repository of the remote Portal_server under the control of

 
You find in paragraph 8. Run_time_user_actions more useful notes about Delete


 
 

5. Starting a workflow

Using the  Workflow /Workflow Manager/ Submit (Figure_2) command you can submit the  prepared workflow to the GRID i.e. you may let it run. Certainly the following conditions must be fulfilled which are controlled by the system partly at creation time and partly at load time:
See more detailed in paragraph  Run_time_user_actions
 

6. Observing the progress of a workflow

If the submission was successful and the jobs begin running you can follow the progress in three different ways:
 

6.1 Progress info from the  Workflow Manager

First of all, the elements of the  Workflow list of the Workflow Manager (Figure_38) inform you  about the state of the whole workflow (column Status), and about the eventual results (column Output). The elements of the column View  have a tree structure, and their roots are the buttons Details  (Figure_39).  In the detailed mode a sub list describes the state of each job  composing the selected workflow. 
Size shows the size of the storage needed by the Workflow in the host of the server.
Quota shows the percentage of the  quota permitted for the user. 
The label of the column Quota includes the information about the full size of  the quota (in case of Figure 38 it is 1 MB),
and the last line of Workflow list  summarizes the percentage and the size of  occupied storage.
 
 
6.1.1 Detailed view (Figure_39)
In this list each line corresponds to a component job. The line contains the following fields:
    - Workflow         name of workflow inherited from the root menu
 
    - Gridname         name of the Grid (or of the virtual organization) where the job runs.
                      Gridname  is a new feature of Version 2.1 See more detailed in  multi-GRID_support.
    - Job                    name of current component,  as the user defined it  in the  text field of "Name" of the job definition window <jobname>properties.
                                 (See  Figure 17  , Figure 18 )

    - Hostname         host where the  job  runs
    - Status                Status information must be distinguished between Workflow status and Job level status.
                                               The possible Job states with proper coloring and in the natural sequence (when applicable) are:
                                   
                                                     init               (white)
                                                        submitted   (orange)    only in case of brokering  (Since Release 2.2)
                                                         wait          (blue)        only in case of brokering   (Since Release 2.2)
                                                        scheduled  (magenta)  only in case of brokering   (Since Release 2.2)
                                                     running         (Red)
                                                     finished         (green)
                                                     error             (blue)

                                                The possible Workflow states  are

                                                     init                 (white   in overview window, green in detailed view)              The workflow is uploaded in the Server
                                                     submitted       (orange in overview window, white in detailed view)              On user action and when no job is Run state        
                                                     running           (red      in overview window, white in detailed view)              On first job enters Running state                                                                                   
                                                     finished          (green  in overview window, white in detailed view )              When the last job terminated successfully               
                                                     error              (blue     in overview window, white in detailed  view)             On error in one job and with no possible jobs to run                   
                                                     rescue            (blue     in overview window, white in detailed  view)             On error in one job and with no possible jobs to run  (Since Release 2.2 See rescue)
                                                     aborted          (red      in overview window, white  in detailed view)             On user action
     - Logs                 buttons Out and/or Error to read the eventual files
                                 written by the system on ¨stdout¨ and ¨stderr¨ respectively
     - Output             A green button indicates that, the application  terminated successfully and the result can be downloaded
     - Visualization   eventual buttons  Visualize , All to call the graphic 
                                 monitoring for the whole workflow, or for the proper  job or 
                                 for each possible parts. 
     - Action              This array of buttons is inherited from the root menu and 
                                 will be discussed in paragraph 8_Run_time_user_actions_
 
The user should return to the root menu by hitting the button Back
 
 

6.2 Progress info from the Workflow Editor

If the graph of the running workflow is selected and visible in the Workflow Editor then you can see the progress by the changing of the coloring   of the corresponding nodes  of the jobs that are being executed. 
You can start the Workflow Editor in two different ways to see the progress:
- either by the Attach button of that workflow in the Workflow list of the
  Workflow Manager
- or by hit the button Workflow Editor of the Workflow Manager and Open
  the Workflow to be observed. 
 In accordance with the convention discussed in  6.1.1_Detailed_view_ the following colors are 
 used:
                                  orange              The job  waits for user submission
                                  white                The job  is submitted and waits to run         
                                  red                   The job  is running 
                                  green                The job  is  finished
                                  blue                  The job  has been aborted either by the system or by the user

Note that since the release 2.2 there is an additional state of a job in the case it is running under the control of a Broker of  the EGEE Grid:

                                  magenta    The job is scheduled by the broker

Note that since the release 2.2 a new color indicates that the workflow failed but can be restarted from a natural checkpoint composed upon the jobs finished successfully:

                                  saddle brown    The job is in "rescue" state

 


                                  
 
 

6.3 Progress info by Monitoring and Visualization

The third method is graphical monitoring. It is discussed detailed in chapter V Monitoring_and_Visualization.
 

  6.4  Suspend  the run of the  Workflow  (Smart abort)

A new feature of Release 2.5 of the P-Grade Portal program is that the run of a submitted  workflow can be suspended  and the  workflow becomes  in  the  rescueable state. So the Rescue state can be generated not only by the system but by the user as  well. This feature is very useful  if a job hangs on  an unfortunately selected  resource but  the other jobs of the  workflow have produced already  results worth preventing from recalculation,   which would be necessary  during  a resubmission  following the abortion. See the new button Suspend on Figure 40a .
 

7. Fetching the result

The last step is fetching the result. The Portal_server puts the results in a zip file. This is a compressed directory hierarchy which follows the structure of the workflow graph. A subdirectory will be generated for each node, where the  associated permanent local  output_files are stored. The user can fetch the zip file by the standard download manager of the browser used. 
This step is marked in Figure_2 as ¨Workflow/Workflow Manager/output¨.

Please note, that the remote output files will not be retrieved to the portal server and can be accessed  by methods beyond the control of the Portal. See Figure 8.1

Please note, that the download manager is  part of the browser, and it is the responsibility of the administrator of the user's web site to set it up properly.


 7.1 Fetching partilal results

The local permanent output files of the terminated jobs can be downloaded individually even before  the termination  the whole  Workflow.
Before the release 2.5 of the P-GRADE Portal program only the the whole set of of local outputs could have been downloaded.
Please note, that the partial download possibility is not valid  for the eWorkflows of the PS Tasks.
The little green triangle buttons  in the rows of the proper jobs and in the column Output (See Figure 40a)  indicate the available outputs  and receive the download requests.

8. Run time user actions

The recent version of the Portal Program maintains  two different lists  of  workflows:
The meaning of discrimination between the active and inactive workflows is the following:
The cost of operation of polling the state of active workflows is expensive because of heavy net traffic.  Therefore the user may get  much slower responses if the number of  active workflows ( and the complexity of them ) is high.
The both  lists will be updated  as a consequence of  the  Save  or Save as command of the Workflow Editor  but  the  the  Delete  command may has  different consequences:

The user can use the buttons  of the Actions column (Figure_38) in the row belonging to the to each elements of the Workflow/ Workflow Manager list
 

 
The button Delete all deletes all workflows of the user from the Portal_server or from the list of active workflows.

8.1 Suspend  the run of the  Workflow  (Smart abort)


A new feature of Release 2.5 of the P-Grade Portal program is that the run of a submitted  workflow can be suspended  and the  workflow becomes  in  the  rescueable state. So the Rescue state can be generated not only by the system but by the user as  well. This feature is very useful  if a job hangs on  an unfortunately selected  resource but  the other jobs of the  workflow have produced already  results worth preventing from recalculation,   which would be necessary  during  a resubmission  following the up to now only possible abortion. See the new button Suspend on Figure_40a.
 
 
  
 
 
 

IV The detailed operation of the PORTAL by an example

 
During the tour you will build, start and observe a little test workflow.
After the general preparation you will find the description of the workflow to build and submit after Figure 14
 

1 Login

 
 
The user can reach the PORTAL through a  proper URL. For example: 

 
 http://fn1.hpcc.sztaki.hu:9080/gridsphere/gridsphere

There you should find - depending on the browser - something like this:

 
Figure 3
 
At this point you log on the Portal_server  (see also User identification against_the_Portal_Server_).

After successful log in, the user  is automatically directed  to the new Welcome menu offering possibilities to change personal data. (See Figure 12.1 )
Selecting the tab Workflow  the user can reach the basic services of the  Workflow Manager:
 



Figure 4
 
From here you can launch the activities shown in Figure_2
Let us begin with the Certificate Manager by selecting the tab Certificates.
 
 

2.Certificates: Setting access rights to resources

 
 
Figure 5
 
The user can Upload a personal certificate to, or Download   a temporary proxy_certificate from the MyProxy certificate server with the help of the Certificate Manager triggered by the Certificates tab.
The very first action can be to fill  the Certificate Server ( the so called MyProxy server) with the existing personal certificates of the user .
Please select Upload:

2.1 Upload detailed

In this process the user creates a certificate account

Please rememeber that the Upload step -for the time being - must be skipped and done in a different way (outside of the scope of P-Grade Portal)  in the case when your Virtual Organisation   uses WOMS extension to the certificates. See WOMS_Warning

 
 

Figure 6
 
 
 
The first screen of the upload process requires your file named userkey.pem containing your secret_key  (see Figure_6).
You can search for it in your local directory system using the Browse  tool.
Fill the input field and accept it with OK. The next panel requires a password for your secret file as Figure_7
shows (see also User  identification against_own_userkey_file  ) :
 
 
 
Figure 7
 
Upon OK the certificate file will be requested. This  certificate entitles you to use certain resources for a limited amount of time.
(See Figure_8)
 


Figure 8
 
 
Upon OK you will see the window depicted in Figure_9 where you must select an existing certificate_server(MyProxy)  by hostname and port, and must define an account (login name and password ) on it where your certificate will be stored.
( See also User identification_against_the_Certificate_Server)
This Certificate account stores just one certificate, so this ¨login¨ is actually the user name of the given certificate. The default host and port of the Certificate Server is given by the system. You see here an additional input field, the lifetime - this will be an upper limit for the short term proxy certificates you may request by a subsequent download. You must hit Upload to perform the operation. 
 
                                                                                                                              


                                                                                                                                       
  
Figure 9
 
  The system acknowledges the end of the successful upload process:



Figure 9a.
 
 
Next you may generate a short term proxy_certificate. To get it you can use the  Download  button of the  Certificate List menu  open  in this state ( see Figure 9a)
You will get Download menu:

2.2 Download detailed:

 




Figure 10

 
The proxy certificate will be generated from the personal certificate by filling a form and will be downloaded to the Portal_server upon hitting the button Download.
The parameters of the form are the followings:

The fields login  name  and  password refer to the account  of the  previously uploaded certificate.
You can overwrite the default value of lifetime as the required expiration time of the  short term proxy certificate. However the  the  actual  value of the  downloaded  proxy will not
exceed  the value you have  defined  during Upload of your certificate.
If you find this limit too short, please repeat the upload process.



Upon Download in  the new (2) release of the Portal the user gets a message (Figure 10a) indicating that  downloaded short term certificate can be associated to a GRID.




Figure 10a

A GRID is an administrative community of certain resources. With the same certificate the user can reach all resources of a certain simple GRID or a certain virtual_organization of a complex grid.
Resources are subordinated to grids, i.e. each resource must belong to a given GRID (or to a virtual organization of complex Grid).
In the new release of the Portals Infrastructure  the different jobs of  the same workflow can submit programs  in resources of different available  grids.
Therefore the user must have methods to maintain different certificates to different Grids (or virtual_organizations).
See more detailed in VI_Multi-GRID_support.
This association is introduced by hitting the button Set for Grid.

The actual selection  can be performed on the subsequent frame (Figure 10b) .


 



Figure 10b.

                 Upon selecting one available name (which may refer either to a simple grid or to a virtual_organization or to a virtual organization with broker support  ) from the check list
                 labelled Select GRID  the hitting of button  OK (Figure 10b) closes the frame and returns to the  panel showing the downloaded short term (proxy) certificates:


 
Figure 11
 
        As you can see -comparing it with Figure 5,  there is just one certificate on the list in our case, and the time of the usage is restricted by the value you have used for the selection.

          Important notice:
In the [Actions] column there may be a fourth button Use this at each unselected proxies , if the list has more than one element belonging to the same GRID (or virtual organization). 
With this button you can select the actual certificate you want to apply for your subsequent job submissions directed in the respecting GRID (or virtual organization).
With the help of button Set for Grid  the GRID (or virtual organization) association can be changed at any time (See Figure 10b)
With the help of button Details the user can go back to a frame similar to of Figure 10b  but without the   possibility to set a GRID (or virtual organization).

Having defined the certificates you need resource(s) where your jobs may run.
The main menu button Settings helps you to define them. Please note that these data are stored on your Portal_server and not on the Certificate Server.
 
 

3. Settings: Defining the resources

          
In the new version  of the P-GRADE Portal infrastructure  the  resources  are subordinated  to   virtual organizations which are disjointed  administrative communities of the grids.

          However  resources  belonging to different virtual organizations  (or even to different grids )  may be used within the same workflow.  
          See the  management of the  grids in the  Chapter VI_Multi-GRID_support.
Please, note that the Name in the table settings (See Figure 12a) means virtual_organization even in cases when the grid consists of just one virtual_oganization.


Hitting the tab Settings  the list of virtual organizations  available  for the user will be displayed:

 

Figure 12a

To select a line for details the proper button Resources opens a new frame  listing the resources belonging  to the selected virtual_organization .



 


 
Figure 12b
 
            New elements to the resource listing of Figure12b  can be  added by three different ways:
          The  Contact_string  defines the entry point of  a resource (cluster ) in form of an   URL of the leading Host of the cluster  appended by symbolic scheduler
          name Job manager which classifies the demands of the job against the cluster.
          More precisely the contact string defines a program-queue belonging generally to a cluster of hosts, whose elements may execute the job added to the queue.
          In the EGEE like grids the name of  the resources  identified by a contact string  is called Computing Element (CE).
          The same cluster -identified by the leading Host- can serve several CE-s.
 
          You may remove a resource from your list at any time by Delete.

On manuall resource addition please make sure the GRAM and the GridFTP servers are listening on the same host. If you would like to add a resource where these two services are not described with the same Contact_string please contact your portal administrator, otherwise you will experience submission problems with your newly defined resource.

           If the NAME of Figure 12a is of virtual organization with broker support kind - for example "hungrid_LCG_2_BROKER" -  then the  window
           opening for  hitting the button  "Resources" is not modifiable, i.e. it has no significance, and is there just for historical reasons.

           The contact string is a general term, which has been slightly  extended by the EGEE Therefore the usage of  EGEE resources needs special consideration:

3.1 Direct  use  of  resources in the EGEE.


The user can explore the available Computing Elements by two ways:
  For example in case of the virtual_organization "voce":

skurut4.cesnet.cz$ lcg-infosites --vo voce ce

****************************************************************
These are the related data for voce: (in terms of queues and CPUs)
****************************************************************

#CPU    Free    Total Jobs      Running Waiting ComputingElement
----------------------------------------------------------
   9       9       0              0        0    ce.grid.tuke.sk:2119/jobmanager-pbs-voce
 166     166       0              0        0    ce.polgrid.pl:2119/jobmanager-lcgpbs-voce
  94      14      95             80       15    grid109.kfki.hu:2119/jobmanager-lcgcondor-long
  36      32       0              0        0    ares02.cyf-kr.edu.pl:2119/jobmanager-lcgpbs-voce
  78      65       0              0        0    zeus02.cyf-kr.edu.pl:2119/jobmanager-lcgpbs-voce
  46      41       0              0        0    skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce
 176     176       0              0        0    ce.egee.man.poznan.pl:2119/jobmanager-lcgpbs-voce

Two new features can be observed comparing the traditional Globus resources:

4. Workflow Editor: Building your workflow

 
Now it is time to make our own workflows. We select the tab Workflow of the main menu:
  
 
 

Figure 13
 
By selecting the button Workflow Editor,  an independent java program, the Workflow_Editor will start. 
 
Note: Your browser must have the JRE 1.5.2  Java plug-in (or higher) in order  to let this program start.
The first time the Workflow Editor is loaded during the portal session some messages regarding possible risk of using the program will be displayed.
The Workflow Editor, however, is harmless and should be allowed to run.

(The involved Webstart technology notifies the user that the downloaded program may access the local file system and prompts the user to trust or dismiss the source of the certificate )

 
In the positive case the following window will appear:
 
 
 
Figure 14
          
                
 Next you will build a simple workflow containing two  jobs  described as follows:
       Example definition:
 
For simplicity both jobs are of identical structure and use the same executable program (in real life the executables are usually different):
This executable is a simple sequential program  of C source -in our example Cell.c¨ - that reads two integer numbers from two different text files, where the program opens these files as  ¨INPUT1¨ and ¨INPUT2¨ respectively, and the value of the multiplication of the two numbers will be written in an output text file opened as ¨OUTPUT¨. 
 
You will build the connections to your first  job  ¨Cascade1¨ such a way, that it will receive its both input_files    from the local file system as  <path1>/I1 and <path2>/I2   respectively.
 Its output  ¨OUTPUT¨ will be generated somewhere in the GRID and will serve as the ¨INPUT1¨ of the second  job   -¨Cascade1.2¨.
The local file <path2>/I2 will be used as the ¨INPUT2¨ of this job .
Finally  the result of the whole workflow - ¨OUTPUT¨ of ¨Cascade1.2¨  - will be generated in the GRID and replicated to the Portal_server, downloadable for the user.
 
 
As a preparation, you must have got  the executable code (¨Cell.exe¨) and the input_file  in the form <path1>/I1 <path2>/I2   stored  in your local file system. 
 
This knowledge is enough to build the workflow as follows.  Let us define the first job  first:
 

 
Figure 15
 
Hitting the marked icon New  job  a new  job  will appear:
 
 
 
Figure 16
 
Double clicking on the  job  will allow its properties to be edited.
(An alternative to double clicking is the RIGHT mouse click on all graphic elements triggering a popup menu of possible operations.)
 
  
 
 
Figure 17
         Learning Notes on Job Properties:
 
          In this menu the user defines the code of job to run (Job Executable) and the call conditions of the job - where to let it run (Grid, Resource),
          with what kind of arguments and  conditions of the resources.
          Arguments can be
                        line Attributes   elaborated by the code internally eventually influencing the running of the job,
                        Monitor   flag to indicate graphical observation of  running of the job, 
                        and some hints about  the kind (JobType) and size of  resources needed perform the job (ProcessNumber)

          Details:
                    
Name is given by the system as default. The user can change it but the name of job must be different from any other job names. 
JobType is the kind of the code referenced by Job Executable to be started on the resource.
It can be traditional sequential (SEQ)   or parallel (MPI, PVM) . In case of MPI the resource must be
informed about the number of hosts needed by the program (Process Number)  

Important notice: If the user wants to submit MPI jobs with Broker support the special JDL requirement must be entered:
(See Chapter X 2.9_Important_notice_to_MPI_submission)      

 Job Executable defines the path of the executable code to be uploaded from the local file system.
                               Upon a successful upload and a subsequent download of the Workflow the input field will show
only the name of the executable instead of the whole path. Any change of this input field instructs  the
system to upload a new executable -defined by an absolute path - from the local file system.
The search for such a file is supported by the File Browser

Instrument is a message field set by the system  only in the case when the executable code contains
special message sending instructions for the real time monitoring  i.e.  the code is instrumented.

 Process Number has significance only if the Job Executable is of MPI type.
 It notifies the resource about number of needed hosts.


Attributes may be filled in just as the eventual command line parameters of traditional C programs.

 Grid defines a GRID  or a virtual organization   i.e. the high level administrative domain where the job must be submitted.
Changing the GRID changes the subordinated list of selectable Resources as well.

Monitor flag can be set by the user only in the case when the Instrument is set.
The setting enables job level monitoring .

Please note that  setting (changing) of  the monitor flag may have an unexpected effect  on the selected Resource (and on selectable resources):

If the previous state of Monitor  was "not  set"  and  the relating selected Resource could not be  monitored then the new monitoring request will hide all  the resources lacking the monitoring infrastructure and the current  value in the  field Resource will be replaced by  first monitorable element of the list of resources.

Resource defines  the resource  where and with what kind of assumption the Job Executable will runs within the defined Grid.
The selected resource may change implicitly as a consequence of changing Grid and Monitor setting.


If the Grid is of  virtual organization with broker support  kind - for example hungrid_LCG_2_BROKER -  then the value of
  "Resource" has no significance, and just for historical reasons.

Warning:
          If the kind of the virtual_organization defined in the Grid is of  EGEE like and Job_manager of the Contact_string defined as Resource is an LCG proprietary  one
          please consider the warning in chapter 3.1_Direct_use_of_resources_in_the_EGEE and use virtual organization with broker support  as Grid with resource constraint in the
           JDL.
 
JDL:  Is a new feature  available from  the Release  2.2.
 It indicates the  JDL editor. This editor is applicable only if the virtual_organization  selected in Grid is of EGEE compliant type  i.e. the Resource will be determined by the system upon clues of   matching characteristics (set by the user with the help of the JDL Editor) instead of direct assignment. The usage and effect of the JDL Editor will be discussed in  Chapter X_Connection_to_the_EGEE_Grids.


There is an alternate method to set the resource dependent features of the job. It can be handled centrally starting from the
main menu  Workflow Properties.

    
After filling the needed fields the window may look like this:
 
 
 
Figure 18
 
 
Hitting Ok you will return to the main window:

 
 
 
Figure 19
 
Now you can define the I/O ports to this job. Hitting the port icon you may add a new port to the selected job .
An alternate method to define a new port is the menu item New port in the main menu Workflow (See Figure 32)

The selected state of the  job  is visible by the red frame around the job  : 
 
 
 
Figure 20
 
    Double clicking on the port icon (or selecting the Properties item of the popup menu triggered by the right mouse click as Figure23 shows )
    the port properties will be definable in a pop-up window:
 
 
 
Figure 21
 

With the help of the  port properties window the user defines the direction, kind, name and file association of the port.

Learning notes on Port Properties:

A port connects  an  input or an output file opened by the job with the environment.
This environment can be from the point of view of the respecting port an external file or an other port of a different job.
The external file reference (defined in the field File)  will be mapped to the  name (defined in the field Internal File Name) used by the author of the executable to open the given file.

Notice, that there is no more restriction on usage of  filenames:
In the versions preceding  Release 2.2 of the P-GRADE Portal the Port property field Internal File Name  (see Figure 21)
has not been defined and hence there was the additional restriction imposed on  the user to apply external file references
"ending" as the job executable expects them i.e. the value corresponding to this field
  was generated from  the
"/" separated "tail" of the
File field which used to have the form
                               [[protocol]<directory>]<FileName_applicabe_as InternalFileName_as_well> 

A port   can be either an Input (In)  or  an Output (Out) port .

Input:
If the Type  of the port  is Input  AND  the port will NOT be connected to any Output port of other jobs then the port must refer to a genuine  input file.
The genuine input file must be defined as a full path  in the File in the form of  [<protocol>]<path> where
<protocol> can be defined only in case of  Remote files (see VIII_Handling_of_remote_files).

If the Input port will  not  be connected to a genuine input file i.e. it will be connected  to the output port of a different job  then File field must (and can) not be filled.

In both (local and remote) cases the user defined  input field <name> after the label  Internal File Name  must correspond to the   "fopen (<name>,"r")" instruction within the code of the job.

The set flag  managed copy means that the system automatically delivers the input file to the working directory where the associated job will run.
This is the default case.
However in  some cases when the user wants to handele remote files and the location of the input file is a Storage Element  the file may be to big to be copied.
In such cases the user may decide (by clearing the flag) to take over the responsibility of reading the Grid file. Note, that in this case the executable of the job must be prepared by the user to open and read the Grid files using the GFAL api of the EGEE.

Output:
If the Type of the port is Output AND File Type is NOT Remote then  File field must (and can) not be written. 
Please note that there is NO symmetry  between genuine input and output files: Genuine local output files
( i.e. those referenced by Output ports not connected to any other Input ports of different jobs)  are   stored in the PORTAL Server and
 will be downloaded to the local environment of the user by an interactive command after the completion of the run of workflow.
The other case  is when the  Output  File  is Remote. In this case the file referenced as string after the label File  will be stored  according the full path of form  [<protocol>]<path>
(see VIII_Handling_of_remote_files).

In both (local and remote) cases the user defined  input field <name> after the label  Internal File Name  must correspond to the   "fopen (<name>,"w")" instruction within the code of the job



Details:

Port name:
                  Given automatically by the system.  There is not too much sense to change it by the user.
         The field will be used internally to generate subdirectories. Hence it must contain only alphanumerical characters.
Type:        
                    Selector, to indicate  either a reading or writing  access  to the proper  "fopen (<name>,"{r/w}")" instruction in the code of the job
File Type: 
                    Has significance only in case of genuine files. The default setting is Local.  
If the setting is Remote then the Input file will not be uploaded to the Portal Server during the Save/Upload phase
terminating the definition of Workflow by this Workflow Editor. Instead of just the reference to the remote file will be stored and
the file transfers will be organized by the run time system.
A file defined to be Remote on an output port forces all connected input ports to be  Remote  with identical File names.
File:            
                     Reference to a genuine local or remote  file  [<protocol>]<path>
                     Please note, that this filed is not definable if the port  is an input one and connected to the output  of an other port, or the port is designed to be 
                     a local output port.

The search for such a file  in the case FileType = Local   is supported by the File Browser.


Internal File Name:
                     
Internal reference to a file  used by the author of the corresponding job in a "fopen(...)"  instruction.
File storage type:

This selector can be activated only in case of Output files. Its default setting is Permanent for the genuine Output files and  Volatile for the 
"channel" files connected  to other  ports .
If a channel file is reset as Permanent its data will not be discarded after each  connected job has read its
content but "added to the output of the workflow". It means in the case when the setting FileTypeLocal  that the file will be preserved for downloading as it would be a genuine Output file.
An eventual resetting to Volatile forces the change of the  File Type to Local because  a  temporary file  in dedicated Remote storage device is undesirable.


  Please select a proper input file with the help of the File Browser and fill the Internal File Name according the convention required by example program cell.c  :

 

 
 
 
Figure 22
 
The Port name is set automatically, however the user may redefine it.
Hitting OK you return to the original editor to define other ports.

Repeating the proper steps (basically to define the location of INPUT2 for port ¨1¨ - a similar window is shown in Figure 30)
you arrive to change the properties of port ¨2¨

Learning Notes on Port Editing:

In this case hitting the right mouse button (seen in  Figure23) offers three possibilities:
Properties - to define the Port properties
Delete       - to delete the port
Fix (Unfix)- toggle to glue (or release) the relative  graphic representation  of the  port  along the sides of the square representing the job
                   (This operation has no significance from the point of view of the semantics of the workflow)
 
 
 
 
 
Figure 23
 
Selecting Properties by left click you get the Port properties popup window where you may select "Out" as Type and enter "OUTPUT" as Internal File Name:
 
 


Figure 24
 
 
Please remember  that in this (Local) case of Output  you must not define the File even if this port would be intended as the source of the genuine output of the whole workflow. The reason is, that the Workflow, upon successful termination of the submitted  tasks, will not return individual  files. Instead it packs the Permanent Local output_files into a compressed file tree reflecting  the structure of the workflow, and you can download it by the standard method of your browser as you will see it in Figure 39.
Please note that you  will be able to identify  this file  using its Internal File Name.
If the user wants to reduce the storage load of the files produced by the job then  the eventual unneeded files can be marked as Volatile instead of the default Permanent.
Hitting Ok completes the definition of our first job and  the  icon New job can be selected.
 
 
 
 
 
Figure 25
Let us define the job properties as previously and let us create the ports for the new job similar to Figure19, and let us select the first (0) one.
Here the  File will not be  defined  because this port will be connected to  the output port of the other job:


 
Figure 26
  Hitting Ok we will receive the following Warning message Figure 27:
 
 



 
 
 
Figure 27
  The simplest  way is to answer it with  No  and proceed to  perform the  port connection.
  After closing the Port property window the Editor looks like as:
                                                                                                                                            
 
 
 
Figure 28
 
Now we connect the output (port ¨2¨) of ¨Cascade1¨ with the input port ¨0¨ of the second job .
Pressing the middle mouse  button on the output port, holding it down, dragging it up to the proper input port  and releasing it will define the desired connection.
                          Editing Notes :

No rubber line will be seen during dragging.

Clicking on the arrow connecting ports the color changes from blue to red and the connection becomes selected.
The selected  connector can be deleted by the proper icons of the menu bar  (cat or delete) or even attributed it graphically
to influence the color oft he connection for the simulation  regime when the Workflow Edit  is used to display the runtime state
of the submitted workflow. Hitting right click on the arrow opens a popup menu where toggle item Switch to {ONLINE|OFFLINE}
can be selected. This  minor coloring feature does not change the  semantics of the  workflow to be defined.
 

 
 
 
Figure 29
Now we edit the second input port (Port name 1)  of the  new job:
 

 
Figure 30
 
And let us define the output port for the second Job:
 
 
 
Figure 31


Confirming the change by Ok the edition phase is complete.  We just need to save our product for the Portal_server:
In the main menu let us select the operation Save as:
 

 
Figure 32


In this state the Workflow Editor controls the correctness of the workflow.

Learning Note
In case of an eventual error  (mostly bad references to the local files to upload, missing resources) a warning message appears about the found errors.
Even in this case the user may decide to save the workflow. However in this case the workflow is marked as incomplete for the Workflow Manager and can not be submitted only to be stored for a later modification.
This modification is initialized by the Open menu command (see Figure32 ) supplying a list of the workflows of the user stored in the  PORTAL Server  .
Selecting one workflow it will be downloaded and the editing can be completed.

In case of saving of new workflow (in case of Save as or  at the first use of Save)  a popup dialog  (Figure 32a) prompts the user for a  name of the Workflow.
This must be of alphanumerical characters and must be different of workflow names have been stored in the PORTAL Server .



Figure 32a


 
Let's define the workflow as  ¨WF1¨.
In a subsequent step system automatically proposes to issue the Upload command to transfer the referenced executable code(s) and the input_file(s) from the client's desktop to the Portal_server:

Learning notes on upload file :
The Upload proposal  happens in the following cases:
                               If the user refuses the suggestion   the workflow remains incomplete .
                               Upload command can be issued later at any time even manually. See Menu  Upload Files... of  Figure 32

 
 
 
Figure 33
 
You select Yes and then the system starts the uploading process, which is indicated in a pop up window Upload containing a progress bar. 
Upon termination the message Finished will be visible and the system will wait for the user to press the  Close button:
 
 
 
Figure 34
 
 
 
 Executing the editing steps above we  have finished the  creation of our new workflow WF1 and can leave the  Workflow Editor  to return to our Workflow Manager
 Its page (See Figure 35) must be  Refresh-ed  to show our new workflow WF1, which is now ready to run. 

 Before doing it let us control  the associations of jobs and resources.
 It can be done either step by step visiting the jobs for properties or centrally by a new menu command of Release 2 Workflow Properties (See Figure 32 )
 It opens the following  table:



Figure 34a.

  If a change is needed it can be performed in a 6 step process:
  1. Select a proper Grid
  2. Select the required  resource from the list of the loaded Resources. (Remember the resources are Grid dependent)
  3. Mark the left of the line(s) belonging to the required Job(s). 
  4. Confirm the changes with the button Set selected
  5. Leave the window by Ok
  6. Save the Workflow
This table can be used the similar way to control the monitoring of the  jobs. If the code belonging to the job is not instrumented then
the association will be refused.

 

5 Submitting the workflow

 
 
 

Figure 35
 
With the Submit command we can activate the workflow:
 
 
 
Figure 36
 
 

6 Observing the progress of the workflow

 
A side effect of submitting is the changing of the Submit button to Abort.   A subsequent Attach  command reopens the Workflow Editor but in a new cast: The progress of the workflow can be followed by the changing of the colors:
 
  
 
Figure 37
 
In this state the first  job  has received the control and the second is waiting for the termination of the first.
A click on the Refresh button of the Workflow manager window may indicate the successful termination of the workflow.
The Portal_server just collects the referenced files and makes one compressed downloadable file  out of them: 
 
  
 

 
 Figure 38
 

7 Fetching the result

 
A click on the green button in the [ Output ] column  starts the download manager of the browser to copy the result file on the desktop of the user.  Please note that the user defines the destination library of the workflow result with the tools of the download manager in a browser dependent way. 
 
The pressing of the  Details button  of the [ View ] column opens a window from where important information can be concluded: 
 
 
  

Figure 39

 
Beyond the verbose state of the constituent  jobs in the Status column you can get the graphical rendering of the two stages  Time – Process  communication diagrams ( by pressing the buttons under column [Visualization] . You can also see the eventual  messages of the   jobs  directed to the standard output ( by pressing the button Out) and/or to the standard error (by pressing the button Err  - not visible in Figure 40 as the jobs did not produced error messages ) channels. This buttons are placed in the column [Logs]. 
 
Hitting  the Visualize button, the visualization is performed by the independent  program called Prove that is working on a proper trace file of the workflow.  The availability of  job  level visualization is depending on two necessary conditions:
 
As you see this was not the case in our simple example. 
You will get a more comprehensive view of the possibilities available for monitoring in Chapter V_Monitoring_and_Visualization .
 
Finally we show the window returning the content of the standard output upon hitting the Out button of Cascade 1.2
 
 
 
 
Figure 40
 


Figure 40a New Features: Suspend and patrtial result download
 
 
 

V Monitoring and Visualization

 

1 Introduction

Graphic monitoring means the generating, collecting and graphic rendering of runtime data informing the user about the state and about the progress of the submitted workflow. In a parallel environment the dynamical conditions  triggering the run of a distinguished program parts are of special importance:  They help the user to pinpoint design flaws and the temporarily missing resources. Therefore the “time space” diagram has been selected as the base tool to render graphically the behaviour of the interacting program parts. This will be discussed in Section 3 where the work with the graphic tool Prove – running in the desktop of the user will be detailed.
 
We use the common term program parts” in the respect of graphic monitoring in two totally different contexts:
For example the upper part of Figure_43 shows the monitoring of the whole workflow, the lower part is a detailed view of the progress its  job  “cummu”.
 

1.1  Availability of monitoring

 On one hand the possibility of the high level, –or workflow monitoring is the generic property of the implied  job  submission technique “Globus/Dagman” . 
On the other hand the  job  level monitoring – badly needed first of all in cases when the  job  includes  parallel processes – can only be performed if the following conditions are all valid :
  
 
 
 

2. Life cycle of monitoring data

 

2.1 The source of data

As you see in Figure_2 the workflow results –including monitoring data, in our terminology the trace file”– primarily arrive from the remote resources into the Portal_server. Actually huge amount of data may be produced by the instrumentation. 
As each  job  is associated to a dedicated resource, there is a separate trace_file file to each job .
 

2.2 The transport

The trace_file  will be collected in an autonomous, incremental way in packages as a result of two possible events which are basically independent from the activity of the user: 
 
- The local temporary buffer for the current portion of  the  trace_file  in a host of the remote resource is full.
- The respecting  job  has terminated
 

2.3 The elaboration

These data need to be stored, filtered and elaborated. It is the Portal_server which does the bulk of this work.  It prepares the “image  file” on user demand. This “image file”   - very few byte indeed compared to the trace_file  – will forwarded to the application program Prove running on the desktop of the user. 
  
  
Why the user should know all these nasty technical nuances?
First of all to understand the cause of the delays that the double buffering imposes on the graphic rendering system. Almost as important to understand that in given cases the user should assist to diminish the load of the  Portal_server by issuing of the “forget  events” command  of the Prove,  instructing the Portal_server to truncate the corresponding trace_file  releasing the data about events have been arrived before a certain time. (There is only a limited storage quota for each user in the Portal_server  which is a precious shared resource )
 

2.4 The frame of destination: The visualization interface

The program  Prove can be started from the “detailed” view of the  “Workflow Manager” as Figure_39 indicates. 
 
Note: In the following a new workflow application is selected as an example to demonstrate  the full palette of monitoring options. This fairly complicated workflow has been prepared such a way that all of its component jobs contain instrumented codes
It is called ForecastWmin and performs a weather forecast program, see Figure_41)
 
    

Figure 41
 
In this case the detailed view of this workflow in the Workflow Manager indicates the possibility of the  job  level monitoring by the show of proper buttons (Figure_42). You can compare it with  Figure_39 of the workflow application WF1 where the buttons for the  job  level monitoring are missing, because the  jobs  of this application have not been prepared for monitoring. 
 
   
 
 
Figure 42

 
Figure 42 shows the detailed view of the  workflow ForecastWmin in an intermediate state. The  jobs  which are running and /or finished can be visualized by the program Prove  which opens independent windows upon hitting the respecting Visualize buttons. The Prove can be opened for the high level view of the workflow as well. (Button Visualize in first line of the workflow containing the name of it) 
The button All “packs” all visualization windows together starting from the high level  view as Figure_43 indicates:  
 
 
 
  Figure 43
 
 
  Warning:
   If the number of the elements along the vertical axis  (hosts / jobs)  is high than certain alphanumeric texts may not be displayed due to the low resolution.
   In that case please increase the size (especially the Height  ) of the applet.
 
 

3 The Prove program

 
As previously indicated the program Prove visualizes time – space diagrams.   
The program_parts  are represented by colored bars placed as rows of a coordinate system  where the horizontal axis denotes  the common time, and the –discrete – vertical axis is labeled by the name of program parts which may be  jobs  or processes depending on the call context of the current item of Prove.
 
 
Endpoints of arrows between bars are indicating times of sending and receiving of events respectively. These arrows must be generally blue. Exceptional red lines indicate bad trace_files, unsynchronized clocks, lost monitor information. You are kindly encouraged to report them to our Portal maintenance team.
 

3. 1 User activities 

The user activities may have effect on the  trace_file generation and  on the graphical rendering of them.
 
3.1.1 Truncate trace files
The only activity respecting the  trace_file generation is the menu command Trace/Forget events (see more detailed in the chapter 2
 
 
 
Figure 44
 
The menu command Trace/Collect is not used at present – it is reserved for forcing the remote resource to update the  trace_file .
 
3.1.2 Visualization activities
Visual rendering activities include the filtering, attributing, and time scale zooming of  the program parts. 
 
3.1.2.1 Filtering
The menu item View/Filter serves to diminish the program parts to be shown.
You can select the interesting program parts by the associated toggle marks. The selection is will be actualized by selecting the “Show changes” item, as Figure_45 shows. Please note, that “delta_m” –not visible in Figure_45- has been selected too.  
The operation Filtering can be regarded as a kind of “vertical zooming”.
 
 
 
Figure 45
 
The result of selection is shown in Figure_46:
 
 
 
 
 
Figure 46
 
 
 
3.1.2.2 Change state/statistics
Selecting the statistical regime instead of the default settings informing about the time dependent states of the program parts  a color coded statistics of the occurrence frequency of distinguished event types will be retrieved:


 
  
This operation can be started by selecting the menu item Info/Statistics/Event.  See Figure_47:
 
 
 
Figure 47
The result can be seen in Figure_48:
 
 
 
 
Figure  48
 
You can restore the original settings by selecting the menu item   Info/Statistics/Communication (Figure_49)
 
 
3.1.2.3 Sorting the program parts vertically
You can change the order of the appearance of the program parts along the vertical axis.Figure_49  shows the path to the selection of the proper menu item from the list 
 Info/Sort/{Sort by communication |Sort by name| Sort by hostname} 
 
 
Figure 49
 
Figure_50 shows the new image:
 
 
Figure 50
3.1.2.4 Zooming in the time scale
One of the most important ways of the investigation of events is the  zooming facility in the time scale. The zooming works a stack like way and does not use special  buttons of the window but the just the mouse buttons. The rules of selection are very simple:
 
The Figure_51 shows the state immediately after the range selection (the little horizontal line toward the right side of the calibrated time scale), and Figure_52  the state after the execution of the zoom instruction.
 
 
 
Figure 51
 
 
Figure 52
 
Any zoomed image (Figure_52,Figure_53) contains an active ruler . With the help of it  whole original range time range can be swept over. However this operation can be prohibitively slow: As it was discussed in 2.3, the desktop part of the Prove program must send a request to the Portal_server  for a new image which will be downloaded with a delay depending on the network. Therefore the sweeping will not be as smooth as it would be, in the case of traditional local program.
 
The Figure_53 shows the image after a repeated zoom.    
 
 
Figure 53
 

VI Multi-GRID support


In P-GRADE Portal from version 2.1 users can execute their applications in several Grids, each of which may consist of one or more  Virtual organizations, (VOs).
If a Grid consists of several VOs the user should have a certificate for the Grid and this certificate should be registered to those VOs the user would like to access.
For each of these VO-s  the user has to have a valid certificate, which will be used for authenticating the user at the resources of that particular  VO.
To use this multi-GRID support the following steps have to be taken

- The portal administrator has to set up the list of VOs, and may define a set of default resources. These  resources appear on demand
   in the resource list of every common user
.

-  Each user can then setup his own resource list for this VO
-  The jobs of the workflow can then be allocated to any  resource of any VO, so different jobs of the very same workflow can be executed on
    different resources belonging to different VOs of even different Grids

-   Before execution the user has to download a short term proxy certificate  for each  VO involved in the workflow.

Important notice for EGEE users:

The Portal ensures a multi- Grid, multi VO support independently from the underlying infrastructures.
However  certain grids may impose restrictions:

EGEE restriction:
A VO defined by the user when selects a Virtual Organization with Broker support may be in contradiction with the VO permission of  the resource selectable by that Broker.
This unpleasant situation may only occur if two conditions fulfill:

 Let's see the situation detailed:

  1. The user already a member of VO1 registers to VO2. As the site "S" also belongs to the VO2  in the Grid map File  of site "S"  the user will be mapped as  VO2 member.
  2. The user submits a job to a VO1 broker accepting him/her as a VO1 user and making the proper VO1 setting in the JDL description.(Figure 10.2)
  3. The local security system on site "S" finds a VO1 job and  from the delivered  proxy_certificate ( including distinguished_name of the user  )  determines a contradicting VO2
    membership from the mentioned Grid map File.





Let's see all this in a bit more detail.


1.  Setting up VOs of Grids and default resources (by portal administrator)

 

The Grid   and  VO and the resource list of the VO-s can be edited in the Settings tab of the portal.
Only the root user has privilege to setup and modify the list of Grids and VOs.
This means that he/she has to set up at least one Grid (or VO) and advisably one default resource for it.
In Figure 6.1 the Grid configurations window can be seen as edited by the root user.

The root user  adds a new VO by ‘Add new’, and delete existing ones by ‘Delete’.

Note  that  in case of  Grids  composed  of several VO-s  the input  field "Name"  refers  to the  VO  and the  input  field "Grid"  refers to  the Grid as  hub over the several VOs.
The distinction is necessary because  the resources will belong to the VO but the  information system access defined here refers to the superimposed Grid.
Shortly speaking the string defined as "Grid" may appear only in the top of hierarchy  when the user selects a Grid as the root  for  information retrieval (see The Information system).

If we want to define any VO -for example - "HUNGRID" -  of the "EGEE" Grid then  together with the VO "HUNGRID" we may define the access to the whole "EGEE" Grid.
Having defined the HUNGRID as part of the EGEE grid  the whole information system of the EGEE Grid becomes visible (See Figure 7.6)

In cases when the Grid is not really subdivided by VO-s  the Grid is regarded to be consisting of one VO and -similar to the multi VO  case - this  name of this VOis required
as "Name".
The filling of the field "Grid" is not obligatory, and in case of the empty input string its value will be inherited from the value of "Name" . This suits for needs of user groups
who using  simple Grids   do not want to make distinction between the idea of VO and of  Grid.





 




Figure 6.1 Grid configurations list window


The administrator can also setup an information system for the Grid of the VO  if it is available. Currently the information systems of types MDS2 and LCG2  are supported. The configuration of the Information System will then be used by the Information System portlet. If there is no information system then just choose ‘N/A’.
Please note that  in case of the LCG2 the information system  refers  to the  whole  Grid and not to just one virtual organization.

Both for MDS2 and LCG2  the host, port, base-dn have to be defined for contacting the Information System. You can see this in Figure 6.2.
For the MDS2 type you also have to refer to an existing MyProxy server account (See the "login" and "password" of Figure10, where "login" of Figure 10 corresponds to "Username" of Figure 6.2).


The other fields of the  MyProxy Server account ("hostname" and "port") are referring to the MyProxy Server itself and they are defined during the installation of the P-GRADE Portal in the  configuration file  "PGradePortal.properties".
The system will automatically download a proxy certificate from this account, and will  use it for authenticating itself against the source of the information when querying the job-manager list for the Grid.

                                                                                                                     


Figure 6.2 Defining Information System for the Grid (MDS2)

A default resource list can also be setup by the portal administrator. This user interface can be reached by clicking the ‘Resources’ button in the Grid Configuration Window (Figure 6.1) . The resource list window can be seen in Figure 6.3.


                                                                                                                      



Figure 6.3 Defining the DEFAULT resource list for the Grids

 

The portal administrator defines a default list, which will then be available for any of the users for setting up their own resource lists. Resources can be added by ‘Add’ and can be deleted by ‘Delete’. At definition the URL (for example "n99.hpcc.sztaki.hu"), and a Job manager (for example "jobmanager-fork") have to be provided.


A special case of the VO   definition is when we define a virtual_organization with broker support for example "hungrid_LCG_2_BROKER"  in Figure 6.1
In this case  no information system will be defined. For historical reasons the
window Resources  contains  in this case just one list element -mostly the- "default.jobmanager". It will be set by the administrator, and  it may not be altered  by a common user. This value is not used in Release 2.2.

 

 

2.     Setting up the resource list for a VO (any regular user)

 

Any regular user can define his own resource list for each of the available Grids.
Let us compare Figure 6.1  and    Figure 12a.  As you can see, the users cannot edit the Grid list itself, they can only edit resources list by clicking the ‘Resources’ button for each Grid.
The resource list window for any user for a particular VO can be seen in Figure 12b.  The user can add and delete resources just like the portal administrator by ‘Add’ and ‘Delete’. The default resources defined by the administrator can be loaded by the ‘Load default’. If and MDS2 type information system is defined for the Grid than it can also provide some resource configurations, this can be loaded by the ‘Load resources from MDS2’ button.


                                                                                                                    


3.      Allocating the workflow (any regular user)

The workflow and its jobs can be allocated in the Workflow Editor(WE). For any job any VO  and resource in that VO  can be set. In Figure 34a  you can see the window Workflow properties  in the WE which can be opened from the Workflow menu or using the Ctrl+W hotkey.

A Resource for the jobs can also be set in the job properties window, which opens by clicking on the job.
The  VO (Grid) in the 
job properties window can be selected marked by the label Grid. This window can be seen in Figure 18 .


4      Supplying certificate for each virtual organization before execution (any regular user)

 

In the multi-GRID environment users have to provide certificate for  each virtual organization, this means that they have to map any valid certificate for any  virtual_organization  on the resources of which they want to execute their application. The whole certificate management takes place in the Certificate tab of the portal just like before. Right after download, users are offered to map the certificate for any of the Grids. This can be seen in Figure 10a .

The click on ‘Set for Grid’ leads to the interface in Figure 10b . The details of the certificate such as the issuer, subject and timeleft are displayed, and the desired Grid can be selected.


By clicking ‘OK’ in this window the user gets back to the certificate list, which can be seen in Figure 6.4.
In the column named ‘Set for Grids’  all the names of valid virtual organizations having been  associated with the respecting certificate are encountered.
Each certificate can be assigned to any number of the 
virtual organizations, but only one certificate can be set for a given virtual organization any time.




Figure 6.4 The certificate mapping window with the Grid mappings

 

In this window you can also modify mappings by the ‘Set for Grid’ function, which leads to the certificate-mapping window already seen before in Figure 10b .



 

VII Information System


The P-Grade portal can handle the available Grid dependent information systems. 
Two kind of information systems are recognized in the P-Grade Portal: the MDS-2 and the LCG-2  Information system.

 
Configuring a Grid access (including specifying an information system for a grid) is a task of the administrator of the portal. See   Setting up VOs of Grids and default resources

 

1.      MDS-2 information system

 

The MDS-2 information system of the portal has two functions: one is getting the list of resources available in the Grid; the other is getting detailed information about individual resources.

 

1.1 View of available resources in the Grid

 

When the user clicks on the Information system tab then the MDS Monitor label the MDS Monitor module of the portal is activated by default. There are two modules under the tab "Information System" the MDS Monitor and the LCG Monitor. In case of a subsequent selection of Information system the last visited module will be activated.

 

If the administrator of the portal has not yet specified a grid with MDS-2 information system, the following message can be seen in the portal window (see Figure 7.1).




Figure 7.1

If one or more Grids with MDS-2 information systems have already been defined in the portal the following screen
( Figure 7.2 ) can be seen after the selection of the MDS Monitor label.





Figure 7.2

The user can select a Grid  to see the available resources using the combo box  which is in the upper left part of the portal window.
Having selected a grid the user must click on the View button right next to the grid combo box to see the available resources.

I
f the server (called as a GIIS server) or the service running on that server from where the portal gets this information is not
 available the following message can be seen (see Figure 7.3).




Figure 7.3

1.2 View of detailed information about a resource

 

If the user would like to get detailed information about a resource he should click on the appropriate resource in the resource list (see Figure 7.2). The page with the detailed information about a resource can be seen in  Figure 7.4.




Figure 7.4

Figure 7.4 shows that the detailed information on every resource provided by MDS-2 can be divided into a static and a dynamic part.

If any information (e.g.: CPU Model in Figure 7.4) is not available from the MDS at that moment the Not Available (N/A) text is displayed for that attributes.

 

 

2. LCG-2 information system

 Introduction

To understand the relations what kind of information is gained by the EGEE information system infrastructure please see the following chart, where a site is a name for a collection of resources  which are geographically and by oragnization closely related.
Sites are generally clusters and compose the hardware infrastructure for computing elements and storage elements.
The figure shows that the resources of a given site may be shared among several Virtual Organizations  more precisely among the computing elements and storage elements of the  VO-s.
 



It can be seen that the separate BDII servers which are collecting information and are associated to different  Virtual Organizations  "see" diffrent views and fractions of the same Grid.
A BDII server may "show" even such sites to which the own VO is not associated. With the Example above the BDII c "sees" Site k
The BDII servers work in "pull" regime  and have a general refresh rate of  2 minutes. However the accuracy of data in a  distributed  system is  not guarantied.
 

In the P_GRADE Portal there are two possible queries of the VO dependent BDII servers, where  the user  must  know  that  Select Grid  list box  (See figures 7.5 7.6 )  selects just a BDII server associated to a VO:

Note:
The name "Grid"  in this command is  based upon the circumstances that generally the BDII servers - at least in the case of the IGEE federation - encounters almost all  sites of "foreign"  VO-s as well. See the example of the preceding paragraph BDII of VO c and Site k. However this working is not guarantied. Therefore -as a rule of thumb-, please use that BDII server in the Select Grid list box which is belongs to the VO you will be interested setting Select VO :

  1. If the user selects the  option All of the list box Select VO (See Figure 7.6) then -following the logic of the BDII server -  all sites observed by the  BDII are  selected  and  the  true  values of the dynamic load of the sites  (number of running and waiting jobs)  will be displayed.
  2. If  the user selects a dedicated VO of the list box Select VO (See Figures 7.7 7.8) then the BDII server will return only the those sites associated to the requested VO and what's more the displayed dynamic load values refer only the jobs have been submitted under the "flag" of the requisted VO therefore  the  no sound consequences can be drown about the full dynamic load of sites to be questioned.

The suggested  usage of the Select VO command is the following:
First select a dedicated VO to find all the  sites of  requested VO, and select All  to see the realistic load of  a dedicated site afterwards.




 

 

2.1 View of available sites in a Grid

 

When the user clicks on the  label LCG Monitor of the tab of Information system  then the LCG Monitor module will be activated.

 

If the administrator of the portal has not specified a grid with LCG information system yet, the following message can be seen in the portal window (see Figure 7.5).







Figure 7.5

If one or more grids with LCG information systems have already been defined in the portal the following screen ( Figure 7.6 ) can be seen after the user clicks on the LCG Monitor label.




Figure 7.6

The user can select a grid for the available sites using the combo box which can be found toward the upper part of the portal window. After selecting a grid the user must click on the View button right next to the grid combo box to see the available sites. By default the sites belong to the first grid in the grid list is displayed in this page.

 

Each site in the LCG type grid is built up from Computing_Element (CE) and Storage Elements (SE).
More precisely the site is a rather geographic idea.
There can be one ore more clusters inside of a site.
A cluster can be feed by one or more queue called Computing_Element.


In the site’s list page the basic information about  CE-s and SE-s can be seen. The information for each site by default is the aggregation of all the CE and SE resources can be found at the respective site.

 

If the server (called as a BDII server) or the service running on that server from where the portal gets this information is not available the  message "Cannot contact the BDII server" can be seen.




2.1.1 Selecting a Virtual Organization

 

The users of LCG type grids must belong to one or more virtual organization (VO). The CE’s and the SE’s are associated to  VO-s as well. The CE-s may belong to more than one VO. This means that if a CE or SE associated to a VO only those users who belong to the corresponding VO can access these resources.

The user can filter the sites associated to a specified  VO  by the combo box can be found under the grid combo box in the upper part of the portal window (Figure 7.7).   See bug report






Figure 7.7


After clicking the View button right next to the combo box the sites that belong to the selected VO can be seen (Figure 7.8).




Figure 7.8

Selecting a specified VO means the following:

- The user can see the list of those sites which belong to the selected VO .

- When the user clicks on a site name  the detailed information will display only those CE’s and SE’s  which belong to the selected VO

Important remark - see  bug report B.1  while interpreting the value of columns Total Free Running Waiting

 

2.2 View of detailed information about a site of a Grid

 

If the user would like to get detailed information about a site he should click on the appropriate name of the site in the site list (see Figure 7.6). The page with the detailed information about a resource can be seen in  Figure 7.9.




Figure 7.9

As can be seen in this figure the selected VO is All. This means that all CE-s and SE-s have been found at that site are displayed.
If the user select a VO in the site list page only those CE’s and SE’s will be displayed in the detailed view which are belong to a selected VO.
As can be seen in the Figure 7.10  reflecting the site IFCA-LCG-2 with  VO
dteam only limited number of CE and SE is displayed.






Figure 7.10




VIII Handling of remote  files

1 General aspects of remote files

The P-GRADE Portal supports the handling of remote files.
Remote  is  a  place within a given virtual_organization which is different from the local file system of the user's desktop and its access is controlled by  the grid certificates.

Since the version 2.1 of the P-GRADE Portal  input files can be sent to a job not only from the local file system of the user's desk top but from trusted remote places as well.
In a similar way the  output files of a job can be sent  into  remote storage places as well.

The next figure explains the differences between the handling of local and remote files:







Figure 8.1
Life cycles of local and remote files



2.Different kinds of remote file usage




Remote files can be handled by several protocols, stored by different means and can be referenced  at several levels in a Grid (and VO) dependent way.

There are two basically different ways to use remote files from the point of view of the user:
1.  Low level usage supported by  the Globus middleware.
2.  High level usage generally supported by the EGEE infrastructure.[4]

2.1 Low level usage (Globus)

2.1.1 Protocol
To access a file on  a remote place a transfer protocol is needed, which is  explicitly or implicitly  part of the URL describing the location of the file.
 Mostly the protocol  gsiftp will be used  i.e. in this case the user will be identified against the remote host by the actual certificate.

2.1.2 File reference
The file will be referenced by the URL consists of the concatenation of  host name and the storage path of the file on that host.

2.1.3 File Storage
The remote files are stored as common files of a host and there is a  special file, the GridMap file of entries containing the so called distinguished name part of  the user certificates  associated to a user account known on that system. So the system can control the access permission of file operations. The GridMap file is maintained by the local administrator of that host. 


2.1.4 Example


 
 The system will use this information in arguments of the automatically generated  globus-url-copy instructions.

 






Figure 8.2
  Low level access to a remote input file

2.2 High level usage (only within the EGEE with broker support)


In this chapter only the most important remote file related features of the LCG like grids (for example EGEE) are covered.

2.2.1 Protocol
The protocol is of  low importance as the  JDL  job submission system and the joined internal services of the P-GRADE Portal hide the protocol from the user.
In that case the job submission  is performed by the  Broker support. See Connection to the EGEE Grids and the usage of the Broker)
2.2.2 File reference
The high level remote files can be referenced within the P-GRADE Portal by symbolical names directed to File Catalogues.
File Catalogues map the symbolical names to Grid File-s.
Grid files are not modifiable (after creation), may exists in several replicas connected by a common grid wide unique identifier "guuid" and the replicas are stored in Storage Elements.
 
There are more standards of File Catalogues. The actual type of the File Catalogue is defined by the administrator of the respective virtual_organization.
 A reference to a file catalogue - a symbolical name - begins with the prefix "lfn:" (abbreviation of logical file name) but the syntax following this prefix is different depending on the type of the File Catalogue:
Two type of File catalogues has been tested:
In both cases the user is emphatically suggested to define the environment variable "LCG_GFAL_INFOSYS" as the catalogues are accessible via the information system.
This environment variable is mostly defined by the system administrator of the UIF machine i.e. on the same machine where the P-GRADE Portal server runs.
However, it is possible that the working nodes  (CE -s) where the actual jobs run miss this setting. In that case the operations relating remote files will fail.

The user should put this setting  manually in the JDL part of  the  Job Properties window. See Figure 10.8
The value of this setting may differ in different VO-s. Please check it in the UIF machine  with the instruction

set | grep LCG_GFAL_INFOSYS

 

Typical  values are at the time of  writing of this manual:

lcg-bdii.cern.ch:2170                for  the  VO  voce
bdii.phy.bg.ac.yu:2170              for  the VO  seegrid

2.2.2.1 LFC file catalogue
            The file name here has a fix hierarchical form:

                                            /grid/<VO>/<Username>/[<LFC_Catalog_Directory_Name>/]...<fileName>

             where the   <LFC_Catalog_Directory_Name>-s must refer existing catalogue directories having been defined by proper LFC commands[4].
              
 
            See Figure 8.3 as example.
 
            In connection with the usage of LFC catalogue the special setting of two environment variables is required:


            It is very important that it is the responsibility of the user to set these environmental variables properly in the JDL description.  (See Figure 10.8
           

2.2.2.2 RMC file catalogue
             In this case the name is not hierarchical, but a plain string. For example:  MyTestFile_25_Nov_2005
             No user setting of environment variables is required   

        
2.2.3 File Storage
          In the EGEE  the remote (grid) files are stored in so called Storage Elements.  Local  administrators of the  sites belonging  to common virtual organization
          may have different policy about  usage of the local Storage Element.
          The user can instruct the system within the P-GRADE Portal to store the generated output file on a certain Storage Element.
          This is  a possibility of the JDL description modifiable by the Workflow_Editor.  See the   input field Output SE in the Figure 10.6
          The user can explore the available Storage Elements by two ways:
         


2.2.4. Example      





Figure 8.3
High level file definition used by the LFC catalogue


 .

IX User quotas



For the safety of the overall operation of the Portal_server the Release 2 of the P-GRADE Portal introduces the term of and manages the administration of user quotas.
User quota is a  predefined amount of the storage resources available for a User on the host machine acting as  the server of the  P-GRADE Portal
(See "Portal server" on Figure 2) .

The amount of the user quota (defined in MB) is set  by the system administrator of the P-GRADE Portal centrally:
The administrator can set different amount of storage for each user and can reset it at any time.
See the pane Quota per portal user on tab Settings which defines a common default value
and the pane User Quota listing the users with their quota limits where the administrator can define individual values:

Please remember that this pane is visible and editable only  by the administrator (user root).



Figure 9.1

Note:
In the eventual (and possibly improbable case) when user quota becomes exhausted as a consequence of the activity of the administrator
who has decreased the quota, the user will get the same warning messages as if he/she would have stepped over  the limit.
No user data will be lost but the user will be forced  to take  measures  to free  enough places.


The quota  is the highest  amount  of the valuable common storage resource which can be allocated by a user directly or indirectly:
The user can compare the permitted and used storage quota  in the  Workflow  List  window of  the  Workflow Manager. (See Figure_36 and Figure_38 )

The quota management does not guarantee the availability of the defined amount.

The only purpose of the quota management is the prohibition of excessive usage and/or of malevolent exhausting of the common storage resources.
Shortly speaking it defends first of all the system against the user, but not the user against the system.


If the quota is exhausted  the user receives a proper warning message.

Suggested user actions:

 

X Connection to the EGEE Grids and the usage of the Broker

1. General rules to submit individual jobs of a workflow  by the Broker of the EGEE

Since the  Release 2.2 of the P-GRADE portal the user can submit one or more jobs of a workflow with broker support into an EGEE like Grid [4].
However this freedom is  coupled with the installation restriction  that  the  Portal Server  ( see Figure 2)  must be set up on a so called "UIF machine" belonging to the  EGEE like Grid to be reached.

The main differences in the usage between a traditional low level Globus Grid and an EGEE like Grid  from the point of view of user are the followings:


The system recognizes a  virtual organization  with broker support  if two conditions for the Name  defined in the window  "GRID configurations"  (See Figure 6.1)  and   selected as Grid  in the window "Job properties"  (Figure 10.1) are uphold:
In this case the button  JDL Editor... of the Job Properties windows becomes sensitive. (See Figure 10.1) and the Resource information has no significance.




For a more detailed usage of the JDL language please consult with [3]

2. JDL Editor details

2.1 Opening the JDL Editor






Figure 10.1

2.2 Setting retry count





Figure 10.2

In this window only Retry count (the highest number of repetitions in case of eventual errors)  can be defined.
In this and in all subsequent tabs of the JDL Editor the button View opens a different window  to show  the whole JDL file to be generated.

2.3 Checking the Sandbox





Figure 10.3

Local files of the ports and the executable of the job are copied in the proper Sandboxes.
Please observe  the proper mapping of Internal File Name  from the left hand side of Figure 10.3  and  from Executable of  the Job Properties window ( Figure_10.1)  to the right hand side of Figure 10.3 
Several system files (an envelop shall, info.tar.gz, x509up...  ) are needed  to copy the eventual  remote input files to the executing machine, and to start the  executable of the job.

Please remember that brokering and the mentioning of the eventual remote input  files in the tab Input Data  of JDL (See Figure 10.5) does not ensure in itself the access to the  remote input  files from the executable program in the working node of the CE  therefore  the implemented automatic copy  mechanism  of the P-GRADE  Portal  infrastructure is used  (See Remote_input_file_handling)

2.4 Setting Ranks&Requirements





Figure 10.4

The fields of Rank and Requirements can be filled  according to the rules of the JDL. It is free text from the point of view of the portal server and the checking of the syntax will be done by the broker and the eventual errors will be returned in the standard Error Output channel run time.

2.5 Checking Input Data





Figure 10.5

2.6 Setting optional Storage Element in Output Data





Figure 10.6

If the job has a proper remote output reference then  the system will deliver it automatically to the proper destination.
The user can define a destination Storage_Elements in the text field of Output SE: In the absence of this definition a default "near" one will be used.

2.7 Setting the Environment Variables eventually needed on the Working Nodes of the Computing Element






Figure 10.7
The next window shows a typical setting to reach lfc catalogue on the worker node:
                                                                           


Figure 10.8

2.8 Example of  "misuse" : Direct a job to a dedicated site




Figure 10.9

2.9 Important notice to MPI submission

Because of a well known problem of the LCG information system the MPI submission for the time being needs the following user entered requirement extension of  in the tab Rank&Requirement of the JDL:

  (other.GlueCEInfoLRMSType == "PBS") || (other.GlueCEInfoLRMSType == "LSF")

XI Rescuing the workflow

The execution of a workflow may fail for many reasons. In general, however, this means that some part of the workflow had completed already and only the left part has to be executed for the completeness of the workflow. In such cases it saves time and CPU time if the user can examine what might have gone wrong, do modifications, such as reallocating the failed job to a proper resource, and then resubmit the non-finished jobs of the workflow. This mechanism is supported in P-GRADE Portal from Release 2.2 and is called rescuing. Currently before rescuing a workflow the user can modify the resources of a job in the Workflow Editor or can adjust the certificate belonging the resource  in the Certificates tab of the portal.

The general assumption is that the code our workflow is tested, and the genuine  input files and especially the eventual remote input files  do not change during  the period the  error  is detected and the failed jobs are restarted. Shortly speaking Rescuing may help to overcome difficulties having arisen due to broken resources and invalid certificates.


Please read the next step-by-step guide for getting familiar with the Rescue function as a portal user.

  1. Workflow status: rescue



    Figure 11.1

    The submitted  job "Count3" of the workflow "demo-RESCUE" has failed for some reason, and the workflow status has changed for rescue, which means that the user may modify the workflow and then may attempt to let it run further by pressing the button Rescue.
    Please note that the execution of the workflow will stop only then when there is no more independent job to be executed.

  2. Read the log for possible reasons



    Figure 11.2

    The user reads the error log belonging to the failed job and identifies the authentication problem at the given resource. He decides to launch the Workflow Editor in which he can reallocate his job to a working resource, see this in the following step.


  3. Modify the workflow: reallocating the failed job


    Figure 11.3

    The user reaches  the workflow  (by button Attach Figure 11.1 )which is now in Rescue mode ( stopped job painted blue). He opens up the job properties window for the problematic Count3 job:



    Figure 11.4

    Then  the user changes   the resource  in the window job properties to a properly working one.



    Figure 11.5



    Finally, in the Workflow menu the user saves his modification with the  menu item Save resources, which stores his modification on the server side.




  4. Rescuing the workflow



    Figure 11.6
    In the window Workflow Manager  the "continue button" Rescue in this state  is appearing . Clicking the button Rescue the previously failed  job "Count3" starts running on the new resource.  The already finished jobs Count1 and Count2 will  not be resubmitted! 


    Figure 11.7


     




  5. Workflow finished



    Figure 11.8


    With modifying the resource the user could Rescue his workflow, which then successfully completed only by executing the non-finished jobs and preserving the results of the finished jobs from the first attempt.




XII. Welcome Menu

Since the Release 2.2 of the P-GRADE Portal a new Welcome  portlet greats the user logged in .
In this menu the user can customize the portal and can alter own role, personal data, and first of all the original  password  received from the system  administrator.




Figure 12.1 Welcome menu

XIII Workflow archive service

An existing workflow can be saved from the Workflow Manager list  of the Portal Server  and stored in the local file system belonging to the user's Desktop Machine and can be uploaded from there in the reveres order subsequently. See Figure 2  (arrows Workflow/Storage/Download  Workflow/Upload) for overview and Figure 13.1 for the actual usage:


Figure 13.1

  1 Saving the definition of  a workflow  and clearing the temporary parts:


Clicking on the operation Storage  (Figure 13.1)  opens the storage list showing  the workflows can be saved:




Figure 13.2


Three parts  of  a workflow  can be handled   independently:
  • Under column Workflow the definition part of  a workflow is accessible.

    Download selects a workflow and opens the Download Manager of the browser, by which the user can  define a destination in the local file system  in order to download the  definition of the selected workflow  in form of a compressed file.
    The  saved workflow can be retrieved later from the local file system
    (See paragraph 2. Uploading the definition of a workflow to modify / resubmit or uploading the content of a trace fie for visualization:)
    Please note that the workflow is saved in its current  state i.e. with its eventual temporary files.
    If you do not need this please apply set init:

    set init is an auxiliary operation to discard the temporal files have been generated during eventual previous workflow submissions.

    Both cases -with an without set init - may have own merits:
    Saving the workflow in the state as it was facilitates the subsequent  investigation of a spoiled run by an expert (For example to discriminate user, portal and Grid related errors in complicated cases)
    Saving the workflow bringing it to the init state minimizes the information needed to save the  definition of the workflow. This option will be suggested if the user  wants to migrate the workflow to a different  user, to a different portal, or wants to save it intending to resubmit or edit it in the future.

  • The operations under column Trace are optional and depending on the existence of  the trace file .
    As trace files may be of substantial size they can be Downloaded or  Deleted separately.

  • Under the column Output  there is no Download option as this functionality is available under the
    Workflow/Workflow Manager tag. Here only the output of a workflow can be Deleted from the Portal  server machine.
Please note that in the forth column ALL the button Delete is visible only  if the  workflow is inactive i.e. the workflow is not in the Workflow / Workflow Manager list

2. Uploading the definition of  a workflow to modify / resubmit  or  uploading the content of a trace fie for  visualization:


Clicking on the operation Upload  (Figure 13.1)  opens the set of file browsers  to define the paths of the saved files in the user's desktop environment
to be uploaded in the Portal Server:




Figure 13.3

The input field of  Workflow archive must refer to one of the compressed files have been previously stored by the Storage/ Workflow Download
operation. (See paragraph 1_Saving_the_definition_of_a_workflow)

Demo Workflows  are prefabricated example/test applications to be  uploaded. See more detailed in the next section

Important notice:
 
The result of the successful Upload from a Workflow archive operation will not be visible immediately in the Workflow / Workflow Manger list.
However it appears both in the Storage list, and in the Open list of the Workflow_Editor.  Therefore  user following the successful Upload should

  1. enter the Workflow Editor in tab Workflow / Workflow Manger (Figure 13),
  2. Open the workflow list in the Workflow Editor, select requested Workflow 
  3. Save it on the server (Figure 32)
  4. (and hit the Refresh button on the Workflow / Workflow Manger  tab)
(See the arrows EDITOR/Open , EDITOR/Save|Upload of  Figure 2 for overview )


3. Uploading of the demo applications

The Demo Workflows section of  Figure 13.3 shows the available prefabricated demo applications.
These generally test the P-GRADE Portal and the current environment (certificates, settings and the Grid).
The names and numbers of the displayed test applications may be different from that shown by Figure 13.3 , and they may be reset by the portal Administrator.
The user  can either select one  application  (by the radio button  confirmed by OK button ) or all the available Demo Workflow applications (by the Upload all button).
The selected applications will  appear  in the Workflow Manager list just after the user manually modified them by the  Workflow Editor.
However it is not  guaranteed  that the  application  will  be associated with the  proper  resources, and can be submitted imediately.
The inexperieneced Portal user is suggested  to follow the next steps:
  1. Select an application  by  the radio button and confirm the upload with OK.
  2. Control the success of Upload reading the Message line
  3. Control the existence of  valid proxy certificate in the Certificate tab
  4. Control the existence of  required Grids/ resources in the Setting tab
  5. Control the association of  Grids to the selected valid proxy certificate in the Certificate tab
  6. Select the tab Workflow/ Workflow Manager
  7. Select the button Workflow Editor 
  8. Use the menu item open in the WE window  toaccess and download the demo application.
  9. In the appearing WE graph open each job of the application:
    Select  one of the resources has been defined/checked in (4) , and conform the changes by OK
  10. Save the workflow by SaveAs..
  11. Submit the workflow  with the proper button of the tab Workflow / Workflow Manager

3.1 The Equation Solver application

This application solves the n  (in our example 5)  dimensional equation system A*x = B
See details here [5]   
The Figure 13.3  contains four versions of the  the common  workflow  prepared for two different  virtual organisations, and  discriminating in each the  direct (static) and  dynamic (Broker associated)  resource reservations.
The expected  results of x (approximations of the vector [1,2,3,4,5] ) can be read out the simplest way by  hitting the Out button of  column Logs belonging to the line of Job Multip_B in the detailed view of the  the submitted  workflow within the Workflow Manager portlet. 


XIV Parameter Study - Mass Workflow Processing


1 Introduction

One of the most frequent users favored  ways  of  exploiting  the services of a computational grid  is when the user wants to solve such problems where sets of  inputs must be applied to a single algorithm.

The name of the scenario is Parameter Study when
  •  the algorithm is independent from the input - i.e. the same code represented by the algorithm can be applied to any  member of the input set-, and
  • the outputs -equal in cardinality with the input set  - will be evaluated/elaborated in a later phase (eventually by a differenet algorithm) 

The inputs of these generally  exploring/searching  tasks  need not  be  the members of a single set representing  one of the possible characteristics of a feature but of several  sets with different kind of features as well . In this case t all combinations of actual characteristics of different features  must be studied.

For example,  if  we have two independent features, set1 and set2 where the members of set1   are {  c11, c12 , c13 } and the  members of  set2 are {c21c22 } then combination of possible  actual charcteristics compose a new set { {c11,c21}, {c11,c22}, {c12,c21}, {c12,c22},{c13,c21}, {c13,c22}}
having the cardinality  computed  by the  multplication of the cardinality of the base sets  (Descartes product), in our case 3*2 = 6  .

The members of this combination  must be  applied one by one  to the algorithm which in our case yields 6 independent runs each with two parametrized input  values.
We will use the term PS Set (or Parameter Study Set ) for each of the  independent feature sets.

2. Basic principles

The Workflows created  with the help of the P_GRADE Portals  are ideally suitable to serve as the representation of the mentioned algorithm because the load of the  executions can be distributed in the Grid.  The simplest way we regard a tested P-GRADE workflow a  black box  and "pump" in it the members of the combined inputs. To do that efficiently the user must be careful  to submit the jobs belonging to the parametric workflows with the assistance of the Broker  whenever it is possible.
The workflows defined together with their PS_Set(s)  are called as  Parameter Study Workflows or PS Workflows


The subseqeent  Chapter 3 contains the properties of the basic parameter study  introducing the idea of the PS Input Port.
Chapter 4 deals with the advanced parameter study introducing the term of two specialized  job types  Generator and  Collector  and the term of two new Port types associated  to the new jobs: PS Output Port and Collector PS Input Port.



3. Basic Parameter Study: Implementation

3.1 Preparation of  parameters and results

As the black box principle is used it is only the interface - the definition of the parameterized input files  and  the  placement of the output of an executed workflow - which  must be defined slightly differently  compared to a  "normal"  workflow.

As the cardinality and  size of inputs (and of outputs) may by  considerable the implementation decision was that these files must be stored remotely, for example, in  Storage Elements if  an  EGEE like VO is involved.
Hence, the obligatory convention is that each independent feature set - PS Setmust be represented as a sub directory (PS Subdirectory) within the scope of  the selected  remote storage system, and the files found in these sub directories  are regarded to be the members of the respecting PS_Set
Similar convention is valid for the results. They will be dropped in a user defined sub directory as independent compressed files and they will be identified by an automatically generated file name extension containing the indices of current  member(s) of the PS Set(s)  determining the run of the workflow involved.

It follows that it is the repsponsibility of the user, that:
  • The input  sub directories (often represented by Grid File catalogues) should exist before the  submission of the Parameter Study;
  • The input member files should be existing  before the  submission of the Parameter Study; and they may not be changed in number during the whole elaboration process not to spoil the indexing system  controlling the  elaboration process.
  • The members of the input files of  a PS Set should be identical in structure because they will be elaborated by a common code.
  • A  PS_Subdirectory  must  contain neither any other files for different purposes nor directory entries.
To preserve resources  all parts of a successfully elaborated member workflow of a PS must be cleared from the P-GRADE Portal server. Consequently -not to loss information -  the result of  a single  workflow -  containing only  the Permanent Local Output files in case of "simple" workflows - must be extended by  the eventual log messages of  jobs ( and of the execution engine)  had been directed either to the standard output or to the standard error channels.


 

3.1.1 Input connection

PS_Set can be connected to any job  by  the  modification of a "common" input Port into  the  so called PS Input Port.
The PS Input Port feature can be selected by the toggle "Switch to PS"  accessible by the right click on the icon of the input Port (See Figure 14.1)
Note that the PS Input Port is indicated by dark green color.



Figure 14.1 Selecting input Port as PS Port


The definition of a PS_Input Port differs from the definition  of a common input  port  only in that respect, that in the former case  the sub directory  of  the remote files representing  the PS_set  must be defined in the field Directory instead of the input field File of the later.  (See Figure 14.2)




Figure 14.2 PS Input Port definition

It must be obvious to the reader up to now that  during the elaboration of  the Parameter Study  the subsequent input file within the PS_Subdirectory will be copied as  the  Internal File Name states to the local working directory of the execution system  and referenced as such by the Open statement of the executable of the respecting  job.

The actual syntax of the field Directory is dependent on the kind of the remote file

It can be
  • one of the several File Catalogue formats if the  high level (EGEE like) remote file handling is used (as the Figure 14.2 shows)
     
  • or must be a common URL  if low level (Globus 2 like) remote file handling is used.
Please remember that in connection with  high level grid file catalogues  the proper environment variables must be set in the JDL description of the respecting Job as  Figure 10.8 indicates. 

3.1.2 Result connection

A new submenu item PS properties within the Workflow Editor (See Figure 14.3) opens the window where the placement of the result  must be defined (See Figure 14.4).



Figure 14.3 PS result window selection

In this window the Grid ( or Virtual Organization in case on an IGEE like grid) and within that the Output Directory containing the results  must be defined. See Figure 14.4


Figure 14.4 Result container definition


The rules governing the  syntax of the Output Directory  are the same as were in the case of the PS Port Directory.

The Portal tries to generate the Output Directory upon the input field automatically if the user  has not defined it previously.
If the directory refers to a LCG_2 like Grid catalogue then LCG Catalog Type and  the URL of the LFC Host must be defined as well. (See 2.2.2_File_reference)

3.2  Submitting and observing a  Parameter Study task

Upon the existence of  the defined PS_port(s) the system recognizes a saved and  stored  workflow as a PS_Workflow. In the Workflow Manager list these workflows are distinguished by the buttons  PS_Detailes  (See Figure 14.5)




Figure 14.5 PS_Workflow in Workflow Manager List


Hitting the button Submit the one by one execution  of the members of  PS workflow will be started:
The system calculates the Descartes product  determining the number of total submissions,  associates  the  proper input files to the  next member workflow item - the so called element Workflow (or eWorkflow) - and tries to submit it.
To avoid the overloading  of the Portal Server and of the Grid infrastructure there is an  upper limit  of  eWorkflows which can be "living" in parallel (eWorkflow buffer).
If an eWorkflow terminates or fails  without  hope to be resubmitted by the manual rescue operation  it will be cleared from  the  eWorkflow_buffer (and  from P-GRADE Portal server at all) and the system automatically submits the next eWorkflow and this process continues until the whole Portal Study task  terminates or the eWorkflow_buffer will be filled  with  eWorkflows need user interaction.

Hitting the  button PS Details the user can observe and control the eWorkflow_buffer (See Figure 14.6)



Figure 14.6 The eWorkflow Buffer



The new  Statistics bar informs the user about the current state of the whole  PS  Task:

  • Total is the number of the Descartes_product, i.e. it is  the static number of independent eWorkflows to be executed within the framework of the PS Task.
  • Init is the number of eWorkflows waiting -in the virtual queue of the P-GRADE Portal - for the submission.
  • Submitted is the number of actual eWorkflows being processed  and not expecting - without eventual Abort- user interaction.
  • Error is the number of  eWorkflows failed without rescue possibility.
  • Rescue is the number of eWorkflows which expect user interaction in order to the computation can be continued.
  • Finished is the number of eWorkflows terminated properly.

From the definition above follows that  these categories ( not considering the first) mean mutually excluding states of  an eWorkflow and  the equation

Total  = Init + Submitted + Error + Rescue + Finished 
is hold.

Note:
A special case occurs if the eWorkflow is terminates properly but the Grid infrastructure is unable to copy the compressed file representing the result to the destination remote storage. In this case the button "Error" appears in the column "Log". Hitting the button the text message containing error report about the respecting eWorkflow(s)  can be read. In this case the  respecting eWorkflow(s)  will be accounted as "Finished" in the Statistics but the eWorkflow(s)  will  not be cleared from the P-GRADE portal  as in normal case. Consequently, after the termination of the whole Parameter Study task the result(s) of the eventual remnant eWorkflow(s) can be downloaded from the Portal server as a single compressed file similar to the case of  a common workflow having local output results.  This circumstance will be indicated by the  traditional green triangle in the OUTPUT column  in the Workflow Manager window.

The button Suspend  has increased importance in case of  an eWorkflow:  The  probability    that  a job will be  assigned  to a  bad or  overloaded  resource  rapidly  increases  with the cardinallity of the of the PS parametres (and with the complexity of the Workflow).
Hittng the button  Details  -  following the black box principle -  leads to the traditional detailed view  to the eWorkflow being processed:




Figure 14.7 An eWorkflow in detaled view

There is a slight difference between Figure 14.7 and Figure_40a :  Some action buttons are missing here as the state of a single eWorkflow  can not be graphically animated  in the Workflow Editor, and  the  deleting  is not permitted here  to  prevent the user to kill inadvertently the whole PS_Workflow  task.
The Abort instruction refers  only to the eWorkflow.

3.3 Result Evaluation

3.3.1 Results of PS worflows

After the succeessful run of the  whole PS workflow task the  eWorkflow s can be feched from the defined subdirectory.

Example:

The result can be listed - using the definiton of the   Figure_14.4  for the placement  -  by the following IUF machine command:
 
lfc-ls /grid/seegrid/hermann/EQU/OUTPUT

Ax_EQUAL_B_PS.1.zip
Ax_EQUAL_B_PS.2.zip
Ax_EQUAL_B_PS.3.zip
Ax_EQUAL_B_PS.4.zip
Ax_EQUAL_B_PS.5.zip
Ax_EQUAL_B_PS.6.zip


3.3.2 Results of  remote output files defined in PS workflows


Special consideration is needed if some of the output files of the original Workflow are remote files. In this case  no special measure must be taken by the user. However the user must  know that in this case the system generates a unique output file  for each  eWorkflow. The  the prefix of the names for these files will be the same as the File defined in the respecting port properties window,  and the postfix of the file will be the name of the eWorkflow  (Workflow name + instance number) .

Example:
The job Multp_B of  of the workflow Ax_EQU_B_from_A_GEN_Collector   Figure_14.8 has one ouput port defining a remote file having the name
                         "lfn:/grid/seegrid/hermann/PS/EQU_AGEN_11_10/Multip_B_10/out".
After the execution of the whole  Parameter Study  task, where there are two PS parameters  - as you can see it  by  by Tortal generated items  of  Figure_14.13 - ,
the IUF machine terminal command

lfc-ls /grid/seegrid/hermann/PS/EQU_AGEN_11_10/Multip_B_10

will encounter the following files:

out.Ax_EQU_B_from_A_GEN_Collector.1
out.Ax_EQU_B_from_A_GEN_Collector.2


As you see out is the prefixout.Ax_EQU_B_from_A_GEN_Collector is the name of the workflow, and 1, and 2 are the instance numbers denoting the respecting instances of the runs.

4. Advanced Parameter Study : Generators and Collectors




Figure 14.8 Generator (Job 5) and Collector (Job Collector)

4.1 Overview

As a Parameter Study (PS) executes the same operation over a (usually large) set of inputs and produces a (usually large) set of outputs, the obvious question arises: ”How can be inputs for a PS generated and how can be outputs produced by a PS evaluated?”  These tasks can be done manually. However, for a wide class of problems support can be given for the user  to tackle these problems in an automated way. This can be done by the introduction of two new types of jobs with the following features:

The Generator job generates a set of input files and puts them on a remote storage represented by a PS_Output_port. These files are used by the subsequent Parameter Study jobs of the workflow. Therefore, the Generator job must have a new kind of output port: PS Output port. This port type is discussed later in detail.

The Collector job processes a set of outputs produced by the preceding Parameter Study jobs in a single unit. Therefore collector jobs start to run after every Parameter Study job of the whole PS -i.e. eachl eWorkflow - has been terminated. 
A Collector job can be connected to a Parameter Study job by a special connection. The source of such a connection is a remote output port linked to the Parameter Study job, where any  job is  a Parameter Study job  within a PS workflow with the exception of Generator(s) and Collector(s).  The destination of the connection is a special PS Input port called Collector PS Input Port linked to the Collector job. Both ports represent the same directory on a remote Grid storage.

 

4.2 Overall Semantics

 

The overall execution of a PS Graph is divided in three subsequent steps:

 

  1. If there is any job of Generator type it will be executed just before the step (2) and the results will be stored as the proper PS Output Port describes. (A consequence is that Generator job(s) must be  the root element(s) of the directed acyclic workflow graph, i.e. no other job can be defined as a predecessor of  a Generator job)
  2. The whole Parameters Study Task (represented by the PS jobs whose kind is neither Generator nor Collector) will be executed  for each  member of the  Parameter Set.
  3. If there are Collector jobs then they will be executed once over the set(s) of output files represented by one or more remote directories. (A consequence is that Collector job(s) must be the leaves of the directed acyclic workflow graph, i.e. no other job can be defined as a successor of a Collector job).

4.3 Generator Job detailed.

Note:
In this chapter two different applications will be used as demonstration, the Advanced version of Ax_EQUAL_B_PS (See Figure 14.2 ) the Ax_EQU_B_PS_A_GEN 
( See Figure 14.8and a  simple  one, the A_GEN_EXAMPLE


There are two kinds of generators:

  1. The general type is characterized by free semantics i.e. the binary executable for the component is provided by the user – as in case of “traditional” jobs - and  it is the responsibility of the user provided the executable to generate the set of outputs according to the conventions determined by the PS Output port.
  2. The output set will be generated by generator codes that are part of the Portal. These generator components can be controlled by the user. (Via key and parameter values.)
    See 4.3.1_Auto_Generator

In both cases one and only one PS Output port defines the name, storage and delivery conventions for the generated files.



Figure 14.9 PS Output port


  The PS Output port defines 3 properties:

 

  1. Directory:

    It is the subdirectory on the remote storage where the element files will be stored.The naming conventions are the same as for PS Input Ports, i.e. LFC Grid file catalog and Globus GridFTP URL-s are allowed.

  2. Internal file name:

    Internal file name is a prefix part of name of files to be generated by the job executable. The postfix part of the filenames must be different for each file and must start with a dot (“.”) separator character.Example:If the Internal File Name  is “OUTPUT”, the number of parameters is 2 then the names of the files generated by a Generator component will be { OUTPUT.1,  OUTPUT.2 }

  3. Managed copy:

    This option has significance only if the executable for the generator is given by the user (i.e. the Generator job is not an Auto Generator job).If the option is turned on the system assumes that the files will be generated by the generator binary executable on the worker node where the generator runs. In this case the system takes over the responsibility of copying the generated files to the remote destination defined by the PS output port Directory. If the “managed copy” option is turned off then the user’s executable is responsible for generating the Grid Files and copying them into the destination Directory on a remote storage.  In this case the naming convention for the “Internal file name” can be overruled, as the system now is not directly related to the generated files, thus it is not sensitive to the file names.


Let it be emphasized that the PS_Output Ports of Generators must be connected to PS_input Ports of PS jobs and the “Directory” values of the connected ports must be the same. (One port inherits this value from another port when the two ports are connected together.)

In the Workflow Editor an existing Job can be redefined as Generator ( or Collector) if and only if there is at leaset one defined PS Input Port in the defined Workflow:




Figure 14.10 How to define a Generator (or Collector)


 4.3.1 Auto Generator

The Auto Generator (AG) is a special convenience job tailored such a way that a user can create and modify a whole set of parameter files using a built in macro processor.
An Auto Generator job can be defined from an existing -general-  Generator Job:




Figure 14.11 How to define an Auto Generator Job from a common Generator

The AG has the following features distinguishing it from other Job wrappers:

 

  • It has no input port and just on predefined PS Output Port
  • The defined file set creation will be executed on the Portal Server Machine and not in a remote resource
  • Its semantics is predefined by the macro processor and can be controlled by user parameters determining the content and number of the files to be created.

The use of the Auto Generator is suggested first of all in case of  the legacy applications where   complicated input files may be required with eventual format restrictions and/or reference such input data structures where internal coherence of data must be ensured.

4.3.1.1 Formal definition

The formal description of the generation process - assuming ASCII files - is the following:

 The base of the generation is a template called Input file text . The Input file text is a arbitrary sequence of final strings  and keys.

Final strings will be copied in the result files without any changes.

Keys must be associated with non empty finite sets.

The  elements of finite sets  will be encountered and substituted within the Input_file_text in place of the keys  in turn, such a way that for each substitution combination a new file will be created. However, within a given output file the multiple occurrence of a certain key will be replaced everywhere by the same value.

Example-Part 1:

Let be the  Input_file_text  aXbYcX where X={2,3} Y={6,7} are the keys  and a,b,c the final_strings .

The generated four files will contain the following strings:

 a2b6c2”, “a3b6c3”, “a2b7c2”, “a3b7c3”.

 

4.3.1.2 Representation of the macro

      The macro representation consists of two logical parts:

  

The actual representation of the Input file text is an editable input text window. It canbi found in the "Job properies" window  of the Auto_Generator job :



Figure 14.12 Input file text definition (Application Ax_EQU_B_PS_A_GEN)


The template defined by the Input file text:

  • can be edited, or
  •  uploaded from the local file system with the help of the file browser represented by the button Load from File...
The names of keys  will be separated from the final_strings by the editable Left and Right Parametric key delimiters.
Hitting the button Parse the content of  found keys  will be  parsed and the found keys will be listed under Keys:.

Double clicking on a member of the mentioned Keys list a new window will be opened where the elements of the set represented by the given key can be defined:




Figure 14.13 Definition of a finit set associated to a key (Application Ax_EQU_B_PS_A_GEN)


 
The set can be defined in the Value Definition pane by one of the following methods:

  • Encountering the members (selecting the radio button Set)
  • Reading the members from a defined local input file where the button Browse... opens the File browser assisting the search (selecting the radio button Set from local  File).  In both mentioned cases (Set and Set from local  File) there is no type restriction on the values and the values are delimited by the Separator value.
  • Range can be applied only for numbers and the elements are gained by the semantic of a classical DO cycle.
  • Random uses a built in random generator, where the Seed value, the size of the set (Cases) and the lower and upper range of the generated number values(From:,To:) should be defined.

The generation of the key values is performed  upon hitting the button Generation and the user can visually control the defined set  in the table Generated items.

 The generated elements are represented as ASCII characters and –as inputs of eventual legacy Applications - can be defined to be accordance with a more restricted format restriction:

The common length of the string representing the values  must be defined if the toggle Free format is not set. In this case the toggle Left aligned determines whether the eventual empty spaces will fill the right or the left side of string representing the given value.

For REAL numbers there are further format conversion possibilities making them readable for programs expecting inputs with  C and FORTRAN, Java  format conventions.


Example-Part 2:

Using the example  of the previous chapter, the generation may be defined as Figure 14.14  shows where the files to be created may have the form as the table Generated example files displays:



Figure 14.14  Auto Generator (Application A_GEN_EXAMPLE)



As there is one PS_Output_port  for the output generation there will be just one set of files.

If  the Grid is an LCG brokerable type then a  Storage Element and the Environment variables must be associated to the Remote File  defined in the PS Output port. This can be defined by the  button Attributes Editor  (See Figure_14.12) . 
Attribute editor opens a new window for two tabs, one for the definition for the Storage Element (See Figure 14.15) and one for the definition of the  host for the Grid File Catalog  (See Figure 14.16)

Please note, that the definition of the Output SE  is obligatory  if  an LCG_2 like  Grid File Catalogue  based directory is defined.




Figure 14.15 Storage definition for Auto Generator Outputs
(
Application Ax_EQU_B_PS_A_GEN)



Figure 14.16 Host definition for Grid File Catalogue
(Application Ax_EQU_B_PS_A_GEN)

4.3.2  Common Generator

The properties of a common Generator are the same as those of  a general job. The Common Generator may have any number of common input ports as well. The only restriction is, that the job must have just one output port and it should be a PS Output Port

4.3.3   Result of the generation - by an example

The generation of the output files - as it was mentioned in Chapter 4.2 - will be started as the first step  followning the submission of the PS Workflow. 
See the pane Jobs in generator phase of   Figure 14.17 .  This pane appeares in the list PS workflow details  only if at least a Generator job has been defined.




Figure 14.17  PS  Workflow  Detailed in submission state  after  sccessful generation
(Application A_GEN_EXAMPLE)


Example-Part3:

Let us suppose that the Internal File Name is “OUTPUT” as the Figure 14.9 shows then the file generation of the previous example  will look like as the following table shows. The names of the generated Files can be seen by hittng the button Out  in the line of the job AgenEx ( See  Figure 14.17)


File names in the  catlogoue /grid/seegrid/hermann/PS/EQU_AGEN_/

content
OUTPUT.0.0 a2b6c2
OUTPUT.0.1 a2b7c2
OUTPUT.1.0 a3b6c3
OUTPUT.1.1 a3b7c3

Table: Generated example files

4.4 Collector Job detailed:




Figure 14.18 A job of collector (COLL) type


See Figure_14.8 for the detailes of the job named as Collector and having the type name COLL.
This type can be selected by right mouse button clicking on  a job icon in a WE window as Figure_14.10  shows.

The semantics of a Collector_job is determined by the binary Job Executable provided by the user for this component, similarily to the Common Generator

It is the task of the user defined executable - see the example of Figure_14.18  MatrixDemoWithCollector.exe - to encounter,open, read, and evaluate each input file defined by the Collector PS Input port(s).


Important notice:

 If the user in the Workflow_Editor connects  the input  port of a collector job to an output port of a PS_job then input port automaticaly changes to be a Collector_PS_Input_Port. Its color indicated by dimmed light green differing from the green of the common input ports. 

You remember   that the connected output port of the PS job must refer to a remote file! 
As the section  3.3.2_Results_of_remote_output_files explanes:  In this case the ouput port to which the Collector PS Input Port is connected to implicitly defines a grid file subdirectory where the results of the  PS are  gethered.


See the detailes of  a Collector PS Input port on the next figure:




Figure 14.19 Collector PS Input port


If the toggle “managed copy” is set then the P-GRADE  Portal will automatically copy the remote files in to the working direcrtory of the machine executing collector job. The generated names of these local files will be structured as follows:
The common prefix of the names is defined  by the Internal File Name  and the postfix  of the names will be inherited from the names of the postfix part of the remote files (workflowname + instance number , See 3.3.2_Results_of_remote_output_files).

Example

Using the settngs of  the Figure_14.19 -and remembering  3.3.2_Results_of_remote_output_files  - the user can expect the following input files in his/her local working directory to be opened:

INPUT1.Ax_EQU_B_from_A_GEN_Collector.1
INPUT1.Ax_EQU_B_from_A_GEN_Collector.2

 

If the toggle managed copy is not set, it is the responsibility of the Job Executable of the Collector job to read the grid files.

 

4.5 Short case study

Let us suppose that the user defined a complew PS Workflow consisting of  more Generators (in our case a user defined and an Autogenerator) and more Collectors as the Figure 14.20 shows:




Figure 14.20 Complex PS Workflow with more Generators and Collectors

After the termination of the Generators the 3 main parts of the Workflow manager can be observed on the snapshot Figure 14.21 rendering the Detailed view of the PS _Workflow:
  • Jobs in generator phase lists the Generator jobs with their states.
  • eWorkflow list shows the submit pool with the  generated element Workflows which are just running -and may be manipulated while they do not leave the pool (either by successful termination or by user Abort). This list is headed by the Statistics disussed earlier in  Chapter 3.2
  • The members of the Jobs in collector  are inactive at the moment as the number of eWFs in the states Finished + Error  has not reached the value of Total.




Figure 14.21 Intermediate State: the  eWorkflows are running



The Figure 14.22 shows the state when  all  of  the Collectors have been terminated :


Figure 14.22 Terminated PS Workflow


Note that the eWorkflow list is empty as all the eWorkflows have been elaborated.

5. PS Persistency

During  a long PS-WF experiment - theoretically it may last weeks or months long -  it may occour that the portal  must be stopped and  restarted by the  Portal  administrator.  The PS has been designed such a way that even in this case the most improtant user results would  not be lost:
Each output of the generated eWorkflows are landing  on  a Storage Element, so the terminated  eWF-s  are  preserved  and  the user  can  resume the execution of the PS via the Submit button. Only the results of the eWF-s which were running in the moment of the shutdown will be lost.
However, this restricted  damage will be repaired  as these eWF-s will be resubmitted upon the mentioned resume operation.
Shortly speaking the P-GRADE Portal  guaranties the eWF level checkpinting  in case of a Portal brakedown  instead of the more fine granulated  job level  one.
 

XV References   


    [1] Mercury monitor: 

   http://www.lpds.sztaki.hu/mercury/

    [2] P-GRADE: 

   http://www.lpds.sztaki.hu/pgrade/

     [3] Job Description language How To. December 17th, 2001

            http://server11.infn.it/workload-grid/docs/DataGrid-01-TEN-0102-0_2-Document.pdf


     [4] EGEE User Guide 

              https://emds.cern.ch/file/454439//LCG-2-UserGuide.html
                     
               [5 ]  Equation Solver  application
                        http://www.lpds.sztaki.hu/pgportal/v23/includes/Equation_Solver.html