Job configuration for Galaxy 19.01 or higher

Simple configuration

The following is a simple job configuration sample that you can use to get started.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
        <plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
    </plugins>
    <destinations default="galaxycloudrunner">
        <destination id="local" runner="local"/>
        <destination id="galaxycloudrunner" runner="dynamic">
            <param id="type">python</param>
            <param id="function">cloudlaunch_pulsar_burst</param>
            <param id="rules_module">galaxycloudrunner.rules</param>
            <param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
            <!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
            <param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
            <!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
            <param id="pulsar_runner_id">pulsar</param>
            <!-- Destination to fallback to if no nodes are available -->
            <param id="fallback_destination_id">local</param>
            <!-- Pick next available server and resubmit if an unknown error occurs -->
            <resubmit condition="unknown_error and attempt &lt;= 3" destination="galaxycloudrunner" />
        </destination>
    </destinations>
    <tools>
        <tool id="upload1" destination="local"/>
    </tools>
</job_conf>

In this simple configuration, all jobs are routed to GalaxyCloudRunner by default. This works as follows:

  1. If a Pulsar node is available, it will return that node.
  2. If multiple Pulsar nodes are available, they will be returned in a round-robin loop.
  3. You can add or remove Pulsar nodes at any time. However, there’s a caching period (currently 5 minutes) to avoid repeatedly querying the server, which will result in a short period of time before the change is detected by the GalaxyCloudRunner. This has implications for node addition and in particular removal. When adding a node, there could be a delay of a few minutes before the node is picked up. If a Pulsar node is removed, your jobs may be routed to a dead node for the duration of the caching period. Therefore, we recommend attempting a job resubmission through the resubmit tag as shown in the example. See Additional Configuration and Limitations on how to change this cache period.
  4. If no node is available, it will return the fallback_destination_id, if specified, in which case the job will be routed there. If no fallback_destination_id is specified, the job will be re-queued till a node becomes available.

To burst or not to burst?

In the above example, all jobs are routed to the GalaxyCloudRunner by default. However, it is often the case that jobs should be routed to the remote cloud nodes only if the local queue is full. To support this scenario, we recommend a configuration like the following.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
        <plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
        <plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
    </plugins>
    <destinations default="burst_if_queued">
        <destination id="local" runner="local"/>
        <destination id="burst_if_queued" runner="dynamic">
            <param id="type">burst</param>
            <param id="from_destination_ids">local,drmaa</param>
            <param id="to_destination_id">galaxycloudrunner</param>
            <param id="num_jobs">2</param>
            <param id="job_states">queued</param>
        </destination>
        <destination id="galaxycloudrunner" runner="dynamic">
            <param id="type">python</param>
            <param id="function">cloudlaunch_pulsar_burst</param>
            <param id="rules_module">galaxycloudrunner.rules</param>
            <param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
            <!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
            <param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
            <!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
            <param id="pulsar_runner_id">pulsar</param>
            <!-- Destination to fallback to if no nodes are available -->
            <param id="fallback_destination_id">local</param>
            <!-- Pick next available server and resubmit if an unknown error occurs -->
            <resubmit condition="unknown_error and attempt &lt;= 3" destination="galaxycloudrunner" />
        </destination>
    </destinations>
    <tools>
        <tool id="upload1" destination="local"/>
    </tools>
</job_conf>

Note the emphasized lines. In this example, we route to the built-in rule burst_if_queued first, which determines whether or not the cloud bursting should occur. It examines how many jobs in the from_destination_ids are in the given state (queued in this case), and if there are above num_jobs, routes to the to the to_destination_id destination (galaxycloudrunner in this case). If bursting should not occur, it routes to the first destination in the from_destination_ids list. This provides a simple method to scale to Pulsar nodes only if a desired queue has a backlog of jobs. You may need to experiment with these values to find ones that work best for your requirements.

Advanced bursting

In this final example, we show how a complex chain of rules can be used to exert fine-grained control over the job routing process.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
        <plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
        <plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
    </plugins>
    <destinations default="burst_if_queued">
        <destination id="local" runner="local"/>
        <destination id="burst_if_queued" runner="dynamic">
            <param id="type">burst</param>
            <param id="from_destination_ids">local,drmaa</param>
            <param id="to_destination_id">burst_if_size</param>
            <param id="num_jobs">2</param>
            <param id="job_states">queued</param>
        </destination>
        <destination id="burst_if_size" runner="dynamic">
            <param id="type">python</param>
            <param id="function">to_destination_if_size</param>
            <param id="rules_module">galaxycloudrunner.rules</param>
            <param id="max_size">1g</param>
            <param id="to_destination_id">galaxycloudrunner</param>
            <param id="fallback_destination_id">local</param>
        </destination>
        <destination id="galaxycloudrunner" runner="dynamic">
            <param id="type">python</param>
            <param id="function">cloudlaunch_pulsar_burst</param>
            <param id="rules_module">galaxycloudrunner.rules</param>
            <param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
            <!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
            <param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
            <!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
            <param id="pulsar_runner_id">pulsar</param>
            <!-- Destination to fallback to if no nodes are available -->
            <param id="fallback_destination_id">local</param>
            <!-- Pick next available server and resubmit if an unknown error occurs -->
            <resubmit condition="unknown_error and attempt &lt;= 3" destination="galaxycloudrunner" />
        </destination>
    </destinations>
    <tools>
        <tool id="upload1" destination="local"/>
    </tools>
</job_conf>

Jobs are first routed to the built-in burst_if_queued rule, which determines whether the bursting should occur. If it should, it is then routed to the burst_if_size destination, which will check the total size of the input files. If they are less than 1GB, they are routed to the galaxycloudrRunner destination. If not, they are routed to a local queue.

Job configuration for Galaxy versions lower than 19.01

Simple configuration

The following is a simple job configuration sample that you can use to get started.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
        <plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
    </plugins>
    <destinations default="galaxycloudrunner">
        <destination id="local" runner="local"/>
        <destination id="galaxycloudrunner" runner="dynamic">
            <param id="type">python</param>
            <param id="function">cloudlaunch_pulsar_burst_compat</param>
            <param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
            <!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
            <param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
            <!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
            <param id="pulsar_runner_id">pulsar</param>
            <!-- Destination to fallback to if no nodes are available -->
            <param id="pulsar_fallback_destination_id">local</param>
        </destination>
    </destinations>
    <tools>
        <tool id="upload1" destination="local"/>
    </tools>
</job_conf>

In this simple configuration, all jobs are routed to GalaxyCloudRunner by default. This works as follows:

  1. If a Pulsar node is available, it will return that node.
  2. If multiple Pulsar nodes are available, they will be returned in a round-robin loop.
  3. You can add or remove Pulsar nodes at any time. However, there’s a caching period (currently 5 minutes) to avoid repeatedly querying the server, that will result in a short period of time before the change is detected by the GalaxyCloudRunner. This has implications for node addition and in particular removal. When adding a node, there could be a delay of a few minutes before the node is picked up. If a Pulsar node is removed, your jobs may be routed to a dead node for the duration of the caching period. Therefore, we recommend a job resubmission through a resubmit tag. However, Galaxy versions prior to 19.01 do not support resubmissions for Pulsar, and you may need to change the cache period to zero to handle this scenario. See Additional Configuration and Limitations on how to change this cache period.
  4. If no node is available, it will return the fallback_destination_id, if specified, in which case the job will be routed there. If no fallback_destination_id is specified, the job will be re-queued till a node becomes available.

Note that you must manually add the galaxy rule as described here: Configuring Galaxy versions lower than 19.01

To burst or not to burst?

In the above example, all jobs are routed to the GalaxyCloudRunner by default. However, it is often the case that jobs should be routed to the remote cloud nodes only if the local queue is full. To support this scenario, we recommend a configuration like the following.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
        <plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
        <plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
    </plugins>
    <destinations default="galaxycloudrunner">
        <destination id="local" runner="local"/>
        <destination id="galaxycloudrunner" runner="dynamic">
            <param id="type">python</param>
            <param id="function">cloudlaunch_pulsar_burst_compat</param>
            <param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
            <!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
            <param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
            <!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
            <param id="pulsar_runner_id">pulsar</param>
            <!-- Destination to fallback to if no nodes are available -->
            <param id="pulsar_fallback_destination_id">local</param>
            <param id="burst_enabled">true</param>
            <param id="burst_from_destination_ids">local,drmaa</param>
            <param id="burst_num_jobs">2</param>
            <param id="burst_job_states">queued</param>
        </destination>
    </destinations>
    <tools>
        <tool id="upload1" destination="local"/>
    </tools>
</job_conf>

Galaxy versions prior to 19.01 do not support chaining dynamic rules, and therefore, we have provided a single monolithic rule that can handle both scenarios.

Note the burst_enabled flag, which will activate the bursting rule. This rule will determine whether or not the cloud bursting should occur. It examines how many jobs in the burst_from_destinations are in the given state (queued in this case), and bursts to pulsar only if they are above burst_num_jobs. If bursting should not occur, it routes to the first destination in the from_destinations list. This provides a simple method to scale to Pulsar nodes only if a desired queue has a backlog of jobs. You may need to experiment with these values to find ones that work best for your requirements.

Advanced bursting

In this final example, we expand this compound rule to also filter jobs by size.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
        <plugin id="drmaa" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner">
        <plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
    </plugins>
    <destinations default="galaxycloudrunner">
        <destination id="local" runner="local"/>
        <destination id="galaxycloudrunner" runner="dynamic">
            <param id="type">python</param>
            <param id="function">cloudlaunch_pulsar_burst_compat</param>
            <param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
            <!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
            <param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
            <!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
            <param id="pulsar_runner_id">pulsar</param>
            <!-- Destination to fallback to if no nodes are available -->
            <param id="pulsar_fallback_destination_id">local</param>
            <param id="burst_enabled">true</param>
            <param id="burst_from_destination_ids">local,drmaa</param>
            <param id="burst_num_jobs">2</param>
            <param id="burst_job_states">queued</param>
            <param id="dest_if_size_enabled">true</param>
            <param id="dest_if_size_max_size">1g</param>
            <param id="dest_if_size_fallback_destination_id">local</param>
        </destination>
    </destinations>
    <tools>
        <tool id="upload1" destination="local"/>
    </tools>
</job_conf>

Enable the dest_if_size_enabled flag as highlighted to filter by size. This will make sure that the job is routed to Pulsar only if the total size of the input files are less than 1GB. If not, they are routed to dest_if_size_fallback_destination_id, which in this case, is a local queue.