Galaxy
Galaxy is an open source, web-based platform for data intensive biomedical research. A universal
GUI and workflow manager for lots of tools.
Official website: https://galaxyproject.org/main/
There are approximately 120 public servers, dozens of private academic servers and even several
commercial servers. Lots of which are highly specialised or modified as Galaxy project is
completely open access.
Anyone can use one of public servers, with or without an account, but Galaxy user accounts are
simple to create (email, password and go!). Advantages are: increased data quotas, more resources
available, parallel jobs and extended functionality across sessions (i.e., Histories), such as
naming, saving, sharing, and publishing.

Usegalaxy.org
The main Galaxy server at http://usegalaxy.org
●combines many common tools with data sources;
●is available since 2007 for anyone to analyze their data free of charge;
●provides substantial CPU and disk space, making it possible to analyze large datasets;
●supports thousands of users and hundreds of thousands of jobs per month (see Statistics);
●sustained by TACC hardware using allocation generously provided by the CyVerse project.
Notation:  History, Dataset, Tool panel, Job, Workflow, Visualization, Library


CEITEC Galaxy server
Public Galaxy is quite versatile and flexible (but not that much).
You can always install your own Galaxy instance. Advantages:
●(simple) installation of additional tools, visualizations, etc.
●own computational resources and personalised quotas in data storages
●easy to share datasets, workflows, etc.
●adjustable access security level (from no authentication to fully restricted access)
Our Galaxy server can be found here:
https://galaxy.ceitec.muni.cz/
To login please use your university UCO and secondary password

Advantages and disadvantages of command line and Galaxy

CEITEC Galaxy server

Advantages and disadvantages of command line and Galaxy

Upload data


Upload data - sources


Upload data into History


Upload data into History


History
Size of History
Total usage across the Histories
Reload, Advanced options, Multy-history panel
Search field (looks in names, tags, comments, metadata)
History name (adjustable anytime, recommended to use)
Multi-dataset operations, History tags, History annotation
Dataset with ID and name

History - advanced options
Lot of useful features:
●Copy whole History
●Share or Publish History
●Extract Workflow
●Permanent delete of History or Datasets
●Export citations from used Tools

Dataset
View content of Dataset
Edit attributes (name, datatype, access)
Delete Dataset (not permanently)
Edit tags of Dataset (very usefull)
Edit attributes (name, datatype, access)
Delete Dataset (not permanently)
HID - History ID of Dataset (increasing)
Name of Dataset (usually name of uploaded file or description of job result, strongly recommended
to change)
Information about length, format and associated genome database
Download Dataset
Show Details of Dataset or Job
Show available Visualizations
Show Help page
Peek into Dataset (first several lines)
Adjustable Info panel

Dataset content


Dataset attributes


Dataset details
Each Dataset is a result of a Job using some Tool from Tool panel (here are useful information
about performed Job)

Tool panel
Search field of Tool panel (lloks into names, descriptions, metadata)
Tool panel with many (named) groups of tools.
Each group can be unrolled into a list of tools

Command-line Tool

Example of STAR alignment in command line - very “nice” for non-bioinformaticians

Galaxy - STAR alignment
Galaxy Tool

EXAMPLE of STAR alignment in Galaxy - very NICE for non-bioinformaticians

Command-line vs Galaxy
Galaxy
●is build on top of the bioinformatic (and not only) tools providing non-bioinformaticians an
access to the tools
●puts a graphic interface on the top of the command-line making it much more user-friendly
●allows sharing of the data, workflows, results, visualizations, etc.
●allows to repeat and edit Job settings easily
●Is great for small-scale analyses
●does not support all types of tools (e.g., online tools)
Command-line is more flexible
For large amount of data or unsupported tools it is still better to use command-line

Advantages and disadvantages of command line and Galaxy

Job - setup


Job - execution


Job - error
Rerun job (not only failed one)
Report failed Job (please, do!)
Indicator of failed Job
Piece of Error message

Report Job error


Data Libraries

Copy the links which start with ftp:// and end with .fastq.gz
If the link doesn’t start with ftp:// Galaxy might have problem to upload it
If you are sure the link you copied is an ftp link you can just add the ftp:// in front of the link
so it then looks like (most likely) as ftp://ftp.xxx.fastq.gz where xxx is the actual path to the
file on the ftp server

Data Libraries


Data Libraries
1.
2.
3.

Workflow
Workflow or pipeline is an automatisation of multi-step analysis
Set (or tree) of tools taking the input from the output of another tool (except the very first)

Workflow


Workflow - build
Tools can be drag&dropped from Tool panel and connected together only if output and input Dataset
format matches
Parameters,
Annotation,
etc.

Workflow
- execution


Saved Histories
Special panel
(next slide)

Multi-history panel


Multi-history panel - drag&drop copy Datasets


Visualization
Depends entirely on Dataset format