Galaxy Galaxy is an open source, web-based platform for data intensive biomedical research. A universal GUI and workflow manager for lots of tools. Official website: https://galaxyproject.org/main/ There are approximately 120 public servers, dozens of private academic servers and even several commercial servers. Lots of which are highly specialised or modified as Galaxy project is completely open access. Anyone can use one of public servers, with or without an account, but Galaxy user accounts are simple to create (email, password and go!). Advantages are: increased data quotas, more resources available, parallel jobs and extended functionality across sessions (i.e., Histories), such as naming, saving, sharing, and publishing. Usegalaxy.org The main Galaxy server at http://usegalaxy.org ●combines many common tools with data sources; ●is available since 2007 for anyone to analyze their data free of charge; ●provides substantial CPU and disk space, making it possible to analyze large datasets; ●supports thousands of users and hundreds of thousands of jobs per month (see Statistics); ●sustained by TACC hardware using allocation generously provided by the CyVerse project. Notation: History, Dataset, Tool panel, Job, Workflow, Visualization, Library CEITEC Galaxy server Public Galaxy is quite versatile and flexible (but not that much). You can always install your own Galaxy instance. Advantages: ●(simple) installation of additional tools, visualizations, etc. ●own computational resources and personalised quotas in data storages ●easy to share datasets, workflows, etc. ●adjustable access security level (from no authentication to fully restricted access) Our Galaxy server can be found here: https://galaxy.ceitec.muni.cz/ To login please use your university UCO and secondary password Advantages and disadvantages of command line and Galaxy CEITEC Galaxy server Advantages and disadvantages of command line and Galaxy Upload data Upload data - sources Upload data into History Upload data into History History Size of History Total usage across the Histories Reload, Advanced options, Multy-history panel Search field (looks in names, tags, comments, metadata) History name (adjustable anytime, recommended to use) Multi-dataset operations, History tags, History annotation Dataset with ID and name History - advanced options Lot of useful features: ●Copy whole History ●Share or Publish History ●Extract Workflow ●Permanent delete of History or Datasets ●Export citations from used Tools Dataset View content of Dataset Edit attributes (name, datatype, access) Delete Dataset (not permanently) Edit tags of Dataset (very usefull) Edit attributes (name, datatype, access) Delete Dataset (not permanently) HID - History ID of Dataset (increasing) Name of Dataset (usually name of uploaded file or description of job result, strongly recommended to change) Information about length, format and associated genome database Download Dataset Show Details of Dataset or Job Show available Visualizations Show Help page Peek into Dataset (first several lines) Adjustable Info panel Dataset content Dataset attributes Dataset details Each Dataset is a result of a Job using some Tool from Tool panel (here are useful information about performed Job) Tool panel Search field of Tool panel (lloks into names, descriptions, metadata) Tool panel with many (named) groups of tools. Each group can be unrolled into a list of tools Command-line Tool Example of STAR alignment in command line - very “nice” for non-bioinformaticians Galaxy - STAR alignment Galaxy Tool EXAMPLE of STAR alignment in Galaxy - very NICE for non-bioinformaticians Command-line vs Galaxy Galaxy ●is build on top of the bioinformatic (and not only) tools providing non-bioinformaticians an access to the tools ●puts a graphic interface on the top of the command-line making it much more user-friendly ●allows sharing of the data, workflows, results, visualizations, etc. ●allows to repeat and edit Job settings easily ●Is great for small-scale analyses ●does not support all types of tools (e.g., online tools) Command-line is more flexible For large amount of data or unsupported tools it is still better to use command-line Advantages and disadvantages of command line and Galaxy Job - setup Job - execution Job - error Rerun job (not only failed one) Report failed Job (please, do!) Indicator of failed Job Piece of Error message Report Job error Data Libraries Copy the links which start with ftp:// and end with .fastq.gz If the link doesn’t start with ftp:// Galaxy might have problem to upload it If you are sure the link you copied is an ftp link you can just add the ftp:// in front of the link so it then looks like (most likely) as ftp://ftp.xxx.fastq.gz where xxx is the actual path to the file on the ftp server Data Libraries Data Libraries 1. 2. 3. Workflow Workflow or pipeline is an automatisation of multi-step analysis Set (or tree) of tools taking the input from the output of another tool (except the very first) Workflow Workflow - build Tools can be drag&dropped from Tool panel and connected together only if output and input Dataset format matches Parameters, Annotation, etc. Workflow - execution Saved Histories Special panel (next slide) Multi-history panel Multi-history panel - drag&drop copy Datasets Visualization Depends entirely on Dataset format