C2110 UNIX and programming Lesson 12 / Module 1 PS / 2020 Distance form of teaching: Revl Petr Kulhánek kulhanek@chemi.muni.cz National Center for Biomolecular Research, Faculty of Science Masaryk University, Kamenice 5, CZ-62500 Brno 10 UNIX and programming Lesson 12 / Module 1 AWK http://www.gnu.org/software/gawk/gawk.html AWK is a scripting language designed for text data processing, whether in the form of text files or streams. The language uses string data types, associative field (arrays indexed by string keys) and regular expressions. adapted from www.wikipedia.org 10 UNIX and programming Lesson 12 / Module 1 Obsah > AWK • Conditions, logical operations • Run control (next, exit) • Loops • Arrays 10 UNIX and programming Lesson 12 / Module 1 -3- Conditions if( logical_expression ) { command2; • • • } else { command3; } If logical_expression is true, command2 will be executed. Otherwise, command3 will be executed. Example: Differences from BASH if ( $1 > max ) { max = $1; } 1 if commandl; ti command2 else command3 fi 10 UNIX and programming Lesson 12 / Module ll Logical operators Operators: —— equal • not equal < smaller than <= less than or equal > greater than >= greater than or equal i a negation && logical and J J logical or Exampley: j > 5 (j > 5) && (j < 10) (j <= 5) II (j >= 10) 10 UNIX and programming Lesson 12 / Module 1 -5- Exercise 1 1. Write a script that prints the largest and smallest value from the third column of the matice.txt file. 2. Write a script that prints lines that contain nine words from rst.out file. 3. Write a script that calculates the average value of the numbers listed in the second column of the matice.txt file. The data is in the directory: /home/kulhanek/Documents/C2110/Lessonl2 10 UNIX and programming Lesson 12 / Module 1 Exercise 7 1. Write a script that calculates the geometric center of the molecule stored in the format xyz. The resulting coordinates will be printed to the terminal. The file name is entered by the user as the first argument of the script. Take care of situation when the wrong number of arguments is specified and the specified file does not exist. The input file is in a directory geom. Help: • File format xyz contains the number of atoms on the first line , any comment on the second line and the next lines contain the element of the atom and its x, y and z coordinates. • You can use any combination of commands lines cat, wc, head and tail to discard the first two lines. Alternatively, get inspired on the manual pages of the command tail or apply conditions in awk. 10 UNIX and programming Lesson 12 / Module 1 Running control - next user script script.awk Key word next terminates the processing of the current record. The next entry continues. 0 UNIX and programming Lesson 12 / Module 1 -8- Key word exit stops processing of the current record and all subsequent files. NIX and programming Lesson 12 / Module 1 -9- Exercise 2 1. Extract temperature profile and calculate its average value from file rst.out. Compare the calculated value with the average value given in the file rst.out. Why do the values differ? 10 UNIX and programming The data is in the directory: /home/kulhanek/Documents/C2110/Lessonl2 Lesson 12 / Module 1 -10- Loops for(inicialization; condition; change) { commandl; } Example: for(I=l;I <= 10;I++){ sum = sum + $I; } Differences from BASH 10 UNIX and programming Lesson 12 / Module 1 Exercise 3 1. Write a script that sums the values of all the numbers listed in the matice.txt file. 2. Write a script that prints the number of words that the file rst.out contains. Verify the result with the command wc. 10 UNIX and programming The data is in the directory: /home/kulhanek/Documents/C2110/Lessonl2 Lesson 12 / Module 1 -12- Arrays AWK uses associative arrays. The array has a name, the elements of the array are accessed using a key. The key can have any value and type. The key can be the value of a variable. Value assignment: my field[key] = value; Obtaining value: value = my field[key]; It is not recommended to use real numbers as keys! Variable: A H the variable contains only one value A= 5; print A; Associative array: AR g field may contain more values, but only one for each key. AR[9] = 5; AR["a"] = 10; print AR[9], AR['*a"]; 10 UNIX and programming Lesson 12 / Module 1 Arrays - Examples Examples: i = 5; my_array[i] = 15; print my_array[i]; a = "word"; my_array[a] = "value"; print my array["word"], my array[5]; Practical use: BEGIN { count = 0; } { } data[count++] = $1; script prints number of values in column 1 and then their values END { print count; for(i=0; i < count; i++){ print data[i]; } } 10 UNIX and programming Lesson 12 / Module 1 Exercise 4 2. The structurel.dat file contains the name of the element and the position of the atom on each line. Write a script that converts the file to a format xyz and saves it as structurel.xyz. View the converted structure in VMD. Verify the generality of the solution by converting the structure2.dat file. C -1.8164140 3.6071310 0.6117350 C -1.8002910 2.2769110 0.4584060 C -0.6436270 4.3094580 -0.0124580 Molecule display: $ module add vmd $ vmd structurel.xyz first line: number of atoms second line: any comment 20 * / molecule C -1.8164140 3.6071310 0.6117350 C -1.8002910 2.2769110 0.4584060 C -0.6436270 4.3094580 -0.0124580 The data is in the directory: /home/kulhanek/Documents/C2110/Lessonl2 10 UNIX and programming Lesson 12 / Module 1 Self-study 10 UNIX and programming Lesson 12 / Module 1 Arrays,... Browse the key list: for( variable in array) { print array[variable]; • • • } Delete records with key: delete array[key]; Executes loop body for each key that was used to store the value in array. The key value is stored in variable. ATTENTION: order of the keys is not specified and thus may not correspond to order of inserting elements into array 10 UNIX and programming Lesson 12 / Module 1 -17-