C2110 UNIX and Programming 11th lesson awk Petr Kulhánek kulhanek@chemi.muni.cz National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 5, CZ-62500 Brno I UNIX and Programming 11th lesson Test 11 10 UNIX and Programming 11th lesson Test 11 > Test through ROPOT in IS (Rope)! = Revision, Opinion Poll and Jesting) Student - ROPOT - C2110 - Test 2c Time limit - 20 minutes. Only one set of questions can be built. Continuously save your answers. Evaluation can be done only once. It is allowed and recommended: • to test commands in the terminal. • to search in the manual pages, in your notes and presentations of the course. • when in doubt, ask the teacher. It is not allowed: • to communicate with another person except the teacher. 10 UNIX and Programming 11th lesson -3- Contents > AWK • What is AWK? • Script structure, Process execution • Block structure * Variables, Operations with variables * Conditions * Loops I UNIX and Programming 11th lesson AWK http://www.gnu.org/software/gawk/gawk.html AWK is a scripting language designed for processing text data, whether in the form of text files or streams. The language uses the string data types, associative arrays (indexed by string keys) and regular expressions adaptováno z www.wikipedia.org 10 UNIX and Programming 11th lesson Text file analysis 54 .7332 295. 7275 128. 4090 -508. 1302 -155. 6037 0. 0000 51 .3204 292. 3619 176. 5980 -494. 7423 -164. 7991 0. 1822 40 . 6154 273. 9238 164. 5827 -488. 9232 -163. 0629 0. 3793 52 .5044 281. 5944 153. 4570 -484. 6533 -168. 5328 0. 3528 62 .5486 294. 2701 155. 3607 -483. 6872 -169. 1747 0. 0033 Potential function: ntf = 2, ntb ipol = 0, gbsa dielc = 1.00000, cut 0, igb = 5, nsnb 0, iesp = 0 999.00000, intdiel = 1.00000 25 10 UNIX and Programming 11th lesson Text file analysis record \ field of the record 54.7332 295.7275 128.4090 -508.1302 -155.6037 0.0000 3I. 40.6154 52.5044 62.5486 zvz.jbiy 273.9238 281.5944 294.2701 17 b.öyöu 164.5827 153.4570 155.3607 4 y4. 488.9232 ■484.6533 483.6872 -ib4./yyi -163.0629 -168.5328 -169.1747 field of the record Potential -rrbr- function record ipol dielc 2, ntb 0, gbsa 1.00000, cut 0, igb = 5, nsnb 0, iesp = 0 = 999.00000, intdiel = 1.00000 U.ItíZZ 0.3793 0.3528 0.0033 25 10 UNIX and Programming 11th lesson Text file analysis R4 7-3-39 9QR 797R n nnnn 51.3204 40.6154 52.5044 62.5486 292.3619 176.5980 -494.7423 -164.7991 273.9238 281.5944 294.2701 164.5827 153.4570 155.3607 488.9232 ■484.6533 483.6872 163.0629 168.5328 169.1747 0.1822 0.3793 0.3528 0.0033 Potential function ntf 2, ntb 0, Lgb 5, nsnb 25 dielc 1.00000, cut 999.00000, intdiel = 1.00000 10 UNIX and Programming 11th lesson Process of script execution BEGIN { /PATTERN/ { J END { BEGIN block (1) is executed (if it is part of the script) before the file analysis. • Record is read from the file. By default, the record is one line of the analyzed file or stream. The record is split into fields. By default, the fields are individual words of the record. • Block (2) is executed for the given record. • If the record matches the pattern, the block (3) is executed. • ... potential execution of other blocks ... Block END (4) is executed (if it is included in the script) after analysis of the whole file. Structure of AWK script BEGIN { } { } /VZOR/ { } END { executed before processing the file (optional section) executed for each record executed for each record matching the PATERN any number of sections executed after processing the file (optional section) The block is enclosed in curly braces {}. Program blocks as shown are optional. Line is set as a record in default setting. 10 UNIX and Programming 11th lesson Example input.txt 54.7332 295.7275 128.4090 -508.1302 -155.6037 0.0000 40.6154 52.5044 62.5486 zvz.jbiy 273.9238 281.5944 294.2701 script.awk print $2; one simple block 164.5827 153.4570 155.3607 4 y4. /4ZJ 488.9232 ■484.6533 483.6872 ib4.vyyi 163.0629 168.5328 169.1747 U . 0.3793 0.3528 0.0033 awk —f script.awk input.txt or awk 1{ print $2; }1 input.txt 295.7275 292.3619 273.9238 281.5944 294.2701 10 UNIX and Programming 11th lesson Block structure comment starts with character # # This block counts subtotal and analyzes # value of the fourth column # this is comment i = i + 1; f = f + $2; # here i counts subtotal printf("Subtotal is %10.3f\n",f); if( $3 == 5 ) { k = k + $4; Commands are placed on separate lines. It is appropriate to end the line with the semicolon despite awk does not require this. Semicolon must be used when two or more commands are placed on one line. } 10 UNIX and Programming 11th lesson Variables Assignment to a variable: Must not contain spaces print A + C; print B; Special variables: NF B = "this is text" Diffe » ences from BASH C = 10.4567; I D = A + C; r A= =5 Value of variable: echo $A Value of variable using $ number of fields in the current record NR index number of current record FS field separator, in default it is space and tabulator RS record separator, it default it is character for new line \n $0 whole record $1, $2, $3 ... individual fields of the record 10 UNIX and Programming 11th lesson Variables,... $0 whole record $1, $2, $3 ... individual fields of the record character $ allows for program access to individual fields of the record Example: i=3; print $i; \ prints the value of third column 10 UNIX and Programming 11th lesson -14- Launching of AWK scripts Processing of text file: result is printed on the screen Indirect launch: $ awk -f script.awk input.txt \ " analyzed text file awk script language interpreter Analyzed data can be sent through standard input: $ awk -f script.awk < input.txt $ cat input.txt | awk -f script.awk I UNIX and Programming 11th lesson Launching of AWK scripts,... Direct launch: $ ./script.awk input.txt $ ./script.awk < input.txt $ cat file.txt | ./script.awk #!/usr/bin/awk -f / { i += NF; } END { print "Number of words : 11, i ; File script. awk must have set flag x (executable) and interpreter (part of the script). } 10 UNIX and Programming 11th lesson Exercise 1. In your home directory, create directory awk-data 2. Copy file matice.txt, produkt.log, and rst.out to directory awk-data from directory /home/kulhanek/Documents/C2110/Lessonll. 3. Write a script which will print the second column from the file matice.txt. 4. Write a script which will print the second and fourth column from the file matice.txt matice = matrix produkt = product (chemistry) 10 UNIX and Programming 11th lesson Mathematical operations If a variable can be interpreted as an integer, following arithmetic operators can be used: ++ value of the variable is increased by one A++; value of the variable is decreased by one A--; + sums up two values A = 5 + 6; A = A + 1; += adds value to variable A += 3; A += B; subtracts two values subtracts value from variable A = 5 - 6; A = A - 1; A -= 3; A -= B; ★ multiplies two values multiplies variable by value A = 5 * 6; A = A * 1; A *= 3; A *= B; divides two values A = 5 / 6; A = A / 1; /= divides variable by value A/= 3; A/= B; 10 UNIX and Programming 11th lesson Command print Command print serves for non-formatted output of strings and numbers. Syntax: print valuel[,] value2[,] ...; Examples: i = 5; k = 10.456; j = "value of variable i ="; print j, i; print "value of variable k k; I UNIX and Programming 11th lesson if values are separated by a comma, values are separated by a space in the output Exercise 1. Write a script which will sum numbers in the second column of the file matice.txt. 2. Write a script which will print the number of lines of the file matice.txt. Verify the result by using wc command. 3. Write a script which will print the number of words in the file matice.txt. Verify the result using wc command. 4. Write a script that will calculate the average value of the numbers in the second column of the file matice.txt. 10 UNIX and Programming 11th lesson -20- Function printf Command printf serves for formatted output of strings and numbers. Syntax: printf("format", valuel, value2, ...); "Number %5d has value %03d' in this place put value2 in the given format in this place put valuel in the given format Comparison with BASH: printf [format] [valuel] [value2] ... ^===:::::::::::s»8s^i arguments are separated by space command 10 UNIX and Programming 11th lesson Conditions if(logical_expression) { command2; • • • } else { command3; } If logical_expression is true, then command command2 is executed. Otherwise command command3 is executed. Example: if ( $1 > max ) { max = $1; } Comparison with BASH _J_ if commandl; ti command2 else command3 fi 10 UNIX and Programming 11th lesson Logical operators Operators: < <= > >= I a && equal to not equal to lower than lower than or equal to greater than greater than or equal to negation logical yes logical or Examples: j > 5 (j > 5) (j <= 5) && (j < 10) II (j >= 10) 10 UNIX and Programming 11th lesson Loops for(initialization; condition; change) { commandl; } Příklad: for(I=l;I <= 10;I++){ sum = sum + $I; } Comparison with BASH W4 for((initialization/condition/change)); do commandl done j 10 UNIX and Programming 11th lesson Exercise 1. Write a script which will print the largest and smallest values of the third column in the file matice.txt. 2. Write a script which will print lines from the rst.out file that contain nine word on the line. 3. Write a script which will sum all values in the file matice.txt. 10 UNIX and Programming 11th lesson -25-