Overview User's input Files Custom variable types Homework 4. Text parsing Jan Dugäcek November 10, 2017 Jan Dugäcek 4. Text parsing Q Overview • Overview Q User's input • getchar() • scanf() • Exercises Q Files • Reading files • Writing files • Exercises Q Custom variable types • struct • union • enum • Combinations • Shortening • Exercises Q Homework Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework • You know quite a bit about this already • We'll study how to open files, read them and write new ones o We'll learn a few useful tricks and about parsing files that are not human-readable • This is mostly an exercise for pointer usage • C++ offers many tricks to make this easier, but they are now always applicable (or grossly inefficient in the situation) Jan Dugäcek 4. Text parsing Overview User's input Files Custom variable types Homework getchar() scanf() Exercises char buffer [30] ; char got; i n t i = 0 ; for ( ; i < 29 && (got = getcharQ) != '\n' && got; i++) b u ffe r[i] = got; buffer [ i ] = 0; • Here, we first define an array for storing text buffer, a variable to store read characters got and a position iterator i • Then we use the function get char () to read from input (what is written into the command line after the program is started) 9 It is read until it finds a newline symbol or the input ends or the array is full • After the cycle, a terminating character is set o get char () reads the line after a newline is pressed, until that, the program sleeps • stdio.h needs to be included for this Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework getchar() scanf() Exercises r buffer[30]; nf("%s\n" , buffer); scanf () is a function that parses text in most cases, returns the number of variables parsed It's much like a reverse printf () If the input is too long, the function writes behind the array (buffer overflow), which is one of the primary ways how systems get hacked (reading integers or floats in this way is fine) To prevent it, use function scanf _s() that accepts an additional argument after each string that contains the maximum size allowed (Microsoft-only, unfortunately) Jan Dugacek 4. Text parsing Overview User's input Files scanf() Custom variable types Exercises Homework O Write a program that reads one line of the user's input, writes it back and exits (assuming maximum text size 100) Q Write a program that reads several lines of numbers (until an empty line is inserted) and writes back their average O Write a program that reads one line of the user's input, replaces letters with capitals, writes the result and exits (assuming maximum text size 100) O Write a program that reads a 3x3 matrix (space separates numbers on one line, newline separates lines) and outputs its determinant (https://en.wikipedia.org/wiki/Rule_of_Sarrus) Advanced: O Write Angry internet poster simulator that reads input of any size, accepts a command line argument determining the percentage of words that will be capitalised and writes the result back O Write a program that reads a matrix of any size (space separates numbers on one line, newline separates lines) and outputs its determinant (https://en.wikipedia.org/wiki/Laplace_expansion) Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework Reading files Writing files Exercises FILE* file = fopen ( " f i le . txt" , " r " ); i n t n u m ber ; fscanf (file , "%i\n" , ^number); char character = fgetc(file); fclose ( fi le ); • f open() opens a file, file name (if it is in the same folder, otherwise there can be the path to the file) is the first argument, if reading, second argument is "r" o fscanf () is a version of scanf () for reading files, the only difference is that the file is placed before its first argument • Use feof () to check if there's still something to read 9 fscanf () is also potentially insecure and fscanf_s() might be useful if using a Microsoft compiler • Also, fgetcO is analogical to getc() for reading from files, it returns EOF if there is nothing left to be read • The file has a position, so multiple reads read parts that follow one after another (you can use rewindO to get to the beginning of the file) • File should be closed with fcloseQ Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework Reading files Writing files Exercises FILE* file = fopen ( " f i le . txt" , "w" ) fprintf (file , "Hello world !\nM); • fopenO opens a file, if the second argument is "w'\ the file is created (if it exists, it is cleared) and ready to be written into, if it is "a", new text will be appended to its end • fprintf () is a version of printf () for writing into files, the only difference is that the file is placed before its first argument o fprintf () is faster than printf, so it can be useful to write into files if a lot of information needs to be printed (in the order of tens of megabytes) Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework Reading files Writing files Exercises Write a program that reads numbers from a file (one number per line) and writes the largest one Write a program that writes a table of the sine function into a file (x and sin(x) separated by tab on each line) Write a program that reads one file, replaces all letters by capitals and writes that into another file Advanced: O Write a parser of simple commands that reads files and executes them, supports 26 variables (from a to z), the file name is given as command line argument and result printed, should be able to execute this: a =3 b=2+a c=b*a d=c a return 1= <\(y Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework struct union enum Combinations Shortening Exercises struct someStuff { short int index; char stuffType ; float value; }; struct someStuff a = {1, , 12.5}; o struct is a custom variable type, composed of other variable types (not necessarily primitive ones) • In memory, they are saved similarly to arrays, but the elements are named and may be of different types (with different sizes) • They may be initialised like arrays, but unlike arrays they are copied when given as function arguments Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework struct union enum Combinations Shortening Exercises struct someStuff { short int index; char stuffType ; float value; }; struct someStuff a = {1, , 12.5}; • It is important to know that variables smaller than word size (that is 8 bytes on 64-bit architectures) are stored only on addresses divisible by their size (larger ones must be on addresses divisible by word size), so index (short int, 2 bytes) will always be saved on an address divisible by 2, value (float, 4 bytes) on an address divisible by 4 and stuff Type, char, size 1 can be saved anywhere • Although index and stuffType occupy only bytes 0, 1 and 2, byte 3 cannot be occupied by value because it's not divisible by 4 and thus it will be saved at bytes 4-7 • The size of someStuff is 8 Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework struct union enum Combinations Shortening Exercises struct }; parsing { char first [4]; char end ; char second [4] ; char end2; char unparsed[] = "0245 3245"; struct parsing parsed = *((struct pa rsi n g *) u n pa rsed ) ; pa rsed .end = 0 ; parsed .end2 = 0; printf("%i, %i\n", atoi(parsed . first ) , a toi(parsed .second )) ; • Structure parsing is a custom variable type, composed of 10 variables of type char • Because it is identical to an array with 10 elements, we can covert an array to it • Named members are accessed using the . operator, if we have a pointer to the struct, we use -> instead □ @1 ► 4 = ► 4 = 1= ^) <\(y Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework struct union enum Combinations Shortening Exercises • struct may be used only in parsing of files where everything has a fixed position on its line However, programs often store data in formats that are not human readable, often in the form of struct directly saved into a file, with values mostly set over 4 bytes in IEEE 754, meaning they may contain anywhere the 0 character that ends strings; in that case, functions like f scanf are useless • To read or write files like this, use "rb", "wb" or "ab" (depending if you read, write or append) as arguments to f open • struct is incredibly useful for many other things as we shall see later Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework struct union enum Combinations Shortening Exercises union someStuff { char raw [ 8 ] ; double number; }; union someStuff a ; for (int i = 0; i<8; i++) a . raw [ i ] = fgetc ( f i I e ); printf ("Read %f\n", a.number); • union is similar to struct, but its contents are saved on the same place instead of one after another, allowing to access the same data as different types comfortably Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework struct union enum Combinations Shortening Exercises enum }; enum } logic { no = 0 , maybe = 1 , yes = 2 logic logic_a nd (enum logic a, enum logic if (a = yes && b = yes) return yes; if (a = no || b = no) return no; return maybe; b) { • enum is a custom variable type whose values are named o Using numbers instead is less readable and makes adding new values in the middle very, very troublesome 1= <\(y Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework struct union enum Combinations Shortening Exercises enum coordinate { x , y - z , coord Cou nt }; float pos i t i o n [ coord Cou nt ] ; for (int i = 0; i < coordCount; position [i] = 0; position [x] = 12; • enum is very useful for naming elements in an array • An element after the last valid one is the quantity of valid ones and can be used in iteration or array sizes Jan Dugacek 4. Text parsing Overview User's input Files Custom variable types Homework struct union enum Combinations Shortening Exercises enum }; vo i d flag { ignoreProtection truncate = 2 , backup = 4 = 1, c h a n ge F i I e ( c h a r * name, unsigned int if ( f lags&ignoreProtection ) {/> if ( f I a gs&tr u n ca te ) {/>