Identifying aboutgrams in engineering texts Introduction •n-grams, bundles, chunks, clusters •“idiom principle” •Phraseological studies - concgraming •Clusters -> as an alternative to KW -> phraseology • Concgrams •N-grams – contiguous –e.g. take part •Skipgrams – AB, A*B e.g. take no part, take an active part –Not positional variation (AB, BA) •“phrase-frames” •Concgrams –Irrespective of any constituency and/or positional variation –Starting point for quantifying the extend of phraseology Concgrams • C:\TEMP\scan712.jpg Aboutgrams •Aboutness is a product of the global patternings of a text = “macrostructure” •“aboutgrams” = word associations which are specific to a text •Aboutness –Highest freq. occuring lexical phrases comprise provisional aboutgram list –Equally or more freq. in corpus are removed –Specialized (HKEC) and general corpus (BNC) – • Analysis of data • C:\TEMP\scan711.jpg Analysis of data • C:\TEMP\scan713.jpg Analysis of data •Intercollocation of collocates –Establish unique words ( i.e. “types”) –Search for word associated with them -> 2 word concgrams –2 word concgrams then become new origin – –This process disambiguates words and a group gives a strong sense of the content, scope and argument of the document (design [of a] [for] tall building(s) – 11 instances) – – Conclusion •Meaning arise from words in particular combination •Not rely entirely on KW –Aboutness, phraseology is not utilized •Create tentative aboutgrams –Confirmed with reference to both, spec. and general reference corpus •Human intervention in creating concgrams •Can be used in learning and teaching context with students of English to raise their awareness of the centrality of phraseology