Enriched Genie Interaction Extraction Challenge Data Format 1 File Structure LLL challenge training data and the linguistic information are represented as follows. The file consists of the following fields (one field by line) : - ID : unique identifier of the Pubmed abstract that contains the sentence and the sentence position number - sentence : the original sentence - words : sequence of the sentence words - agents : list of the agents of the genie interactions - targets : list of the targets of the genie interactions - genic^interactions : list of the interactions described in the sentence. - lemmas : list of identified canonical form of words - syntactic^relations : list of the syntactic relations in the sentence. See the Syntactic Analysis Guidelines for more information about this field 2 Field Structure A tab separates each element of a field : ID The ID field contains the abstract PubMed ID (PMID) which the sentence is extracted from and it contains the sentence number in this abstract. ID (tabulation) 11011148-1 SENTENCE This field contains the sentence. sentence (tabulation) ykuD was transcribed by SigK RNA polymerase from T4 of sporulation. WORDS, AGENTS, TARGETS, GENIC_INTERACTIONS, LEMMAS, SYNTACTIC RELATIONS Other fields are organised according to the following format : Field^Name (tabulation) predicatel (argument 1 _ 1 ,argument 1 _ 2,...) (tabulation) precUcate2(axgument2_l,argument2_2,...) (tabulation) 1 EXAMPLE WORDS words word(0,'ykuD',0,3) word(l,'was',5,7) word(2,'transcribed',9,19) word(3,'by',21,22) word(4,'SigK',24,27) word(5,'RNA',29,31) word(6,'polymerase',33,42) word(7,'from',44,47) word(8,'T4',49,50) word(9,'of',52,53) word(10,'sporulation',55,65) 3 Predicate Description WORD The predicate "word" refers to a word of the sentence and accepts four arguments : word (id _ word, 'string _ word' .start _ word ,cnd _ word) id^word integer, unique word id stringy word string, the actual word start^word integer, position of the first character in the sentence (starting at 0) end^word integer, position of the last character in the sentence (starting at 0) AGENT The predicate "agent" refers to the agent of the genie interaction. It accepts one argument agent (id _ word) id^word integer, id of the word the agent refers to TARGET The predicate "target" refers to the target of the genie interaction. It accepts one argument : target (id_ word) id^word integer, id of the word the target refers to GENIC INTERACTION The predicate "genic^interaction" refers to an interaction between an agent and a target genie _ interact ion (id _ word 1, id _ word 2) id^ id wordl word2 integer, id of the word the agent refers to integer, id of the word the target refers to LEMMAS The predicate "lemma" refers to the normalized form (lemma) of a word, lemma (id _ word,' st ring _ lemma') id^word stringy lemma integer, id of the word the lemma refers to string,the lemma of the word SYNTACTIC RELATION The predicate "relation" refers to the normalized form (lemma) of a word. See the Syntactic Analysis Guidelines for more information. rela t ion ('string _ rela t ion', id _ word 1 ,id _ word 2) string^relation string, the information contained in a syntactic relation (function of the relation :morpho-syntactic nature of the 2 words) id^wordl integer, id of the first word (the head) linked by the relation id_word2 integer, id of the second word (the expension) linked by the relation 2 4 Example ID sentence words lemmas syntactic _ relations agents targets genic^interactions 10747015-5 Localization of SpoIIE was shown to be dependent on the essential cell division protein FtsZ. word(0,'Localization',0,ll) word(l,'of',13,14) word(2,'SpoIIE',16,21) word(3,'was',23,25) word(4,'shown',27,31) word(5,'to',33,34) word(6,'be',36,37) word(7,'dependent',39,47) word(8,'on',49,50) word(9,'the',52,54) word(10,'essential',56,64) word(ll,'ceH',66,69) word(12,'division',71,78) word(13,'protein',80,86) word(14,'FtsZ',88,91) lemma(0,'localization') lemma(l,'of) lemma(2,'spoIIE') lemma(3,'be') lemma(4,'show') lemma(5,'to') lemma(6,'be') lemma(7,'dependent') lemma(8,'on') lemma(9,'the') lemma(10,'essential') lemma(ll,'cell') lemma(12,'division') lemma(13,'protein') lemma(14,'ftsZ') relation('comp^of :N-N',0,2) relation('mod^att :N-ADJ',13,10) relation('mod_pred :N-ADJ',0,7) relation('mod^att :N-N',14,13) relation('mod^att :N-N',12,11) relation('mod^att :N-N',13,12) relation('comp_on :ADJ-N',7,14) agent (14) target(2) genic_interaction(14,2) 3