胡佳音老師 撰文 (Dr. Chia-yin Hu, Assistant Professor, Department of Foreign Languages and Applied Linguistics, National Taipei University)
Corpora and Concordancers to the Rescue
Burning the midnight oil…Deadline fast approaching… "Tick-tock, tick-tock, tick-tock," the clock is counting down. On the notebook, nothing. Two long, agonizing hours later, still nothing. As blank as the notebook always is.
Does that sound familiar? This could have happened to anyone who writes, including expert and novice writers. Writers are bound to experience writers' block, or even worse, blocks, in many points during the writing process. Experienced writers find ways to solve the problem, but it won't be easy for rookies to do so like a pro.
Now consider a different scenario.
A born genius, with so many brilliant ideas in the head, the enthusiastic writer still strives to tackle problems such as "Which word is more appropriate in this context?" "Are there any other words more suitable than this one?" The writer wishes to have a guru who could provide useful advice when encountering the problem of limited vocabulary, but there is simply none available. Acquiring books and learning new vocabulary is undoubtedly a good way to increase one’s lexicon; it is, unfortunately, neither efficient nor precise. What can this writer do?
Language corpora and concordancing tools come to rescue.
A language corpus is a database that collects authentic linguistic data, and a concordancer is a query tool that allows users to search for words in raw texts and examine the usages of the target item. Three online resources will be introduced below to illustrate how specifically language corpora and concordancing tools can help writers: the British Academic Written English Corpus (BAWE), Linggle, and AntConc.
The first resource, the British Academic Written English Corpus (BAWE) is a corpus of academic written texts by undergraduate students from various disciplines. The collection can be freely downloaded from the webpage: BAWE , or accessed via the Sketch Engine open-access interface. Frequently occurred lexical items and collocation can be extracted and if needed, further subclassified according to the semantic and syntactic characteristics, among other significant features. The extracted and classified list of vocabulary and collocation can then serve as a reliable reference and learning resource.
With the vocabulary and collocation lists in hand, student writers may still find it challenging to put the building blocks in place. This is especially the case for non-native speakers. Writers struggling to compose sentences may now turn to Linggle , the second online resource to be introduced. Linggle is a free web-based linguistic search engine developed by the Natural Language Processing Lab of National Tsing Hua University in Taiwan. Repeatedly occurring lexical bundles with authentic examples are provided after queries of a word or collocational pattern are sent. A number of wildcard symbols can be employed to maximize search results and find variations. For instance, after the query of "the study _ that” is sent, a number of search results are displayed, ranked according to frequency, including "the study found that," "the study showed that," and "the study is that." In addition to the information regarding counts and percentages, authentic examples extracted from the online database are also provided for users to see the lexical bundles used in contexts. Linggle evidently functions as a quick and valid reference for writers.
The two online resources introduced so far attempt to collect linguistic data with balanced coverage. The databases, therefore, are reliable, but often are not specific and precise enough. Users might need to pinpoint exactly which lexical items and collocations are appropriate in the designated area. Writing for a specific area requires a powerful and genre-specific and/or discipline-specific search tool, and Voilà! Here comes AntConc, available on here. AntConc is a free customized concordancer that allows users to create their own corpora. The feature 'free' can be interpreted from two perspectives. First, the concordancer is free of charge, just like the two above-mentioned online resources. More importantly, users can freely create their own databases by loading selected text files and then search for the data they need. Commonly performed concordancing functions include word list, frequency counts and KWIC (keyword in context). By using AntConc, writers can easily obtain the list of vocabulary and collocation frequently used in the designated area.
The three corpora and concordancing tools introduced above, namely, BAWE, Linggle, and AntConc, demonstrate the specific assistances writers can get from them. Writers have the chance to create the vocabulary list and collocational patterns that they truly need. By actually writing up a piece of article with these specifically customized building blocks, writers can finally learn by writing. By familiarizing themselves with the genre-specific and discipline-specific lexicon, not only experienced writers but also novice writers will be able to compose pieces of writing conforming to the conventions in their fields.
Anthony, L. (2014). AntConc (Version 3.4. 3) [Computer Software]. Tokyo, Japan: Waseda University.
Chen, J. J., Peng, H. C., Yeh, M. C., Chen, P. Y., & Chang, J. S. (2016). Linggle Knows: A Search Engine Tells How People Write. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations (pp. 166-169).
Nesi, H., Gardner, S., & Thompson, P. (2008). British Academic Written English Corpus, Oxford Text Archive, http://hdl.handle.net/20.500.12024/2539
第二部分，講者介紹了許多故事推進情節的模式，包含 ABT 模式，也就是 And, But, Therefore 的結構，使用該模式的可以電影《刺激一九九五》為例。還有「英雄之旅」的模式也可以參考，在凡常的世界裡，有個不盡然完美的主角，他的世界被某個催化事件攪弄得天翻地覆，或面對天人交戰的處境，於是主角決定挺身行動。但當代價提高，主角必須學到教訓，這樣才能阻止對手，達成他的目標。另外，還有包含目標、阻礙、努力、結果、意外、彎曲、結局七個部份的「黃金故事公式」可參考。
末尾，講者提醒我們寫作故事時需要注意的一些重點。在構思故事的時候，我們要將注意力集中在一個主題上，不要塞進太多主題，以讓主題聚焦而明確；開頭要有力，避免陳言俗套；不要只是宣稱，須要展示，有明確細節支持；要避免 AAA 模式(即 and, and, and...…) 和 DHY (despite, however, yet) 模式，讓故事有明確的結構發展與邏輯關係，而不會讓讀者覺得是記流水帳、沒有重點、或是摸不著頭緒。留意以上所提及的要點，可以幫助我們寫出更好的故事。