Corpora4Learning Home | Bibliography | English corpora | Tools & websites | Projects
|
Tools & websites
|
| This page offers information about some common corpus tools and links to resources on the web. |
NB: This section focusses on the features available online. The corpora themselves (e.g. Bank of English, British National Corpus, Brown Corpus) are briefly described in the English corpora section.
- Search by word, phrase, wildcard, part of speech or a combination of these.
- KWIC concordances of variable length (concordance output restricted to 40 lines).
- Collocation sampler to retrieve a word's most significant collocates.
- Search by word, phrase, wildcard, part of speech or a combination of these.
- Sentence concordances (output restricted to 50 samples).
Also available for the BNC:
- Search for frequently co-occuring words of 2 to 8 words length (word clusters).
- Search all clusters of a particular length or clusters containing a particular word, phrase or part of speech.
- Cluster lists with frequency statistics, and KWIC concordances of the clusters.
- Search by word, phrase, wildcard, part of speech or a combination of these.
- Search in the entire corpus as well as genre-specific searches.
- Frequency statistics, collocates and KWIC concordances.
- Compare quasi-synonyms or other related words and their collocates.
- Search by word, phrase or wildcard.
- KWIC concordances of variable length.
- Search by word, phrase or wildcard.
- KWIC concordances of variable length, collocate frequencies.
- Gapped KWIC concordances as a basis for exercises.
- Search by words or phrases.
- KWIC concordances, collocate frequency.
- Morphosyntactic analysis analysis of concordance lines.
- Search by word, phrase or wildcard
- KWIC concordances of variable length, collocate frequencies, sentence concordances
- Gapped KWIC concordances as a basis for exercises
- Collocational frameworks
- Easy access to full interview text and videos
- Browse corpus by topic index
- Online concordancer (KWIC and sentence format, search by word, phrase or wildcard)
- Ready-made concordance of all words in the whole corpus and in each interview
- Ready made frequency lists word the whole corpus and each interview
- Browse according to specified speaker and speech event attributes (file references)
- Search by word or phrase in specified contexts (KWIC concordances)
- Search by word, phrase or wildcard
- KWIC cconcordances, word lists, some good advanced features
- Disadvantage: not language-specific
- Search in books by word or phrase, and then browse relevant books online.
- Search in books by word or phrase, and then browse relevant books online.
The archives listed below offer a variety of texts and smaller corpora for download. To search them with corpus analysis methods, you will normally need an offline text/corpus analysis tool, i.e. a concordancer. Alternatively, you may be able to carry out some simple analyses with online text analysis tools.
More than 5000 full text, audio and (streaming) video versions of public speeches, sermons, legal proceedings, lectures, debates, interviews, other recorded media events.
A digital library of Internet sites and other cultural artifacts in digital form (text, audio, video).
Free online search (concordances and a range of interesting features).
Free access to texts in different formats (meta search in a number of archives).
Free download as well as online search (concordances), wide variety of languages.
Free download (e.g. complete works of Shakespeare).
All Sate of the Union addresses, provided by c-span.org (transcripts, and since 1989 video clips as well).
Approx. 2,000 literary texts in html format.
This section lists a selection of simple text analysis tools that can be used online, i.e. without installation. These tools allow you to create e.g. concordances, wordlists, text profiles from your own texts or from web pages of your choice.
- KWIC concordance for each word in the text.
- See also 'phrase extractor' section to build concordance with word clusters.
- Compares the text against well-known word lists (1000/2000 most frequent English words and others).
- Highlights words of different frequency bands in different colours.
- See also 'Unique Words Text Profiler' (finds all words which occur only once in a text).
- Returns a variety of word lists.
- KWIC concordance for all words in the text/web page
- Frequency lists and other features
This section lists software packages that are commonly referred to as concordancers. They provide a more comprehensive range than the online analysis tools listed above (usually creation of concordances, alphabetical and frequency word lists, comparison of word lists and other statistical functions). Most packages can be freely downloaded but require installation.
- For Windows and Linux.
- Reads text, html, and xml files.
- Main functions: concordances, citation of search term in its co-text, collocates, word clusters, frequency lists, text profiling through key rod lists.
- For Windows.
- Main functions: concordances, collocate search, frequency lists.
- For Windows.
- Creates a complete concordance for each word in a corpus and supports
its publication as a web concordance.
- Other functions: individual concordances, citation of search term in its co-text,
frequency lists, text profiling through key rod lists, and a range of other statistical functions.
- For Windows.
- Different from the other packages in that it focusses on the analysis of web pages.
- For Windows.
Very comprehensive package.
- For Window and Mac.
- Main functions: concordances, citation of search term in context, frequency lists.
- For Windows, Linux and Mac.
- Reads text, html, Word and Open Office files.
- Web spider facility for corpus creation directly from Internet sources.
- Main functions: concordances, citation of search term in context, frequency lists.
- For Windows.
- Very comprehensive package.
This section focusses on corpus-related resources for the learning and teaching context.
Module on Using concordance programs in the modern foreign languages classroom
by Marie-Noëlle Lamy and Hans Jørgen Klarskov Mortensen.
Module on Corpus linguistics by Tony McEnery and Andrew Wilson.
English and German online dictionary based on newspaper corpus, with frequency of occurrence, explanation, grammatical information and more
A corpus-based pedagogical grammar of English
The following websites include resources and link collections generally related to corpus linguistics.
| back to top | S.Braun (at) surrey.ac.uk |
updated 03/06/06
聯(lián)系客服