WinATA

(The Aston Text Analyser Mark2 for Windows)

Users' Guide

1. Background
WinATA Mark2 represents an extension and development of the original ATA (The Aston Text Analyser). With the passage of time and technological changes, this suffered increasingly from a number of disadvantages. These included chiefly:

* It operated under MS-DOS. The indexer could not operate with Windows running in the background, and increasingly users were becoming unfamiliar with the most elementary operations at the DOS prompt.
* It was designed as a research tool, particularly for the needs of teachers engaged in an investigation of natural language for pedagogic purposes, and did not lend itself so well to wider research needs, or as a learner interface. This meant that teachers could not easily explore the pedagogic possibilities offered by a 'learner as researcher' approach to language learning.

There were also numerous requests from users for enhanced functionalities not available under the original ATA. The new version is a completely reprogrammed suite of tools operating under a 32-bit environment (Windows 95 etc), designed with the needs or the teacher researcher, the language researcher and the learner researcher in mind. Each user simply ignores the built-in functionalities not appropriate for his/her purposes.

2. Technical Requirements
WinATA will only run under a 32-bit environment, eg Windows 95 or later. It will not run under Windows 3.x. Most compatible machines will normally have sufficient power to run WinATA, but as a lower bound a 100Mhz machine, with 16Mb of RAM and 25Mb of free disk space are recommended. WinATA is distributed on a CD-ROM.

3. Installing WinATA
The instructions for installing WinATA and your corpus are contained in a separate document.

4. Running ataIndex
Before you can use ataInsight, you must first create the necessary database. This is because a programme designed with so many functions would be too slow handling a large corpus 'on the fly'. This process should not take long on a reasonably fast machine, typically five or six minutes for a quarter of a million words on a 300Mhz machine. Double clicking on C:\Program Files\WinATA\ataIndex.exe will produce a screen labelled "Corpus Administrator". At the top left corner click on "Corpus" and select "New". This produces a dialogue box as shown in the top half of Figure 1.

Figure 1

Figure 1. Dialogue box showing the selections made for ataIndex, and the "Jobs to do" progress chart. Click the image to see a large-scale version.

Now carry out all these steps:
* Under "Project description" (top left) enter a name of your choice for this project. You may wish to enter "Film Reviews".
* Make sure the language selected (top right) is correct. (For most users only "English" will be available.)
* Select the correct drive, in our example D:
* Select the correct directory in which you have placed your corpus. In our case this will be D:\WINATA\filmdir
* If you enter say "*.txt" in the small centre box labelled "File filter", only files with the ".txt" extension will be shown in the list underneath.
* The files in d:\corpora\filmdir will now be shown in the centre box. Highlight the one(s) you want (in our case "filmrev.txt") and click on the arrow to move it to the box headed "Selected corpus files".
* Finally, click on "Index". The full screen shown in Figure 1 is now shown.

5. The "Jobs to do" Progress Screen
The "Jobs to do" progress screen (lower half of Figure 1) keeps you up to date with the various stages of the indexing process. When this is finished you are shown the amount of time taken (hours and minutes only). This is of interest only in the case of large corpora, i.e. millions of words. Click OK and shut down the indexer using the "Corpus/Exit" Menu option (top left corner).

6. Running ataInsight
Double clicking on c:\Program Files\WinATA\ataInsight.exe will produce a screen labelled "Open ATA project". The main box will show the project names of all the corpora which you have indexed. You can highlight any of these and click the delete button to remove it. To investigate a previously indexed corpus highlight its project name, in our example "Film Reviews", and click on "OK". The main research screen appears as is shown in Figure 2.

Figure 2

Figure 2. The main research screen. Click the image to see a large-scale version.

7. Chief Features of the Research Screen
The illustration in Figure 2 is of a corpus of articles headed "Money Markets" taken from a CD-ROM kindly provided by The Financial Times. Chief features of the screen are as follows:
Feature Description
Title bar (top left): The project name (in this case "Money"), the total number of tokens in the corpus (in this case 152018), and the total number of types (in this case 7451).
The Word Frequency List box (upper left): This shows the number of entries in the list, (in this case 7538, a higher number than the types to take account of potentially ambiguous types). The four columns in this box show:
=>The absolute (raw) frequency of the type in the corpus;
=>The relative (out of 10,000) frequency of the type in the corpus;
=>The relative (out of 10,000) frequency of the type in a user-defined reference corpus;
=>The type itself.
The Concordance/profile box (upper right). This is labelled "Contexts" and shows a concordance for the type highlighted in the Word Frequency List, "under" in the example. This may be converted to a summary in the form of a synoptic profile (see Paragraph 11 below).
The Sentence box (lower right) This shows the full sentence containing the context highlighted. It is possible to cut and paste from the Sentence box.
The 'Word Families' box (lower left). This shows a group of types associated by some syntactic or semantic criterion with the type currently highlighted in the Word Frequency List.

These boxes, and their associated functions, are each considered in more detail below.

8. The Word Frequency List.
A right mouse click in this WINDOW produces a pull-down menu showing all available options. These operate as follows:
Option Description
Alpha sort: Present the Word Frequency List with types arranged in alphabetical order.
Numeric sort: Present the Word Frequency List with types arranged in numerical order.
Find: This produces a dialogue box in which the user enters a string of letters to be located in the Word Frequency List. This may be a whole word, or part of a word. The user may specify that the string should either start or end the word. Wildcards are allowed, that is "*" can be used to represent "any number (including zero) of consecutive characters", and "?" any single character. The string "a*e*i*o*u" with the "Containing" option selected (=anywhere in the word) locates types containing the five vowels in the given order. (This search on a corpus on hip replacement identified the word "arteriovenous".)
Filter: This functions in much the same way as Find, except that instead of locating instances of the desired string one by one, it collects them all together, e.g. all types ending in "ity" or, as in the above example, all types containing the vowels in that order. However, there are two further useful functions available under "Filter".
=>Frequency filter: this creates a list of all those type with a specified raw frequency. For example, by entering "1" the researcher can obtain a list of all the hapax legomena in the corpus.
=>Comparative filter: Clicking on the Comparative option offers a choice of comparisons of the relative frequencies of the corpus being studied with that of the user-defined reference corpus. Possible comparisons are greater than, less than, equal, or Not in reference.
Collocations: This produces a full list of contexts (for the type currently highlighted) in the upper right hand box.
Synoptic Profile: This reduces the concordance for the type highlighted to an ordered wordcount for each word position, from four to the left to four to the right.
Export: This causes the current Word Frequency List, in its present filtered or unfiltered state, to be appended to a file called wfl.out in the same directory as the corpus.
Font: This enables the user to select a preferred type face and size for the display of the Word Frequency List.

9. The Contexts Box
This box (upper right) is filled with the concordance, or synoptic concordance profile, requested as described above. A right mouse click brings up a similar menu to that for the Word Frequency List, with the following features:

Option Description
Sort Left: This alphabetises the concordance by words to the left of the keyword.
Sort Right: This alphabetises the concordance by words to the right of the keyword.
Filter: This reduces the list to those lines containing a chosen substring in a chosen position (right or left of keyword, or anywhere). Thus a filter for "er " ('e' and 'r' followed by a space) applied to a concordance for "than" would select all the comparatives in "-er" (plus a few oddments like "water"). A left-hand filter using "more" would generate a list of comparatives with "more". In this case oddments such as "moreover" can be avoided by choosing the option of specifying that the string "more" must in all cases represent a whole word. Once again, wildcards are allowed, such that a search for "?n" (the letter 'n' preceded by any one letter), with the further specification that this be a whole word, would reduce the concordance to those lines which contain one of the words 'an', 'in' or 'on'. A further useful feature is the option of preserving previous filtered lists in memory and adding new lists to them. Thus a concordance for e. g. "doing" could be successively filtered for the various forms of the verb 'to be', in illustration of the present continuous tense. In the illustration in Figure 2 the 89 lines of the concordance for "under" have been filtered for the 35 lines containing the type "pressure".
Export: Any list may be appended to a file named "context.out" in the same directory as the corpus. The two halves of each concordance line are separated by a tab, thus making it easy to convert the text into a table using a standard word processor.
Font & Sentence to front: These two facilities are designed for use in conjunction with a further facility. Next to the number of entries at the top right corner of the box is a small window called the Maximising button. Clicking on this expands the concordance/profile box to full screen size. If the font size is to small this can be changed by selecting 'Font'. But enlarging the screen obscures the normal position of the full sentence. Selecting 'Sentence to front' corrects this.

10. The Synoptic Profile
This is a researchers tool, and not intended for language learners. It is intended to be richly informative rather than user friendly. It is called a 'synoptic profile' because it shows at a glance a summary of all types occurring in a concordance, together with the number of times they occur in that position. Thus if we have a synoptic profile for 'close' showing '60 to' in position -1 (immediately before the keyword) and '50 to' in position +1 (immediately after the keyword), we know we are (almost certainly) dealing with 60 cases of the verb 'to close', and 50 of the adjective 'close to'. An example of a full-screen version is shown in Figure 3, being a reduction of 170 lines of a concordance for 'out'.

Figure 3

Figure 3. A synoptic profile for "out". For explanation see text. Click the image to see a large-scale version.

Here the eight columns must be read as representing the eight positions to right and left of the keyword, from -4 to +4. If for example one is investigating the right and left associativity of 'out', it is immediately clear (to the trained eye) that the particle bonds strongly to form 'out+of' and 'due+out', and that a phrasal verb to 'take+out' is widely used, all in the context of money market reports.The researcher might next wish to return to the full-screen concordance and reduce the full list for 'out' to instances where the keyword is preceded by some form of 'take', and followed by 'shortage'. The result is shown in Figure 4.

Figure 4

 

Figure 4. A concordance for 'out' successively cumulatively filtered for all parts of 'take' followed by 'shortage'. Click the image to see a large-scale version.

See also Notes for the advanced user contained in a separate document.

Return to ...
ATA Installer LSU Home Page   Peter's Home Page

                  Back to LSU Home Page        © 20th April 2000 Language Studies Unit, Aston university;