Technical details

This project is co-ordinated by Nick Thieberger as part of his ARC-funded Future Fellowship grant. He arranged with the National Library of Australia to have all the microfilmed images from Section XII of the Bates papers digitised. The 24,000 images were renamed following the NLA's manuscript naming convention, and the typescripts in that collection (some 4,000 pages) were sent to be typed. The typing for this questionnaire-based material used tables to distinguish the words and their meanings. When we got the typed versions back we added tags to the content.

Conal Tuohy designed the structure of the dataset according to the Text Encoding Initiative TEI: P5 Guidelines, to embody both a facsimile of the original set of manuscripts and a structured dataset for complex research questions.

Where possible, each language represented in a wordlist is identified and words from that language are tagged to distinguish them from English terms for searching. Places, language names, 'tribe' or local group names, and individual names are also tagged to allow them to be searched. Each document is also geocoded so it can be presented on the map of words.

Only words that occur in the questionnaire (around 1800 words, listed here) are presented on the map of words.

Using the "Map of vocabularies" allows you to look at any given vocabulary and to see where it was recorded.