The logo is 'English-Mori Word
Translator'. The image is 'Kopu'

Papers, Data Sets and Documentation



Here are several data resources and documentation for downloading. If you use them, we ask that the appropriate recognition and acknowledgement be accorded to the developers who have supplied the data. Thankyou.


A paper presented at the
5th IASTED International Conference on Signal and Image Processing
August 2003
Honolulu, Hawaii. USA

"Speech data analysis for diphone construction of a Māori online text-to-speech synthesizer"
by Mark Laws
Synthesizer.pdf, Synthesizer.ppt


Two papers presented at the
The Fifth Joint Conference on Information Sciences
February 27-March 3, 2000
Trump Taj Mahal Casino and Resort
Atlantic City, NJ. USA

"Analysis of the New Zealand and Māori On-Line Translator"
by Mark Laws, Richard Kilgour and Michael Watts.
Translator.html, Translator.pdf, Translator.ppt

"Development of a Māori Database for Speech Perception and Generation"
by Mark Laws
Database.html, Database.pdf, Database.ppt


Two additional papers
A  Bilingual  Information  System:
"The Computational Linguistic Engineering of English and Māori"
for Speech Perception and Generation
by Mark Laws
ABIS.pdf

"Management Of Otago Speech Environment"
(MOOSE)
by Mark Laws and Richard Kilgour
MOOSE.pdf


Text Datasets:
[TXT] (54k) 3000 Common English words with Māori translations.
[TXT] (38k) 2000 Common English words with Māori translations (a reduced set).
[TXT] (2k) 102 Artificial Intelligence (AI) terms (still waiting to be translated).
[TXT] (6K) 500 Most commonly used Māori words. Based on Benton (1982) and other works.
[TXT] (10k) 1300 Most frequently used Māori words.
[TXT] (8k) 1000 High frequency NZ English words from: Linguistisc Department, Victoria University of Wellington.
[TXT] (52k) 6300 Māori words from the English-Māori Word Translator.
[TXT] (56k) 6800 NZ English words from the English-Māori Word Translator.
[TXT] (8k) 300 Common computing terms in English and Māori. This file is from: Te Taka Keegan and Treweek, P. (1994).
[TXT] (6k) 126 Proposed internet terminologies in English and Māori. This file has been developed by: Peter J Keegan (1999).
[TXT] (44k) 2000 NZ English words with pronunciations derived from: (7Mb) BEEP Database by Tony Robinson (1996).


Database Tables:

[TXT] (500k) English Database Table - 7000 indexed English words.
[TXT] (550k) Maori Database Table - 8300 indexed Maori words.
[TXT] (600k) Translation Database Table - 13500 Possible translations.


Otago Speech Corpus:


[DIR] Information about the entire Otago Speech Corpus.


Documents:

[PDF] (98k) 45 New Zealand English phonemes from the Otago Speech Corpus.
[PDF] (1M) Database Models for Language and Linguistic Integration (Chapter 6 from M Laws' PhD).
[PDF] (1M) Statistical Data Analysis of the English-Maori Word Translator (Chapter 7 from M Laws' PhD).
[PDF] (150k) Development of a Proto-Central-Eastern-Polynesian Speech and Language Database Translation System [PolySys:] (M Laws' Post-Doc).
[PDF] (216k) & [PPT] (4.4M) Building New Language Bridges Across Polynesia: The Development of a Multilingual Online Translation System (M Laws' Open Lecture).


Dr Mark R. Laws maintains the Translation data sets.