About the Otago Speech Corpus


Introduction:

This database contains the Otago Speech Corpus in its entirety. Additionally, an incomplete Māori speech corpus is included, containing phoneme and word examples. Incomplete segmentation data is also included, with code to segment the corpus into phonemes.

Speakers in the Otago Speech Corpus:

The speakers in the Otago Speech Corpus have been assigned numbers from one to ninety-nine. The following table shows a summary of information about them.

Speaker

Gender

Digits

Corpus

Segmented

Avaliable

01

M

Y

N

-

-

02

M

Y

Y

Y

Y

03

F

Y

N

-

-

04

M

Y

N

-

-

05

M

Y

N

-

-

06

M

Y

N

-

-

07

F

Y

Y

Y

N

08

M

Y

N

-

-

09

M

Y

N

-

-

10

M

Y

Y

Y

N

11

M

Y

N

-

-

12

M

Y

Y

Y

Y

13

F

Y

N

-

-

14

M

Y

N

-

-

15

F

Y

N

-

-

16

F

Y

N

-

-

17

F

Y

Y

Y

Y

18

F

Y

Y

Y

Y

19

F

Y

N

-

-

20

F

Y

Y

Y

N

21

F

Y

Y

Y

N

22

M

N

Y

Y

Y

23

M

N

Y

Y

Y

24

F

N

1

N

N

25

M

N

Y

Y

N

26

M

N

Y

Y

Y

27

F

N

Y

N

N

28

M

N

Y

N

N

29

F

N

Y

N

N

30

F

N

Y

Y

N

31

M

N

Y

N

N

32

F

N

Y

Y

Y

33

F

N

Y

Y

N

34

M

N

Y

N

N

35

F

N

Y

N

N

93

M

N

Y

-

-

96

M

N

1,2,3

-

-

The first 21 speakers were students from the INFO303 class, and make up the Digit Corpus. Here 11 of the speakers are male, and 10 are female. All the Digit Corpus data is segmented and avaliable. Of the Otago Speech Corpus proper, most of the speakers  have had their speech segmented, although only 8 speakers are currently avaliable on-line.

Words in the Otago Speech Corpus:

The following table shows all the words in the Otago Speech Corpus, and their phonetic representation. Only the target phoneme has been segmented.

Code

Word

Representation

Target

001

pat

p

p

002

paper

p

p

003

shop

p

p

004

bat

b

b

005

baby

b

b

006

tub

b

b

007

tart

t

t

008

letter

t

t

009

gut

t

t

010

dart

d

d

011

ladder

d

d

012

dead

d

d

013

card