Teaching the machines to speak: Māori innovation and the fight for data sovereignty

Reece Harley
Reece Harley Updated November 6, 2025 - 8.54pm (AWST), first published November 3, 2025 at 5.00pm (AWST)

At last week's World Indigenous Business Forum in Melbourne, Peter Lucas-Jones, Chief Executive Officer of Te Hiku Media and Chair of Te Rūnanga Nui o Te Aupōuri (the National Assembly of the Aupōuri Tribe), stood before an audience of Indigenous entrepreneurs, business leaders and technologists with a challenge that was as cultural as it was technical.

"There is no artificial intelligence without Indigenous data," he said.

"If we do not control our own data, we will lose not just our language, but our identity."

A Legacy Born of Struggle

Te Hiku Media-Te Reo Irirangi o Te Hiku o Te Ika-is no ordinary broadcaster. Founded in 1990 in the far north of Aotearoa (New Zealand), it grew out of a long struggle for Māori control over media and cultural representation.

Its creation followed years of political activism, Waitangi Tribunal claims and court cases that forced the New Zealand Government to recognise Māori language rights in broadcasting.

The organisation represents five iwi (tribes): Ngāti Kuri, Te Aupōuri, Ngāi Takoto, Te Rarawa, and Ngāti Kahu. For decades, Te Hiku Media's purpose has been to revitalise te reo rangatira (the Māori language) and ngā tikanga Māori (customs and traditions) through radio, storytelling, and innovation.

"We were born from struggle," Lucas-Jones said.

"Our grandmothers dreamed of hearing our language on the airwaves. Now we're teaching computers to speak it."

The Spark of Digital Innovation

In 2013, a pivotal meeting at Mahimaru Marae (a traditional meeting place) changed Te Hiku Media's trajectory. Elders gathered to discuss how to carry their language into the digital age. Two resolutions were passed that day-to nurture the Māori language of the region and to use new technology to ensure it would survive.

That same year, the broadcaster launched tehiku.nz, an online platform to store and share its media archive. Soon after, Te Hiku Media began experimenting with live streaming, pioneering online coverage of national events such as Waitangi Day celebrations and the 2014 return of the Polynesian voyaging canoe Hōkūle'a.

These projects built digital capability and laid the foundation for a radical new idea: could a computer learn to understand and speak Māori?

From Archives to Algorithms

By 2017, Te Hiku Media had digitised thousands of hours of archived audio-songs, speeches, and interviews with native speakers. But manually transcribing these recordings proved slow and expensive.

A new project called Kōrero Māori ("Speak Māori"), funded through New Zealand's Ka Hao Fund, set out to automate the process. Partnering with Dragonfly Data Science, a local technology company, the team created kaituhi.nz, an automatic transcription tool using a Māori speech-to-text API. It was the first of its kind.

The success of Kōrero Māori led to the development of a fully synthesised Māori voice and laid the groundwork for what would become Te Hiku Media's most ambitious project: Papa Reo.

Papa Reo: Building a Platform for the Future

Funded by Aotearoa/New Zealand's Ministry of Business, Innovation and Employment, Papa Reo is a seven-year national data science initiative-the only project of its kind not led by a university.

The vision is bold: to build a multilingual natural language processing platform that allows smaller Indigenous communities to create their own speech recognition and AI tools while retaining sovereignty over their data.

"Papa Reo will enable smaller Indigenous language communities to develop their own capabilities," Lucas-Jones explained.

"It ensures the benefits go directly to the people, not to outside corporations."

The project's first stage focuses on te reo Māori, but its methods are designed to support other minority languages such as Hawaiian and Samoan, and even to help multilingual users navigate between Māori and English.

In an era where voice assistants dominate the digital landscape, the stakes are high. Without access to the large datasets that power AI, speakers of small languages are locked out of emerging technologies.

"That further marginalises our language, and reduces our ability to participate in modern society."

Papa Reo aims to change that by creating tools that can work with smaller datasets-making AI accessible to communities that have been invisible in the digital world.

The Papa Reo project.

The "Big Marae in the Sky"

Te Hiku Media calls its digital platform "the big marae in the sky"-a virtual meeting place where people can speak, listen, and connect in their language. Every day, Māori families across New Zealand and abroad use Te Hiku's online services to listen to interviews, join livestreamed ceremonies, or learn pronunciation.

"Our babies pick up a phone first thing in the morning," said Lucas-Jones.

"Our Elders say prayers on their devices at night. We decided to meet our people where they already were; online."

By combining language revitalisation with technological empowerment, Te Hiku Media has built something few others have: a fully Indigenous-led AI ecosystem.

A New Model for Data Sovereignty

For Lucas-Jones, Indigenous AI is about agency, not automation. Big technology companies, he argues, have failed Indigenous languages precisely because they extract data without building community capability. Papa Reo reverses that approach.

"Each community must maintain control of its data," he said.

"That's how we protect authenticity and create economic opportunity."

He envisions a network of small, Indigenous-owned computing centres powered by renewable energy. These would allow communities to process language data locally, reducing reliance on expensive cloud services and creating new skilled jobs for Indigenous developers and linguists.

Saving Language, Saving the Planet

Lucas-Jones closed his address with a reminder of what is at stake. Eighty per cent of the world's remaining biodiversity exists on Indigenous land, he said, and Indigenous languages hold the ecological knowledge that sustains those environments.

"When we lose a language, we lose our way of seeing the world," he said.

"Language comes from the land-it carries our science, our stories, our philosophy. If we can teach the machines to speak our language, then we can make sure they never forget who we are."

   Related   

   Reece Harley   

Download our App

Article Audio

Disclaimer: This function is AI-generated and therefore may mispronounce.

National Indigenous Times

Disclaimer: This function is AI-generated and therefore may mispronounce.