In the shade of a coconut palm, Chandrika tilts her smartphone screen to avoid the sun’s glare. It is early morning in Alahalli village in the southern Indian state of Karnataka, but the heat and humidity are rising fast. As Chandrika scrolls, she clicks on several audio clips in succession, demonstrating the simplicity of the app she recently started using. At each tap, the sound of her voice speaking her mother tongue emerges from the phone.
Before she started using this app, 30-year-old Chandrika (who, like many South Indians, uses the first letter of her father's name, K., instead of a last name) had just 184 rupees ($2.25) in her bank account. But in return for around six hours of work spread over several days in late April, she received 2,570 rupees ($31.30). That’s roughly the same amount she makes in a month of working as a teacher at a distant school, after the cost of the three buses it takes her to get there and back. Unlike her day job, the app doesn’t make her wait until the end of the month for payment; money lands in her bank account in just a few hours. Just by reading text aloud in her native language of Kannada, spoken by around 60 million people mostly in central and southern India, Chandrika has used this app to earn an hourly wage of about $5, nearly 20 times the Indian minimum. And in a few days, more money will arrive—a 50% bonus, awarded once the voice clips are validated as accurate.
Chandrika’s voice can fetch this sum because of the boom in artificial intelligence (AI). Right now, cutting edge AIs—for example, large language models like ChatGPT—work best in languages like English, where text and audio data is abundant online. They work much less well in languages like Kannada which, even though it is spoken by millions of people, is scarce on the internet. (Wikipedia has 6 million articles in English, for example, but only 30,000 in Kannada.) When they function at all, AIs in these “lower resourced” languages can be biased—by regularly assuming that doctors are men and nurses women, for example—and can struggle to understand local dialects. To create an effective English-speaking AI, it is enough to simply collect data from where it has already accumulated. But for languages like Kannada, you need to go out and find more.
This has created huge demand for datasets—collections of text or voice data—in languages spoken by some of the poorest people in the world. Part of that demand comes from tech companies seeking to build out their AI tools. Another big chunk comes from academia and governments, especially in India, where English and Hindi have long held outsize precedence in a nation of some 1.4 billion people with 22 official languages and at least 780 more indigenous ones. This rising demand means that hundreds of millions of Indians are suddenly in control of a scarce and newly-valuable asset: their mother tongue.
Data work—creating or refining the raw material at the heart of AI— is not new in India. The economy that did so much to turn call centers and garment factories into engines of productivity at the end of the 20th century has quietly been doing the same with data work in the 21st. And, like its predecessors, the industry is once again dominated by labor arbitrage companies, which pay wages close to the legal minimum even as they sell data to foreign clients for a hefty mark-up. The AI data sector, worth over $2 billion globally in 2022, is projected to rise in value to $17 billion by 2030. Little of that money has flowed down to data workers in India, Kenya, and the Philippines.
These conditions may cause harms far beyond the lives of individual workers. “We’re talking about systems that are impacting our whole society, and workers who make those systems more reliable and less biased,” says Jonas Valente, an expert in digital work platforms at Oxford University’s Internet Institute. “If you have workers with basic rights who are more empowered, I believe that the outcome—the technological system—will have a better quality as well.”
In the neighboring villages of Alahalli and Chilukavadi, one Indian startup is testing a new model. Chandrika works for Karya, a nonprofit launched in 2021 in Bengaluru (formerly Bangalore) that bills itself as “the world’s first ethical data company.” Like its competitors, it sells data to big tech companies and other clients at the market rate. But instead of keeping much of that cash as profit, it covers its costs and funnels the rest toward the rural poor in India. (Karya partners with local NGOs to ensure access to its jobs go first to the poorest of the poor, as well as historically marginalized communities.) In addition to its $5 hourly minimum, Karya gives workers de-facto ownership of the data they create on the job, so whenever it is resold, the workers receive the proceeds on top of their past wages. It’s a model that doesn’t exist anywhere else in the industry.
“The wages that exist right now are a failure of the market,” Manu Chopra, the 27-year-old CEO of Karya, tells me. “We decided to be a nonprofit because fundamentally, you can’t solve a market failure in the market.”
The work Karya is doing also means that millions of people whose languages are marginalized online could stand to gain better access to the benefits of technology—including AI. “Most people in the villages don’t know English,” says Vinutha, a 23-year-old student who has used Karya to reduce her financial reliance on her parents. “If a computer could understand Kannada, that would be very helpful.”
The catch, if you can call it that, is that the work is supplementary. The first thing Karya tells its workers is: This is not a permanent job, but rather a way to quickly get an income boost that will allow you to go on and do other things. The maximum a worker can earn through the app is the equivalent of $1,500, roughly the average annual income in India. After that point, they make way for somebody else. Karya says it has paid out 65 million rupees (nearly $800,000) in wages to some 30,000 rural Indians up and down the country. By 2030, Chopra wants it to reach 100 million people. “I genuinely feel this is the quickest way to move millions of people out of poverty if done right,” says Chopra, who was born into poverty and won a scholarship to Stanford that changed his trajectory. “This is absolutely a social project. Wealth is power. And we want to redistribute wealth to the communities who have been left behind.”
Chopra isn’t the first tech founder to rhapsodize about the potential of AI data work to benefit the world’s poorest. Sama, an outsourcing company that has handled contracts for OpenAI’s ChatGPT and Meta’s Facebook, also marketed itself as an “ethical” way for tech companies to lift people in the Global South out of poverty. But as I reported in January, many of its ChatGPT workers in Kenya—some earning less than $2 per hour—told me they were exposed to training data that left them traumatized. The company also performed similar content moderation work for Facebook; one worker on that project told me he was fired when he campaigned for better working conditions. When asked by the BBC about low wages in 2018, Sama’s late founder argued that paying workers higher wages could disrupt local economies, causing more harm than good. Many of the data workers I’ve spoken to while reporting on this industry for the past 18 months have bristled at this logic, saying it’s a convenient narrative for companies that are getting rich off the proceeds of their labor.
There is another way, Chopra argues. “The biggest lesson I have learned over the last 5 years is that all of this is possible,” he wrote in a series of tweets in response to my January article on ChatGPT. “This is not some dream for a fictional better world. We can pay our workers 20 times the minimum wage, and still be a sustainable organization.”
It was the first I’d heard of Karya, and my immediate instinct was skepticism. Sama too had begun its life as a nonprofit focused on poverty eradication, only to transition later to a for-profit business. Could Karya really be a model for a more inclusive and ethical AI industry? Even if it were, could it scale? One thing was clear: there could be few better testing grounds for these questions than India—a country where mobile data is among the cheapest in the world, and where it is common for even poor rural villagers to have access to both a smartphone and a bank account. Then there is the potential upside: even before the pandemic some 140 million people in India survived on under $2.15 per day, according to the World Bank. For those people, cash injections of the magnitude Chopra was talking about could be life-changing.
Just 70 miles from the bustling tech metropolis of Bengaluru, past sugarcane fields and under the bright orange arcs of blossoming gulmohar trees, is the village of Chilukavadi. Inside a low concrete building, the headquarters of a local farming cooperative, a dozen men and women are gathered—all of whom have started working for Karya within the past week.
Kanakaraj S., a skinny 21-year-old, sits cross-legged on the cool concrete floor. He is studying at a nearby college, and to pay for books and transport costs he occasionally works as a casual laborer in the surrounding fields. A day’s work can earn him 350 rupees (around $4) but this kind of manual labor is becoming more unbearable as climate change makes summers here even more sweltering than usual. Working in a factory in a nearby city would mean a slightly higher wage, but means hours of daily commuting on unreliable and expensive buses or, worse, moving away from his support network to live in dormitory accommodation in the city.
At Karya, Kanakaraj can earn more in an hour than he makes in a day in the fields. “The work is good,” he says. “And easy.” Chopra says that’s a typical refrain when he meets villagers. “They’re happy we pay them well,” he says, but more importantly, “it’s that it’s not hard work. It’s not physical work.” Kanakaraj was surprised when he saw the first payment land in his bank account. “We’ve lost a lot of money from scams,” he tells me, explaining that it is common for villagers to receive SMS texts preying on their desperation, offering to multiply any deposits they make by 10. When somebody first told him about Karya he assumed it was a similar con—a common initial response, according to Chopra.
With so little in savings, local people often find themselves taking out loans to cover emergency costs. Predatory agencies tend to charge high interest rates on these loans, leaving some villagers here trapped in cycles of debt. Chandrika, for example, will use some of her Karya wages to help her family pay off a large medical loan incurred when her 25-year-old sister fell ill with low blood pressure. Despite the medical treatment, her sister died, leaving the family responsible for both an infant and a mountain of debt. “We can figure out how to repay the loan,” says Chandrika, a tear rolling down her cheek. “But we can’t bring back my sister.” Other Karya workers find themselves in similar situations. Ajay Kumar, 25, is drowning in medical debt taken out to address his mother’s severe back injury. And Shivanna N., 38, lost his right hand in a firecracker accident as a boy. While he doesn’t have debt, his disability means he struggles to make a living.
The work these villagers are doing is part of a new project that Karya is rolling out across the state of Karnataka for an Indian healthcare NGO seeking speech data about tuberculosis—a mostly curable and preventable disease that still kills around 200,000 Indians every year. The voice recordings, collected in 10 different dialects of Kannada, will help train an AI speech model to understand local people’s questions about tuberculosis, and respond with information aimed at reducing the spread of the disease. The hope is that the app, when completed, will make it easier for illiterate people to access reliable information, without shouldering the stigma that tuberculosis patients—victims of a contagious disease—often attract when they seek help in small communities. The recordings will also go up for sale on Karya’s platform as part of its Kannada dataset, on offer to the many AI companies that care less about the contents of their training data than what it encodes about the overall structure of the language. Every time it’s resold, 100% of the revenue will be distributed to the Karya workers who contributed to the dataset, apportioned by the hours they put in.
Rajamma M., a 30-year-old woman from a nearby village, previously worked as a COVID-19 surveyor for the government, going from door to door checking if people had been vaccinated. But the work dried up in January. The money from working for Karya, she says, has been welcome—but more than that, she has appreciated the opportunity to learn. “This work has given me greater awareness about tuberculosis and how people should take their medicine,” she says. “This will be helpful for my job in the future.”
Although small, Karya already has a list of high-profile clients including Microsoft, MIT, and Stanford. In February, it began work on a new project for the Bill and Melinda Gates Foundation to build voice datasets in five languages spoken by some 1 billion Indians—Marathi, Telugu, Hindi, Bengali, and Malayalam. The end goal is to build a chatbot that can answer rural Indians’ questions, in their native languages and dialects, about health care, agriculture, sanitation, banking, and career development. This technology (think of it as a ChatGPT for poverty eradication) could help share the knowledge needed to improve quality of life across vast swaths of the subcontinent.
“I think there should be a world where language is no longer a barrier to technology—so everyone can use technology irrespective of the language they speak,” says Kalika Bali, a linguist and principal researcher at Microsoft Research who is working with the Gates Foundation on the project and is an unpaid member of Karya’s oversight board. She has specifically designed the prompts workers are given to read aloud to mitigate the gender biases that often creep into datasets and thus help to avoid the “doctor” and “nurse” problem. But it’s not just about the prompts. Karya’s relatively high wages “percolate down to the quality of the data,” Bali says. “It will immediately result in better accuracy of the system’s output.” She says she typically receives data with a less than 1% error rate from Karya, “which is almost never the case with data that we build [AI] models with.”
Over the course of several days together, Chopra tells me a version of his life story that makes his path toward Karya feel simultaneously impossible and inevitable. He was born in 1996 in a basti, an informal settlement, next to a railway line in Delhi. His grandparents had arrived there as refugees from Pakistan during the partition of British India in 1947, and there the family had remained for two generations. Although his parents were well-educated, he says, they sometimes struggled to put food on the table. He could tell when his father, who ran a small factory making train parts, had had a good day at work because dinner would be the relatively expensive Maggi instant ramen, not cheap lentil dal. Every monsoon the basti’s gutters would flood, and his family would have to move in with his grandmother nearby for a few days. “I think all of us have a keen recognition of the idea that money is a cushion from reality,” Chopra says of the Karya team. “Our goal is to give that cushion to as many people as possible.”
Chopra excelled at the basti’s local school, which was run by an NGO. When he was in ninth grade he won a scholarship to a private school in Delhi, which was running a competition to give places to kids from poor backgrounds. Though he was bullied, he acknowledges that certain privileges helped open doors for him. “As difficult as my journey was, it was significantly easier than most people’s in India,” he says, “because I was born to two educated parents in an upper-caste family in a major city.”
When Chopra was 17, a woman was fatally gang-raped on a bus in Delhi, a crime that shocked India and the world. Chopra, who was discovering a love for computer science at the time and idolized Steve Jobs, set to work. He built a wristwatch-style “anti-molestation device,” which could detect an elevated heart rate and administer a weak electric shock to an attacker, with the intent to allow the victim time to escape. The device grabbed the attention of the media and India’s former President, Dr. A.P.J. Abdul Kalam, who encouraged Chopra to apply for a scholarship at Stanford. (The only thing Chopra knew about Stanford at the time, he recalls, is that Jobs studied there. Later he discovered even that wasn’t true.) Only later did Chopra realize the naivety of trying to solve the problem of endemic sexual violence with a gadget. “Technologists are very prone to seeing a problem and building to solve it,” he says. “It’s hard to critique an 11th grade kid, but it was a very technical solution.”
As he tells it, his arrival in California was a culture shock in more ways than one. On his first night, Chopra says, each student in his dorm explained how they planned to make their first billion dollars. Somebody suggested building “Snapchat for puppies,” he recalls. Everyone there aspired to be a billionaire, he realized, except him. “Very early at Stanford, I felt alone, like I was in the wrong place,” Chopra says. Still, he had come to college as a “techno-utopian,” he says. That gradually fell away as he learned in class about how IBM had built systems to support Apartheid in South Africa, and other ways technology companies had hurt the world by chasing profit alone.
Returning to India after college, Chopra joined Microsoft Research, a subsidiary of the big tech company that gives researchers a long leash to work on difficult social problems. With his colleague Vivek Seshadri, he set out to research whether it would be possible to channel money to rural Indians using digital work. One of Chopra’s first field visits was to a center operated by an AI data company in Mumbai. The room was hot and dirty, he recalls, and full of men hunched over laptops doing image annotation work. When he asked them how much they were earning, they told him they made 30 rupees per hour, or just under $0.40. He didn’t have the heart to tell them the going rate for the data they were annotating was, conservatively, 10 times that amount. “I thought, this cannot be the only way this work can happen,” he says.
Chopra and Seshadri worked on the idea for four years at Microsoft Research, doing field studies and building a prototype app. They discovered an “overwhelming enthusiasm” for the work among India’s rural poor, according to a paper they published with four colleagues in 2019. The research confirmed Chopra and Seshadri’s suspicions that the work could be done to to a high standard of accuracy even with no training, from a smartphone rather than a physical office, and without workers needing the ability to speak English – thus making it possible to reach not just city-dwellers but the poorest of the poor in rural India. In 2021 Chopra and Seshadri, with a grant from Microsoft Research, quit their jobs to spin Karya out as an independent nonprofit, joined by a third cofounder, Safiya Husain. (Microsoft holds no equity in Karya.)
Unlike many Silicon Valley rags-to-riches stories, Chopra’s trajectory, in his telling, wasn’t a result of his own hard work. “I got lucky 100 times in a row,” he says. “I’m a product of irrational compassion from nonprofits, from schools, from the government—all of these places that are supposed to help everyone, but they don’t. When I have received so much compassion, the least I can do is give back.”
Not everybody is eligible to work for Karya. Chopra says that initially he and his team opened the app up to anybody, only to realize the first hundred sign-ups were all men from a dominant-caste community. The experience taught him that “knowledge flows through the channels of power,” Chopra says. To reach the poorest communities—and marginalized castes, genders, and religions—Chopra learned early on that he had to team up with nonprofits with a grassroots presence in rural areas. Those organizations could distribute access codes on Karya’s behalf in line with income and diversity requirements. “They know for whom that money is nice to have, and for whom it is life-changing,” he says. This process also ensures more diversity in the data that workers end up generating, which can help to minimize AI bias.
Chopra defines this approach using a Hindi word—thairaav—a term from Indian classical music which he translates as a mixture between “pause” and “thoughtful impact.” It’s a concept, he says, that is missing not only from the English language, but also from the business philosophies of Silicon Valley tech companies, who often put scale and speed above all else. Thairaav, to him, means that “at every step, you are pausing and thinking: Am I doing the right thing? Is this right for the community I’m trying to serve?” That kind of thoughtfulness “is just missing from a lot of entrepreneurial ‘move fast and break things’ behavior,” he says. It’s an approach that has led Karya to flatly reject four offers so far from prospective clients to do content moderation that would require workers to view traumatizing material.
It’s compelling. But it’s also coming from a guy who says he wants to scale his app to reach 100 million Indians by 2030. Doesn’t Karya’s reliance on grassroots NGOs to onboard every new worker mean it faces a significant bottleneck? Actually, Chopra tells me, the limiting factor to Karya’s expansion isn’t finding new workers. There are millions who will jump at the chance to earn its high wages, and Karya has built a vetted network of more than 200 grassroots NGOs to onboard them. The bottleneck is the amount of available work. “What we need is large-scale awareness that most data companies are unethical,” he says, “and that there is an ethical way.” For the app to have the impact Chopra believes it can, he needs to win more clients—to persuade more tech companies, governments, and academic institutions to get their AI training data from Karya.
But it’s often in the pursuit of new clients that even companies that pride themselves on ethics can end up compromising. What’s to stop Karya doing the same? Part of the answer, Chopra says, lies in Karya’s corporate structure. Karya is registered as a nonprofit in the U.S. that controls two entities in India: one nonprofit and one for-profit. The for-profit is legally bound to donate any profits it makes (after reimbursing workers) to the nonprofit, which reinvests them. The convoluted structure, Chopra says, is because Indian law prevents nonprofits from making any more than 20% of their income from the market as opposed to philanthropic donations. Karya does take grant funding—crucially, it covers the salaries of all 24 of its full-time employees—but not enough to have an entirely nonprofit model be possible. The arrangement, Chopra says, has the benefit of removing any incentive for him or his co-founders to compromise on worker salaries or well-being in return for lucrative contracts.
It’s a model that works for the moment, but could collapse if philanthropic funding dries up. “Karya is very young, and they have got a lot of good traction,” says Subhashree Dutta, a managing partner at The/Nudge Institute, an incubator that has supported Karya with a $20,000 grant. “They have the ability to stay true to their values and still attract capital. But I don’t think they have been significantly exposed to the dilemmas of taking the for-profit or not-for-profit stance.”
Over the course of two days with Karya workers in southern Karnataka, the limitations of Karya’s current system begin to come into focus. Each worker says they have completed 1,287 tasks on the app—the maximum, at the point of my visit, of the number of tasks available on the tuberculosis project. It equates to about six hours of work. The money workers can receive (just under $50 after bonuses for accuracy) is a welcome boost but won’t last long. On my trip I don’t meet any workers who have received royalties. Chopra tells me that Karya has only just amassed enough resellable data to be attractive to buyers; it has so far distributed $116,000 in royalties to around 4,000 workers, but the ones I've met are too early into their work to be among them.
I put to Chopra that it will still take much more to have a meaningful impact on these villagers’ lives. The tuberculosis project is only the beginning for these workers, he replies. They are lined up to shortly begin work on transcription tasks—part of a push by the Indian government to build AI models in several regional languages including Kannada. That, he says, will allow Karya to give “significantly more” work to the villagers in Chilukavadi. Still, the workers are a long way from the $1,500 that would mark their graduation from Karya’s system. Eventually, Chopra acknowledges that not a single one of Karya’s 30,000 workers has reached the $1,500 threshold. Yet their enjoyment of the work, and their desire for more, is clear: when Seshadri, now Karya’s chief technology officer, asks the room full of workers whether they would feel capable of a new task flagging inaccuracies in Kannada sentences, they erupt in excited chatter: a unanimous yes.
The villagers I speak to in Chilukavadi and Alahalli have only a limited understanding of artificial intelligence. Chopra says this can be a challenge when explaining to workers what they’re doing. The most successful approach his team has found is telling workers they are “teaching the computer to speak Kannada,” he says. Nobody here knows of ChatGPT, but villagers do know that Google Assistant (which they refer to as “OK Google”) works better when you prompt it in English than in their mother tongue. Siddaraju L., a 35-year-old unemployed father of three, says he doesn’t know what AI is, but would feel proud if a computer could speak his language. “The same respect I have for my parents, I have for my mother tongue.”
Just as India was able to leapfrog the rest of the world on 4G because it was unencumbered by existing mobile data infrastructure, the hope is that efforts like the ones Karya is enabling will help Indian-language AI projects learn from the mistakes of English AIs and begin from a far more reliable and unbiased starting point. “Until not so long ago, a speech-recognition engine for English would not even understand my English,” says Bali, the speech researcher, referring to her accent. “What is the point of AI technologies being out there if they do not cater to the users they are targeting?”
- The Man Who Thinks He Can Live Forever
- Why We Can't Get Over the Roman Empire
- The Final Season of Netflix’s Sex Education Sends Off a Beloved Cast in Style
- How Russia Is Recruiting Cubans to Fight in Ukraine
- The Case for Mediocrity
- Paul Hollywood Answers All of Your Questions About The Great British Baking Show
- How Canada and India's Relationship Crumbled
- Want Weekly Recs on What to Watch, Read, and More? Sign Up for Worth Your Time