The AI industry is built on geographic and social inequality, research shows

The arm of global inequality is long and is particularly noticeable in the development of AI and machine learning systems. In a recent article, researchers from Cornell, the Universite de Montreal, the National Institute of Statistical Sciences (USA) and Princeton argue that this inequality in the AI ​​industry involves a concentration of profits and increases the risk of ignoring the contexts, in which AI is used.

With the increasing anchoring of AI systems in society, those responsible for the development and implementation of such systems could benefit greatly from it. And if these actors are predominantly in economic powers such as the USA, China and the EU, a disproportionate share of the economic benefit will fall into these regions, which exacerbates inequality.

Regardless of whether there is an explicit reaction to this inequality or not, broader involvement in the development of AI was called for. At the same time, some have recognized the limitations of inclusion. For example, when analyzing publications at two major machine learning conference venues, NeurIPS 2020 and ICML 2020, none of the ten countries for the publication index were in Latin America, Africa, or Southeast Asia, co-authors of this new study note. In addition, the full lists of the Top 100 Universities and Top 100 Companies by Publication Index did not include any companies or universities based in Africa or Latin America.

This inequality manifests itself in part in data collection. Previous research has shown that ImageNet and OpenImages, two large, publicly available image data sets, are US and euro centered. Models trained on these data sets perform worse in images from countries in the global south. For example, pictures of grooms from Ethiopia and Pakistan are classified with less accuracy than pictures of grooms from the United States. In this sense, publicly available object recognition systems cannot correctly classify many of these objects if they come from the global south, as images of words like “wedding” or “spices” are presented in significantly different cultures.

Labels, the annotations from which AI models learn relationships in data, also bear the mark of inequality. A major place for crowdsourced labeling work is Amazon Mechanical Turk, but an estimated less than 2% of Mechanical Turk employees are from the global south, with the vast majority being from the US and India. Not only are the tasks monotonous and low wages – on Samasource, another workload crowdsourcing platform, workers earn around $ 8 a day – but there are a number of barriers to participation. A computer and a reliable internet connection are required. At Amazon Mechanical Turk, US bank accounts and gift cards are the only payment methods.

As the researchers point out, ImageNet, which has been critical to recent advances in image processing, would not have been possible without the work of data labelers. But ImageNet employees themselves earned an average wage of $ 2 an hour, with just 4% more than the US federal minimum wage of $ 7.25 an hour – even far from a living wage.

“As [a] Data labeling is an integral part of the data collection pipeline and an extremely poorly paid task that involves repetitive tasks that don’t leave room for upward mobility, ”the co-authors write. “Individuals may not need a lot of technical skills to tag data, but neither do they develop meaningful technical skills. The anonymity of platforms such as Amazon Mechanical Turk prevents the formation of social relationships between the labeler and the customer, which otherwise could have led to further educational opportunities or better remuneration. Although data is central to today’s AI systems, data labelers receive only a disproportionately small portion of the profits from building these systems. “

The co-authors also find inequality in the AI ​​research labs set up by tech giants like Google, Microsoft, Facebook, and others. Despite the presence of these centers across South and Latin America, they usually focus on specific countries, particularly India, Brazil, Ghana, and Kenya. And positions there often require technical expertise that local populations may not have, as evidenced by the tendency for AI researchers and practitioners to work and study in places outside their home country. The co-authors cite a recent report from the Center for Security and Emerging Technologies at Georgetown University that found that 42 of the 62 major AI laboratories are outside the US, but 68% of the workforce is in the US.

“Even with long-term investments in regions in the global south, the question remains whether residents are given the opportunity to join management and contribute to important strategic decisions,” wrote the co-authors. “True inclusion requires that underrepresented voices be found in all ranks of a company’s hierarchy, including senior management positions. Tech companies establishing themselves in these regions are uniquely positioned to offer this opportunity to the locals of the region. “

Co-authors are encouraged by the efforts of organizations such as Khipu and Black in AI, which have identified students, researchers, and practitioners in the AI ​​field, and have made improvements in increasing the number of Latin American and Black scientists attending world-class AI conferences and publish them. Other communities based on the African continent such as Data Science Africa, Masakhane, and Deep Learning Indaba have expanded their efforts through conferences, workshops, and dissertation awards, and developed curricula for the wider African AI community.

In this case, the co-authors say that a key component of future inclusion efforts should be to increase the participation and participation of those who have historically been excluded from AI development. Currently, they argue, data labelers are often completely separate from the rest of the machine learning pipeline, with employees often unaware of how their workforce is being used and for what purpose. The co-authors say these workers should be offered educational opportunities that will enable them to contribute beyond labeling to the models they build.

“Little sense of fulfillment comes from simple tasks [like labeling]and by using these workers solely for their produced knowledge, without putting them in the fold of the product they are helping to create, there is a deep gap between the workers and the downstream product, ”the co-authors wrote. “Similarly, if participation in model development is the norm, employers should seek to involve local residents in the ranks of management and in the process of strategic decision-making.”

While the co-authors acknowledge that this is not an easy task, they suggest looking at AI development as a path for economic development. Rather than relying on foreign front-runners in AI systems for domestic application, where the income from those systems is often not reinvested domestically, they encourage countries to develop domestic AI development activities that focus on “high productivity” activities such as model development, deployment and research focus.

“As AI continues to evolve around the world, the exclusion of those most likely to bear the brunt of algorithmic inequality will only worsen,” the co-authors wrote. “We hope that the measures we propose can help transform the movement of communities in the Global South from being mere beneficiaries or subjects of AI systems to active, engaged participants. True freedom of choice over AI systems built into the livelihood of communities in the Global South will maximize the impact of those systems and pave the way for AI inclusion globally. “


VentureBeat’s mission is to be a digital city square for tech decision makers to gain knowledge of transformative technology and transactions. Our website provides important information on data technologies and strategies to help you run your business. We invite you to become a member of our community and access:

  • current information on the topics of interest to you
  • our newsletters
  • gated thought leader content and discounted access to our valuable events like Transform
  • Network functions and more

become a member