In two consecutive days during May 2024, global tech giants “OpenAI” and “Google” announced the development of a new generation of advanced assistants capable of interacting through text, voice, and images, analyzing surrounding context, and providing real-time, human-like responses. On May 13, OpenAI unveiled the GPT-4o application, with the “O” standing for “Omni,” signifying its comprehensive nature. This application can think across voice, vision, and text in real time. The next day, on May 14, Google introduced its “Astra” project at its annual developer conference, describing it as an “AI-for-everything agent.”
These developments underscore the fierce competition to create an all-encompassing, highly capable assistant that can perform a wide range of tasks, from suggesting travel destinations to composing long texts, reciting poetry, and composing music. These innovations mark revolutionary changes in how machines integrate into everyday life, dramatically expanding the reach and impact of artificial intelligence (AI) and presenting opportunities and challenges that seem closer than ever before.
A New Generation
The term “GPT” stands for “Generative Pretrained Transformers,” a technique first introduced by Indian researcher Ashish Vaswani and others in 2017. It used a neural network based on the attention mechanism for multi-language machine translation tasks. The attention mechanism allows the model to assess the importance of different words in a sentence when encoding a specific word, helping it understand context more effectively.
OpenAI utilized this innovative technique to release the first GPT models in June 2018, with 117 million parameters. The second version followed in February 2019, with 1.5 billion parameters and an advanced ability to generate coherent and contextually relevant text. Initially withheld due to concerns about misuse, the model was fully released in November 2019. Months later, the third version, GPT-3, was launched in June 2020. GPT-3’s predictive learning capabilities allowed it to perform a wide range of tasks with minimal instruction, forming the foundation for various applications such as programming assistants, content creation tools, and chatbots. This led to the November 2022 release of ChatGPT, based on the GPT-3.5 architecture, followed by GPT-4 in March 2023, which offered improved understanding and generation of precise, contextually appropriate responses, handling more complex queries and ambiguous instructions.
On May 13, 2024, OpenAI launched GPT-4o, which can process and generate text, voice, and images in real-time. This model enables the creation of artistic designs, branding, caricatures, poetry, and visual novels. It can take notes, summarize, write, and translate, while also engaging with humans, understanding emotions, and responding to voice inputs in as little as 232 milliseconds, with an average of 320 milliseconds—approaching human response times in natural conversations.
The following day, on May 14, Google announced its “Astra” project at its developer conference. This project, described as an “agent” for daily tasks, represents a significant leap in the development of Google’s AI systems, such as the Gemini AI and Google Assistant. Google’s presentation included a series of videos demonstrating Astra’s capabilities, which appear similar to GPT-4o but with more advanced visual features, such as remembering where a user left their glasses. Although Google plans to release Astra within a few months, giving OpenAI a first-mover advantage, Google’s platform integration, including with Google Glass, will likely give Astra rapid global adoption and enhance user engagement.
This trend is not exclusive to Google. Tech companies are integrating intelligent assistants into their various applications, accelerating the spread of these tools across countless computers and devices worldwide. Meta is working on integrating its Llama 3 model into Facebook, Instagram, and WhatsApp, while Microsoft is embedding its Copilot assistant into Microsoft 360, Internet Explorer, Bing, and other browser tools. Although OpenAI does not have the same level of platform integration, it has negotiated with Apple to incorporate GPT into iOS systems, enhancing Siri’s cognitive capabilities and efficiency.
The widespread availability of these assistants is further supported by tech companies’ efforts to reduce machine learning costs, making it more accessible to developers of automated applications and thus driving broader user adoption. At its 2024 developer conference, Google introduced Gemini 1.5 Flash, a faster and cheaper model that uses “distillation” technology.
Diverse Challenges
AI is characterized by its dual-use nature, with the potential for both beneficial and harmful applications. While promotional videos highlight these new tools’ abilities to perform tasks efficiently, analyze environmental factors, and assist users in decision-making, they also raise social, technical, and security challenges:
Social and Psychological Impacts: Advanced assistants that mimic human interactions—expressing emotions, making jokes, commenting on users’ appearance or mood, or offering consolation—create a sense of companionship and trust between users and machines. These machines not only complete routine tasks like booking vacations, managing social time, or performing other daily functions but also entertain users with jokes, help them with home décor, and assist with various tasks. Unlike human companions, these AI tools don’t get angry, age, or require gifts, nor do they impose social or psychological burdens on users. This gives AI assistants unprecedented popular appeal, strengthening people’s faith in their capabilities and increasing dependence on them. Additionally, they collect vast amounts of data, enhancing their accuracy and capabilities.
Autonomous Decision-Making: These assistants need a certain level of autonomy to perform tasks without constantly bothering human users. This raises questions about the relationship between machine decisions and human preferences. Google’s DeepMind team, specializing in AI ethics, explored this issue in an April report titled “Ethics of Advanced AI Assistants,” highlighting the challenge of human alignment—understanding human preferences and values. Machines might misinterpret unclear instructions, or their autonomy might lead them to take actions that conflict with users’ values and interests, raising concerns about the interactions between these AI systems.
Generative Echo Chambers: The term “echo chamber” typically describes polarized online communities that reinforce shared opinions, limiting exposure to diverse views. However, American researchers found that chatbots can create similar effects. In a study of 272 participants, they examined whether AI-powered search tools promote selective exposure and reduce users’ encounters with diverse opinions. The study concluded that these bots often repeat users’ opinions, creating generative echo chambers that reinforce biases.
Security Risks: In February 2024, Microsoft and OpenAI, the owners of ChatGPT, identified five sources of cybersecurity threats that exploited these models for malicious purposes, including two from China and others from Iran, Russia, and North Korea. These entities used AI for open-source intelligence gathering, translation, code error detection, and basic coding tasks. Large language models also pose risks, particularly “prompt injection” attacks, where the models are manipulated.
Uneven Opportunities
AI has a disruptive effect, reshaping industries, markets, concepts, and relationships, especially when integrated with other technologies such as the Internet of Things and big data. Over the next five years, we may witness transformational changes that replace the status quo, altering how we learn, shop, and communicate. This will bring about structural changes in markets, industries, and social relationships for which the world may not be equally prepared.
Given the vast gap in AI capabilities, we are heading toward a world of inequality, with a widening divide between those who have access to these advanced tools and those who don’t, as well as between those who develop these tools and those who merely use them. While advanced assistants offer opportunities for information gathering, translation, problem-solving, and creativity, they also highlight a growing world of inequality, with disparities in access and usage based on financial, technical, and knowledge-based capabilities. Some scholars have even referred to this imbalance as “algorithmic colonialism,” reflecting the unequal relationship between the beneficiaries and the deprived.
For the past three decades, development theories have warned of the risks of the digital divide. However, the divide created by AI monopolization is larger, deeper, and growing faster. It affects not only users with varying levels of access to these advantages but also nations. The companies developing advanced machine learning models and neural networks now wield power beyond that of governments and international organizations, which resort to cooperation or sanctions to tame them. Developing countries, in particular, are largely excluded from these dynamics.