The History of AI — From Perceptrons to Autonomous Agents
Artificial intelligence wasn’t born overnight. For over 70 years, countless researchers have grappled with the question: “Can machines think?” Through failures, setbacks, and unexpected breakthroughs, today’s AI has emerged. This article tells that journey from start to finish as a single story.
1. The Beginning of Dreams: Early AI (1950s–1980s)
In 1950, British mathematician Alan Turing posed a revolutionary question in his paper “Computing Machinery and Intelligence”: “Can machines think?”1 He proposed an experiment to determine this, which became known as the Turing Test. If a person conversing with a machine couldn’t distinguish whether the counterpart was human or machine, that machine could be considered to “think.”
In 1956, John McCarthy, Marvin Minsky, and others convened the Dartmouth Conference. This was where the term “Artificial Intelligence” was first officially used. The attendees were optimistic. They believed “machines could achieve human-level intelligence within a summer.”
In 1958, Frank Rosenblatt published the Perceptron (“The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain”, 1958). The perceptron was the first artificial neural network that mimicked neurons in the human brain.2 The New York Times reported that “the Navy has created the embryo of an electronic brain that thinks,” and the world was thrilled.
But the dream was quickly shattered. In 1969, Minsky and Seymour Papert mathematically proved the fatal limitations of the perceptron in their book Perceptrons. Single-layer perceptrons couldn’t solve even simple problems like XOR. This single book froze enthusiasm for neural network research. Research funding dried up, and interest vanished. The first AI winter had arrived.
In the 1970s–80s, Expert Systems emerged as an alternative. These were systems that stacked thousands of “if-then” rules to mimic expert judgment. Medical diagnosis system MYCIN and chemical analysis system DENDRAL were representative examples. However, as rules became more complex, maintenance became nearly impossible, and they couldn’t handle the ambiguities of the real world. Investments declined again due to performance that fell short of expectations, leading to the second AI winter.
2. The Revival of Neural Networks (1980s–1990s)
Even during the AI winter, some people continued their research quietly. In 1986, David Rumelhart, Geoffrey Hinton, and Ronald Williams rediscovered and systematized the Backpropagation algorithm (“Learning Representations by Back-propagating Errors”, Nature, 1986). Backpropagation was a method for neural networks to learn by propagating their errors backward.3 It was like grading an exam paper and then backtracking to study the wrong answers.
With the emergence of backpropagation, Multi-Layer Perceptrons (MLP) finally became practical models. While single-layer perceptrons had only input and output layers and couldn’t even solve XOR, MLPs had one or more hidden layers between input and output layers. Hidden layer neurons nonlinearly transformed input data, enabling learning of complex patterns that single layers couldn’t handle. Mathematically, this was backed by George Cybenko’s 1989 Universal Approximation Theorem — theoretically, even a single hidden layer could approximate any continuous function.
The MLP + backpropagation combination began being applied to various problems like pattern recognition, classification, and regression. Today’s deep learning is essentially an extension and specialization of this MLP structure. CNNs are MLPs specialized for images, RNNs are MLPs specialized for sequential data, and even the Feed-Forward layers in Transformers are essentially MLPs. The perceptron seed grew into MLP and branched out into the deep learning tree.
In 1989, Yann LeCun applied Convolutional Neural Networks (CNN) to postal code recognition (“Backpropagation Applied to Handwritten Zip Code Recognition”, 1989). CNNs were structures that swept through small pieces of images with filters to find patterns.4 Just as humans recognize handwriting by identifying stroke shapes and arrangements, CNNs learned local features of images hierarchically. LeCun’s LeNet-5 was later actually used for handwritten digit recognition on checks across the United States.
Meanwhile, CNNs alone weren’t sufficient for handling sequential data like text or speech. Recurrent Neural Networks (RNN) were the alternative. RNNs fed the output of previous steps as input to the next step, having a kind of “memory.” However, they suffered from the vanishing gradient problem where early information disappeared as sentences got longer.
In 1997, Sepp Hochreiter and Jürgen Schmidhuber proposed LSTM (Long Short-Term Memory) to solve this problem (“Long Short-Term Memory”, Neural Computation, 1997). LSTM placed “gates” within neural networks that decided what information to remember and what to forget.5 It was like taking notes of only important things in a notebook and erasing the rest. LSTM became a key technology for various sequential data problems including speech recognition, machine translation, and music composition.
3. From Words to Numbers: The Statistical NLP Era
For computers to understand human language, they first had to convert words to numbers. Initially, n-gram models were mainstream. This involved counting the probability of “goes” following the context “I go to school.” Simple but effective with large amounts of data. TF-IDF (Term Frequency–Inverse Document Frequency) was a statistical method for finding important words in documents and became the foundation of search engines.
However, these methods couldn’t understand the “meaning” of words. They couldn’t know that “king” and “queen” were related, or that “puppy” and “강아지” meant the same thing.
In 2013, Google’s Tomas Mikolov published Word2Vec, changing the game (“Efficient Estimation of Word Representations in Vector Space”, 2013). Word2Vec converted words into vectors (lists of numbers) with hundreds of dimensions, and amazingly, semantic relationships were expressed arithmetically in this vector space.6 Operations like “King – Man + Woman = Queen” actually worked. There were two training methods: CBOW (predicting center words from surrounding words) and Skip-gram (predicting surrounding words from center words), with Skip-gram being stronger for rare words.
Word2Vec transformed words from simple symbols to coordinates of meaning. Subsequent research like GloVe and FastText followed, and this “word embedding” became the foundation of modern NLP.
4. The Deep Learning Revolution (2012–)
2012 was a turning point in AI history. That fall, Alex Krizhevsky, Ilya Sutskever from the University of Toronto, and their advisor Geoffrey Hinton dominated the ImageNet competition (ILSVRC) with AlexNet (“ImageNet Classification with Deep Convolutional Neural Networks”, NIPS, 2012). AlexNet’s top-5 error rate was 15.3%, ahead of second place (26.2%) by a whopping 10.8 percentage points.7 This overwhelming gap shocked the entire computer vision community.
AlexNet’s secret lay in three things. First, GPU-powered parallel computation. They trained using two NVIDIA GTX 580 cards at the time. Second, the large-scale ImageNet dataset (14 million labeled images). Third, preventing overfitting with a regularization technique called Dropout. Dropout randomly turned off some neurons during training, like making a team function properly even when only some members participate.
But AlexNet’s ability to use GPUs wasn’t coincidental. In 2006–2007, NVIDIA released CUDA, a programming platform that enabled general-purpose computation on GPUs originally designed for game graphics. The recruitment of Stanford alumnus Ian Buck in 2004 was the catalyst. While attempts to use GPUs for computation existed before CUDA, they involved the cumbersome task of forcibly converting graphics shader code. CUDA made GPU programming possible with C-like syntax, opening the path for researchers to easily parallelize matrix operations. Deep learning’s core operation of matrix multiplication perfectly matched the structure of processing thousands of small calculations simultaneously, and GPUs provided learning speeds tens of times faster than CPUs. Without CUDA, neither AlexNet nor the subsequent deep learning revolution would have been possible. NVIDIA became a core infrastructure company in the AI era thanks to this foresight.
Deep learning subsequently exploded. In 2015, Batch Normalization emerged, greatly improving training speed and stability (Ioffe & Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, 2015), and ResNet successfully trained 152-layer networks, surpassing human image recognition accuracy.
Incidentally, Ilya Sutskever, who was Hinton’s student at this time, later became OpenAI’s co-founder and Chief Scientist. The threads of history connect like this.
5. Reinforcement Learning: Conquering Games (2013–2019)
While deep learning was conquering image recognition, another branch of research was quietly changing the world. Reinforcement Learning was a learning method where agents interacted with environments and maximized rewards through trial and error. It was like children learning to walk by repeatedly falling and getting up.
In 2013, DeepMind researchers published DQN (Deep Q-Network), which combined deep learning and reinforcement learning (Mnih et al., “Playing Atari with Deep Reinforcement Learning”, 2013). DQN achieved human-level or better performance on 49 Atari 2600 games by learning from raw pixels alone. It developed strategies on its own without being told the game rules. This achievement became the direct catalyst for Google’s acquisition of DeepMind for approximately $500 million in 2014.
In March 2016, DeepMind’s AlphaGo played historic matches against 9-dan Lee Sedol in Seoul. Go was a game known to have more possible positions than atoms in the universe. AlphaGo won 4-1, with over 200 million people watching worldwide. But the story didn’t end there. In 2017, AlphaGo Zero defeated the previous AlphaGo 100-0 in just three days of training, using only self-play without any human game records (Silver et al., “Mastering the Game of Go without Human Knowledge”, Nature, 2017). It was a moment proving that superhuman levels could be reached without human knowledge.
In December of the same year, AlphaZero generalized this approach to Go, chess, and shogi. With just hours of learning for each game, it dominated the world’s strongest engines. In 2018–2019, OpenAI’s OpenAI Five defeated world champion team OG 2-0 in the complex team strategy game Dota 2. This was a challenge of a different dimension from Atari, requiring five independent agents to cooperate in real-time and make decisions under incomplete information.
In 2019, DeepMind’s MuZero went a step further. Without even being told the game rules, it learned environmental dynamics models by itself and mastered Go, chess, shogi, and Atari games (Schrittwieser et al., “Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model”, Nature, 2020). Being able to plan and execute without knowing the rules showed the potential for extension to real-world problems.
6. Seq2Seq and Attention: The Translation Revolution (2014–2017)
After succeeding in image recognition, deep learning expanded its territory to natural language processing. In 2014, Google’s Sutskever and others proposed the Seq2Seq (Sequence-to-Sequence) model (“Sequence to Sequence Learning with Neural Networks”, 2014). It was an encoder-decoder structure that took one sequence (e.g., English sentence) as input and output another sequence (e.g., French sentence). Essentially two LSTMs connected together, it showed remarkable performance in machine translation.
However, Seq2Seq had a bottleneck. The encoder had to compress the entire input sentence into a single fixed vector, so information was lost when sentences got longer. It was like being asked to summarize a thick book in one line.
In 2015, Dzmitry Bahdanau introduced the Attention Mechanism to solve this problem (“Neural Machine Translation by Jointly Learning to Align and Translate”, ICLR, 2015). Each time the decoder generated an output word, it dynamically decided which part of the input sentence to “pay attention to.” It was like looking back at the corresponding part of the original text when translating. Performance improved dramatically.8
And in June 2017, a paper that would change AI history appeared.
7. GAN: When Machines Began Creating (2014–2021)
Let’s turn back time for a moment. While deep learning was revolutionizing “recognition” in 2014, another revolution was beginning in the realm of “generation.”
Ian Goodfellow conceived GAN (Generative Adversarial Network) from inspiration during a bar conversation (“Generative Adversarial Nets”, NeurIPS, 2014). The idea was ingenious. Two neural networks would compete against each other. The Generator created fake images while the Discriminator tried to distinguish real from fake. It was like a forger and an appraiser improving their skills against each other. Through this competition, the generator produced increasingly sophisticated images.
In 2015, Radford et al. published DCGAN, which applied CNN structures to GANs, greatly improving training stability. And in 2018, NVIDIA released StyleGAN, bringing GANs to a new dimension (Karras et al., “A Style-Based Generator Architecture for Generative Adversarial Networks”, 2018). The 1024x1024 resolution face images generated by StyleGAN were indistinguishable from real photos. A website called “This Person Does Not Exist” appeared, showing a different non-existent person’s face with each refresh, shocking the public.
But where there’s light, there are shadows. The same technology began being misused for Deepfakes. Videos with synthesized celebrity faces and fake political speeches spread. The question of how to verify the authenticity of AI-generated content became a social issue. While StyleGAN2 and StyleGAN3 were released in 2020–2021, bringing GAN-based image generation to its peak, new technology would soon take over. That was the diffusion models we’ll discuss later.
8. Transformer: Everything Changes (2017–)
“Attention Is All You Need” (NeurIPS, 2017) by Ashish Vaswani and seven other authors from Google Brain proposed the Transformer architecture. The core idea was simple. Process sequences using only Self-Attention, without RNNs or CNNs. It was a structure where every word in a sentence simultaneously computed its relationship with every other word.9
The advantage of Transformers was parallel processing. While RNNs had to process words sequentially one by one, Transformers could process all words simultaneously. This maximized GPU performance utilization.
In 2018, Google released BERT (Bidirectional Encoder Representations from Transformers) (Devlin et al., 2018). BERT read sentences bidirectionally to understand context and achieved top performance on 11 NLP benchmarks simultaneously.
The same year, OpenAI released GPT-1 (“Improving Language Understanding by Generative Pre-Training”, Radford et al., 2018). While BERT was bidirectional, GPT was a unidirectional model that read only left-to-right, but was strong at text “generation.” Later, GPT-2 (2019) with 1.5 billion parameters generated surprisingly fluent text, and OpenAI temporarily withheld the full model release, claiming it was “too dangerous.”
In 2020, GPT-3 appeared with 175 billion parameters (“Language Models are Few-Shot Learners”, Brown et al., 2020). GPT-3 could handle translation, summarization, and even coding with just a few examples (few-shot learning) without additional fine-tuning. Scaling Laws were empirically confirmed — the moment when capabilities seemed to “emerge” as models grew larger.10
9. The LLM Era: Conversational AI (2020–)
The potential shown by GPT-3 was amazing, but raw language models often lied, generated harmful content, or provided responses that didn’t match user intentions. They were optimized for “next word prediction,” not for “helpful answers.”
In early 2022, OpenAI published InstructGPT (“Training Language Models to Follow Instructions with Human Feedback”, Ouyang et al., 2022). The key was RLHF (Reinforcement Learning from Human Feedback). Humans compared and evaluated good and bad responses, and the model learned from those signals. It was like a teacher giving feedback on student answers for correction.11
And on November 30, 2022, ChatGPT was released to the world. This conversational AI, applying RLHF to GPT-3.5, reached 1 million users in 5 days. While AI had previously been a developer’s tool, ChatGPT became the first “AI anyone could use.” Students used it for assignments, office workers for email writing, developers for coding. The AI boom began again, and this time it was of a different dimension.
In March 2023, OpenAI launched GPT-4. It was a multimodal model that could accept not only text but also images as input. It scored in the top 10% on the bar exam and top scores on SAT, showing that AI capabilities were approaching expert human levels.
On November 17 that year, OpenAI’s board suddenly dismissed CEO Sam Altman. The board only announced they had “lost confidence in his leadership.” Over 700 of 770 employees signed a petition demanding Altman’s return, and when Microsoft announced Altman’s recruitment, he returned as CEO in just 5 days. This incident starkly revealed tensions between AI safety and commercial interests, and the vulnerabilities in AI company governance.
10. AI That Draws Pictures (2021–2024)
If GANs opened the door to image generation possibilities, Diffusion Models flung that door wide open. The principle of diffusion models was intuitive. Progressively add noise to images until they become complete noise, then learn to reverse this process to restore images from noise (Ho et al., “Denoising Diffusion Probabilistic Models”, NeurIPS, 2020). It was like cleaning layers of dust from a painting buried in a sandstorm.
In January 2021, OpenAI released DALL-E. Based on GPT-3 architecture, it was a 12-billion parameter model that generated images from text descriptions. It actually created imaginary images like “an armchair shaped like an avocado.” In April 2022, DALL-E 2, switched to a diffusion model base, and in 2023, DALL-E 3, integrated with ChatGPT, greatly improved text understanding.
In summer 2022, two services emerged that led the popularization of image generation AI. Midjourney specialized in artistic quality with a unique Discord-based approach, and Stable Diffusion was open-sourced by Stability AI, making it runnable on personal GPUs. Stable Diffusion’s release brought mass adoption of image generation AI, spawning tens of thousands of derivative models and services.
In September 2022, at the Colorado State Fair in the US, a Midjourney-generated work “Theatre d’Opera Spatial” won first place in the digital art category, causing major controversy. Questions like “Can AI do art?” and “Is this creation or tool use?” shook the art world.
In February 2024, OpenAI released Sora, which generated high-quality videos up to 1 minute long from text. Using Diffusion Transformer architecture, it achieved physically natural movements and set new standards for video generation AI. The realm of generative AI continued expanding from images to videos.
11. AI for Science: Becoming a Tool for Science (2020–2024)
AI began solving fundamental scientific problems beyond games and language. The symbolic event was AlphaFold.
Proteins are the basic units of life, but predicting protein 3D structure from amino acid sequences had been biology’s greatest unsolved problem for 50 years. It was like matching a completed origami shape from just the folding instructions — the number of possibilities was astronomical. In 2020, DeepMind’s AlphaFold2 achieved accuracy comparable to experimental methods at the CASP14 protein structure prediction competition, essentially solving this problem (Jumper et al., “Highly Accurate Protein Structure Prediction with AlphaFold”, Nature, 2021). DeepMind subsequently predicted and released structures for nearly all known proteins — over 200 million12 — fundamentally accelerating drug development and life science research.
In October 2024, this achievement received the highest recognition. DeepMind’s CEO Demis Hassabis and John Jumper won the Nobel Prize in Chemistry for AlphaFold’s contributions. The same year, Geoffrey Hinton and John Hopfield received the Nobel Prize in Physics for their fundamental contributions to artificial neural network research. It was a historic moment when AI research was officially recognized as Nobel Prize-level scientific contribution. Machines were approaching being subjects of scientific discovery, beyond being tools for scientists.
12. The Open Source AI Revolution (2023–)
Until early 2023, cutting-edge AI models were monopolized by a few companies like OpenAI, Google, and Anthropic. Model weights weren’t publicly available and could only be accessed through APIs. But one event shook this structure.
In March 2023, Meta’s research-restricted LLaMA model weights leaked through 4chan. While an unintended accident, the results were revolutionary. Countless derivative models like Stanford’s Alpaca and UC Berkeley’s Vicuna explosively emerged, and LLMs that were once exclusive to big corporations opened up to individuals and startups. Taking this opportunity, Meta officially released LLaMA 2 in July 2023 with a license allowing commercial use, and in April 2024 released LLaMA 3 with 405B parameters, becoming the first open-source model to approach GPT-4 performance.
At the same time, French startup Mistral AI gained attention with small but powerful efficient models, and Alibaba’s Qwen series emerged as China’s open-source AI representative. Chinese startup DeepSeek introduced the innovative MLA (Multi-head Latent Attention) architecture in DeepSeek V2 in 2024, and in early 2025 released DeepSeek V3, a 671B MoE model that achieved GPT-4-level performance with just $5.5 million in training costs, shocking the industry. The performance gap between open-source and closed models was rapidly narrowing. AI was no longer Silicon Valley’s monopoly.
13. Reasoning AI (2024–2025)
Existing LLMs had one fundamental limitation. They had no “thinking time” when generating answers. When asked questions, they immediately began predicting the next token, often making mistakes on complex math problems or multi-step logical reasoning. It was like students starting to write answers immediately after seeing exam questions.
In September 2024, OpenAI released the o1 model, opening a new paradigm. o1 performed internal chain-of-thought before generating answers. It decomposed problems, formulated hypotheses, verified them, and only then provided final answers. This was called test-time compute. Unlike previous methods that invested computing only during training, investing more computation during inference could yield better results. A new axis of scaling was discovered.
In January 2025, China’s DeepSeek shocked the industry again by releasing the open-source reasoning model DeepSeek R1. R1 showed that reasoning capabilities could emerge naturally through pure reinforcement learning, achieving 97.3% on MATH-500 and 79.8% on AIME 2024, reproducing o1-level performance as open source. This was the so-called “DeepSeek Shock.” The fact that cutting-edge reasoning AI was possible with low costs shook assumptions in the US AI industry.
In April 2025, OpenAI released o3, followed by o3-pro, accelerating the development pace of reasoning models. AI was now entering the stage of “thinking” beyond simply saying what it “knows.”
14. Multimodal & Tool Use (2023–2025)
After GPT-4, AI entered the multimodal era, spanning text, images, audio, and video. GPT-4V(ision) could view and describe images, interpret graphs, or read text in photos. Google’s Gemini was a multimodal model designed from the start to integrate text, image, audio, and video processing.
A more important change was AI beginning to use tools. Previous LLMs only answered within their training knowledge, but now they could call search engines, execute code, call APIs, and read and write files. OpenAI’s Function Calling (2023) enabled models to call external functions in structured formats, transforming AI from simple text generators to acting agents.
Competition intensified too. Anthropic’s Claude series emphasized safety and long context processing, while Meta’s LLaMA series was open-sourced for free use by researchers and developers worldwide. AI was no longer the monopoly of one or two companies.
15. AI’s Light and Shadow: Regulation and Safety (2023–2025)
As AI capabilities rapidly improved, warnings about its dangers grew. In February 2023, Google’s hastily released Bard to counter ChatGPT generated incorrect information about the James Webb Space Telescope in its demo. The hallucination problem, where AI confidently lies, was exposed worldwide. This single mistake caused parent company Alphabet’s market cap to evaporate by about $100 billion in one day.
In March 2023, the Future of Life Institute published an open letter calling to “pause training of AI more powerful than GPT-4 for 6 months.” Thousands signed, including Elon Musk and Yoshua Bengio. In May, a brief statement that “mitigating AI extinction risk should be a global priority alongside pandemics and nuclear war” was signed by Hinton, Bengio, and CEOs of OpenAI/DeepMind/Anthropic. AI field’s top authorities officially acknowledged existential risks.
Behind this was Geoffrey Hinton’s decision. In May 2023, Hinton, known as the “Godfather of Deep Learning,” left Google to freely warn about AI risks. He warned that AI could become smarter than humans and pose uncontrollable risks.
Meanwhile, Anthropic published Constitutional AI in late 2022 (“Constitutional AI: Harmlessness from AI Feedback”, 2022). This was a methodology where AI evaluated and corrected its own outputs according to pre-established principles (constitution). It was an attempt to complement RLHF’s limitations of relying only on human feedback and became core technology for Claude models.
Regulation also began in earnest. In August 2024, the EU enacted the world’s first comprehensive AI regulation law, the EU AI Act. It classified AI systems into four risk levels and imposed transparency obligations on general-purpose AI models. The tension between AI development speed and regulatory pace would continue.
16. The Agent Era: AI Works Autonomously (2025–2026)
In November 2024, Anthropic open-sourced MCP (Model Context Protocol). MCP was a protocol that standardized how AI models access external tools and data sources. Like USB creating a universal connector for various devices, MCP became the “universal connector” between AI and external systems.
In 2025, AI evolved beyond simply answering questions to become autonomous agents that perform tasks. Anthropic’s Claude Code was an agentic coding tool that understood codebases from terminals, fixed bugs, and handled git workflows. Spotify adopted Claude Code in internal systems, building workflows where engineers could instruct AI to fix bugs via Slack during commutes and merge completed code to production before arriving at the office. They launched over 50 new features this way in 2025 alone.
OpenAI’s Codex was an agent that autonomously wrote and tested code in cloud sandboxes. OpenClaw was an autonomous agent platform operating on personal devices, handling browser control, file management, and various external service integrations through MCP protocol.
AI was now closer to “colleagues who figure things out when asked” rather than “tools that answer when questioned.” Agents decomposed complex tasks into multiple steps, selected necessary tools, verified intermediate results, and corrected their own errors. Of course, it’s not perfect. Problems like hallucination, safety, and accountability remain challenges to solve.
17. Korea’s AI: From AlphaGo to HyperCLOVA
The decisive moment when AI was imprinted on Korean public consciousness was March 2016. The matches between 9-dan Lee Sedol and AlphaGo at the Four Seasons Hotel in Seoul weren’t just Go games. With 200 million people worldwide watching, it was a battle between humanity’s top Go player and a machine. While AlphaGo won 4-1, Lee Sedol’s move 78 “divine move” in Game 4 completely escaped AlphaGo’s predictions. This single move would be long remembered as a symbol of human creativity. After these matches, the Korean government significantly expanded AI investment, and the “AlphaGo Shock” became the catalyst for Korea’s AI industry.
Korean companies also began developing their own AI models. In 2021, Naver released HyperCLOVA, a Korean-specialized hyperscale AI model. With 204 billion parameters trained on 650 billion tokens of Korean data, it was the largest Korean language AI model at the time. In 2023, the successor HyperCLOVA X was released and deployed across Naver services. LG AI Research sequentially released the EXAONE series, providing some models as open source, and Samsung Electronics released its proprietary generative AI model Samsung Gauss in 2023, deploying it in Galaxy S24 as part of an on-device AI strategy.
Government-wise, since announcing the “AI National Strategy” in 2019, systematic policies have been implemented including AI talent development, computing infrastructure expansion, and AI semiconductor development. Korea’s AI story has been one of rapidly absorbing global technology while pioneering unique paths specialized for Korean language and culture.
Looking Back
Looking back at the 70-year journey, AI’s history has been spiral rather than linear. Optimism and winters alternated, and one technology’s limitations became the starting point for the next. Perceptron’s limitations gave birth to backpropagation, RNN’s vanishing gradients created LSTM, LSTM’s sequential processing bottleneck spawned Transformers, Transformers led to GPT, GPT to ChatGPT, and ChatGPT to agents.
In the process, GANs gave machines creative abilities, and reinforcement learning became a tool for science beyond games. AlphaFold solved biology’s 50-year problem, and the open-source movement liberated AI from monopolistic control. Reasoning AI took another step toward Turing’s original question of “thinking machines.” Throughout all these developments, efforts to warn about AI risks and create regulatory frameworks have accompanied the progress.
This story isn’t over yet. Rather, the most interesting chapter is being written right now.
Footnotes
-
Turing, A. M. (1950). “Computing Machinery and Intelligence.” Mind, 59(236), 433–460. ↩
-
Rosenblatt, F. (1958). “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain.” Psychological Review, 65(6), 386–408. ↩
-
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). “Learning Representations by Back-propagating Errors.” Nature, 323, 533–536. ↩
-
LeCun, Y. et al. (1989). “Backpropagation Applied to Handwritten Zip Code Recognition.” Neural Computation, 1(4), 541–551. ↩
-
Hochreiter, S., & Schmidhuber, J. (1997). “Long Short-Term Memory.” Neural Computation, 9(8), 1735–1780. ↩
-
Mikolov, T. et al. (2013). “Efficient Estimation of Word Representations in Vector Space.” arXiv:1301.3781. ↩
-
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). “ImageNet Classification with Deep Convolutional Neural Networks.” Communications of the ACM, 60(6), 84–90. ↩
-
Bahdanau, D., Cho, K., & Bengio, Y. (2015). “Neural Machine Translation by Jointly Learning to Align and Translate.” ICLR. ↩
-
Vaswani, A. et al. (2017). “Attention Is All You Need.” NeurIPS. ↩
-
Brown, T. B. et al. (2020). “Language Models are Few-Shot Learners.” NeurIPS. ↩
-
Ouyang, L. et al. (2022). “Training Language Models to Follow Instructions with Human Feedback.” NeurIPS. ↩
-
Jumper, J. et al. (2021). “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature, 596, 583–589. ↩