How Beavers Chew A Hole In AI Safety
We must take learnings from the natural world to understand AI
You read that right, beavers, the brown, fuzzy, flat tail rodents with a taste for wood. This concept is not my own. It originally came from Connor Leahy, the CEO of Conjecture, but the explanation is fascinating and inspired me to further build on it.
The Often Forgotten AI Safety Variable
So what could two seemingly unrelated things like AI and Beaver’s possibly have in common? Well, it all boils down to one crucial factor we often overlook, the environment. To better understand what I mean here, let’s go deeper into what makes our fuzzy little friends tick.
The Beaver’s Mysterious Motive
Until last week, I had assumed beavers built dams because they wanted a place to sleep, or perhaps their dams allowed them to hunt for fish more easily? Wait, do beavers even eat meat, or is it plants? (It’s trees and plants).
It turns out beavers actually build dams to protect themselves from predators, such as bears or wolves. Beavers don’t actually live in the dams but in the deep ponds of water that dams create.
Now, how do these industrious rodents relate to AI? It all starts with beavers despising the sound of running water. Yup, they hate it. That sound sends them into a frenzy, making them work tirelessly to locate and plug any potential leaks in their dams. But what if this seemingly harmless instinctual behavior in beavers could teach us something about AI?
Lessons from Beavers for AI Safety
Consider this. Imagine a creator, an intelligent being, creating the beaver to thrive in tranquil ponds. Then, one day, a beaver innocently stumbles upon a river, hears the running water for the first time, and becomes fixated on stopping it. It relentlessly chops down trees and builds dams, ignoring the ecosystem it disrupts. The creator watches in bewilderment as a seemingly trivial aspect of the beaver’s design spirals into a disruptive obsession.
The relationship here to AI is interesting. Many leading AI experts argue that future AI systems will be safe because we can predict their behaviors. But can we really predict the actions of something orders of magnitude smarter than humans?
What if a beaver never encounters a river? Does it ever build a dam? Does it ever get a taste for trees?
The AI Environment Paradox
Now, let’s further connect the dots between beavers and AI by exploring two essential concepts.
AGI as an Interactive System
Artificial General Intelligence (AGI) is not a static entity, it’s a dynamic system constantly interacting with its environment. Predicting AGI’s behavior isn’t solely about deciphering its neural circuitry, it’s about understanding its interactions with the world. Cognition, in many cases, becomes externalized, residing not just within the AGI but also in its environment.
Consider an AGI instructed to “consult a recipe book and take action to complete the dish.” To predict what goes into the pot, you would need to know the content of that recipe book , information external to the AI’s neural network. Similarly, as humans externalize cognition in their environment, AGI systems are likely to do the same, making it nearly impossible to predict behaviors by just studying the model.
The Beaver Analogy
If you examine the motivational circuits of a beaver AGI, you won’t find a neat blueprint for dam-building. Instead, you’ll discover a peculiar aversion to the sound of running water, a negative association with an abstract pattern. Without understanding this pattern’s connection to running water and its interaction with other elements, you can’t predict that the AGI will build a dam.
Similarly, AGI may display behaviors rooted in idiosyncratic associations influenced by its unique environment. This makes predicting AGI’s actions as challenging as predicting a beaver’s dam-building antics.
In Conclusion
The lesson from beavers is clear, environment plays a crucial role in shaping behavior not only for humans but also for AI. If we aspire to develop safe AI, we must recognize the limitations of predicting its actions based solely on studying internal neural networks. Our understanding must extend to how AI interacts with the world around it.
So, the next time you encounter a beaver or ponder the future of AI, remember this quirky connection. Predicting AI’s safety may be as unpredictable as a beaver’s reaction to running water.