Matt N
TS Member
- Favourite Ride
- VelociCoaster (Islands of Adventure)
Hi guys. With the technology being increasingly prevalent, and an increasingly large part of most fields, I figure it’s about time we had a thread to discuss large language models (LLMs), or so-called “generative AI”. Whether ChatGPT, Gemini, Claude, DeepSeek or another model entirely is your weapon of choice, I’m sure most people have come across generative AI at some point within the last 2 years or so, so I figured it might be fun to have a wider discussion topic about generative AI (particularly seeing as another Alton Towers thread was derailed by GenAI talk earlier).
In here, I’m sure we could discuss anything about it. From where you think it might go to ethical dilemmas to fun things you’ve done with generative AI, I’d love to hear your opinions!
I’ll get the ball rolling with a news article I saw today in The Guardian.
Since generative AI first came about, there has been a lot of debate about whether LLMs, and their more recent sibling LRMs (Large Reasoning Models), have the ability to reason. Within the last couple of days, Apple have released a research paper all but indicating that these models cannot reliably reason: https://www.theguardian.com/comment...ar-ai-puzzle-break-down?CMP=oth_b-aplnews_d-5
In terms of the more detailed summary of Apple’s findings, they have found that generative AI models are very good at pattern recognition, but often fail to generalise when greeted with scenarios too far removed from their training data, despite being explicitly designed for reasoning problems in some cases.
As an example, generative AI models were tested on the classic Towers of Hanoi problem, a problem containing three pegs and a number of discs that entails moving all discs on the left peg onto the right one without stacking a larger disc on top of a smaller one. It was found that generative AI models could only just do it when 7 discs were present, attaining 80% accuracy, and that they could hardly do it at all when 8 discs were present. For some idea, this is a problem that has already been solvable by classical AI techniques since as early as 1957, and, as the author of the Guardian article puts it, “is solvable by a bright and patient 7 year old”. It was also found that even when the generative AI models were told the solution algorithm, they still could not reliably solve the Towers of Hanoi problem, which would suggest that their thought process is not logical and intelligent like that of a human.
Now of course, it’s worth noting that many humans would have issues solving a puzzle like Hanoi, particularly with 8 discs. But as the author of the Guardian article points out, it is a definite setback for LLMs that would suggest that the current iteration of generative AI is not the technology that will bring about the sort of artificial general intelligence (AGI), or “the singularity”, that is capable of superseding human intelligence and solving any complex problem. These findings would suggest that the outputs of generative AI are too hit and miss to fully trust, and that it can’t be expected to solve any complex problem on its own with any degree of reliability.
These findings would confirm a long-held suspicion of mine about generative AI. It is undeniably very clever, but I’ve long been sceptical of talk of it being able to reason and the like. What they effectively produce, despite the anthropomorphic seeming quality of the “speech” output, is something that looks like a plausible solution to the given question based on the data it has been trained on. These models are trained with language data that says “x word is most commonly followed by y word” and such. While ChatGPT and the like are probably underpinned by mind-blowing model architectures trained on equally mind-blowing training datasets, they suffer from the same flaws as any mathematical model in that they can generalise within a distribution of their training data. If greeted by a complicated problem that’s too far removed from their training data, they lack the skill to provide a reliable answer. I myself have had instances with ChatGPT where it has generated outputs that look plausible at first glance, but do not hold up to any scrutiny whatsoever if inspected more closely.
But I’d be keen to know; what are your thoughts on Apple’s findings, and generative AI in general? I’d love to hear any thoughts about the wider topic of generative AI!
				
			In here, I’m sure we could discuss anything about it. From where you think it might go to ethical dilemmas to fun things you’ve done with generative AI, I’d love to hear your opinions!
I’ll get the ball rolling with a news article I saw today in The Guardian.
Since generative AI first came about, there has been a lot of debate about whether LLMs, and their more recent sibling LRMs (Large Reasoning Models), have the ability to reason. Within the last couple of days, Apple have released a research paper all but indicating that these models cannot reliably reason: https://www.theguardian.com/comment...ar-ai-puzzle-break-down?CMP=oth_b-aplnews_d-5
In terms of the more detailed summary of Apple’s findings, they have found that generative AI models are very good at pattern recognition, but often fail to generalise when greeted with scenarios too far removed from their training data, despite being explicitly designed for reasoning problems in some cases.
As an example, generative AI models were tested on the classic Towers of Hanoi problem, a problem containing three pegs and a number of discs that entails moving all discs on the left peg onto the right one without stacking a larger disc on top of a smaller one. It was found that generative AI models could only just do it when 7 discs were present, attaining 80% accuracy, and that they could hardly do it at all when 8 discs were present. For some idea, this is a problem that has already been solvable by classical AI techniques since as early as 1957, and, as the author of the Guardian article puts it, “is solvable by a bright and patient 7 year old”. It was also found that even when the generative AI models were told the solution algorithm, they still could not reliably solve the Towers of Hanoi problem, which would suggest that their thought process is not logical and intelligent like that of a human.
Now of course, it’s worth noting that many humans would have issues solving a puzzle like Hanoi, particularly with 8 discs. But as the author of the Guardian article points out, it is a definite setback for LLMs that would suggest that the current iteration of generative AI is not the technology that will bring about the sort of artificial general intelligence (AGI), or “the singularity”, that is capable of superseding human intelligence and solving any complex problem. These findings would suggest that the outputs of generative AI are too hit and miss to fully trust, and that it can’t be expected to solve any complex problem on its own with any degree of reliability.
These findings would confirm a long-held suspicion of mine about generative AI. It is undeniably very clever, but I’ve long been sceptical of talk of it being able to reason and the like. What they effectively produce, despite the anthropomorphic seeming quality of the “speech” output, is something that looks like a plausible solution to the given question based on the data it has been trained on. These models are trained with language data that says “x word is most commonly followed by y word” and such. While ChatGPT and the like are probably underpinned by mind-blowing model architectures trained on equally mind-blowing training datasets, they suffer from the same flaws as any mathematical model in that they can generalise within a distribution of their training data. If greeted by a complicated problem that’s too far removed from their training data, they lack the skill to provide a reliable answer. I myself have had instances with ChatGPT where it has generated outputs that look plausible at first glance, but do not hold up to any scrutiny whatsoever if inspected more closely.
But I’d be keen to know; what are your thoughts on Apple’s findings, and generative AI in general? I’d love to hear any thoughts about the wider topic of generative AI!

 
						 
 
		 
 
		 
 
		 
 
		
 
 
		 
 
		