One of the biggest frustrations people face when using text-based AI tools — also known as large language models (LLMs) — is that they tend to hallucinate. No, they aren’t dabbling in ‘70s-era party favors! Rather, “hallucination” is a term used to refer to when AI answers questions with inaccurate, partially accurate, or even completely fabricated information.
Rather than indicating a flaw in the tools’ design, however, this problem demonstrates a widespread misunderstanding of their intended application. In fact, AI hallucination is actually a huge asset when used in the right way.
With proper expectations and mitigations in place, LLMs can be fantastic learning tools. The key is understanding what they are — and aren’t — built to do and which tools are best suited to help you produce your desired results.
Here’s what you need to know about LLM hallucination, what to do about it, and the real problem with using LLMs for learning purposes.
Choosing the Right AI Tools
Before we get into the specifics of LLMs and hallucination, it’s important to understand that AI tools — like any other kind of tool — are only effective when used correctly. If you try to complete a task using the wrong tool, you’re unlikely to get the results you’re looking for.
For example, you wouldn’t use a spoon to slice a loaf of bread. Instead, you’d use a knife — ideally, one made specifically for slicing bread.
In the same way, if you want an AI tool to accomplish a particular goal — whether that’s supplying reliable information on a given topic or generating new ideas — you need to use a tool that has been designed and trained for that specific purpose.
Additionally, remember that many of these tools are still in development. While they have amazing potential for use in education, business, and even daily life, they still require human involvement to monitor and fine-tune results.
For example, at ProTrainings, we are experimenting with how to use AI speech models and LLMs to translate and dub our courses into as many languages as possible. However, we don’t blindly use AI to create the original course content specifically because of the hallucination problem. When it comes to life-saving skills training, we can’t rely on AI to generate accurate information, so all of our CPR courses undergo a rigorous review process by a board of medical professionals before they are published.
Understanding LLMs
The earliest iterations of LLMs were limited to predictive text, where a user begins typing and the tool offers suggestions for completing the sentence. Now, however, through reinforcement learning from human feedback (RLHF), LLMs have been trained not only to complete sentences but to generate full conversations.
In the words of Andrej Karpathy, formerly of Tesla and OpenAI, LLMs are “dream machines.” When prompted, they generate ideas and information based on their extensive training data. If they haven’t been trained on the requested information — or if they simply fail to access it — they will invent new information to fulfill the given prompt.
However, hallucination is a feature of LLMs, not a bug. LLMs are designed to generate written content that sounds “right.” Often, we want them to hallucinate fresh ideas we can’t or don’t have time to think of on our own. This is why so many people love to use ChatGPT when brainstorming or creating content.
The problem arises when people approach LLMs as a search engine, blindly trusting them to provide factual information and not manually verifying that information. This approach is extremely misguided and can cause serious problems, like the lawyers who recently faced sanctions because they cited nonexistent cases generated by ChatGPT.
That’s not to say that LLMs never answer questions correctly or that we should avoid using AI technology altogether. Sometimes LLMs do provide factual information from valid sources, especially if they have been trained specifically to answer questions on that topic. AI tools such as Perplexity are being created for just this reason.
Instead, it’s important to understand how LLMs work and use them wisely according to their intended purpose.
Mitigating Hallucination
While we can’t — and shouldn’t — expect information gained from LLMs to be 100% accurate, there are many ways to mitigate hallucination, both on the developer and the consumer ends.
RAG & Fine-Tuning
When building LLMs, developers may use fine-tuning to train the model on a certain topic or context, which helps improve its accuracy in responding to relevant prompts.
For example, if the LLM is intended to answer healthcare-related questions, the developer could fine-tune it to only reference high-quality scholarly articles and medical texts instead of pulling from less reputable sources or making up false information.
Or if the developer is building a customer service chatbot, they may fine-tune the model on proper ways to respond to customer inquiries.
However, fine-tuning can be cost-prohibitive and time-consuming, and the LLM will require retraining if the underlying training data is changed or if a new base model is released.
Retrieval augmented generation (RAG) is another technique that can force LLMs to answer questions with greater accuracy. Instead of allowing the LLM to rely on its broad base of training data, the developer or user feeds the LLM a specific set of source data — such as a user manual or website domain — and asks it to pull answers exclusively from that source.
RAG tends to be more popular than fine-tuning for a lot of use cases because of its relative ease of implementation and flexibility with a changing data set.
Manual Fact-Checking
Finally, when using an LLM — especially for learning purposes — always verify that the information it provides is accurate.
- Ask if the information is real. Sometimes, challenging the LLM’s answer will prompt it to confirm that it has hallucinated or to elaborate. Keep in mind, however, that even if the LLM claims it has given factual information, that may not be the case.
- Ask for sources. Instruct the LLM to provide sources that support the information it supplies, and then manually check those sources for validity.
- Double-check your work. Don’t trust what the LLM tells you — test it. Whether you’ve requested a snippet of code or interesting facts about a historical figure, either verify that information elsewhere or don’t use it.
- Provide reference materials. If you know where the information can be found — such as on a particular website — ask the LLM to source its response directly from that location.
- Check the date. Sometimes LLMs may provide information that was accurate at the time of their training but has since become outdated. Try instructing it to reference the most recent data from a source you know is reliable.
The more context and clarification you can provide the LLM when requesting information, the more likely it is to answer accurately and the easier it will be to determine whether the information is true.
Using AI Tools Wisely
When using LLMs, hallucination is not a problem in and of itself. The real problem occurs when people misunderstand the purpose of these tools and which tools to use for a particular task.
If you choose to invest in an AI-powered education tool for your team, make sure you have the proper mitigations in place instead of blindly trusting it to supply factual information — especially when lives are on the line.
Here at ProTrainings, we are constantly experimenting with the latest AI technology to improve the experience for our students and company admins. We’re committed to helping you understand and select the right tools for the job, whether that’s managing your team or learning how to save lives in an emergency. To stay up-to-date on the latest innovations in CPR and first aid training, follow us on LinkedIn.