Technology Research

Artificial Intelligence

Machine Learning

Deep Learning

Artificial Neural Networks

Artificial Intelligence - AI

What is Deep Learning - DL?

Deep Learning - DL is a subset of Machine Learning - ML, which is essentially an Artificial Neural Network - ANN with three or more layers. These Artificial Neural Networks - ANNs attempt to simulate the behavior of the human brain—albeit far from matching its ability—allowing it to “learn” from large amounts of data. While an Artificial Neural Network with a single layer can still make approximate predictions, additional hidden layers can help to optimize and refine for accuracy.

Deep Learning drives many Artificial Intelligence - AI applications and services that improve automation, performing analytical and physical tasks without Human intervention. Deep Learning - DL technology lies behind everyday products and services (such as Digital Assistants, Medical Imaging, and Credit Card Fraud Detection) as well as emerging technologies (such as self-driving cars).

Deep Learning - DL vs. Machine Learning - ML

If Deep Learning - DL is a subset of Machine Learning - ML, how do they differ? Deep Learning - DL distinguishes itself from classical Machine Learning - ML by the type of data that it works with and the methods in which it learns.

Machine Learning - ML algorithms leverage structured, labeled data to make predictions—meaning that specific features are defined from the input data for the model and organized into tables. This doesn’t necessarily mean that it doesn’t use unstructured data; it just means that if it does, it generally goes through some pre-processing to organize it into a structured format.

Deep Learning - DL eliminates some of data pre-processing that is typically involved with Machine Learning - ML. These algorithms can ingest and process unstructured data, like text and images, and it automates feature extraction, removing some of the dependency on human experts. For example, let’s say that we had a set of photos of different pets, and we wanted to categorize by “cat”, “dog”, “hamster”, et cetera. Deep Learning - DL algorithms can determine which features (e.g. ears) are most important to distinguish each animal from another. In Machine Learning - ML, this hierarchy of features is established manually by a Human expert.

Then, through the processes of gradient descent and Backpropagation, the Deep Learning - DL algorithm adjusts and fits itself for accuracy, allowing it to make predictions about a new photo of an animal with increased precision.

Machine Learning - ML and Deep Learning - DL models are capable of different types of learning as well, which are usually categorized as Supervised Learning, Unsupervised Learning, and Reinforcement Learning. Supervised Learning utilizes labeled datasets to categorize or make predictions; this requires some kind of Human intervention to label input data correctly. In contrast, Unsupervised Learning doesn’t require labeled datasets, and instead, it detects patterns in the data, clustering them by any distinguishing characteristics. Reinforcement Learning is a process in which a model learns to become more accurate for performing an action in an environment based on feedback in order to maximize the reward.

How does Deep Learning - DL work?

Deep Learning - DL Neural Networks, or Artificial Neural Networks - ANNs, attempts to mimic the Human brain through a combination of data inputs, weights, and bias. These elements work together to accurately recognize, classify, and describe objects within the data.

Deep Neural Networks consist of multiple layers of interconnected nodes, each building upon the previous layer to refine and optimize the prediction or categorization. This progression of computations through the network is called forward propagation. The input and output layers of a Deep Neural Network are called visible layers. The input layer is where the Deep Learning model ingests the data for processing, and the output layer is where the final prediction or classification is made.

Another process called Backpropagation uses algorithms, like gradient descent, to calculate errors in predictions and then adjusts the weights and biases of the function by moving backwards through the layers in an effort to train the model. Together, Forward Propagation and Backpropagation allow a Neural Network to make predictions and correct for any errors accordingly. Over time, the algorithm becomes gradually more accurate.

The above describes the simplest type of Deep Neural Network in the simplest terms. However, Deep Learning - DL algorithms are incredibly complex, and there are different types of Artificial Neural Networks - ANNs to address specific problems or datasets.

Deep Learning - DL Hardware Requirements:

Deep Learning - DL requires a tremendous amount of computing power. High performance Graphical Processing Units - GPUs are ideal because they can handle a large volume of calculations in multiple cores with copious memory available. However, managing multiple GPUs on-premises can create a large demand on internal resources and be incredibly costly to scale.

Large Language Models

Retrieval Augmented Generation

Neural Machine Translation

What is a Large Language Model - LLM?

A Large Language Model - LLM is a Deep Learning - DL algorithm that can perform a variety of Natural Language Processing - NLP tasks. Large Language Models - LLMs use Transformer Models and are trained using massive datasets — hence, large. This enables them to recognize, translate, predict, or generate text or other content.

Large Language Models - LLMs are also referred to as Neural Networks - NNs, which are computing systems inspired by the Human brain. These Neural Networks work using a Network of Nodes that are layered, much like Neurons.

In addition to teaching Human languages to Artificial Intelligence - AI applications, Large Language Models - LLMs can also be trained to perform a variety of tasks like understanding Protein Structures, writing Software Code, and more. Like the Human brain, Large Language Models - LLMs must be Pre-Trained and then Fine-Tuned so that they can solve Text Classification, Question and Answering, Document Summarization, and Text Generation Problems. Their problem-solving capabilities can be applied to fields like Healthcare, Life Sciences, Telecommunications, Satellite Communications, Systems Engineering, Network Engineering, Data Science, Finance, Space Exploration, etc., where Large Language Models - LLMs serve a variety of NLP applications, such as translation, Chatbots, artificial Intelligence - AI assistants, etc.

Large Language Models - LLMs also have large numbers of parameters, which are akin to memories the model collects as it learns from training. Think of these parameters as the model’s knowledge bank.

What is a Transformer Model?

A Transformer Model is the most common architecture of a Large Language Model. It consists of an encoder and a decoder. A Transformer Model processes data by tokenizing the input, then simultaneously conducting mathematical equations to discover relationships between tokens. This enables the computer to see the patterns a Human would see were it given the same query.

Transformer Models work with self-attention mechanisms, which enables the model to learn more quickly than traditional models like long short-term memory models. Self-attention is what enables the Transformer Model to consider different parts of the sequence, or the entire context of a sentence, to generate predictions.

Key Components of Large Language Models - LLMs

Large Language Models are composed of multiple Neural Network layers. Recurrent layers, Feedforward layers, Embedding layers, and Attention layers work in tandem to process the input text and generate output content.

The embedding layer creates embeddings from the input text. This part of the Large Language Model - LLM captures the Semantic and Syntactic meaning of the input, so the model can understand context.

The Feedforward layer - FFN of a Large Language Model - LLM is made of up multiple fully connected layers that transform the input embeddings. In so doing, these layers enable the model to glean higher-level abstractions — that is, to understand the user's intent with the text input.

What is the difference between Large Language Models - LLMs and Generative Artificial Intelligence - AI?

Generative Artificial Intelligence - AI is an umbrella term that refers to Artificial Intelligence - AI models that have the capability to generate content. Generative AI can generate text, code, images, video, and music. Examples of Generative AI include Midjourney, DALL-E, and ChatGPT.

Large Language Models - LLMs are a type of Generative AI that are trained on text and produce textual content. ChatGPT is a popular example of Generative Text AI.

All Large Language Models - LLMs are Generative AI

The Recurrent layer interprets the words in the input text in sequence. It captures the relationship between words in a sentence.

The attention mechanism enables a language model to focus on single parts of the input text that is relevant to the task at hand. This layer allows the model to generate the most accurate outputs.

Apply transformers to your search applications

There are three main kinds of Large Language Models - LLMs:

- Generic or raw language models predict the next word based on the language in the training data. These language models perform information retrieval tasks.
- Instruction-tuned language models are trained to predict responses to the instructions given in the input. This allows them to perform sentiment analysis, or to generate text or code.
- Dialog-tuned language models are trained to have a dialog by predicting the next response. Think of Chatbots or Conversational AI.

How do Large Language Models - LLMs work?

A Large Language Model - LLM is based on a Transformer Model and works by receiving an input, encoding it, and then decoding it to produce an output prediction. But before a Large Language Model - LLM can receive text input and generate an output prediction, it requires training, so that it can fulfill general functions, and fine-tuning, which enables it to perform specific tasks.

Training: Large Language Models - LLMs are Pre-Trained using large textual datasets from sites like Wikipedia, GitHub, or others. These datasets consist of trillions of words, and their quality will affect the language model's performance. At this stage, the large language model engages in unsupervised learning, meaning it processes the datasets fed to it without specific instructions. During this process, the LLM's AI algorithm can learn the meaning of words, and of the relationships between words. It also learns to distinguish words based on context. For example, it would learn to understand whether "right" means "correct," or the opposite of "left."

Fine-tuning: In order for a Large Language Model - LLM to perform a specific task, such as translation, it must be fine-tuned to that particular activity. Fine-tuning optimizes the performance of specific tasks.

Prompt-tuning fulfills a similar function to fine-tuning, whereby it trains a model to perform a specific task through few-shot prompting, or zero-shot prompting. A prompt is an instruction given to an LLM. Few-shot prompting teaches the model to predict outputs through the use of examples. For instance, in this sentiment analysis exercise, a few-shot prompt would look like this:

Customer review: This plant is so beautiful!
Customer sentiment: positive

Customer review: This plant is so hideous!
Customer sentiment: negative

The Language Model would understand, through the semantic meaning of "hideous," and because an opposite example was provided, that the customer sentiment in the second example is "negative."

Alternatively, Zero-Shot Prompting does not use examples to teach the Language Model how to respond to inputs. Instead, it formulates the question as "The sentiment in ‘This plant is so hideous' is…." It clearly indicates which task the Language Model should perform, but does not provide problem-solving examples.

Benefits of Large Language Models - LLMs

With a Broad Range of Applications, Large Language Models - LLMs are exceptionally beneficial for problem-solving since they provide information in a clear, conversational style that is easy for users to understand.

Large Set of Applications: They can be used for language translation, sentence completion, sentiment analysis, question answering, mathematical equations, and more.

Always Improving: Large Language Model - LLM performance is continually improving because it grows when more data and parameters are added. In other words, the more it learns, the better it gets. What’s more, Large Language Models - LLMs can exhibit what is called "in-context learning." Once an LLM has been Pre-Trained, Few-Shot Prompting enables the model to learn from the prompt without any additional parameters. In this way, it is continually learning.

They learn fast: When demonstrating in-context learning, Large Language Models - LLMs learn quickly because they do not require additional weight, resources, and parameters for training. It is fast in the sense that it doesn’t require too many examples.

Limitations and Challenges of Large Language Models - LLMs

Large Language Models - LLMs might give us the impression that they understand meaning and can respond to it accurately. However, they remain a technological tool and as such, Large Language Models - LLMs face a variety of challenges.

Hallucinations: A hallucination is when a LLM produces an output that is false, or that does not match the user's intent. For example, claiming that it is Human, that it has emotions, or that it is in love with the user. Because Large Language Models - LLMs predict the next syntactically correct word or phrase, they can't wholly interpret human meaning. The result can sometimes be what is referred to as a "hallucination."

Security: Large Language Models - LLMs present important Security Risks when not managed or surveilled properly. They can leak people's private information, participate in Phishing Scams, and produce Spam. Users with Malicious Intent can reprogram Artificial Intelligence - AI to their Ideologies or Biases, and contribute to the spread of Misinformation. The repercussions can be devastating on a global scale.

Bias: The data used to train Language Models will affect the outputs a given model produces. As such, if the Data represents a Single Demographic, or Lacks Diversity, the outputs produced by the Large Language Model - LLM will also Lack Diversity.

Consent: Large Language Models - LLMs are trained on trillions of datasets — some of which might not have been obtained consensually. When scraping data from the internet, Large Language Models - LLMs have been known to ignore Copyright Licenses, plagiarize written content, and repurpose proprietary content without getting permission from the original owners or artists. When it produces results, there is no way to track data lineage, and often no credit is given to the creators, which can expose users to copyright infringement issues.

They might also scrape personal data, like names of subjects or photographers from the descriptions of photos, which can compromise privacy.2 LLMs have already run into lawsuits, including a prominent one by Getty Images3, for violating intellectual property.

Scaling: It can be difficult and time- and resource-consuming to scale and maintain Large Language Models - LLMs.

Deployment: Deploying Large Language Models - LLMs requires Deep Learning - DL, a Transformer Model, Distributed Software and Hardware, and overall technical expertise.

What is Retrieval Augmented Generation - RAG?

Retrieval Augmented Generation - RAG is an Artificial Intelligence - AI framework for retrieving facts from an external knowledge base to ground Large Language Models - LLMs on the most accurate, up-to-date information and to give users insight into LLMs' generative process.

General-Purpose Language Models can be fine-tuned to achieve several common tasks such as sentiment analysis and named entity recognition. These tasks generally don't require additional background knowledge.

For more complex and knowledge-intensive tasks, it's possible to build a Language Model-based System that accesses external knowledge sources to complete tasks. This enables more factual consistency, improves reliability of the generated responses, and helps to mitigate the problem of "Hallucination".

Meta Artificial Intelligence - AI researchers introduced a method called Retrieval Augmented Generation - RAG to address such knowledge-intensive tasks. Retrieval Augmented Generation - RAG combines an information retrieval component with a Text Generator Model. RAG can be fine-tuned and its internal knowledge can be modified in an efficient manner and without needing retraining of the entire model.

Retrieval Augmented Generation - RAG takes an input and retrieves a set of relevant/supporting documents given a source (e.g., Wikipedia). The documents are concatenated as context with the original input prompt and fed to the text generator which produces the final output. This makes RAG adaptive for situations where facts could evolve over time. This is very useful as Large Language Models - LLMs's parametric knowledge is static. Retrieval Augmented Generation - RAG allows language models to bypass retraining, enabling access to the latest information for generating reliable outputs via retrieval-based generation.

Lewis et al., (2021) proposed a general-purpose fine-tuning recipe for RAG. A pre-trained seq2seq model is used as the parametric memory and a dense vector index of Wikipedia is used as non-parametric memory (accessed using a neural pre-trained retriever).

Retrieval Augmented Generation - RAG performs strong on several benchmarks such as Natural Questions, Web Questions, and CuratedTrec. Retrieval Augmented Generation - RAG generates responses that are more factual, specific, and diverse when tested on MS-MARCO and Jeopardy questions. Retrieval Augmented Generation - RAG also improves results on FEVER fact verification. This shows the potential of RAG as a viable option for enhancing outputs of language models in knowledge-intensive tasks.

More recently, these retriever-based approaches have become more popular and are combined with popular Large Language Models - LLMs like ChatGPT to improve capabilities and factual consistency.

Large Language Models - LLMs can be inconsistent. Sometimes they nail the answer to questions, other times they regurgitate random facts from their training data. If they occasionally sound like they have no idea what they’re saying, it’s because they don’t. Large Language Models - LLMs know how words relate statistically, but not what they mean.

Retrieval Augmented Generation - RAG is an Artificial Intelligence - AI framework for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the Large Language Models = LLM’s internal representation of information. Implementing Retrieval Augmented Generation - RAG in an Large Lanugage Model - LLM-based question answering system has two main benefits:

1. It ensures that the model has access to the most current, reliable facts,
2. and that users have access to the model’s sources, ensuring that its claims can be checked for accuracy and ultimately trusted.

“You want to cross-reference a model’s answers with the original content so you can see what it is basing its answer on,” said Luis Lastras, Director of Language Technologies at IBM Research.

Retrieval Augmented Generation - RAG has additional benefits. By grounding an LLM on a set of external, verifiable facts, the model has fewer opportunities to pull information baked into its parameters. This reduces the chances that a Large Language Model - LLM will leak sensitive data, or ‘Hallucinate’ incorrect or misleading information.

Retrieval Augmented Generation - RAG also reduces the need for users to continuously train the model on new data and update its parameters as circumstances evolve. In this way, RAG can lower the computational and financial costs of running Large Language Model - LLM-powered Chatbots in an enterprise setting. IBM unveiled its new Artificial Intelligence - AI and data platform, watsonx, which offers Retrieval Augmented Generation - RAG, back in May (2023). An ‘open book’ approach to answering tough questions.

Underpinning all foundation models, including LLMs, is an Artificial Intelligence - AI architecture known as the transformer. It turns heaps of raw data into a compressed representation of its basic structure. Starting from this raw representation, a foundation model can be adapted to a variety of tasks with some additional fine-tuning on labeled, domain-specific knowledge.

But fine-tuning alone rarely gives the model the full breadth of knowledge it needs to answer highly specific questions in an ever-changing context. In a 2020 paper, Meta (then known as Facebook) came up with a framework called Retrieval Augmented Generation - RAG to give Large Language Models - LLMs access to information beyond their training data. RAG allows LLMs to build on a specialized body of knowledge to answer questions in more accurate way.

“It’s the difference between an open-book and a closed-book exam,” Lastras said. “In a Retrieval Augmented Generation - RAG system, you are asking the model to respond to a question by browsing through the content in a book, as opposed to trying to remember facts from memory.”

As the name suggests, RAG has two phases: Retrieval and Content Generation. In the retrieval phase, algorithms search for and retrieve snippets of information relevant to the user’s prompt or question. In an open-domain, consumer setting, those facts can come from indexed documents on the internet; in a closed-domain, enterprise setting, a narrower set of sources are typically used for added security and reliability.

This assortment of external knowledge is appended to the user’s prompt and passed to the language model. In the generative phase, the Large Language Model - LLM draws from the augmented prompt and its internal representation of its training data to synthesize an engaging answer tailored to the user in that instant. The answer can then be passed to a Chatbot with links to its sources.

Toward Personalized and Verifiable Responses

Before Large Language Models - LLMs, digital conversation agents followed a manual dialogue flow. They confirmed the customer’s intent, fetched the requested information, and delivered an answer in a one-size-fits all script. For straightforward queries, this manual decision-tree method worked just fine.

But it had limitations. Anticipating and scripting answers to every question a customer might conceivably ask took time; if you missed a scenario, the Chatbot had no ability to improvise. Updating the scripts as policies and circumstances evolved was either impractical or impossible.

Today, Large Language Model - LLM-powered Chatbots can give customers more personalized answers without humans having to write out new scripts. And RAG allows Large Language Models - LLMs to go one step further by greatly reducing the need to feed and retrain the model on fresh examples. Simply upload the latest documents or policies, and the model retrieves the information in open-book mode to answer the question.

IBM is currently using Retrieval Augmented Generation - RAG to ground its internal customer-care Chatbots on content that can be verified and trusted. This Real-World scenario shows how it works: An employee, Alice, has learned that her son’s school will have early dismissal on Wednesdays for the rest of the year. She wants to know if she can take vacation in half-day increments and if she has enough vacation to finish the year.

To craft its response, the Large Language Model - LLM first pulls data from Alice’s Human Resource - HR files to find out how much vacation she gets as a longtime employee, and how many days she has left for the year. It also searches the company’s policies to verify that her vacation can be taken in half-days. These facts are injected into Alice’s initial query and passed to the Large Language Model - LLM, which generates a concise, personalized answer. A Chatbot delivers the response, with links to its sources.

Teaching the Model to Recognize when it doesn’t know

Customer queries aren’t always this straightforward. They can be ambiguously worded, complex, or require knowledge the model either doesn’t have or can’t easily parse. These are the conditions in which Large Language Models - LLMs are prone to making things up.

“Think of the model as an overeager junior employee that blurts out an answer before checking the facts,” said Lastras. “Experience teaches us to stop and say when we don’t know something. But Large Language Models - LLMs need to be explicitly trained to recognize questions they can’t answer.”

In a more challenging scenario taken from real life, Alice wants to know how many days of maternity leave she gets. A Chatbot that does not use RAG responds cheerfully (and incorrectly): “Take as long as you want.”

Maternity-leave policies are complex, in part, because they vary by the state or country of the employee’s home-office. When the Large Language Model - LLM failed to find a precise answer, it should have responded, “I’m sorry, I don’t know,” said Lastras, or asked additional questions until it could land on a question it could definitively answer. Instead, it pulled a phrase from a training set stocked with empathetic, customer-pleasing language.

With enough fine-tuning, a Large Language Model - LLM can be trained to pause and say when it’s stuck. But it may need to see thousands of examples of questions that can and can’t be answered. Only then can the model learn to identify an unanswerable question, and probe for more detail until it hits on a question that it has the information to answer.

Retrieval Augmented Generation - RAG is currently the best-known tool for grounding LLMs on the latest, verifiable information, and lowering the costs of having to constantly retrain and update them. Retrieval Augmented Generation - RAG depends on the ability to enrich prompts with relevant information contained in vectors, which are mathematical representations of data. Vector databases can efficiently index, store and retrieve information for things like recommendation engines and Chatbots. But RAG is imperfect, and many interesting challenges remain in getting Retrieval Augmented Generation - RAG done right.

At HubBucket Inc ("HubBucket"), we are focused on innovating at both ends of the process: retrieval, how to find and fetch the most relevant information possible to feed the Large Language Model - LLM; and generation, how to best structure that information to get the richest responses from the Large Language Model- LLM.

Foundation Models are usually trained offline, making the model agnostic to any data that is created after the model was trained. Additionally, Foundation Models are trained on very general domain corpora, making them less effective for domain-specific tasks. You can use Retrieval Augmented Generation - RAG to retrieve data from outside a foundation model and augment your prompts by adding the relevant retrieved data in context.

With Retrieval Augmented Generation - RAG, the external data used to augment your prompts can come from multiple data sources, such as a document repositories, databases, or Application Programming Interfaces - APIs. The first step is to convert your documents and any user queries into a compatible format to perform relevancy search. To make the formats compatible, a document collection, or knowledge library, and user-submitted queries are converted to numerical representations using embedding language models. Embedding is the process by which text is given numerical representation in a vector space. Retrieval Augmented Generation - RAG model architectures compare the embeddings of user queries within the vector of the knowledge library. The original user prompt is then appended with relevant context from similar documents within the knowledge library. This augmented prompt is then sent to the foundation model. You can update knowledge libraries and their relevant embeddings asynchronously.