This PRD delineates the development and deployment strategy for an AI-driven Compliance Term Creation Tool to transform organizations' process of generating compliance-related terms. This tool leverages cutting-edge AI technologies to address the inefficiencies inherent in manual term creation, offering a rapid and reliable solution focusing on compliance-specific applications.

Background:

Our existing dictionary solution facilitates the addition of terms to an organization's dictionary but needs more capabilities to aid in creating these terms. This gap has been identified as a critical area for enhancement.

Purpose

The objective is to develop an AI-powered tool that streamlines the creation of compliance terms, overcoming the limitations of current manual methodologies, which are time-intensive, inconsistent, and hindered by the volume of data sources available. This tool addresses these challenges by providing a fast, accurate, and reference-rich solution for generating compliance terms.

Scope

The tool will be robust and knowledgeable enough to cover a wide range of industries, with the ability to process diverse digital format sources.

The initial release will focus on enabling users to input a term and receive a detailed definition along with pertinent references.

Future versions can include integration as a plugin for document editors like MSFT Word and Google Docs, further enhancing its functionality to analyze documents and automatically identify and define terms not yet in the organization's compliance dictionary.

Target users

The generator is specifically designed for larger enterprises with established compliance departments. Creating terms can be done by compliance officers, legal advisors or business subject-matter-experts (SME).

Reach:

The tool will also cater to a broader audience by providing easily understandable definitions for non-compliance professionals. This accessibility broadens its utility across different organizational departments, making it a valuable tool for anyone needing clarity on compliance terms.

Confidence:

The need was validated internally with all mappers, and with AT&T.

Initial Concept and User Interaction

In its initial version, the tool will offer a user-friendly interface where users can input a compliance term and promptly receive a well-defined, accurate definition and references. The focus is on simplicity and efficiency to ensure ease of use.

Future enhancements will include seamless integration with document editing platforms for direct in-document support.

Benefits

Speed

Our (professional) mappers spend fifteen to thirty minutes on a single definition. Our solution spends seconds,

Accuracy

The tool is meant to be a support for humans. Current results with the proof-of-concept are encouragingly accurate. A production version will further benefit from benchmarks and human-in-the-loop reinforcement learning.

Quality of references

The tool consistently will only reference high-relevance sources (NIST, eCFR, PCI,…) and we control this quality of input.

Current sources in our dictionary are often non-existing or of lower quality (wikipedia.com. techsmith.com,…).

Technology

Philosophy:

AI is advancing rapidly, and the progress in tools is unprecedented. We want to maximize existing tools before diving into research to create new tools.

Our approach leverages existing advancements in AI, particularly Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG), to process extensive text volumes and generate contextually accurate information.

RAG (Retrieval-Augmented Generation)

Loading: Adding data from where it lives, whether it is PDF, text files, a website, a database or an API into our pipeline
Indexing & storing: Creating a data structure that allows for querying the data. In our proof-of-concept we created vector embeddings to accurately find contextually relevant strings. The index is then stored along with other metadata.
Query/answer: the user can utilize LLMs to submit a query. The LLM then creates a response based on the input from the index.
Evaluation: objective measurement on how accurate, faithful and fast the responses are

Proof-of-concept

The concept has been validated through a proof-of-concept, receiving positive feedback from the entire mapping team, despite limited sources and training.

Technical architecture for the POC

Live link:

The POC can be accessed here.

Cost:

The cost of the POC was 2,000 USD for development, plus some time from Prod/Design.

Development Roadmap

Next steps would entail:

Q1 Q2 2024 Going from a proof-of-concept to a production-ready platform:

Proposed way forward:

Refinement of Retrieval Mechanism: Enhance the AI's ability to accurately source and retrieve relevant data.
- Re-visit Architecture
- Evaluate vectorstores (Vectorstore/Knowledge Graph)
- Set up Semantic Retrieval
Enrichment of Sources: Expand and diversify the data sources to improve the tool's comprehensiveness.
Fine-Tuning for Quality: Optimize the AI algorithms to ensure high-quality, accurate compliance term generation.
- Set up evaluation RAG Triad with set of evaluation questions
- Iterate with:
  - Re-rank,
  - Sentence-window and
  - Auto-merging Retrieval
- Self-evaluation by LLM

Q1 2024 Internal Beta Roll-out:

POC to Production in AWS: Transition the Proof of Concept (POC) into a full-scale production environment on AWS.
Integration with Mapper Team: Provide the tool to the mapper team for human validation, ensuring accuracy and reliability.

Q2/Q3 2024 External Beta Roll-out:

Following successful internal use and quality benchmarks, release the tool externally as a feature of the Dictionary solution.

Competition

The market is rapidly evolving with general genAI tools like Bard and ChatGPT. We focus on developing a niche, compliance-specific solution to stay ahead in this competitive space.

We see niche solutions sprouting up for academic research, legal, etc. In compliance, the amount of new startups is less, but bound to catch up.

Unique Selling Proposition (USP)

Versus the current way of doing things, the manual way, the tool's USP lies in its:

speed,
accuracy, and
the comprehensive nature of the references it provides.

Leveraging advanced genAI and NLP techniques, it promises a significant improvement over traditional methods, offering quick and reliable compliance term definitions. This tool is a time-saver and a step towards more consistent and universally understandable compliance practices.

Versus more general-oriented tools, our solution’s USP lie in:

Niche Expertise Development: Continuously enhancing our AI models with the latest compliance and regulatory knowledge will keep our tool at the forefront of this niche.
Strategic Partnerships: Collaborating with regulatory bodies and compliance experts can improve our tool's capabilities and credibility.
Focused Marketing and Branding: Emphasizing our specialization in compliance term creation in marketing efforts will help distinguish our tool from broad genAI solutions.

GTM (Go-To-Market)

In-app promotion

The Term Generator will be embedded inside Unified Compliance’s platform, and available to all customers. A free test/trial would be available to customers.

Product page

The Term Generator can be embedded on the Dictionary product page and offer new visitors a limited number of free terms to create.

SEO Lead generation

For many compliance terms, there are very limited or no valid search results on Google. We can create a limited amount pages which would feature a compliance term. Potentially this can create inbound traffic with limited effort.

Financials
The AI-based Term Creation tool is projected to be cost-neutral, delivering significant internal cost savings and serving as a valuable solution for customers.

Cost Savings (validated)

UCF Mapping Team Efficiency: The team has created 1,351 new terms year-to-date, spending 20-30 minutes per term, totaling approximately 563 hours.
Hourly Cost Savings: With an internal cost of $60/hour, the tool offers an annual cost saving of $33,775.

Revenue Projections

Pricing Strategy: The tool will be priced at $5/user/month, with additional charges for token credits in case of over-use.
Market Penetration Assumptions:
- Existing Customer Base Penetration: Estimated at 50%, with an average of 2 users per account.
- Annual Recurring Revenue (ARR): Potential ARR is projected to be $240,000

Cost Projections:

Development and Operational Costs:
- NLU Developer: $15,000-$25,000 for enhancing Natural Language Understanding capabilities.
- AWS Migration and UI Refinement: $7,500 for development costs associated with AWS migration and user interface improvements.
- Quality Assurance: $2,500 for QA processes.

This financial overview underscores the tool's potential for cost-effectiveness and revenue generation, aligning with our strategic goals of efficiency and market competitiveness.

Conclusion

This PRD outlines the development of an AI-based Term Creation tool tailored for Compliance Analysts, addressing the inefficiencies of manual compliance term creation.

Our tool differentiates itself by focusing on compliance-specific term generation, leveraging advanced genAI and NLP technologies. Targeted at larger enterprises, it promises enhanced accuracy, efficiency, and industry-wide applicability.

Moving forward, our focus can be on continuous technological refinement and market responsiveness to maintain a competitive edge against broader genAI solutions, ensuring our tool remains a specialized, valuable asset in compliance.

Strategically, this smallish genAI solution is a first step, a beachhead in our potential approach to meeting the growing demand for genAI solutions in the compliance industry.

Appendix:

Proof-of-concept

A screenshot of results of the live POC made in October 23.

The POC provided the following:

a short definition
a slightly more complete definition
three questions which the user can ask that help explain the term
references upon which the answer is based

The POC could digest as source information for its compliance knowledge both PDF documents, and websites.

Detail on Development Roadmap

Refinement of Retrieval Mechanism

The main evaluation parameters for RAG:

Screen Shot 2023-12-19 at 10.11.19 AM.png

Precision

Performance of RAG hinges very much on its capability to make sure the context is precisely related to the Query.

Hallucinations

RAG is an effective way to combat ‘hallucinations’ of an LLM. However, even with RAG, problems can occur.

When the LLM does not find relevant information in its added knowledge contexts, it tries to produce an answer, and goes to its pre-existing knowledge from its pre-training phase.

So when context relevance is low, LLMs tend to ‘fill in’ gaps with their ‘general’ knowledge from the training phase. This answer will have low groundnedness even though the answer might seem like a good answer and be contextually relevant.

Precision: Some underlying problems:

Chunk size

Screen Shot 2023-12-19 at 9.24.19 AM.png

Fragmentation

After vector retrieval, we are feeding a bunch of fragmented chunks of information into the LLM context window, and the fragmentation is worse the smaller our chunk size.

Screen Shot 2023-12-19 at 10.01.47 AM.png

Evaluation

Additional evaluation parameters:

Screen Shot 2023-12-19 at 10.33.29 AM.png

Solution approaches

Precision

Reranking

After first semantic search obtained with Dense Retrieval, we add an additional sorting using traditional semantic search for the first set of results.

Sentence-window retrieval

Retrieve not only the sentence found in embedding lookup but also the sentence before and after.

Screen Shot 2023-12-19 at 9.21.51 AM.png

LLM’s work better with larger chunks but vector-based retrieval delivers smaller chunks.

Screen Shot 2023-12-19 at 9.30.15 AM.png

After embedding and before sending the chunks to the LLM, we re-rank the chunks.

Auto-merging retrieval

We create a hierarchy of larger parent nodes with smaller children nodes.

For results in the embedding lookup, the child node will be merged into the parent node if a threshold is exceeded.

Screen Shot 2023-12-19 at 10.03.51 AM.png

“Self-evaluation” by LLM

Screen Shot 2023-12-19 at 10.13.41 AM.png

Examples of LLM evaluations

Screen Shot 2023-12-19 at 10.14.53 AM.png

Screen Shot 2023-12-19 at 10.15.31 AM.png

Result exploration

Evaluation results can be visualized, below is an example of how this can be done with e.g. Trulens.

Screen Shot 2023-12-19 at 10.20.33 AM.png

Evaluation allows to drill down into individual results

Screen Shot 2023-12-19 at 10.27.10 AM.png

And show the feedback from the LLM on the results. This allows to get insights into how changing parameters influence results.

Screen Shot 2023-12-19 at 10.29.07 AM.png

Potential metrics

Accuracy Rate:
- Definition Correctness: Percentage of terms where the generated definition accurately reflects the intended meaning.
- Reference Relevance: Proportion of contextually relevant references to the generated terms.
Error Rate:
- Misinterpretation Frequency: Track the frequency of incorrect interpretations or irrelevant definitions generated.
- Inconsistency Detection: Measure instances where the tool provides varying quality across similar requests.
Response Time:
- Generation Speed: Monitor the average time to generate a term and its definition, ensuring it meets efficiency standards.
Usage Metrics:
- Adoption Rate: Track the number of active users and frequency of use, indicating the tool's perceived value.
- Repeat Usage: Measure how often users return to the tool, indicating reliance and satisfaction.
Benchmarking:
- Comparison with Manual Processes: Compare the quality of terms generated by the tool against those created manually.
- Competitor Comparison: Regularly compare the tool's output quality against similar offerings in the market.

PRD | AI-powered Compliance Term Generator

Q1 Q2 2024 Going from a proof-of-concept to a production-ready platform:

Q1 2024 Internal Beta Roll-out:

Q2/Q3 2024 External Beta Roll-out:

Cost Savings (validated)

Revenue Projections

Cost Projections:

Appendix:

Proof-of-concept

Detail on Development Roadmap

Precision

Hallucinations

Precision: Some underlying problems:

Fragmentation

Evaluation

Solution approaches

Precision

Reranking

Sentence-window retrieval

Auto-merging retrieval

“Self-evaluation” by LLM

Examples of LLM evaluations

Result exploration

Potential metrics