...

The generator is specifically designed for larger enterprises with established compliance departments. Creating terms can be done by compliance officers, legal advisors or business subject-matter-experts (SME).

Reach:

The tool will also cater to a broader audience by providing easily understandable definitions for non-compliance professionals. This accessibility broadens its utility across different organizational departments, making it a valuable tool for anyone needing clarity on compliance terms.

Confidence:

The need was validated internally with all mappers, and with AT&T.

...

Current sources in our dictionary are often non-existing or of lower quality (wikipedia.com. techsmith.com,…).

Technology

Philosophy:

AI is advancing rapidly, and the progress in tools is unprecedented. We want to maximize existing tools before diving into research to create new tools.

Our approach leverages existing advancements in AI, particularly Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG), to process extensive text volumes and generate contextually accurate information.

RAG (Retrieval-Augmented Generation)Image Removed

...

Loading: Adding data from where it lives, whether it is PDF, text files, a website, a database or an API into our pipeline
Indexing & storing: Creating a data structure that allows for querying the data. In our proof-of-concept we created vector embeddings to accurately find contextually relevant strings. The index is then stored along with other metadata.
Query/answer: the user can utilize LLMs to submit a query. The LLM then creates a response based on the input from the index.
Evaluation: objective measurement on how accurate, faithful and fast the responses are

Proof-of-concept

The concept has been validated through a proof-of-concept, receiving positive feedback from the entire mapping team, despite limited sources and training.

Technical architecture for the POC

...

Live link:

The POC can be accessed here.

Cost:

The cost of the POC was 2,000 USD for development, plus some time from Prod/Design.

Development Roadmap

Next steps would entail:

...

Over a period 0f 4-6 months: Going from a proof-of-concept to a production-ready platform:

Proposed way forward:

Refinement of Retrieval Mechanism: Enhance the AI's ability to accurately source and retrieve relevant data.
- Re-visit Architecture
- Evaluate vectorstores (Vectorstore/Knowledge Graph)
- Set up Semantic Retrieval
Enrichment of Sources: Expand and diversify the data sources to improve the tool's comprehensiveness.
Fine-Tuning for Quality: Optimize the AI algorithms to ensure high-quality, accurate compliance term generation.

...

- Set up evaluation RAG Triad with set of evaluation questions
- Iterate with:
  - Re-rank,
  - Sentence-window and
  - Auto-merging Retrieval
- Self-evaluation by LLM

Milestone after 3 months: Internal Beta Roll-out:

POC to Production in AWS: Transition the Proof of Concept (POC) into a full-scale production environment on AWS.
Integration with Mapper Team: Provide the tool to the mapper team for human validation, ensuring accuracy and reliability.

...

Milestone after 4-5 months: External Beta Roll-out:

Following successful internal use and quality benchmarks, release the tool externally as a feature of the Dictionary solution.

...

Niche Expertise Development: Continuously enhancing our AI models with the latest compliance and regulatory knowledge will keep our tool at the forefront of this niche.
Strategic Partnerships: Collaborating with regulatory bodies and compliance experts can improve our tool's capabilities and credibility.
Focused Marketing and Branding: Emphasizing our specialization in compliance term creation in marketing efforts will help distinguish our tool from broad genAI solutions.

GTM (Go-To-Market)

In-app promotion

The Term Generator will be embedded inside Unified Compliance’s platform, and available to all customers. A free test/trial would be available to customers.

Product page

The Term Generator can be embedded on the Dictionary product page and offer new visitors a limited number of free terms to create.

SEO Lead generation

For many compliance terms, there are very limited or no valid search results on Google. We can create a limited amount pages which would feature a compliance term. Potentially this can create inbound traffic with limited effort.

Financials
The AI-based Term Creation tool is projected to be cost-neutral, delivering significant internal cost savings and serving as a valuable solution for customers.

Cost Savings

...

(validated)

UCF Mapping Team Efficiency: The team has created 1,351 new terms year-to-date, spending 20-30 minutes per term, totaling approximately 563 hours.
Hourly Cost Savings: With an internal cost of $60/hour, the tool offers an annual cost saving of $33,775.

Revenue Projections

...

Pricing Strategy: The tool will be priced at $5/user/month, with additional charges for token credits in case of over-use.
Market Penetration Assumptions:
Existing Customer Base Penetration: Estimated at 50%, with an average of 2 users per account.
Annual Recurring Revenue (ARR): Potential ARR is projected to be $240,000
...
Development and Operational Costs:
NLU Developer: $15,000-$25,000 for enhancing Natural Language Understanding capabilities.
AWS Migration and UI Refinement: $7,500 for development costs associated with AWS migration and user interface improvements.
Quality Assurance: $2,500 for QA processes.
...

Strategically, this smallish genAI solution is a first step, a beachhead in our potential approach to meeting the growing demand for genAI solutions in the compliance industry.
Preliminary RICE evaluation
Reach & Impact
Small w/medium impact to large w/ small impact
Confidence:
Validated customer and internal need
Effort:
Small to Medium

Appendix:

Proof-of-concept

...

The POC could digest as source information for its compliance knowledge both PDF documents, and websites.

Detail on Development Roadmap

Refinement of Retrieval Mechanism

The main evaluation parameters for RAG:

...

Precision

Performance of RAG hinges very much on its capability to make sure the context is precisely related to the Query.

Hallucinations

RAG is an effective way to combat ‘hallucinations’ of an LLM. However, even with RAG, problems can occur.

When the LLM does not find relevant information in its added knowledge contexts, it tries to produce an answer, and goes to its pre-existing knowledge from its pre-training phase.

So when context relevance is low, LLMs tend to ‘fill in’ gaps with their ‘general’ knowledge from the training phase. This answer will have low groundnedness even though the answer might seem like a good answer and be contextually relevant.

Precision: Some underlying problems:

Chunk size

...

Fragmentation

After vector retrieval, we are feeding a bunch of fragmented chunks of information into the LLM context window, and the fragmentation is worse the smaller our chunk size.

...

Evaluation

Additional evaluation parameters:

...

Solution approaches

Precision

Reranking

After first semantic search obtained with Dense Retrieval, we add an additional sorting using traditional semantic search for the first set of results.

Sentence-window retrieval

Retrieve not only the sentence found in embedding lookup but also the sentence before and after.

...

LLM’s work better with larger chunks but vector-based retrieval delivers smaller chunks.

...

After embedding and before sending the chunks to the LLM, we re-rank the chunks.

Auto-merging retrieval

We create a hierarchy of larger parent nodes with smaller children nodes.

For results in the embedding lookup, the child node will be merged into the parent node if a threshold is exceeded.

...

“Self-evaluation” by LLM

...

Examples of LLM evaluations

...

Result exploration

Evaluation results can be visualized, below is an example of how this can be done with e.g. Trulens.

...

Evaluation allows to drill down into individual results

...

And show the feedback from the LLM on the results. This allows to get insights into how changing parameters influence results.

...

Potential metrics

Accuracy Rate:
- Definition Correctness: Percentage of terms where the generated definition accurately reflects the intended meaning.
- Reference Relevance: Proportion of contextually relevant references to the generated terms.
Error Rate:
- Misinterpretation Frequency: Track the frequency of incorrect interpretations or irrelevant definitions generated.
- Inconsistency Detection: Measure instances where the tool provides varying quality across similar requests.
Response Time:
- Generation Speed: Monitor the average time to generate a term and its definition, ensuring it meets efficiency standards.
Usage Metrics:
- Adoption Rate: Track the number of active users and frequency of use, indicating the tool's perceived value.
- Repeat Usage: Measure how often users return to the tool, indicating reliance and satisfaction.
Benchmarking:
- Comparison with Manual Processes: Compare the quality of terms generated by the tool against those created manually.
- Competitor Comparison: Regularly compare the tool's output quality against similar offerings in the market.

User feedback:

Image Added

Versions Compared

Old Version 4

New Version Current

Key

Over a period 0f 4-6 months: Going from a proof-of-concept to a production-ready platform:

Milestone after 3 months: Internal Beta Roll-out:

Milestone after 4-5 months: External Beta Roll-out:

Cost Savings

(validated)

Revenue Projections

Appendix:

Proof-of-concept

Detail on Development Roadmap

Precision

Hallucinations

Precision: Some underlying problems:

Fragmentation

Evaluation

Solution approaches

Precision

Reranking

Sentence-window retrieval

Auto-merging retrieval

“Self-evaluation” by LLM

Examples of LLM evaluations

Result exploration

Potential metrics

User feedback:

Page Comparison

Versions Compared

Old Version 4

New Version Current

Key

Over a period 0f 4-6 months: Going from a proof-of-concept to a production-ready platform:

Milestone after 3 months: Internal Beta Roll-out:

Milestone after 4-5 months: External Beta Roll-out:

Cost Savings

(validated)

Revenue Projections

Appendix:

Proof-of-concept

Detail on Development Roadmap

Precision

Hallucinations

Precision: Some underlying problems:

Fragmentation

Evaluation

Solution approaches

Precision

Reranking

Sentence-window retrieval

Auto-merging retrieval

“Self-evaluation” by LLM

Examples of LLM evaluations

Result exploration

Potential metrics

User feedback: