RFP Answers Program

Automated RFP answering system built in 2016 using NLP and Latent Semantic Indexing to match new questions with historical answers, reducing response time from hours to minutes.

Project Overview

An intelligent RFP (Request for Proposal) answering system that leverages Natural Language Processing and machine learning to automate the tedious process of responding to repetitive RFP questions. Built while working at Oracle Marketing Cloud in 2016, this system analyzed thousands of previously answered questions to suggest relevant answers for new RFPs.

The Problem

The sales engineering team I was on was spending 10+ hours/week per person answering RFPs, often retyping similar answers to slightly different questions. A single RFP could contain 50-200 questions, with 60-80% being variations of previously answered questions. My goal was to figure out how to reduce the time spent answering questions while ensuring we were only re-using answers that were relevant.

The Solution

Using Latent Semantic Indexing (LSI) and TF-IDF vectorization, the system:

Indexed historical Q&A pairs from past RFPs
Processed new questions through NLP pipelines
Found the 3 most semantically similar historical questions
Suggested answers with confidence scores (0-1 similarity)

Key Features

Semantic Matching: Moved beyond keywords to identify key phrases, enhancing relevance significantly
Batch Processing: Processed entire RFPs (100+ questions) in seconds
Confidence Scoring: Provided similarity scores to indicate match quality
Web Interface: Flask-based UI for uploading RFPs and reviewing suggestions

Technology Deep Dive

NLP Pipeline

Text Normalization: URL removal, phrase detection, punctuation handling
Tokenization: Stopword removal, stemming (Snowball), lemmatization (WordNet)
Phrase Detection: Bigram/trigram extraction for compound terms
Vector Space Model: Dictionary creation and corpus representation

Machine Learning

TF-IDF: Weighted words by importance (rare terms ranked higher)
LSI: Created 30 latent topics to capture semantic relationships
Cosine Similarity: Measured document similarity in LSI space
Gensim: Core similarity modeling and indexing

Architecture

Backend: Python 2.7, Flask, SQLAlchemy
NLP: NLTK, Gensim
Frontend: Bootstrap 3, jQuery, React (partial)
Database: SQLite

Training Data

Trained on 1,000+ Q&A pairs from Oracle Eloqua Marketing Cloud RFPs, covering:

Email marketing and automation
CRM integration
Campaign management
Analytics and reporting
API capabilities
Security and compliance

Performance

Accuracy: a top 3 result could be used about ~60% of the time
Speed: 100 questions processed in ~10 seconds
Time Savings: Reduced RFP response time from 8-12 hours to 2-3 hours

Example Results

Question: "Can we perform A/B Testing?"
Suggestion 1: "Yes. Oracle Eloqua supports multivariate testing..." (0.876)
Suggestion 2: "Yes. Our platform includes split testing..." (0.743)
Suggestion 3: "A/B testing is available through..." (0.621)

Impact

This system enabled the Oracle Marketing Cloud sales engineering team to:

Respond to RFPs 4x faster
Maintain consistency across responses
Capture institutional knowledge
Onboard new team members more quickly

Lessons Learned

Domain-specific training data is crucial for accuracy
Hybrid approaches (LSI + supervised learning) could further improve results
User feedback loops are essential for continuous improvement
Short questions needed special handling for good matches

On this page