RFP Answers Program

Automated RFP answering system built in 2016 using NLP and Latent Semantic Indexing to match new questions with historical answers, reducing response time from hours to minutes.

Project Overview

An intelligent RFP (Request for Proposal) answering system that leverages Natural Language Processing and machine learning to automate the tedious process of responding to repetitive RFP questions. Built while working at Oracle Marketing Cloud in 2016, this system analyzed thousands of previously answered questions to suggest relevant answers for new RFPs.

The Problem

The sales engineering team I was on was spending 10+ hours/week per person answering RFPs, often retyping similar answers to slightly different questions. A single RFP could contain 50-200 questions, with 60-80% being variations of previously answered questions. My goal was to figure out how to reduce the time spent answering questions while ensuring we were only re-using answers that were relevant.

The Solution

Using Latent Semantic Indexing (LSI) and TF-IDF vectorization, the system:

  • Indexed historical Q&A pairs from past RFPs
  • Processed new questions through NLP pipelines
  • Found the 3 most semantically similar historical questions
  • Suggested answers with confidence scores (0-1 similarity)

Key Features

  • Semantic Matching: Moved beyond keywords to identify key phrases, enhancing relevance significantly
  • Batch Processing: Processed entire RFPs (100+ questions) in seconds
  • Confidence Scoring: Provided similarity scores to indicate match quality
  • Web Interface: Flask-based UI for uploading RFPs and reviewing suggestions

Technology Deep Dive

NLP Pipeline

  1. Text Normalization: URL removal, phrase detection, punctuation handling
  2. Tokenization: Stopword removal, stemming (Snowball), lemmatization (WordNet)
  3. Phrase Detection: Bigram/trigram extraction for compound terms
  4. Vector Space Model: Dictionary creation and corpus representation

Machine Learning

  • TF-IDF: Weighted words by importance (rare terms ranked higher)
  • LSI: Created 30 latent topics to capture semantic relationships
  • Cosine Similarity: Measured document similarity in LSI space
  • Gensim: Core similarity modeling and indexing

Architecture

  • Backend: Python 2.7, Flask, SQLAlchemy
  • NLP: NLTK, Gensim
  • Frontend: Bootstrap 3, jQuery, React (partial)
  • Database: SQLite

Training Data

Trained on 1,000+ Q&A pairs from Oracle Eloqua Marketing Cloud RFPs, covering:

  • Email marketing and automation
  • CRM integration
  • Campaign management
  • Analytics and reporting
  • API capabilities
  • Security and compliance

Performance

  • Accuracy: a top 3 result could be used about ~60% of the time
  • Speed: 100 questions processed in ~10 seconds
  • Time Savings: Reduced RFP response time from 8-12 hours to 2-3 hours

Example Results

Question: "Can we perform A/B Testing?"
Suggestion 1: "Yes. Oracle Eloqua supports multivariate testing..." (0.876)
Suggestion 2: "Yes. Our platform includes split testing..." (0.743)
Suggestion 3: "A/B testing is available through..." (0.621)

Impact

This system enabled the Oracle Marketing Cloud sales engineering team to:

  • Respond to RFPs 4x faster
  • Maintain consistency across responses
  • Capture institutional knowledge
  • Onboard new team members more quickly

Lessons Learned

  • Domain-specific training data is crucial for accuracy
  • Hybrid approaches (LSI + supervised learning) could further improve results
  • User feedback loops are essential for continuous improvement
  • Short questions needed special handling for good matches