MCP Search Best Practices

Markdown guidance for LLMs using GovTribe MCP search tools.

The following guide is given to LLMs when they call the Search Instructions tool on the GovTribe MCP. It serves as useful context to understand how searching works in GovTribe.

Search Guidance

General

When building GovTribe search payloads, every field marked as required must always be included in the request — even when the user wants an “empty” or unfiltered search. If a required field has no user-provided value, you (the assistant) must supply a valid empty value for that field. Use the following defaults:

  • String fields: ""

  • Arrays: []

  • Date ranges: { "from": null, "to": null }

  • Numeric ranges: { "min": null, "max": null }

  • Sort objects: { "key": "_score", "direction": "desc" }

  • Query: always a string, never null — use "" when no free-text search is needed.

You (the assistant) must never omit a required property and must never use null where the field expects a string or array.

Query

  • A string or empty string. Uses elasticsearch (not lucine) logic.

  • Keyword mode: use quotes for exact phrases, | for OR, - to exclude. Semantic mode: natural language with synonyms. Empty string for aggregation-only queries.

  • Only keyword mode supports operators (like "", |, and -). Semantic mode only supports queries without operators.

  • If you want to search by or exclude results by their GovTribe ids, use the dedicated ids filter.

Pagination & Sizing

  • Choose the smallest value that answers the question; paginate only when needed.

Sorting

  • Only set sort when:

  • a user or prompt explicitly asks for a specific ordering, or

  • the task is analytical (e.g., time series, leaderboards, stats).

  • Use similar_filter when the user wants “items like this other item.”

  • Always provide both:

  • govtribe_type (the referenced item’s type), and

  • govtribe_id (the referenced item’s ID).

  • Example: {"govtribe_type":"federal_contract_award","govtribe_id":"<ID>"}.

    Date Input Grammar

    • All ..._date_range fields accept:

    • Plain dates YYYY-MM-DD, or

    • Elasticsearch date math (e.g., now-7d, now-1h/d, now+12M/d).

    • If the user gives only one bound, infer the other sensibly (e.g., “last 90 days” → from: now-90d/d, to: now/d).

    • Using only one bound at a time is acceptable – only providing to will search for anything before that date.

    • When you include a range object, include both from and to.

    Agency & ID Fields

    • Contracting (Awarding) Agency ≈ “who signed it.”

    • Funding Agency ≈ “who owns the money.”

    • If the user provides names instead of IDs, first resolve with research_federal_agencies, then pass the resulting IDs to the target search.

    Location Filters

    • Fields like place_of_performance / vendor_location accept countries, states, counties, cities, postal codes. Match the user’s specificity.

    Aggregations (Leaderboards & Roll‑ups)

    • Use aggregations when the user wants counts, sums, “top N …”, or overall stats.

    • Keep aggregations values within the tool’s allowed enum; return a compact sample of raw items if it clarifies the aggregates.

    ID Lookups (Cross‑Tool Pattern)

    • When the API expects IDs (agencies, vendors, vehicles, categories) and the user gives names, first call the appropriate research_* resolver tool to get IDs, then perform the main query. We in Beta and creating more research tools, so if there is not a research tool available, inform the user "This tool has not been migrated to MCP yet" and ask for the ID directly.

    Search Mode

    This guidance explains how to set the two parameters used by GovTribe research tools: search_mode (mode selector) and query (the final, transformed query string to execute).

    Parameter contract

    • search_mode: choose "keyword" or "semantic" per the decision checklist below.

    • query: set this to the transformed query produced by the chosen mode. This is the exact string sent to the search tool.

    • Exception (structured / aggs-only): If the user request is strictly about aggregations or filtering on structured fields (e.g., NAICS/PSC/UEI/CAGE/contract IDs), do not send a free-text query. Omit query or set it to null and rely entirely on structured parameters.

    Aggregations (aggs) support

    • "Aggs" (rollups, distributions, leaderboards such as dollars_obligated or top_agencies) are reliably available only in keyword search mode.

    • semantic search mode focuses on semantic document retrieval and does not guarantee/optimize aggs. If the user requests totals, counts, “top N”, “breakdown by …”, “distribution of …”, or time-series rollups, choose keyword search mode.

    Search Mode keyword — “fuzzy keyword search with optional exact matching”

    What it does

    • Runs an Elasticsearch simple_query_string-style search with:

    • Default operator: AND across all tokens.

    • Fuzzy matching enabled (e.g., fuzziness, fuzzy_max_expansions).

    • Prioritizes direct term/phrase overlap. Good recall on misspellings and near‑matches; precision improves when you add quotes around phrases.

    What the AI may do to the user’s query

    • Keep the query verbatim, or:

    • Fix obvious spelling errors outside of quoted strings and outside identifier-looking tokens.

    • Add double quotes around words/phrases the user clearly intends as exact strings (names, IDs, titles, multi-word entities).

    • Build OR lists using the | operator (simple-query-string OR). Example: "enterprise cyber range" | "enterprise cyber training".

    • Exclude terms using -term.

    • Do NOT use other SQS operators or field scoping (+ () ~ * ^ field: etc.). Do not alter text inside quotes.

    Strengths

    • Supports aggregations.

    • Best for lookup and literal intent:

    • Exact identifiers, codes, file names, titles, part numbers, solicitation/notice IDs (e.g., W912HQ-24-R-0123), UEIs/CAGE, NAICS/PSC.

    • Proper nouns or quoted strings where the user expects near-exact hits.

    • Short queries with 1–3 salient tokens.

    • Tolerant of minor typos and near-spellings.

    Limitations

    • Weaker for conceptual or open-ended asks where meaning matters more than words.

    • Can miss semantic equivalents not present as index synonyms.

    When to choose search mode keyword

    • The user’s intent is “fetch this exact thing” (lookup, navigational).

    • The user requests aggregations (e.g., "top vendors", "count by agency/NAICS", "trend by month").

    • The query contains quoted text or looks like an ID/code (hyphenated/all-caps alphanumerics/digit patterns).

    • The user mentions an exact title/name (“RFP ‘Enterprise Cyber Range’”).

    • The query is very short and specific (“CMMC L3 RFI”).

    How to construct query (safe transformations)

    1. Normalize whitespace and fix obvious typos except inside quoted strings or identifier-like tokens.

    2. Wrap clear entities in double quotes: organization names, multi-word titles, IDs.

    3. If the user enumerates alternatives, build an OR list with | between (possibly quoted) terms/phrases.

    4. If the user wants to exclude a term, prefix it with -.

    5. Do not add operators beyond quotes, |, or -.

    6. Boolean operators like OR or AND are not supported.

    Examples (→ assign to query for search mode keyword)

    • W912HQ-24-R-0123 → "W912HQ-24-R-0123"

    • cisa endpoint detection rfi → "CISA" "endpoint detection" RFI

    • uei v1abcde345f6 → "V1ABCDE345F6"

    • "enterprise cyber range" training → "enterprise cyber range" training

    search mode semantic — “Dense vector semantic search (meaning over words)”

    What it does

    • Embeds the user query into a dense vector and retrieves semantically similar content, independent of exact lexical overlap.

    What the AI may do to the user’s query

    • Send verbatim, or apply light reformulation:

    1. Synonym/paraphrase expansion (2–6 items): e.g., RFP ⇢ request for proposal; solicitation; notice · set-aside ⇢ small business; 8(a); WOSB; SDVOSB; HUBZone · IT services ⇢ information technology; software development.

    2. Query relaxation (only if likely sparse): drop/soften narrow constraints (numbers, exact dates, long conjunctive tails) while keeping core intent.

    • Keep reformulation in plain natural language.

    • No boolean operators: OR, AND, etc. are not supported.

    • Cap length: keep the final string concise (≈ 20–25 words).

    Strengths

    • Best for conceptual, exploratory, or intent-heavy queries:

    • “What’s similar to …”, “alternatives”, “how/why/best ways”.

    • Broad topical searches where terminology varies (synonyms, abbreviations).

    • Multi-clause queries that read like a question or task.

    Limitations

    • Does not reliably support aggregations; if aggregates are required, use search mode keyword.

    • May underweight exact identifiers and strict literal intent.

    • Relaxation can introduce drift if overused—apply conservatively.

    When to choose search mode semantic (signals)

    • The query is a question or seeks guidance/ideas (“how to”, “ways to”, “similar to”, “alternatives”).

    • The topic is broad/ambiguous or relies on synonyms.

    • The query mixes multiple related notions where meaning matters more than literal overlap.

    How to construct query (reformulation recipe)

    1. Keep the core intent in plain language.

    2. Append 2–6 high-value domain synonyms/paraphrases.

    3. If sparse, relax the narrowest numeric/date constraints last.

    4. Do not inject contradictions or change scope. Keep it ≤ ~25 words.

    Examples (→ assign to query for search mode semantic)

    • ways to find recompete contracts → ways to find recompete contracts; identify expiring awards; renewal opportunities; follow-on opportunities

    • small business set-aside cloud modernization rfp examples → examples of small business set-aside cloud modernization solicitations; RFPs; RFIs; sources sought

    • find similar notices to FAA SWIM support → notices similar to FAA SWIM support; aviation data integration; system wide information management; enterprise integration support

    How the AI should choose a mode

    One-screen decision checklist

    After choosing, set search_mode accordingly, then build and assign the resulting string to query per the relevant construction rules.

    Pick search mode keyword if any are true:

    • The query includes quotes that signal exact matching.

    • The query contains an ID/code/numbered token (solicitation/notice ID, UEI, CAGE, NAICS/PSC, contract number).

    • The user intent is lookup / navigate to a specific document.

    • The query is ≤ 3 tokens and appears specific rather than conceptual.

    • The user asks for aggregations (counts/tops/distributions/time series).

    Pick search mode semantic if any are true:

    • The query is a question or seeks conceptual/semantic matches.

    • The topic is broad, ambiguous, or relies on synonyms.

    • The user asks for “similar to / related to / alternatives”.

    • The query mixes multiple related ideas where meaning matters more than literal overlap.

    Tie-breakers

    • If the query mixes a unique identifier with conceptual context (e.g., "W912HQ-24-R-0123 recompete history"), choose search mode keyword and keep the ID quoted; include extra unquoted context.

    • If the query has no unique tokens and would benefit from synonyms, choose search mode semantic.

    • If constraints conflict (very long exact phrase + “similar to”), prefer search mode semantic unless there’s a quoted ID—then search mode keyword.

    Compact pseudocode

    Minimal, mode-specific query construction rules

    • If search mode keyword: set query to a string that:

    • Preserves user quotes; adds quotes to clear entities/IDs.

    • Fixes obvious typos outside quotes and outside IDs.

    • Outputs: a space-separated list of (possibly quoted) terms/phrases, with optional | ORs and - excludes. No other operators.

    • If search mode semantic: set query to a string that:

    • Starts with the user’s natural-language query.

    • Appends 2–6 domain-aware synonyms/paraphrases.

    • If likely sparse, removes the narrowest numeric/date constraints last.

    • Keeps the whole string concise (≈ 20–25 words).

    Quick domain-flavored examples

    User query → search_modequery

    • W912HQ-24-R-0123 → 1 → "W912HQ-24-R-0123"

    • "cyber incident response" BPA → 1 → "cyber incident response" BPA

    • uei v1abcde345f6 → 1 → "V1ABCDE345F6"

    • cisa endpoint detection rfi → 1 → "CISA" "endpoint detection" RFI

    • how to find recompetes in DoD → 2 → ways to find recompete contracts in DoD; identify expiring awards; follow-on opportunities; contract renewals

    • similar notices to FAA SWIM support → 2 → notices similar to FAA SWIM support; aviation data integration; System Wide Information Management; enterprise integration support

    • small business set-aside cloud modernization rfp examples → 2 → examples of small business set-aside cloud modernization solicitations; RFPs; RFIs; sources sought

    How to resolve any field ending in _ids

    All *_ids fields require valid GovTribe IDs. Use search tools to resolve names/identifiers to IDs.

    Most fields ending in _ids are self-explanatory (e.g., pipeline_ids filters by pipelines). These have extra context:

    Agency distinctions:

    • contracting_federal_agency_ids: Who signed the contract

    • funding_federal_agency_ids: Who owns the money

    • federal_agency_ids: Any agency involved (contracting or funding)

    Location granularity:

    • place_of_performance_ids: Accepts countries, states, counties, cities, postal codes

    • vendor_location_ids: Accepts countries, states, counties, cities, postal codes

    Vendor relationships:

    • vendor_ids: Prime contractors/awardees

    • sub_vendor_ids: Subcontractors

    Other:

    • federal_meta_opportunity_ids: The originating solicitation/notice

    • vendor_primary_registered_naics_category_ids: Vendor's primary NAICS

    • vendor_registered_psc_category_ids: Any PSC the vendor is registered for

Last updated

Was this helpful?