Content Best Practices

Choosing a Format

Prefer the original format.
If your content already exists in a specific format, keep it as-is.

Choosing a Format for New Content:

Tabular Data

  1. Stand Alone Resource (CSV/Excel)
    • Tables are small to medium sized
    • The tabular data is the primary content
    • Users will ask questions about the data (not query it directly)

    Currently only tables are extracted...surrounding text is ignored

  2. Tables Inside Documents (PDF/Word)
    • Tables are part of a larger narrative
    • Context matters (e.g., explanation + table together)
    • Tables are small to medium sized
  3. Agents (CSV/Excel)
    • Data is large or frequently queried
    • Users need to:
      • Filter
      • Lookup values
      • Aggregate data

      Examples:
    • “Find the value in column C where column F = X”
    • “What’s the average pressure for pumps in region A?”
  4. Other Data Sources
    If your data lives in:
    • Databases
    • APIs
    • Other structured systems

👉 Contact us to discuss integration options.

Decision Guide

ScenarioRecommended Option
Table with supporting contextTable within original document
Small/medium standalone tableCSV / Excel
Large dataset or query use caseAgent

Frequently Changing Content

Dynamic resources (HTML/Markdown)

Best for:

  • Content that changes frequently
  • Unstructured or miscellaneous content

Benefits:

  • Renders cleanly in the browser
  • Easy to edit and maintain

Everything Else

PDF, Word (.docx), Text (.txt), Markdown (.md), etc.

Note: PDF files open in the browser. Other file types download when opened.

Best Practices

Word

  • Use predefined styles (Heading 1, Heading 2, etc.)

HTML

  • Use semantic HTML
    • Headings (<h1>, <h2>, etc.)
    • Navigation elements (to exclude menus from parsing)
  • Add title attributes to images and links
  • Ensure tables include a header row
  • Avoid using tables for layout
  • Use inskill-skip (attribute or class) to exclude irrelevant content (e.g. cookies banners, shopping cart, language selectors, etc.)

PDF

  • Prefer tables with visible borders/lines for better detection

Excel

  • Named tables or filtered ranges improve table detection
  • Top row should be the headers when possible
  • Avoid vertical headers