Content Best Practices
Choosing a Format
Prefer the original format.
If your content already exists in a specific format, keep it as-is.
Choosing a Format for New Content:
Tabular Data
- Stand Alone Resource (CSV/Excel)
- Tables are small to medium sized
- The tabular data is the primary content
- Users will ask questions about the data (not query it directly)
Currently only tables are extracted...surrounding text is ignored
- Tables Inside Documents (PDF/Word)
- Tables are part of a larger narrative
- Context matters (e.g., explanation + table together)
- Tables are small to medium sized
- Agents (CSV/Excel)
- Data is large or frequently queried
- Users need to:
- Filter
- Lookup values
- Aggregate data
Examples: - “Find the value in column C where column F = X”
- “What’s the average pressure for pumps in region A?”
- Other Data Sources
If your data lives in:- Databases
- APIs
- Other structured systems
👉 Contact us to discuss integration options.
Decision Guide
| Scenario | Recommended Option |
|---|---|
| Table with supporting context | Table within original document |
| Small/medium standalone table | CSV / Excel |
| Large dataset or query use case | Agent |
Frequently Changing Content
Dynamic resources (HTML/Markdown)
Best for:
- Content that changes frequently
- Unstructured or miscellaneous content
Benefits:
- Renders cleanly in the browser
- Easy to edit and maintain
Everything Else
PDF, Word (.docx), Text (.txt), Markdown (.md), etc.
Note: PDF files open in the browser. Other file types download when opened.
Best Practices
Word
- Use predefined styles (Heading 1, Heading 2, etc.)
HTML
- Use semantic HTML
- Headings (
<h1>,<h2>, etc.) - Navigation elements (to exclude menus from parsing)
- Headings (
- Add title attributes to images and links
- Ensure tables include a header row
- Avoid using tables for layout
- Use
inskill-skip(attribute or class) to exclude irrelevant content (e.g. cookies banners, shopping cart, language selectors, etc.)
PDF
- Prefer tables with visible borders/lines for better detection
Excel
- Named tables or filtered ranges improve table detection
- Top row should be the headers when possible
- Avoid vertical headers
Updated 1 day ago
