文档首页

Knowledge

AI vision, transcription, and plan limits

How image description and audio/video transcription work on uploads and crawls, what counts toward hosted quotas, and how organization Gemini keys (BYOK) interact with limits.

image recognitiontranscriptiongeminiquotabyokcrawl

FlexyAgents can enrich knowledge with AI beyond plain text extraction: Gemini vision describes images so semantic search finds “screenshots of the billing page,” and Gemini transcription turns podcasts or training videos into searchable text.

On hosted infrastructure, your subscription may include separate monthly caps for (1) image recognition on uploads vs crawls, and (2) media transcription on uploads vs crawls. OCR (Tesseract-style text from images) and basic metadata do not replace vision but are separate—they are not what those “image recognition” quotas measure.

Adding a valid Google Gemini API key for your organization (Settings → LLM API Keys) routes vision and transcription calls through your Google account, so hosted FlexyAgents quotas for those operations are not incremented.

The four hosted counters (when applicable)

Administrators configure limits on the plan. Operators see usage in the knowledge UI: uploads surface upload-side counters; the crawl UI surfaces crawl-side counters when you enable media processing.

Limits use calendar-month windows and the same “unlimited” conventions as other plan numbers where -1 means no cap.

  • Image recognition — knowledge base file uploads: counts successful hosted Gemini vision on uploaded images.
  • Image recognition — website crawl: counts successful hosted Gemini vision on images discovered while crawling.
  • Media transcription — uploads: counts successful hosted Gemini transcription on uploaded audio/video files.
  • Media transcription — crawl: counts successful hosted Gemini transcription on audio/video URLs fetched during a crawl.

Upload behavior and errors

Before processing, the platform checks whether inference will use a hosted Gemini key or your organization key. If you are on hosted inference and the monthly cap is already reached, new uploads that require vision or transcription may be rejected with a clear error (HTTP 402-style payment/quota messaging in APIs) rather than silently storing empty content.

When a BYOK plan requires your own Gemini key and none is configured, vision and transcription steps are skipped; you may still get OCR text or file metadata where available.

  • Prefer adding a Gemini key early if marketing or support teams upload many screenshots or recordings.
  • If uploads fail with a quota message, either raise limits (admin), add a Gemini key, or temporarily upload text-first formats.

Website crawl: separate toggles

Crawl configuration exposes two switches: image recognition (Gemini vision on discovered images) and audio/video transcription (Gemini on fetched media URLs). You can crawl HTML text only, images only, audio/video only, or any combination.

Starting a crawl that enables either switch requires a resolvable Gemini configuration: either FlexyAgents-hosted Gemini (environment-dependent) or your organization Gemini key. Text-only crawls do not require Gemini.

  • Legacy “process all media” style requests are mapped to both toggles for backward compatibility.
  • When hosted crawl quotas are exhausted, later images or media on that crawl may be processed with OCR or placeholders only—behavior is designed to avoid charging failed API calls against limits.

What operators should monitor

Pair analytics with knowledge maintenance: if users ask about visuals, ensure vision is enabled and quotas sufficient. If transcripts are wrong, fix source audio quality or add human-reviewed Q&A for critical wording.

Documentation → Governance → LLM API keys explains key rotation and compliance; Documentation → Knowledge → Website crawl covers politeness, seeds, and scope.

在你的技术栈上构建

准备上线有依据的助手了吗?

开始试用,或与我们沟通上线、治理和企业级要求。