Classifier API Performance Best Practices¶
Overview¶
The Athena Classifier Service provides two main APIs for classification:
Classify
: A streaming API for batch classification.ClassifySingle
: A unary API for single input classification.
Choosing the right API and following best practices can significantly impact performance, scalability, and reliability.
API Tradeoffs¶
Classify (Streaming API)
Use Case: Designed for batch processing of multiple inputs in a single request.
Performance: Highly efficient for large volumes of data. Inputs are processed in parallel, reducing overall latency.
Resource Utilization: Better throughput and lower per-input overhead. Suitable for high-throughput scenarios.
Error Handling: Channel-level errors can abort the entire stream. Individual input errors are returned per input.
Best For: Bulk classification, real-time pipelines, and scenarios where latency per input is less critical than overall throughput.
ClassifySingle (Unary API)
Use Case: Processes a single input per request.
Performance: Simpler, but incurs higher overhead per input due to connection setup and teardown.
Resource Utilization: Less efficient for bulk operations. Each request is handled independently.
Error Handling: Errors are returned directly in the response for the specific input.
Best For: Interactive use, low-frequency requests, or when per-input latency and error isolation are critical.
Performance Best Practices¶
Prefer Streaming for Bulk Operations
Use the
Classify
API when classifying multiple inputs. Streaming allows parallel processing and reduces network overhead.Batch Inputs Appropriately
Group inputs into batches that balance throughput and latency. Very large batches may increase memory usage; very small batches may reduce efficiency. A batch size of 10-50 inputs is often a good starting point, but this may vary based on specific use cases and system capabilities.
Handle Errors Gracefully
For
Classify
, check both global errors (e.g., deployment/channel errors) and per-input errors.For
ClassifySingle
, handle errors in the returned output object.
Optimize Input Validation
Invalid inputs are rejected early. Validate inputs client-side before sending to reduce unnecessary network and server load.
Monitor and Tune Timeouts
Streaming requests may take longer; ensure client and server timeouts are configured appropriately.
For
ClassifySingle
, use theRequestTimeout
setting to avoid hanging requests.
Resource Management
Streaming requests consume server resources for the duration of the stream. Close streams promptly when done.
Avoid opening multiple concurrent streams from the same client unless necessary. Fewer clients will likely be able to better batch requests.
Deployment ID Consistency
For streaming, all inputs in a stream must use the same deployment ID.
Mismatched deployment IDs will abort the stream.
Multiple instances can connect to the same deployment ID for load balancing.
New deployment IDs can be created at any time without affecting existing streams, but there is a latency cost for the initial use of a deployment.
Use Appropriate Encoding and Formats
Prefer the
IMAGE_FORMAT_RAW_UINT8
format to reduce payload size.Use the
RequestEncoding
field to specify the encoding of image data. We suggest Brotli for optimal compression.
Summary Table¶
Feature |
Classify |
ClassifySingle |
---|---|---|
Batch Support |
Yes (streaming) |
No (single input) |
Throughput |
High |
Low |
Latency |
Lower per batch |
Lower per input |
Affinity |
Per-deployment |
Per-request |
Error Handling |
Per input & global |
Per input |
Use Case |
Bulk, pipelines |
Interactive, test |
References¶
API Reference: API Reference
Proto Definitions: Raw Proto Definitions