Computer vision on a factory floor is not a proof of concept anymore. It is running in production at mid-sized manufacturers across Vietnam and Southeast Asia, catching defects that human inspectors miss, at speeds no human inspector can match.
This guide covers what a real deployment looks like, from the first camera mount to a model in production that the quality team actually trusts.
Step 1 — Define the detection task precisely
Computer vision models do not handle vague requirements. "Detect defects" is not a task. "Detect surface scratches longer than 2mm on the aluminium casing of part SKU-441" is a task. The more precisely you can define what a positive detection looks like, the smaller your dataset needs to be and the faster you get to useful accuracy.
Step 2 — Collect and label data
You need examples of both the defect and the non-defect condition. For a single-class defect detection task on a relatively controlled production line, 300–500 labelled images is enough to begin. Label quality matters more than label quantity. Consistent, precise bounding boxes beat a large noisy dataset.
- Capture images in the same lighting conditions the model will run in.
- Include edge cases: partial defects, shadows, normal surface variation.
- Use the same camera and lens you will deploy — transfer learning does not fully compensate for optics differences.
Step 3 — Choose the right model architecture
For real-time defect detection on an edge device, the YOLO family (v8, v10) is the default starting point. It balances speed and accuracy well. For applications where precision matters more than latency, a two-stage detector like Faster R-CNN is worth the extra compute.
Do not choose a model architecture based on a benchmark. Choose it based on your latency requirement and the compute available at the deployment site.
Step 4 — Edge vs. cloud inference
If your production line generates more than a few frames per second, cloud inference is usually impractical because of latency and network cost. An NVIDIA Jetson or equivalent edge device costing under $500 can handle real-time inference for a single-camera line. Multi-camera lines may need a dedicated server at the facility.
Step 5 — Integrate and monitor
A model that fires alerts nobody acts on is not a working system. The integration — how the detection result reaches the operator, what action it triggers, how misses and false positives are logged — matters as much as the model itself.
Plan for model drift from day one. Production lighting changes. Part tolerances change. A model that works perfectly in month one may degrade quietly over six months. Build a feedback loop: every operator override is a potential new label.
The first deployment is always the hardest. The second takes a fraction of the time because you have already solved the data pipeline and the integration patterns.
