NEC combines video analysis technology with generative AI to generate advice for improving work ....
NEC
Corporation has utilized its video analysis and generative AI to develop a
technology that provides advice for improving worker performance by identifying
differences between model actions and actual movements. This newly developed
technology enables the automatic display of appropriate advice for improving
everything from precision tasks using one’s hands and fingers to tasks
requiring the use of the entire body. Using this technology, workers can master
tasks without supervision at worksites in manufacturing, logistics,
construction, and various other industries.
In
recent years, the shortage of mentors arising from the aging of skilled workers
has made passing down skills a challenge. Moreover, there is growing concern
regarding the decline of work quality as a result of the increasing cost of
supervised training and the inability to provide workers with sufficient
training due to the rise of high-mix low-volume production and the diversity
and mobility of workers.
This
technology enables self-education for a wide variety of tasks by having AI
provide advice instead of an instructor. To realize this, NEC developed a video
analysis technology to identify subtle differences in movements when compared
to model actions and a technology to generate appropriate advice to match the
model action by generative AI based on the differences.
The video
analysis technology for detecting subtle differences in movement compares the
model actions with actual movement and matches the sections where the same
operation is performed. At that time, it is possible to merge images by
capturing not only the movement of people, but also interactions such as
grasping and holding objects. This enables subtle differences in the motions of
workers that were previously undetectable to be detected with a high degree of
accuracy, even for video footage of differing lengths.
The
technology that generates advice provides segments of video footage in which
differences have been detected as well as skeletal information such as hip and
knee movement and the shape of the hands and fingers to a Vision and Language
Model (VLM*), this enables the exact working posture and actions within the
video requiring improvement to be accurately pinpointed and specific textual
advice to be generated. Since the textual advice is displayed together with
relevant segments of the video footage, workers can master tasks such as the
meticulous assembly, boxing, and transport of goods without supervision in
various industry settings, thereby contributing to significant reductions in
training costs.
Leave A Comment