Control device, image forming system, and recording medium

文档序号：1188157 发布日期：2020-09-22 浏览：10次中文

阅读说明：本技术 控制装置、图像形成系统以及记录介质 (Control device, image forming system, and recording medium ) 是由小林美奈子于 2020-03-10 设计创作，主要内容包括：提供一种控制装置、图像形成系统以及记录介质,能够降低从进行语音指示到开始处理为止的所需时间。在用户语音指示“进行复印”时,智能扬声器将该语音数据发送到语音AI服务器。语音AI服务器通过语音识别处理以及自然语言解析处理,从语音数据抽出针对多功能外设的指示内容,将得到的指示内容发送到MFP控制服务器。MFP控制服务器生成与该指示内容对应的作业数据,发送到多功能外设。在多功能外设无法使用操作面板受理用户的作业开始指示的情况下,将与多功能外设的动作状态对应的文本数据发送到语音AI服务器。语音AI服务器根据该应答文本合成语音数据,使智能扬声器进行语音输出。(Provided are a control device, an image forming system, and a recording medium, which can reduce the time required from the execution of a voice instruction to the start of processing. When the user voice instructs "copy", the smart speaker transmits the voice data to the voice AI server. The voice AI server extracts instruction contents for the multi-function peripheral from the voice data by voice recognition processing and natural language analysis processing, and transmits the obtained instruction contents to the MFP control server. The MFP control server generates job data corresponding to the instruction content and transmits the job data to the multi-function peripheral. When the multi-function peripheral cannot accept a job start instruction from a user using the operation panel, text data corresponding to the operation state of the multi-function peripheral is transmitted to the voice AI server. And the voice AI server synthesizes voice data according to the response text to enable the intelligent loudspeaker to output voice.)

1. A control device for causing an image forming apparatus to execute a process in accordance with a process content instructed by a voice, the control device comprising:

a display control unit that displays the processing content on a display device of the image forming apparatus before the processing is executed;

an operation state acquisition unit that acquires an operation state of the display device; and

and a determination unit configured to determine a content of a voice to be output for prompting confirmation of the processing content, based on the acquired operating state of the display device.

2. The control device according to claim 1,

the operation state includes an operation state of whether or not at least a part of the power supply of the display device is in a sleep mode in which the power supply is off.

3. The control device according to claim 2,

in the sleep mode, there are: a 1 st-stage sleep mode in which a period of time until the display of the processing content is enabled is a 1 st predetermined period of time, and a 2 nd-stage sleep mode in which a period of time until the display is enabled is a 2 nd predetermined period of time longer than the 1 st predetermined period of time,

the operating state acquiring means acquires an operating state indicating which of the 1 st stage and the 2 nd stage the sleep mode is,

the determining means determines the content of the speech to be output, based on the operating state indicating which of the 1 st stage and the 2 nd stage the sleep mode is acquired by the operating state acquiring means.

4. The control device according to claim 1,

an estimation unit configured to estimate a start-up time until the display by the display device is enabled, based on the operation state acquired by the operation state acquisition unit,

the decision unit decides the voice content according to the start time.

5. The control device according to claim 4,

in the decision unit, the decision unit is adapted to determine,

in the case where the start time is longer than an output time required for voice output of a predetermined item, the voice content is decided so as to include all the predetermined items,

in a case where the activation time is shorter than the output time, the speech content is decided so as to be composed of items other than at least some of the predetermined items.

6. The control device according to claim 4 or 5,

and a voice adding unit configured to determine an additional content other than the voice content based on a remaining time obtained by subtracting an output time required for voice output of the voice content determined by the determining unit from the activation time.

7. The control device according to claim 1, comprising:

a use state determination unit configured to determine whether or not a user other than the user who has performed the voice input on the processing content is using the display device; and

and a speech output control unit that performs speech output on the speech content before the determination result of the use state determination unit becomes negative after the determination unit determines the speech content, and stops the speech output if the determination result of the use state determination unit becomes negative.

8. The control device according to claim 7,

the voice content decided by the decision unit is divided into a plurality of parts,

the voice output control means stops the voice output at a point of time when the voice output of the portion being output is completed when the determination result of the use state determination means is negative.

9. The control device according to claim 7,

the speech output control means does not stop the speech output when the determination result of the use state determination means is negative and the output time of the non-output portion of the speech content determined by the determination means is shorter than a predetermined time.

10. The control device according to any one of claims 7 to 9,

and a prohibition unit that prohibits the display of the processing content on the display device when the voice output control unit does not stop the voice output.

11. The control device according to any one of claims 7 to 9,

a confirmation unit configured to confirm to a user who has performed the voice input whether or not to cause the display device to display the processing content, when the voice output control unit does not stop the voice output,

and displaying the processing content when the confirmation result is positive.

12. The control device according to claim 11,

and a process start unit that causes the image forming apparatus to start the process without waiting for a user who has performed the voice input to instruct the start of the process when the result of the confirmation is negative.

13. The control device according to any one of claims 7 to 12,

and a display notification unit that outputs a voice of the processing content when the processing content is displayed on the display device after the stop.

14. An image forming system is characterized by comprising:

an image forming apparatus; and

the control device of any one of claims 1 to 13.

15. A computer-readable recording medium storing a program for causing a computer to execute control for causing an image forming apparatus to execute a process in accordance with a process content instructed by a voice,

the program causes a computer to execute:

a display control step of displaying the processing content on a display device of the image forming apparatus before the processing is executed;

an operation state acquisition step of acquiring an operation state of the display device; and

a determining step of determining a content of a voice to be output for urging confirmation of the processing content, based on the acquired operation state of the display device.

16. The recording medium of claim 15,

the operation state includes an operation state of whether or not at least a part of the power supply of the display device is in a sleep mode in which the power supply is off.

17. The recording medium of claim 16,

in the operating state acquiring step, an operating state indicating which of the 1 st stage and the 2 nd stage the sleep mode is acquired,

in the determining step, the content of the speech to be output is determined based on the operation state indicating which of the 1 st stage and the 2 nd stage the sleep mode acquired in the operation state acquiring step is.

18. The recording medium of claim 15,

an estimation step of estimating a start-up time until the display by the display device is enabled, based on the operation state acquired in the operation state acquisition step,

in the determining step, the voice content is determined according to the start time.

19. The recording medium of claim 18,

in the step of deciding,

in the case where the start time is longer than an output time required for voice output of a predetermined item, the voice content is decided so as to include all the predetermined items,

in a case where the activation time is shorter than the output time, the speech content is decided so as to be composed of items other than at least some of the predetermined items.

20. The recording medium according to claim 18 or 19,

and a voice adding step of determining additional contents other than the voice content based on a remaining time obtained by subtracting an output time required for voice output of the voice content determined in the determining step from the activation time.

21. The recording medium of claim 15,

the program causes the computer to further execute:

a use state determination step of determining whether or not a user other than the user who has input the voice input to the processing content is using the display device; and

and a speech output control step of performing speech output on the speech content before the determination result of the use state determination step becomes negative after the speech content is determined in the determination step, and stopping the speech output if the determination result of the use state determination step becomes negative.

22. The recording medium according to claim 21,

the voice content decided in the deciding step is divided into a plurality of parts,

in the voice output control step, when the determination result in the use state determination step is negative, the voice output is stopped at a point in time when the voice output of the portion being output is completed.

23. The recording medium according to claim 21,

in the voice output control step, when the determination result in the use state determination step is negative, the voice output is not stopped when the output time of the non-output portion in the voice content determined in the determination step is shorter than a predetermined time.

24. The recording medium according to any one of claims 21 to 23,

the program causes the computer to further execute:

a prohibiting step of prohibiting the display of the processing content on the display device, when the voice output is not stopped in the voice output control step.

25. The recording medium according to any one of claims 21 to 23,

the program causes the computer to further execute:

a confirmation step of confirming whether or not to cause the display device to display the processing content to a user who has performed the voice input, without stopping the voice output in the voice output control step,

in the display control step, the processing content is displayed when the confirmation result is affirmative.

26. The recording medium according to claim 25,

the program causes the computer to further execute:

a process start step of causing the image forming apparatus to start the process without waiting for the user who has performed the voice input to instruct the start of the process when the confirmation result is negative.

27. The recording medium according to any one of claims 21 to 26,

the program causes the computer to further execute:

and a display notification step of outputting a message indicating that the processing content is displayed on the display device after the stop.

Technical Field

The present invention relates to a control device, an image forming system, and a recording medium for giving a voice instruction to an image forming apparatus, and more particularly to a technique for suppressing an increase in waiting time of a user when giving a voice instruction to an image forming apparatus.

Background

In recent years, smart speakers and IoT devices have become widespread, so that devices capable of performing voice operations using smart speakers have been increasing, and with respect to image forming apparatuses, it is also desired to improve user convenience by coping with such voice operations. Specifically, voice data is generated by receiving a voice input with a microphone, the voice data is converted into text data by voice recognition, and the text data is subjected to natural language analysis to specify the instruction content. Then, the instruction content is converted into an instruction for the image forming apparatus and input to the image forming apparatus, thereby executing a voice instruction.

There are a large number of setting items in a job of the image forming apparatus, and it is preferable that a user can confirm the setting contents immediately before the execution of the job or before the execution of the job at the time of setting by the user. This is the same not only in the case of manual setting but also in the case of voice setting. When confirming the setting contents of the job, the user may be left out of the setting, and items not set by the user may also be confirmed.

Therefore, if the image forming apparatus wants to confirm the setting contents by voice output, a considerable time is required. It takes more time if questions are to be re-asked because of missed hearing of the confirmation speech. From such a viewpoint, it is effective to display the setting contents in a list on the operation panel to allow the user to confirm the contents.

In this way, when the content of the instruction is confirmed to the user before the instruction accepted from the user by voice is executed, it is sometimes more efficient to display the content of the instruction on the operation panel for confirmation than to confirm by voice.

However, in a state where the operation panel is turned OFF (OFF) such as when the image forming apparatus is in the power saving mode, or in a state where another user uses the operation panel, the instruction content (in the above example, a list of the setting contents of the job) cannot be displayed and confirmed by the user.

In order to cope with the case where the operation panel is turned off in the power saving mode, for example, a technique has been proposed in which the contents of the voice command are examined in detail to restore the image forming apparatus from the sleep state to a level at which the requested function is available (see patent document 1), or in which the image forming apparatus is restored only when the requested function is present (see patent document 2). This allows the operation panel to return to an operable state, and therefore, the instruction content can be displayed and confirmed by the user.

Disclosure of Invention

However, in the above-described conventional technique, since the instruction content cannot be displayed until the operation panel is returned to the operable state, there is a problem in that it takes too much time until the user confirmation is started.

In addition, there is a problem that it is necessary to wait until the operation panel becomes usable even when another user uses the operation panel.

The present invention has been made in view of the above-described problems, and an object thereof is to provide a control device, an image forming system, and a program that can efficiently confirm the processing content instructed by voice according to the operating state of an operation panel.

In order to achieve the above object, a control device according to an aspect of the present invention is a control device for causing an image forming apparatus to execute a process in accordance with a process content instructed by a voice, the control device including: a display control unit that displays the processing content on a display device of the image forming apparatus before the processing is executed; an operation state acquisition unit that acquires an operation state of the display device; and a determination unit configured to determine a content of a voice to be output for prompting confirmation of the processing content, based on the acquired operation state of the display device.

In this case, it is preferable that the operation state includes an operation state of whether or not at least a part of the power supply of the display device is in a sleep mode in which the power supply is off.

In addition, the sleep mode may include: the operation state acquiring means acquires an operation state indicating which of the 1 st stage and the 2 nd stage the sleep mode is, and the determining means determines the content of the speech to be output based on the operation state indicating which of the 1 st stage and the 2 nd stage the sleep mode is acquired by the operation state acquiring means.

Further, the audio content determination device may further include an estimation unit configured to estimate an activation time until the display by the display device is enabled, based on the operation state acquired by the operation state acquisition unit, and the determination unit may determine the audio content based on the activation time.

In the determination unit, the voice content may be determined so as to include all the predetermined items when the activation time is longer than an output time required for outputting a voice for the predetermined items, and the voice content may be determined so as to be composed of items other than at least some of the predetermined items when the activation time is shorter than the output time.

Further, the audio output device may further include an audio adding unit that determines additional content other than the audio content based on a remaining time obtained by subtracting an output time required for audio output of the audio content determined by the determining unit from the activation time.

Further, the display device may include: a use state determination unit configured to determine whether or not a user other than the user who has performed the voice input on the processing content is using the display device; and a speech output control unit that performs speech output on the speech content before the determination result of the use state determination unit becomes negative after the determination unit determines the speech content, and stops the speech output if the determination result of the use state determination unit becomes negative.

The voice content determined by the determination means may be divided into a plurality of portions, and the voice output control means may stop the voice output at a point in time when the voice output of the portion being output is completed when the determination result by the use state determination means is negative.

Preferably, the speech output control means does not stop the speech output when the output time of the non-output portion of the speech content determined by the determination means is shorter than a predetermined time when the determination result by the use state determination means is negative.

Further, the display device may further include a prohibition unit that prohibits the display of the processing content on the display device when the voice output control unit does not stop the voice output.

It is preferable that the voice output control means includes a confirmation means for confirming whether or not the processing content is to be displayed on the display device to the user who has performed the voice input, when the voice output control means does not stop the voice output, and for displaying the processing content when the confirmation result is affirmative.

Further, the image forming apparatus may further include a process start unit that starts the process without waiting for the user who has performed the voice input to instruct the start of the process when the confirmation result is negative.

Further, the display device may further include a display notification unit that outputs a voice when the processing content is displayed on the display device after the stop.

An image forming system according to an aspect of the present invention includes: an image forming apparatus; and a control device according to an aspect of the present invention.

A program according to an aspect of the present invention is a program for causing a computer to execute control for causing an image forming apparatus to execute a process in accordance with a process content instructed by a voice, the program causing the computer to execute: a display control step of displaying the processing content on a display device of the image forming apparatus before the processing is executed; an operation state acquisition step of acquiring an operation state of the display device; and a determination step of determining the content of the voice to be output for prompting confirmation of the processing content, based on the acquired operation state of the display device.

This makes it possible to efficiently confirm the processing content instructed by the voice in accordance with the operation state of the operation panel.

Drawings

Fig. 1 is a diagram showing a main configuration of an image forming system.

Fig. 2 is a timing chart illustrating an operation of the image forming system.

Fig. 3 is a block diagram showing the main structure of the smart speaker 111.

Fig. 4 is a block diagram showing the main structure of the voice AI server 102.

Fig. 5 is a block diagram showing the main configuration of the MFP control server 101.

Fig. 6 is an external perspective view showing the main structure of the multifunction peripheral 112.

Fig. 7 is a block diagram showing the main structure of the multifunction peripheral 112.

Fig. 8 is a flowchart showing the main operation of the MFP control server 101.

Fig. 9 is a flowchart showing the setting content confirmation processing executed by the MFP control server 101.

Fig. 10 is a flowchart showing In-startup Processing (In-Start-Up Processing) executed by the MFP control server 101.

Fig. 11 is a flowchart showing processing in panel operation performed by the MFP control server 101.

(symbol description)

1: an image forming system; 101: an MFP control server; 102: a voice AI server; 103: the internet; 111: an intelligent speaker; 112: a multi-functional peripheral; 113: a LAN; 600: an operation panel.

Detailed Description

Embodiments of a control device, an image forming system, and a program according to the present invention will be described below with reference to the drawings.

[1] Structure of image forming system

First, the configuration of the image forming system according to the present embodiment will be described.

As shown in fig. 1, the image forming system 1 includes a cloud system 100 and a user system 110. In the user system 110, a Smart Speaker (SS: Smart Speaker)111 and a Multi-Function Peripheral (MFP: Multi-Function Peripheral)112 are connected to a LAN (Local Area Network) 113.

In addition, in the cloud system 100, 2 cloud servers, that is, an MFP control server 101 and a voice AI (artificial intelligence) server 102, are connected to the internet 103. A LAN113 is also connected to the internet 103. The smart speaker 111, the voice AI server 102, and the MFP control server 101 constitute a voice interface device for a user to give a voice instruction to the multi-functional peripheral 112.

As shown in fig. 2, if the user of the multi-functional peripheral 112 instructs the smart speaker 111 to, for example, voice input "copy", the smart speaker 111 generates voice data from a voice signal and transmits the voice data to the voice AI server 102 via the LAN113 and the internet 103.

The voice AI server 102 generates text data from the voice data by voice recognition processing, and extracts instruction contents for the multifunction peripheral 112 from the text data by natural language analysis processing. The voice AI server 102 may execute the voice recognition processing and the natural language parsing processing using a known AI technique, or may use a technique other than the AI technique. The voice AI server 102 transmits the extracted instruction content to the MFP control server 101.

The MFP control server 101 is a control device that controls the multi-functional peripheral 112, and generates a command corresponding to the instruction content if the instruction content is received from the voice AI server 102, and transmits the command to the multi-functional peripheral 112 associated with the smart speaker 111 that has received the voice instruction. The instruction is, for example, an instruction to execute a job such as a scan job or a print job, an instruction to change the setting content of the job, or the like. Further, the MFP control server 101 monitors the operating state of the multi-functional peripheral 112, and transmits a response text (text data) corresponding to the operating state of the multi-functional peripheral 112 to the voice AI server 102.

The voice AI server 102, if receiving the answer text from the MFP control server 101, synthesizes voice data from the answer text by a voice synthesis process, and streams the voice data to the smart speaker 111. The smart speaker 111 sequentially outputs the received voice data.

[2] Structure of smart speaker 111

Next, the structure of the smart speaker 111 will be described.

As shown in fig. 3, the smart speaker 111 includes a voice processing unit 301 and a communication control unit 302, and a microphone 311 and a speaker 312 are connected to the voice processing unit 301.

The voice processing unit 301 performs AD (analog to digital) conversion on the analog voice signal collected by the microphone 311 to generate voice data that is further compression-encoded, or restores the analog voice signal from the voice data received from the communication control unit 302 and outputs the voice from the speaker 312. The communication control unit 302 executes communication processing for transmitting and receiving voice data and the like to and from the voice AI server 102 via the internet 103.

[3] Structure of voice AI server 102

Next, the configuration of the voice AI server 102 will be explained.

As shown in fig. 4, the voice AI server 102 includes a CPU (Central Processing Unit) 400, a ROM (Read Only Memory) 401, a RAM (Random Access Memory) 402, and the like, and the CPU400 reads out a boot program from the ROM401 after reset, starts up the boot program, reads out an OS (Operating System) and other programs from an HDD (Hard Disk Drive) 403 using the RAM402 as a work storage area, and executes the OS and other programs.

A NIC (Network Interface Card) 404 executes communication processing for connecting the smart speaker 111 and the MFP control server 101 to each other via the internet 103.

The voice processing unit 405 executes voice recognition processing of the voice data received from the smart speaker 111 and voice synthesis processing of the voice data transmitted to the smart speaker 111.

The language processing unit 406 executes natural language analysis processing of the text data generated by the voice processing unit 405. Thus, for example, when the user utters a specific keyword toward the smart speaker 111, the voice AI server 102 recognizes the keyword, shifts to the voice instruction reception mode, recognizes the voice instruction of the next user, and specifies the instruction content.

[4] MFP control server 101 configuration

Next, the configuration of the MFP control server 101 will be described.

As shown in fig. 5, the MFP control server 101 includes a CPU500, a ROM501, a RAM502, and the like, and the CPU500 reads out a boot program from the ROM501 and starts it after reset, and reads out and executes a program such as an OS from the HDD503 using the RAM502 as a storage area for a job. The NIC504 performs communication processing for connecting with the voice AI server 102 and the multi-functional peripheral 112 via the internet 103.

With such a configuration, it is possible to generate a response text and transmit it to the voice AI server 102, or generate an instruction and transmit it to the multi-functional peripheral 112.

[5] Structure of multifunctional peripheral 112

Next, the structure of the multifunction peripheral 112 is explained. The multifunction peripheral 112 is an image forming apparatus having a monochrome and color image forming function, a copy function, a facsimile function, and the like.

As shown in fig. 6, the multifunction peripheral 112 includes an image reading portion 610, an image forming portion 620, and a paper feeding portion 630. The image reading unit 610 feeds documents one by one from a bundle of documents placed on a document tray 611 using an Automatic Document Feeder (ADF) 612, reads the documents in a so-called sheet-through method, and then discharges the documents to a discharge tray 613. Thereby, image data is generated.

The image forming unit 620 includes an image forming unit that forms a toner image and transfers the toner image to a recording sheet, and a fixing unit that thermally fixes the toner image to the recording sheet, and executes image forming processing using image data generated by the image reading unit 610 or image data received via the LAN113 or the internet 103. The paper feed unit 630 accommodates recording sheets, and feeds the recording sheets in parallel with the process of forming a toner image by the image forming unit 620. The recording sheet on which the toner image is thermally fixed is discharged to a paper discharge tray 621 provided in the internal space of the multifunction peripheral 112.

The image forming unit 620 includes an operation panel 600, and presents information to the user of the multifunction peripheral 112 or receives an instruction input from the user. The image forming unit 620 includes a control unit 622, not shown, and the control unit 622 controls the operation of the multi-function peripheral 112.

As shown in fig. 7, the control unit 622 includes a CPU700, a ROM701, a RAM702, and the like, and after reset, the CPU700 reads out a boot program from the ROM701 and starts it, and reads out and executes a program such as an OS from the HDD703 with the RAM702 as a storage area for work. The NIC704 performs communication processing for connecting with the voice AI server 102 and the multi-functional peripheral 112 via the internet 103.

With such a configuration, the controller 622 controls the operations of the image reading unit 610, the image forming unit 620, and the paper feeding unit 630. In particular, the operation panel 600 includes a Liquid Crystal Display (Liquid Crystal Display)601, a touch panel 602, hard keys 603, and a panel control unit 604, and the Liquid Crystal Display 601 and the touch panel 602 constitute a touch panel. The panel control section 604 detects operations of the touch panel 603 and the hard keys 603 and controls display on the liquid crystal display 601.

In addition, the hard key 603 is constituted by a plurality of keys including a start key. The user of the multi-functional peripheral 112 can instruct the start of execution of a job by pressing the start key.

In the multi-functional peripheral 112, in the case where there is no job to be executed next after the execution of the job is completed, a transition is made from the job execution mode to the standby mode. After the transition to the standby mode, if a predetermined time has elapsed without accepting a job to be executed next, the transition is made to the sleep mode of stage 1. In the sleep mode in stage 1, for example, the amount of power consumption is reduced by stopping the temperature adjustment of the fixing device, and the backlight of the liquid crystal display 601 is turned off.

After the transition to the sleep mode of the phase 1, the multifunction peripheral 112 transitions to the sleep mode of the phase 2 when a predetermined time has elapsed without accepting a job to be executed next. The sleep mode in phase 2 is a sleep mode in which the power consumption is further reduced than the sleep mode in phase 1, and for example, the panel control unit 604 for controlling the operation panel 600 is also in the power saving state.

Therefore, the multi-functional peripheral 112 takes little time (e.g., 5 seconds) to return from the sleep mode of the phase 1 to the state in which the operation panel 600 can be used, but takes time (e.g., 1 minute) to return from the sleep mode of the phase 2 to the state in which the operation panel 600 can be used.

The multifunction peripheral 112 returns from the sleep mode to the standby mode if an instruction to request display of the setting contents of the job is received from the MFP control server 101 in the state of the sleep mode in stage 1 and the sleep mode in stage 2. Therefore, regardless of whether the multi-functional peripheral 112 is in the sleep mode of the 1 st stage or the sleep mode of the 2 nd stage, the MFP control server 101 can transmit an instruction to the multi-functional peripheral 112 to display the setting contents of the job and cause the display to be executed.

Similarly, the multifunction peripheral 112 can receive a command from the MFP control server 101 and execute processing corresponding to the command, regardless of which state of the sleep mode in stage 1 and the sleep mode in stage 2, in addition to the command for requesting display of the setting content of the job.

[6] Operation of image Forming System 1

Next, the operation of the image forming system 1 will be described centering on the operation of the MFP control server 101.

(6-1) Main Process

As shown in FIG. 8, if the MFP control server 101 receives text data of a voice instruction from the voice AI server 102 (S801: "YES"), it determines whether the instruction is a setting instruction of a job (S802). If the instruction is a setting instruction of the job (S802: yes), the job setting is recorded in the MFP control server 101 (S811). Specifically, in the MFP control server 101, a default set value is stored in each setting item of the job (copy job, scan job, etc.) in advance, and the stored set value is converted into an instructed set value. After that, when the setting instruction is received in the same manner, the setting value of the setting item is changed in accordance with the instruction each time.

If it is determined in step S802 that the instruction is not a setting instruction for the job (S802: no), it is determined whether the instruction is an execution instruction for the job (S803). If it is determined that the instruction is an instruction to execute the job (yes in S803), the process of confirming the setting contents of the job is executed before the job is executed (S804). This processing is processing for presenting the setting contents of the job to the user and confirming whether or not the setting contents are acceptable, and specific processing contents will be described later.

On the other hand, when it is determined in step S803 that the instruction is not an instruction to execute the job (S803: NO), a process other than executing the job is executed in accordance with the instruction from the user (S822). The processing other than the job execution is, for example, processing of replying to the remaining amount or the like in response to an inquiry of the toner remaining amount.

After the confirmation process of the setting contents of the job is executed (S804), the user makes a reply to the effect that there is no problem according to the setting contents via the smart speaker 111, and when the text data to the effect is received from the voice AI server 102 (S805: OK), an instruction to execute the job is transmitted to the multi-functional peripheral 112 (S806). When the user answers a question about the setting content via the smart speaker 111, text data to that effect is transmitted from the voice AI server 102 to the MFP control server 101. In this case (S805: NG), the process returns to step S801, and the process such as job setting is executed again in accordance with the user instruction.

(6-2) setting content confirmation processing (S804)

Fig. 9 is a flowchart showing specific processing contents of the setting content confirmation processing in step S804. The MFP control server 101 first refers to the operating state of the operation panel 600 of the multi-functional peripheral 112 (S901). That is, the setting contents can be referred to which state of the state in which the liquid crystal display 601 can display the setting contents, the sleep state, and the state in which the liquid crystal display is used by another person (hereinafter, referred to as "in panel operation"). As described above, the sleep state is increased in the order of turning off the backlight of the liquid crystal display 601 in the sleep mode of the 1 st stage and turning off the power supply other than the control section 622 (including the panel control section 604) in the sleep mode of the 2 nd stage, and the time until the operation panel 600 becomes displayable is increased.

The MFP control server 101 always monitors the operating state of the operation panel 600. For example, the multifunction peripheral 112 notifies the MFP control server 101 of the operation state each time the operation state is changed, and if the MFP control server 101 receives the notification, the operation state of the operation panel 600 is recorded in the operation state table. In step S901, the operation state table is referred to.

If the operation state of the operation panel 600 is the sleep state (S902: "YES"), the process under activation is executed (S911), and if the process under activation is completed, the flow returns to the main flow.

When the operation state of the operation panel 600 is not the sleep state (S902: NO) but is in the panel operation (S903: YES), the panel operation in-process processing is executed (S912). If the in-panel-operation processing is completed, the main flow is resumed.

When the operation state of the operation panel 600 is not the panel operation (no in S903), the job setting content is transmitted to the multi-functional peripheral 112 (S904), and a command to display the job setting content is transmitted (S905). The multifunction peripheral 112 displays the received job setting contents on the liquid crystal display 601 in accordance with the instruction.

In the present embodiment, the setting items that can be set by voice are limited to the main setting items that are frequently used (hereinafter also referred to as "main setting items") among all the functions of the multi-functional peripheral, and the job setting contents transmitted in step S904 and the job setting contents displayed on the liquid crystal display 601 are the setting contents of all the main setting items. Further, only the setting contents of the setting items changed from the default setting values may be displayed.

In step S906, a response text #0 prompting the user to confirm the setting contents displayed on the operation panel 600 is transmitted to the voice AI server 102. The response text #0 is a text in the form of a question, for example, a "ok" or "no" answer such as "ok" or "no" answer to the setting displayed on the operation panel.

(6-3) Start-Up middle processing (S911)

In the during-startup processing (S911), as shown in fig. 10, first, the job setting content is transmitted to the multifunction peripheral 112 (S1001), and a command to display the job setting content is transmitted (S1002). The multifunction peripheral 112 starts the resume process from the sleep mode if it receives the instruction.

Next, it is determined whether the sleep mode of the operation panel 600 is the sleep mode of the 1 st stage or the sleep mode of the 2 nd stage, and if it is the sleep mode of the 2 nd stage (S1003: "NO"), the activation time T0 is estimated based on the depth of the sleep mode of the operation panel 600 (S1004). As described above, in the sleep state, since there are the sleep mode of the 1 st stage and the sleep mode of the 2 nd stage which are different in time until the operation panel 600 can display the setting content of the job, the MFP control server 101 estimates the activation time T0 by referring to the table in which the activation time T0 is stored for each depth of the sleep state, for example. The activation time T0 differs depending on the model of the multi-functional peripheral, so the activation time T0 is stored in the table for each model of the multi-functional peripheral, and in step S1004, the activation time T0 corresponding to the model of the multi-functional peripheral 112 is referred to.

Next, a response text #1 is generated (S1005). The response text #1 is a text in which the main setting contents instructed to be displayed on the liquid crystal display 601 of the multifunction peripheral 112 are listed. For example, when the main setting items for the copy function are the copy number setting, the color setting, the one-side/double-side setting of the document, the one-side/double-side setting at the time of printing, the page set setting, and the staple setting, the setting contents (default setting or setting changed by the user) are referred to, and "copy by the number of copies 1, full color, one-side reading, double-side printing, 2-in-1, and 1-point binding is generated and copying is possible". "such text. Here, a text in the form of a question is also used as the answer to "possible" or "impossible".

If the answer text #1 is generated, a time T1 necessary for voice output of the answer text #1 is estimated (S1006). For example, the speech output time T1 may be estimated by multiplying the number Ns of syllables included in the response text #1 by an appropriate coefficient.

When the voice output time T1 is shorter than the activation time T0 by comparing the voice output time T1 of the answer text #1 with the activation time T0 of the operation panel 600 (S1007: "yes"), the answer text #1 is transmitted to the voice AI server 102 (S1023). When response text #1 is received, voice AI server 102 voice-synthesizes response voice data from response text #1 by voice processing unit 405, and transmits the response voice data to smart speaker 111. The smart speaker 111 outputs the response voice data by voice.

Thus, in comparison with the case where the user is not provided with any information until the operation panel 600 is restored from the sleep state, and whether or not the setting is the setting desired by the user is confirmed by referring to the display of the operation panel 600 after the operation panel 600 is activated, whether or not the setting is the setting desired by the user can be known by voice until the operation panel 600 is restored from the sleep state, and therefore, the setting content can be efficiently confirmed in a short time. Further, if the setting desired by the user is made, the execution of the job can be started immediately.

In addition, even when the setting desired by the user is not obtained, the user can know the intention of the operation panel 600 before the operation panel is activated, and therefore, the user can quickly correct the setting contents and execute the job after the activation. In this sense, it is possible to suppress occurrence of a wasteful wait time for the user, and to promote smooth use of the multi-function peripheral 112.

When the voice output time T1 of the response text #1 and the activation time T0 of the operation panel 600 are compared and the voice output time T1 is equal to or longer than the activation time T0 (S1007: NO), the response text #2 is generated (S1008). The response text #2 is a text shorter than the response text #1, and is a text listing only the setting contents changed from the default setting values in accordance with the instruction of the user, for example. In addition, the setting contents changed from the default setting values in accordance with the instruction of the user and a part of the setting items other than the default setting contents may be included.

If the response text #2 is generated, a time T2 required for voice output of the response text #2 is estimated (S1009). Even in this case, if the number Ns of syllables included in the response text #2 is multiplied by an appropriate coefficient, the speech output time T2 can be estimated.

When the speech output time T2 is shorter than the activation time T0 by comparing the speech output time T2 with the activation time T0 (S1010: yes), a difference time Δ T obtained by subtracting the speech output time T2 from the activation time T0 is calculated (S1011), and an additional text of the speech output time equal to the difference time Δ T is generated (S1012). In this case, for example, an additional text having a length equal to the number of syllables Ns calculated by dividing the difference time Δ T by an appropriate coefficient may be generated.

As additional text, for example, "it seems that there is rain today from late afternoon can also be generated. Is there an umbrella? Then, "such text. As for such a text, a text having an appropriate number of syllables Ns may be selected from among texts prepared in advance. If the additional text is voice-output prior to the answer text #2, the answer text #2 can be voice-output immediately before the multifunction peripheral 112 is activated, without leaving the user's attention away from the output voice of the smart speaker 111, in a manner that maintains the state in which the user's attention is drawn to the smart speaker 111. Therefore, the user can be prevented from forgetting the contents of the response text #2 after the multi-functional peripheral is activated, so that smooth use of the multi-functional peripheral 112 can be promoted.

The generated additional text is transmitted to the voice AI server 102 together with the answer text #2 (S1013). Upon receiving these texts, the voice AI server 102 performs voice synthesis on the response voice data by the voice processing unit 405, and transmits the response voice data to the smart speaker 111. The smart speaker 111 performs voice output for the answering voice data.

Accordingly, since it is possible to determine whether or not the voice instruction of the user is correctly voice-recognized by using the time until the multifunction peripheral 112 is activated, when it is determined that the voice instruction is not correctly voice-recognized, the setting error can be corrected quickly by operating the operation panel 600 after the operation panel 600 is activated.

When the speech output time T2 is equal to or longer than the activation time T0 by comparing the speech output time T2 of the response text #2 with the activation time T0 of the operation panel 600 (S1010: "no"), the activation time T0 is short, and therefore, there is no problem in confirming the displayed setting content by the user after the operation panel 600 is activated. Similarly to the case of the sleep mode in phase 1 (S1003: "YES"), an additional text is generated in accordance with the activation time T0 and transmitted to the voice AI server 102(S1024), and the flow returns to the main flow.

When the sleep state of the multi-functional peripheral 112 is the sleep mode in stage 1 (S1003: "yes"), the additional text having the voice output time equal to the activation time T0 (5 seconds in the present embodiment) is generated from the sleep mode in stage 1 in the same manner as step S1012 (S1021), and only the generated additional text is transmitted to the voice AI server 102 (S1022). Upon receiving the additional text, the voice AI server 102 performs voice synthesis on the response voice data by the voice processing unit 405, transmits the response voice data to the smart speaker 111, and outputs the voice.

In the sleep mode in stage 1, since the activation time T0 is short, the waiting time for the user is short, and there is no problem in confirming the displayed setting contents by the user after the operation panel 600 is activated, and therefore, the setting contents of the job are not read. In addition, if the additional text is announced, the user is easy to know that the image forming system 1 is operating normally. Further, the length of the additional text may be 0. That is, the voice output may be suspended as necessary. The same applies to response texts #1 and # 2.

(6-4) Panel operation center processing (S912)

In the panel operation in-process (S912), as shown in fig. 11, first, a response text #1 is generated (S1101). The response text #1 is a text in which the main setting contents instructed to be displayed on the liquid crystal display 601 of the multifunction peripheral 112 are listed, similarly to the start-up processing (S911). If the answer text #1 is generated, the answer text #1 is transmitted to the voice AI server 102 (S1102). If the answer text #1 is received, the voice AI server 102 voice-synthesizes answer voice data from the answer text #1 by the voice processing section 405 and transmits the answer voice data to the smart speaker 111. The smart speaker 111 performs voice output for the answering voice data.

After that, the MFP control server 101 monitors the operation state of the operation panel 600, and if the operation of the operation panel 600 by another user is ended (S1103: "yes"), instructs the voice AI server 102 to stop the voice output by the smart speaker 111 (S1104), and transmits an end notification text to notify that the operation of the operation panel 600 by another user is ended, to the voice AI server 102 (S1005). The multifunction peripheral 112 is notified of the job setting contents (S1106), and a command for displaying the meaning of the job setting contents is transmitted (S1107). The multifunction peripheral 112 displays the received job setting contents on the liquid crystal display 601 in accordance with the instruction.

Whether or not the operation of the operation panel 600 by another user is finished can be determined based on whether or not a start key of the operation panel 600 is pressed, whether or not a predetermined time has elapsed since the last input operation using the operation panel 600, whether or not the other user has logged off from the multi-function peripheral 112, and the like.

The end notification text is, for example, "MFP free", and therefore the setting contents are displayed on the operation panel. The setting contents displayed on the operation panel may be the same. "such text. Further, if there is an operation necessary for executing the job, a text urging the operation may be added. For example, "please set the document. "such text.

Upon receiving the instruction and the end notification text, the voice AI server 102 instructs the smart speaker 111 to immediately stop voice output, generates voice data of the end notification from the end notification text, transmits the voice data to the smart speaker 111, and performs voice output. In addition, the multi-functional peripheral 112 displays the setting contents of the job on the operation panel 600.

Before the operation of the operation panel 600 by another user is finished (S1103: "no"), if the voice output of the response text #1 is finished (S1111: "yes"), a notification is transmitted from the voice AI server 102 to the MFP control server 101, and the MFP control server 101 finishes the in-panel operation processing and returns to the setting content confirmation processing. As a result, the flow returns to the main flow as described above.

Thus, in comparison with a case where the user confirms whether the setting desired by the user is achieved by referring to the display of the operation panel 600 after the panel operation is completed in a state where no information is provided to the user until the panel operation by another user is completed, and the start key of the operation panel 600 is pressed, whether the setting desired by the user is achieved can be known by voice during the panel operation by the other user, and therefore, if the setting desired by the user is achieved, the start key can be immediately pressed and execution of the job can be started when the panel operation is completed by the other user.

In addition, even when the setting desired by the user is not obtained, the user can know the intention of the user before the panel operation performed by another user is finished, and therefore, the setting content can be quickly corrected after the completion of the panel operation, and the job can be executed. In this sense, it is possible to suppress the occurrence of useless waiting time for the user and promote smooth use of the multi-function peripheral 112.

[7] Modification example

The present invention has been described above with reference to the embodiments, but it is obvious that the present invention is not limited to the above embodiments, and the following modifications can be implemented.

(7-1) in the above embodiment, the case (S1104) where the response text #1 is stopped when the panel operation by another user is ended (S1103: "yes") has been described as an example, but the present invention is not limited to this, and the end notification text may be transmitted after the response text #1 is output by voice up to the end (S1105).

In addition, since it is determined that the user has grasped the setting contents when the response text #1 is output by voice until the end, the setting contents may not be displayed on the operation panel 600 of the multifunction peripheral 112. Even in the case where the response text #1 is output by voice until the end, the setting contents may be displayed on the operation panel 600 of the multifunction peripheral 112 for the sake of convenience.

(7-2) in the above-described embodiment, the case (S1106, S1107) where the multifunction peripheral 112 displays the setting content of the job on the operation panel 600 when the panel operation by another user is finished (S1103: "yes") has been described as an example, but the present invention is not limited to this, and may be replaced with the following. That is, when the ratio of the part of the response text #1 to which the voice is output to the entire response text #1 is equal to or higher than the predetermined ratio, or when the part of the response text #1 to which no voice is output is set as a default, it may be determined that the user has sufficiently confirmed the setting contents of the job, and the setting contents may not be displayed on the operation panel 600 of the multifunction peripheral 112.

On the other hand, when the portion of the response text #1 to which the voice is output is smaller than the predetermined ratio to the entire response text #1, or when the portion of the response text #1 to which the voice is not output includes a setting other than the default setting, it may be determined that the user has not sufficiently confirmed the setting content of the job, and the setting content may be displayed on the operation panel 600 of the multifunction peripheral 112.

(7-3) in the above-described embodiment, the case (S1106, S1107) where the multifunction peripheral 112 displays the setting content of the job on the operation panel 600 when the panel operation by another user is finished (S1103: "yes") has been described as an example, but the present invention is not limited to this, and may be replaced with the following. That is, the user may be asked whether or not to display the setting content on the operation panel 600 of the multifunction peripheral 112 in accordance with the processing of voice-outputting the end notification text, and the presence or absence of display of the setting content may be switched in response to the user.

(7-4) in the above-described embodiment, the execution of the job is started when the user has made an answer to the effect that there is no problem as a result of the confirmation of the setting contents, but the execution of the job may be started by the user pressing the start key of the operation panel 600 provided in the multi-functional peripheral 112.

(7-5) in the above-described embodiment, the case (S1103) of determining whether or not the panel operation by another user has been completed has been described as an example, but it is obvious that the present invention is not limited to this, and instead, it may be determined whether or not an error state such as exhaustion of paper, exhaustion of toner, and opening of a door has been eliminated, and after the error state has been eliminated, the voice output of the response text #1 may be stopped, the setting contents may be displayed on the operation panel 600 of the multifunction peripheral 112, and the user may wait for the execution of the job instructed and the execution of the job started.

Even in this case, compared to the case where the user is caused to confirm the setting contents of the job through the operation panel 600 of the multi-functional peripheral 112 after the error state is eliminated, since the setting contents of the job are output by voice and heard from before the error state is eliminated, the execution of the job can be started quickly.

(7-6) in the above-described embodiment, the case where voice input/output is performed using the smart speaker 111 has been described as an example, but it is obvious that the present invention is not limited to this, and voice input/output may be performed using a device other than the smart speaker 111, such as a smartphone, instead of the smart speaker 111. In addition, when a device other than the smart speaker 111, such as a smartphone, is used, the device may also have functions of both the smart speaker 111 and the voice AI server 102.

(7-7) in the above embodiment, the case where the MFP control server 101 and the voice AI server 102 are cloud servers has been described as an example, but the present invention is not limited to this, and may be other server apparatuses. For example, a server device connected to the LAN113 or a server device integrated with the multi-function peripheral 112 may be used as the MFP control server 101 or the voice AI server 102.

(7-8) in the above-described embodiment, the case where the response text #1 is a text in which the main setting contents indicated in the manner of being displayed on the liquid crystal display 601 of the multifunction peripheral 112 are listed, and the response text #2 is a text in which only the setting contents in which the user has instructed the change from the default setting values by voice has been described as an example, but it is obvious that the present invention is not limited to this, and if the time required for voice output of the response text #1 is longer than the response text #2, the response texts #1 and #2 may be texts having different contents from those of the above-described embodiment.

The main setting content may be, for example, default setting of a setting item displayed on the uppermost setting screen for setting the job on the operation panel 600 of the multi-functional peripheral 112. For example, in the case of a copy job, the setting items are "density", "background adjustment", "paper", "magnification", "document > output", and "page aggregation".

The setting contents included in the response text #1 may be only the setting contents that the user has given a voice instruction, or may be only default settings. In addition, the default setting may include setting contents related to setting items displayed on a setting screen other than the uppermost setting screen, or conversely, may not include all setting items displayed on the uppermost setting screen.

Similarly, the response text #2 may be only the default setting, or may be a mixture of the default setting and the setting contents of the voice instruction by the user.

Regardless of the contents of the response texts #1 and #2, the effect of the present invention can be obtained as long as the voice output time of the response text #1 is longer than the voice output time of the response text # 2.

(7-9) in the above-described embodiment, the case where the answer text #1 is voice-outputted when another user performs a panel operation has been described as an example, but it is obvious that the present invention is not limited to this, and an additional text may be voice-outputted in addition to the answer text # 1. For example, if the panel operation by another user is not completed even when the output of the voice of the response text #1 is completed, the user may ask something like "now another user is operating the image forming apparatus. Borrow this time to introduce the new services of the company. . . "and the like, and additional text is outputted as a voice.

(7-10) in the above-described embodiment, the case where the voice instruction is given to the color multi-function peripheral 112 was described as an example, but it is obvious that the present invention is not limited to this, and instead, the voice instruction may be given to the monochrome multi-function peripheral. The same effects can be obtained even when the present invention is applied to a single-function machine such as a printer apparatus, a copying apparatus provided with a scanner, and a facsimile apparatus provided with a facsimile communication function.

(7-11) in the above-described embodiment, the MFP server 101 is configured to store default setting values in each setting item of a job (copy job, scan job, etc.) in advance for the job, change the stored setting values to the setting values instructed by voice, and transmit the setting contents to the multifunction peripheral 112 in a lump after receiving an instruction to execute the job, but the MFP control server 101 may transmit the setting instructed by voice to the multifunction peripheral 112 every time and reflect the setting of the job in the multifunction peripheral 112.

(7-12) in the above-described embodiment, the smart speaker 111, the MFP control server 101, and the voice AI server 102 are configured by devices independent from the multi-function peripheral 112, but all or a part of them may be configured to be built in the multi-function peripheral 112.

When the multifunction peripheral 112 is provided with a voice input/output interface, a device for receiving manual input such as the touch panel 602 or the hard keys 603 may be omitted from the operation panel 600 and only the liquid crystal display 601 may be used. In this case, a microphone and a speaker for inputting and outputting voice may be provided.

(7-13) in the above-described embodiment, the case where the setting contents related to the processing executed by the multi-functional peripheral 112 are displayed on the liquid crystal display 601 of the operation panel 600 has been described as an example, but it is obvious that the present invention is not limited to this, and the setting contents may be displayed by providing a display device separately from the operation panel 600. For example, a PC (Personal Computer) or a portable terminal device connected to the multifunction peripheral 112 via a communication network may be used as a display device to display setting contents related to a voice instruction received by the smart speaker 111. In this example, if the content to be output is determined in accordance with the operation state of the display device, the same effect can be obtained.

(7-14) As described above, the image forming system 1 and the MFP control server 101 are computer systems provided with a microprocessor and a memory. The memory may store a computer program, and the microprocessor may operate in accordance with the computer program.

Here, the computer program is configured by combining a plurality of command codes indicating instructions for a computer in order to realize a predetermined function.

The computer program may be recorded in a computer-readable recording medium such as a flexible disk, a hard disk, an optical disk, or a semiconductor memory.

The computer program may be transmitted via a wired or wireless electric communication line, a network typified by the internet, data broadcasting, or the like.

(7-15) the above embodiment and the above modification may be combined.

[ industrial applicability ]

The control device, the image forming system, and the program according to the present invention are applied to a technique for suppressing an increase in waiting time of a user when a voice instruction is given to an image forming apparatus.

26页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：片材给送装置

Control device, image forming system, and recording medium

相关技术

网友询问留言