Image forming system, control method thereof, and storage medium

文档序号：1864877 发布日期：2021-11-19 浏览：8次中文

阅读说明：本技术 图像形成系统、其控制方法以及存储介质 (Image forming system, control method thereof, and storage medium ) 是由福田真人于 2021-05-11 设计创作，主要内容包括：本发明提供图像形成系统、其控制方法以及存储介质。图像形成系统能够减少用户设置显示语言的时间和精力。图像形成系统包括以下构件。图像形成设备在片材上形成图像。显示设备显示信息。麦克风获得语音。获得单元基于关于通过麦克风获得的短语的音频信息,来获得多个词信息。指定单元使用多个词信息来指定语言。更新单元基于由指定单元指定的语言来更新显示设备的显示语言。(The invention provides an image forming system, a control method thereof and a storage medium. The image forming system can reduce the time and effort for the user to set the display language. The image forming system includes the following components. The image forming apparatus forms an image on a sheet. The display device displays information. The microphone picks up the speech. The obtaining unit obtains a plurality of word information based on audio information on a phrase obtained by a microphone. The specifying unit specifies the language using the plurality of word information. The updating unit updates the display language of the display device based on the language specified by the specifying unit.)

1. An image forming system, comprising:

an image forming apparatus configured to form an image on a sheet;

a display device configured to display information;

a microphone configured to obtain speech;

an obtaining unit configured to obtain a plurality of word information based on audio information on a phrase obtained by the microphone;

a specifying unit configured to specify a language using the plurality of word information; and

an updating unit configured to update the display language of the display device based on the language specified by the specifying unit.

2. The image forming system according to claim 1, wherein one apparatus constituting the image forming system has the image forming apparatus, the display apparatus, and the microphone.

3. The image forming system according to claim 1, wherein one device constituting the image forming system has one of the image forming apparatus, the display apparatus, and the microphone, and

wherein an apparatus other than the one apparatus constituting the image forming system has: one or more of the image forming apparatus, the display apparatus, and the microphone other than one of the one device has.

4. The image forming system according to claim 3, further comprising an image forming apparatus and a voice obtaining device,

wherein the voice obtaining apparatus has the microphone, and

wherein the image forming apparatus has the image forming device and the display device.

5. The image forming system according to claim 4, wherein the image forming apparatus has the obtaining unit.

6. The image forming system according to claim 1, further comprising a server,

wherein the server has the obtaining unit.

7. The image forming system according to claim 1, wherein in a case where the plurality of word information includes language-word-unspecific information that does not specify a language, the specifying unit specifies the language using word information other than the language-word-unspecific information among the plurality of word information.

8. The image forming system according to claim 1, wherein in a case where the word information obtained based on the audio information on the previous phrase includes only word information that cannot specify a language, the updating unit updates the display language of the display device based on a language specified using the word information obtained from the audio information on a phrase obtained after the previous phrase is obtained.

9. A control method of an image forming system having an image forming apparatus that forms an image on a sheet, a display apparatus that displays information, and a microphone that obtains a voice, the control method comprising:

obtaining a plurality of word information based on audio information about the phrase obtained through the microphone;

specifying a language using the plurality of word information; and

updating a display language of the display device based on the specified language.

10. A non-transitory computer-readable storage medium storing a control program that causes a computer to execute a control method of an image forming system having an image forming apparatus that forms an image on a sheet, a display apparatus that displays information, and a microphone that obtains a voice, the control method comprising:

obtaining a plurality of word information based on audio information about the phrase obtained through the microphone;

specifying a language using the plurality of word information; and

updating a display language of the display device based on the specified language.

Technical Field

The present invention relates to an image forming system allowing voice operation, a control method thereof, and a storage medium storing a control program thereof.

Background

An image forming apparatus such as a printer which cooperates with a smart speaker is known (see, for example, japanese patent laid-open No. 2019-18394 (JP 2019-18394A)). The user can make various settings of the image forming apparatus by inputting voice into the smart speaker. For example, the image forming apparatus may be shared by a plurality of users using different languages in an office. Therefore, the comfortable operating environment is different for each user. The user changes a language used in the image forming apparatus, such as a display language of a display device of the image forming apparatus, by inputting a voice to the smart speaker.

However, this takes time and effort because the user is required to change the setting of the display language of the display device each time the user starts using the conventional image forming apparatus.

Disclosure of Invention

The present invention provides an image forming system capable of reducing time and effort for a user to set a display language, a control method thereof, and a storage medium storing a control program thereof.

Accordingly, a first aspect of the present invention provides an image forming system comprising: an image forming apparatus configured to form an image on a sheet; a display device configured to display information; a microphone configured to obtain speech; an obtaining unit configured to obtain a plurality of word information based on audio information on a phrase obtained by the microphone; a specifying unit configured to specify a language using the plurality of word information; and an updating unit configured to update the display language of the display device based on the language specified by the specifying unit.

Accordingly, a second aspect of the present invention provides a control method of an image forming system having an image forming apparatus, a display apparatus, and a microphone, the control method comprising: obtaining a plurality of word information based on audio information about the phrase obtained through the microphone; specifying a language using the plurality of word information; and updating a display language of the display device based on the specified language.

Accordingly, a third aspect of the present invention provides a non-transitory computer-readable storage medium storing a control program that causes a computer to execute the control method of the second aspect.

According to the present invention, the time and effort of the user to set the display language can be reduced.

Further features of the invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.

Drawings

Fig. 1 is a configuration diagram showing an image forming system according to an embodiment of the present invention.

Fig. 2 is a block diagram schematically showing the hardware configuration of the MFP in fig. 1.

Fig. 3 is a block diagram schematically showing a hardware configuration of the smart speaker in fig. 1.

Fig. 4 is a block diagram schematically showing a hardware configuration of a controller of the cloud server in fig. 1.

Fig. 5 is a block diagram schematically showing the configuration of a device control module as a software module of the MFP in fig. 1.

Fig. 6 is a block diagram schematically showing the configuration of an audio control module as a software module of the smart speaker in fig. 1.

Fig. 7A, 7B, and 7C are diagrams for describing an audio data conversion control module as a software module of the cloud server in fig. 1.

Fig. 8 is a diagram illustrating an example of job information with language setting (hereinafter referred to as "language setting job information") generated by the cloud server in fig. 1.

Fig. 9 is a diagram illustrating an example of language setting job information generated by the cloud server in fig. 1.

Fig. 10 is a sequence diagram showing a procedure of processing executed when the image forming system in fig. 1 receives a job execution instruction by voice input.

Fig. 11 is a flowchart showing a procedure of a voice operation service execution process performed by the cloud server in fig. 1.

Fig. 12 is a flowchart showing the procedure of the language determination processing of step S1102 when the first speech recognition method is used for conversion of text data in step S1101 of fig. 11.

Fig. 13 is a flowchart showing the procedure of the language determination processing of step S1102 when the second speech recognition method is used for conversion of text data in step S1101 of fig. 11.

Fig. 14 is a flowchart showing the procedure of the operation determination processing of step S1103 of fig. 11.

Fig. 15 is a flowchart showing the procedure of the job execution processing of step S1105 of fig. 11.

Fig. 16 is a flowchart showing the procedure of the job information generation processing of step S1502 in fig. 15.

Fig. 17 is a diagram schematically illustrating a flow of generation of language setting job information of japanese in the embodiment.

Fig. 18 is a diagram schematically illustrating a flow of generation of language setting job information in english in the embodiment.

Fig. 19 is a flowchart showing language setting switching processing executed by the MFP which receives language setting job information from the cloud server.

Fig. 20 is a diagram illustrating screen transition of an operation panel of the MFP when execution of a copy job is instructed by voice input.

Fig. 21 is a diagram showing screen transition of an operation panel of the MFP when execution of a job of EMAIL SEND (e-mail transmission) is instructed by voice input.

Fig. 22 is a sequence diagram showing a procedure of processing executed when the image forming system of fig. 1 receives a job setting change instruction by voice input.

Fig. 23 is a flowchart showing the procedure of the job setting processing of step S1107 in fig. 11.

Fig. 24 is a flowchart illustrating the procedure of the job setting information generation processing of step S2301 of fig. 23.

Fig. 25 is a diagram schematically illustrating a flow of generation of job setting information with japanese language setting in the embodiment.

Fig. 26 is a diagram schematically illustrating a flow of generation of job setting information having english language setting in the embodiment.

Fig. 27 is a flowchart showing language setting switching processing executed by the MFP which receives the language setting job information from the cloud server.

Fig. 28 is a diagram showing a screen transition of an operation panel of the MFP when setting of a copy job is input by the voice of the user.

Fig. 29 is a diagram showing screen transition of an operation panel of the MFP when setting of a job is input EMAIL SEND by a voice of a user.

Fig. 30 is a diagram showing an example of a language determination result in the embodiment.

Detailed Description

Embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the following embodiments do not limit the present invention according to claims, and all combinations of characteristic features described in the embodiments are not always indispensable to the aspect of the present invention.

Fig. 1 is a configuration diagram showing an image forming system 100 according to an embodiment of the present invention. As shown in fig. 1, the image forming system 100 is provided with an MFP (multi function peripheral) 101 as an image forming apparatus, smart speakers 102 as voice obtaining devices, and a cloud server 103. The MFP101 and the smart speakers 102 are connected to a network 104. And the cloud server 103 is connected to the network 104 through the gateway 105. Thereby, the MFP101, the smart speaker 102, and the cloud server 103 can communicate through the network 104.

The image forming system 100 can control the MFP101 to execute processing corresponding to the user voice operation obtained by the smart speaker 102. For example, when a user gives a copy job execution instruction such as "copy for this", the smart speaker 102 transmits audio data (audio information) corresponding to the copy job execution instruction to the cloud server 103 through the network 104. Upon receiving the audio data, the cloud server 103 generates device operation data corresponding to the audio data and transmits the device operation data to the MFP101 through the network 104. The MFP101 executes the copy job as processing corresponding to the received device operation data, and transmits a response indicating that the copy job has been executed to the cloud server 103 via the network 104. When receiving the response, the cloud server 103 generates response message data and transmits the response message data to the smart speaker 102 through the network 104. The smart speaker 102 outputs an audio message of "copy now" corresponding to the received response message data.

The MFP101 is a multi-function device equipped with a plurality of functions such as a printing function and a scanning function. The MFP101 is provided with own device data 106 of the MFP and other device data 107 of the MFP. The own device data 106 of the MFP includes the IP address and MAC address of the MFP101 used in data communication through the network 104. The other device data 107 of the MFP includes, for example, account information used when the MFP101 uses the service of the cloud server 103 and URL information about a response notification notifying the cloud server 103 of the execution result of the processing corresponding to the device operation data received from the cloud server 103.

The smart speaker 102 is a microphone equipped with an audio auxiliary function and equipped with own device data 108 of the smart speaker and other device data 109 of the smart speaker. The own device data 108 of the smart speaker includes the IP address and the MAC address of the smart speaker 102 used in data communication through the network 104. The other device data 109 of the smart speaker includes account information used when the smart speaker 102 uses the service of the cloud server 103, and a service URL of the cloud server 103 corresponding to a wakeup word described later.

The cloud server 103 is provided with own device data 110 of the cloud server and other device data 111 of the cloud server. The cloud server's own device data 110 includes service URL information used when the MFP101 or the smart speaker 102 uses the service of the cloud server through the network 104 and the above-described URL information on the response notification. The other device data 111 of the cloud server includes account information issued to the MFP101 and the smart speakers 102, and IP addresses and MAC addresses of the MFP101 and the smart speakers 102. The cloud server 103 communicates with the MFP101 and the smart speakers 102 through the network 104 by using an IP address and a MAC address included in the other device data 111 of the cloud server.

Various data, such as audio data generated by the smart speakers 102 and device operation data generated by the cloud server 103, are transmitted and received through the network 104. The gateway 105 is, for example, a wireless LAN router based on IEEE802.11 standards such as ieee802.11a and ieee802.11b. It should be noted that the gateway 105 may be a construct based on a wireless communication standard other than the IEEE802.11 standard. In addition, the gateway 105 may be a wired LAN router based on the Ethernet standard (such as 10BASE-T, 100BASE-T, and 1200 BASE-T).

Fig. 2 is a block diagram schematically showing the hardware configuration of the MFP101 in fig. 1. As shown in fig. 2, the MFP101 is provided with a controller 200, an operation panel 209, a print engine (image forming apparatus) 211, and a scanner 213. The controller 200 is connected to an operation panel 209, a print engine 211, and a scanner 213. Further, the controller 200 is provided with a CPU (central processing unit) 202, a RAM203, a ROM204, a storage unit 205, a network I/F206, a display controller 207, an operation I/F208, a print controller 210, and a scan controller 212. The CPU 202, RAM203, ROM204, storage unit 205, network I/F206, display controller 207, operation I/F208, print controller 210, and scan controller 212 are connected to each other via a system bus 201.

The CPU 202 controls the operation of the entire MFP 101. The CPU 202 reads a control program stored in the ROM204 or the storage unit 205, and performs various control processes such as a read control process and a print control process. The RAM203 is a main memory of the CPU 202. The RAM203 is used as a work area for the CPU 202, and as a temporary storage area to which a control program stored in the ROM204 or the storage device 205 is expanded. The ROM204 stores a control program run by the CPU 202. The storage unit 205 stores print data, image data, programs, setting information, and the like. Although the MFP101 of the present embodiment is configured such that the single CPU 202 will execute the processing described later using the single memory (RAM 203), the configuration of the MFP101 is not limited to this configuration. For example, the MFP101 may be configured such that a plurality of CPUs, RAMs, ROMs, and storage units will cooperatively execute the processing described later. Further, the MFP101 may perform some processing using a hardware circuit such as an ASIC (application specific integrated circuit) or an FPGA (field programmable gate array).

When the MFP101 communicates with other devices via the network 104, the network I/F206 is used. For example, the MFP101 analyzes print data received through the network I/F206 by a PDL analysis module (not shown). The PDL analysis module is a software module for analyzing print data, and generates image data printed by the print engine 211 based on print data expressed by various page description languages. A program for starting the PDL analysis module is stored in the storage unit 205 or the ROM 204.

The display controller 207 and the operation I/F208 are connected to an operation panel 209. The operation I/F208 controls display of a screen on the operation panel 209. When the user operates the operation panel 209, the MFP101 obtains an event corresponding to the operation by the user through the display controller 207.

The print controller 210 is connected to a print engine 211. The print controller 210 transmits the image data generated by the PDL analysis module described above to the print engine 211. The print engine 211 forms the received image data on a sheet. An electrophotographic system, an inkjet system, or the like is used as the printing system of the print engine 211. When an electrophotographic system is used, an image is formed on a sheet by developing an electrostatic latent image formed on a photosensitive member, transferring the developed toner image onto the sheet, and fixing the transferred toner image. When an ink jet system is used, an image is formed on a sheet by discharging ink.

The scan controller 212 is connected to the scanner 213. The scanner 213 reads an image on a sheet and generates image data. The image data generated by the scanner 213 is stored in the storage unit 205. Further, an image is formed on the sheet using the image data generated by the scanner 213. The scanner 213 has a document feeder (not shown), and can read documents by conveying documents stacked on the document feeder one by one.

Fig. 3 is a block diagram schematically showing a hardware configuration of the smart speaker in fig. 1. As shown in fig. 3, smart speaker 102 is equipped with controller 300, microphone 308, microphone 310, and LED 312. The controller 300 is connected to a microphone 308, a loudspeaker 310 and an LED 312. Further, the controller 300 is provided with a CPU302, a RAM303, a ROM 304, a storage unit 305, a network I/F306, a microphone I/F307, an audio controller 309, and a display controller 311. The CPU302, RAM303, ROM 304, storage unit 305, network I/F306, microphone I/F307, audio controller 309, and display controller 311 are connected to each other via the system bus 301.

The CPU302 is a central processing unit that controls the operation of the entire controller 300. The RAM303 is a volatile memory. The ROM 304 is a nonvolatile memory, and stores a start-up program of the CPU 302. The storage unit 305 is a storage device having a larger memory capacity than the RAM303, and may be an SD card. It should be noted that the storage unit 305 may be a flash ROM instead of the SD card, or may be another storage device having a function equivalent to that of the SD card. For example, the storage unit 305 stores a control program of the smart speaker 102 executed by the controller 300.

When the smart speaker 102 is started in response to a power-on operation by the user, the CPU302 runs a start-up program stored in the ROM 304. The startup program reads the control program stored in the storage unit 305 and develops the relevant control program onto the RAM 303. The CPU302 runs a control program developed onto the RAM303 and performs various control processes. Further, the CPU302 stores data used when running the control program into the RAM303 or the storage unit 305. The CPU302 communicates with other devices on the network 104 through a network I/F306.

The network I/F306 includes circuitry and antennas capable of communicating in accordance with a wireless communication system based on the IEEE802.11 standard. It should be noted that the network I/F306 may employ a cable communication system based on the ethernet standard instead of the wireless communication system. The microphone I/F307 is connected to a microphone 308. The microphone I/F307 converts the user speech received through the microphone 308 into encoded audio data according to an instruction from the CPU302, and stores the converted audio data into the RAM 303.

The microphone 308 is, for example, a compact MEMS microphone mounted in a smartphone or the like. It should be noted that the microphone 308 is not limited to a MEMS microphone and may be other devices capable of obtaining user speech. In this embodiment, three or more microphones 308 are preferably arranged at predetermined positions so as to specify the arrival direction of the user's voice.

The audio controller 309 is connected to a loudspeaker 310. The audio controller 309 converts audio data into an analog voice signal according to an instruction from the CPU302, and outputs sound through the microphone 310.

The microphone 310 reproduces an audio response indicating that the smart speaker 102 is responding, and also reproduces sound synthesized by the cloud server 103. The microphone 310 is a general purpose device for reproducing sound.

The display controller 311 is connected to the LED 312. The display controller 311 controls light emission of the LED312 according to an instruction from the CPU 302. In an embodiment, display controller 311 controls the illumination of LED312 to indicate that smart speaker 102 is obtaining the user's voice. The LED312 is, for example, a blue LED visible to the user. The LED312 is a general purpose device. In an embodiment, smart speaker 102 may be equipped with a display device capable of displaying text and pictures indicating that smart speaker 102 is obtaining the user's voice, instead of the illumination of LED 312.

Fig. 4 is a block diagram schematically showing a hardware configuration of the controller 400 of the cloud server 103 in fig. 1. As shown in fig. 4, the controller 400 is provided with a CPU402, a RAM403, a ROM 404, a storage unit 405, and a network I/F406. The CPU402, RAM403, ROM 404, storage unit 405, and network I/F406 are connected to each other via a system bus 401.

The CPU402 is a central processing unit that controls the operation of the entire controller 300. The RAM403 is a volatile memory. The ROM 404 is a nonvolatile memory, and stores a startup program and the like of the CPU 402. The storage unit 405 is a storage device having a larger memory capacity than the RAM403, and may be a Hard Disk Drive (HDD). It should be noted that the storage unit 405 may be a Solid State Drive (SSD), or may be another storage device having a function equivalent to that of the HDD. The storage unit 405 stores, for example, a control program of the cloud server 103 executed by the controller 400.

When the cloud server 103 is started, the CPU402 runs a start program stored in the ROM 404. The startup program reads the control program stored in the storage unit 405 and launches the relevant control program onto the RAM 403. The CPU402 runs a control program developed onto the RAM403 and performs various control processes. Further, the CPU402 stores data used when running the control program into the RAM403 or the storage unit 405. The CPU402 communicates with other devices on the network 104 through a network I/F406.

Fig. 5 is a block diagram schematically showing the configuration of a device control module 500 as a software module of the MFP101 in fig. 1. As shown in fig. 5, the device control module 500 includes a data transmission/reception module 501, a data analysis module 502, a job control module 503, a data management module 504, a display module 505, an operation object determination module 506, a scanning module 507, and a printing module 508. Since the CPU 202 runs the control program developed from the ROM204 to the RAM203, the processing performed by these modules is realized.

The data transmission/reception module 501 controls data transmission and reception between the MFP101 and other devices on the network 104 through the network I/F206 according to TCP/IP. For example, the data transmission/reception module 501 controls reception of device operation data generated by the cloud server 103. Further, the data transmission/reception module 501 controls transmission of various notifications from the MFP101 to the cloud server 103. The various notifications include, for example, a notification indicating a job execution result and a notification indicating a job execution state.

The data analysis module 502 converts the device operation data received by the data transmission/reception module 501 into commands that can be interpreted by the modules of the device control module 500, and transmits the corresponding commands to the job control module 503, the data management module 504, and the display module 505.

The job control module 503 gives instructions to the print controller 210 and the scan controller 212 to control the print engine 211 and the scanner 213, respectively. The data management module 504 stores data on processing performed by the device control module 500 to a predetermined area of the RAM203 and the storage unit 205, and manages the data. Data on processing by the device control module 500 includes, for example, job data which is a combination of setting items and setting values of a job executed by the job control module 503, and language setting data showing the language of text displayed on the operation panel 209. Further, the data management module 504 stores authentication information necessary for communication with the gateway 105, device information necessary for communication with the cloud server 103, and the like in the RAM203 or the storage unit 205, and manages the information. Further, the data management module 504 stores screen control information used by the display module 505 for display control of a screen, and operation object determination information used by the operation object determination module 506 for determining an operation object. The screen control information and the operation object determination information are managed for each screen displayed by the display module 505.

The display module 505 gives an instruction on display control of the operation panel 209 to the display controller 207. For example, when receiving an instruction from the display module 505, the display controller 207 displays UI members (buttons, pull-down lists, check boxes, and the like) operable by the user on the operation panel 209. The picture is updated based on the picture control information. For example, the display module 505 obtains a language dictionary corresponding to the language setting data managed by the data management module 504 from the storage unit 205, and displays text data generated based on the language dictionary on the operation panel 209.

The operation object determination module 506 obtains coordinates showing a position where the user touches the operation panel 209 through the operation I/F208, and determines a UI member displayed on the operation panel 209 at the position where the user touches as an operation object. The operation object determination module 506 reads screen control information corresponding to the UI member determined as the operation object, and determines the content of processing based on the relevant screen control information. The operation object determination module 506 instructs the modules of the device control module 500 to perform the determined processing. For example, the operation object determination module 506 instructs the display module 505 to update the display contents of the screen, and instructs the job control module 503 to start a job using the job parameters set by the user operation.

The scan module 507 controls the scanner 213 to perform scanning by the scan controller 212 based on the scan setting received from the job control module 503, and controls the data management module 504 to store the read image data. The print module 508 controls the print engine 211 to print via the print controller 210 based on the print setting received from the job control module 503.

Fig. 6 is a block diagram schematically showing the configuration of an audio control module 600 as a software module of the smart speaker 102 in fig. 1. As shown in fig. 6, the audio control module 600 includes a data transmission/reception module 601, a data management module 602, a control module 603, a voice obtaining module 604, an audio reproduction module 605, a display module 606, a voice operation start detection module 607, and an utterance end determination module 608. Since the CPU302 runs the control program developed from the storage unit 305 to the RAM303, the processing performed by these modules is realized.

The data transmission/reception module 601 controls data transmission and reception between the smart speaker 102 and other devices on the network 104 through the network I/F306 according to TCP/IP. For example, the data transmission/reception module 601 controls transmission of audio data of the user voice obtained by the voice obtaining module 604 to the cloud server 103. Further, the data transmission/reception module 601 controls reception of synthetic audio data (described later) from the cloud server 103.

The data management module 602 stores data related to the processing of the audio control module 600 to a predetermined area of the storage unit 305. The data on the processing of the audio control module 600 includes, for example, volume setting data of sound reproduced by the audio reproduction module 605, authentication information necessary for communication with the gateway 105, and device information necessary for communication with the MFP101 and the cloud server 103.

The voice obtaining module 604 generates audio data by converting the analog voice of the user picked up by the microphone 308 into a digital signal of a predetermined format such as MP3 and by encoding the digital signal, and temporarily stores the relevant audio data into the RAM 303. The control module 603 manages the start and end timings of the processing of the voice obtaining module 604. It should be noted that the format of the audio data may be a general stream format. The encoded audio data may then be transmitted to the data transmission/reception module 601.

The audio reproduction module 605 controls the audio controller 309 to reproduce the synthesized audio data (audio message) received by the data transmission/reception module 601 using the microphone 310. The control module 603 manages execution timing of audio reproduction processing by the audio reproduction module 605.

The display module 606 controls the lighting of the LEDs 312 through the display controller 311. For example, when the voice operation start detection module 607 detects a voice operation, the display module 606 lights the LED312 through the display controller 311. The control module 603 manages execution timing of processing of the display module 606.

When a wake-up word spoken by the user or a pressing operation of an operation start key (not shown) of the smart speaker 102 is detected, the voice operation start detection module 607 transmits an operation start notification showing that the wake-up word or the pressing operation is detected to the control module 603. The wake-up word is a voice word for activating the audio accessory function of the smart speaker 102, and is registered in advance. The voice operation start detection module 607 detects a wake-up word from the user's simulated voice picked up by the microphone 308. The user can operate the MFP101 by speaking a phrase corresponding to the instruction after speaking the wakeup word.

The utterance end determination module 608 determines an end timing of the processing of the speech acquisition module 604. For example, the end-of-utterance determination module 608 determines that the utterance of the user ends when the speech of the user pauses for a predetermined period (e.g., three seconds). Then, the utterance end determination module 608 transmits an utterance end notification showing the determination result to the control module 603. It should be noted that the end of the utterance of the user may be determined based on speaking a predetermined word registered in advance instead of the no-utterance period (referred to as a "pause period"). For example, when the user speaks a predetermined word registered in advance (such as "yes", "no", "OK", "cancel", "end", "start", and "start"), the utterance end determination module 608 may determine that the utterance of the user ends without waiting for a predetermined period of time. Further, not the smart speaker 102 but the cloud server 103 may determine the end of the utterance, and the cloud server 103 may determine the end of the utterance of the user based on the meaning and context of the content of the utterance of the user.

The control module 603 controls the other modules in the audio control module 600 to operate in cooperation with each other. Specifically, the control module 603 controls the start and end of the processing of the voice obtaining module 604, the audio reproducing module 605, and the display module 606. Further, after the voice obtaining module 604 obtains the audio data, the control module 603 controls the data transmission/reception module 601 to transmit the audio data to the cloud server 103. Further, after receiving the synthetic audio data from the cloud server 103, the control module 603 controls the audio reproduction module 605 to reproduce the synthetic audio data.

The start and end timings of the processing of the voice obtaining module 604, the audio reproducing module 605, and the display module 606 will be described.

When receiving the operation start notification from the voice operation start detection module 607, the control module 603 starts the processing of the voice obtaining module 604. Further, when receiving an end-of-utterance notification from the end-of-utterance determination module 608, the control module 603 ends the processing of the speech obtaining module 604. For example, when the user speaks a wakeup word, the voice operation start detection module 607 detects the wakeup word and transmits an operation start notification to the control module 603. When receiving the operation start notification, the control module 603 controls the voice obtaining module 604 to start processing. The voice obtaining module 604 obtains the user's voice after the wakeup word (e.g., "i want to copy"), converts the voice into audio data, and temporarily stores the audio data. When a pause period of a predetermined period of time has continued after the voice of "i want to copy", the utterance end determination module 608 transmits an utterance end notification to the control module 603. When receiving the end-of-utterance notification, the control module 603 controls the speech obtaining module 604 to end the processing. Hereinafter, a state between the start and the end of the processing of the speech obtaining module 604 will be referred to as "speech processing state". The display module 606 illuminates the LED312 as a notification indicating that it is in a speech processing state.

After determining that the utterance of the user is ended, the control module 603 instructs the data transmission/reception module 601 to transmit the audio data temporarily stored in the voice obtaining module 604 to the cloud server 103 and wait for a response from the cloud server 103. The response from the cloud server 103 includes, for example, a header portion indicating the response and a response message composed of synthetic audio data. When the data transmission/reception module 601 receives the above-described response, the control module 603 controls the audio reproduction module 605 to reproduce the synthesized audio data. The synthesized audio data is an audio message such as "copy screen will be displayed". It should be noted that a state between the end of determining the utterance of the user and the end of reproduction of the synthesized audio data will be referred to as a "response processing state". The display module 606 flashes the LED312 as a notification indicating that the response process state is in.

After finishing the reproduction of the synthetic audio data, the user can give an instruction by issuing a phrase corresponding to the instruction without speaking a wake word while the interactive session with the cloud server 103 continues. When the cloud server 103 sends an interactive session end notification to the smart speaker 102, the end of the interactive session is determined. The state between the end of one interactive session and the beginning of another interactive session will be referred to as the "standby state". That is, the smart speaker 102 is in a standby state until the control module 603 receives an operation start notification from the voice operation start detection module 607. During the standby state, the display module 606 causes the LED312 to go off as a notification indicating that it is in the standby state.

Fig. 7A, 7B, and 7C are diagrams for describing an audio data conversion control module as the software module 700 of the cloud server 103 in fig. 1. Fig. 7A is a block diagram schematically showing the configuration of the audio data conversion control module 700. Fig. 7B shows an example of a japanese group ID list used by the group ID determination module 707 described later for determining a group ID. Fig. 7C shows an example of an english language group ID list used by the later-described group ID determination module 707 to determine a group ID. In group ID list

In which words having the same meaning or intention related to the operation of the MFP101 by the user are grouped

Under the same ID. It should be noted that the words listed here are spoken by the user to the smart speaker 102

Results of speech recognition of words. In addition, in the group ID list, it is shown whether or not the post-description language is excluded

A specific language determination exception flag is set to each registered word. In the group ID list, YES is set to a language determination exception flag, such as a "katakana" word (e.g., "kopi"), which cannot specify whether the word is a word in english or japanese. Words with YES set to the language determination exception flag will not be used for the later-described language determination. Meanwhile, in the group ID list, NO is set to the language determination exception flag for words other than katakana words. The word for setting NO to the language determination exception flag is used for the language determination described later.

In fig. 7A, the audio data conversion control module 700 includes a data transmission/reception module 701, a data management module 702, a device operation data generation module 703, and an audio data conversion module 710. The audio data conversion module 710 includes a speech recognition module 705, a morphological analysis module 706, a group ID determination module 707, and an audio synthesis module 708. Since the CPU402 runs the control program developed from the storage unit 405 to the RAM403, the processing performed by the above-described modules is realized.

The data transmission/reception module 701 controls data transmission and reception between the cloud server 103 and other devices on the network 104 through the network I/F406 according to TCP/IP. For example, the data transmission/reception module 701 receives audio data of the user from the smart speaker 102. Further, the data transmission/reception module 701 transmits the group ID determined by the group ID determination module 707 and the determination result of the text data by the voice recognition processing performed by the voice recognition module 705 to the MFP 101.

The data management module 702 stores data related to the processing of the audio data conversion control module 700 to a predetermined area of the storage unit 405. The data related to the processing of the audio data conversion control module 700 includes, for example: an acoustic model and a language model for converting audio data received by the data transmission/reception module 701 into text data, a dictionary used when the morphological analysis module 706 performs morphological analysis on text, a group ID list used when the group ID determination module 707 determines a group ID, an audio database used when the audio synthesis module 708 performs audio synthesis processing, and device information necessary for communication with the smart speaker 102 or the MFP 101.

The voice recognition module 705 performs a voice recognition process to convert the audio data of the user received by the data transmission/reception module 701 into text. The speech recognition process converts the user's audio data into phonemes using an acoustic model and also converts the phonemes into actual text data using a language model. It should be noted that the audio data of the user may include words in several different languages. In an embodiment, the speech recognition process may employ a first speech recognition method that determines a language of input audio data and converts the audio data into text data of the determined language. Further, the speech recognition process may employ a second speech recognition method that converts input audio data into phonemes using acoustic models of a plurality of languages and converts the audio data into text data of each language using a corresponding language model. Since the second speech recognition method converts audio data into text data in a plurality of language forms, the speech recognition module 705 generates speech recognition data composed of text and language settings as a result of execution of the speech recognition processing.

In an embodiment, the language of the input speech is Japanese and English. Speech recognition data for japanese is data composed of a language setting "japanese" and text composed of one or more kana names. The speech recognition data in english is data composed of a language setting "english" and a text composed of one or more letters. It should be noted that the voice recognition processing of converting audio data into voice recognition data is not limited to the method described above in the embodiment, and other methods may be used.

The morphological analysis module 706 morphologically analyzes the voice recognition data converted by the voice recognition module 705 based on the language setting. The morphological analysis module 706 derives a morpheme string from a dictionary having information about the grammar and part-of-speech of the language and determines the part-of-speech of each morpheme (word information) constituting the relevant morpheme string. The morphological analysis module 706 may be implemented by using well-known morphological analysis software such as JUMAN, Web-Chamame, or MeCab.

An operation example of the morphological analysis module 706 will be described. For example, the morphological analysis module 706 analyzes the speech recognition data { "yonbukopishite (four)" and the language setting "japanese" } converted by the speech recognition module 705 into morpheme strings of "yon", "bu", "kopi", "wo", and "shite". In addition, the morphological analysis module 706 analyzes the speech recognition data { "four parts", language setting "english" } into morpheme strings of "four" and "parts".

The group ID determination module 707 specifies the group ID by matching the result of the morphological analysis by the morphological analysis module 706 with the group ID list corresponding to the language setting of the voice recognition data in the japanese group ID list in fig. 7B and the english group ID list in fig. 7C. Then, the group ID determination module 707 generates a group ID determination result indicating the specified group ID. For example, the group ID determination module 707 matches morpheme strings of "yon", "bu", "kopi", "wo", and "shite" with the japanese group ID list in fig. 7B, specifies "NUM 00004", "CNF 00001", and "FNC 00001" as group IDs of "yon", "bu", and "kopi", and generates { ID: NUM00004, ID: CNF00001, ID: FNC00001} as the group ID determination result. Further, the group ID determination module 707 matches the morpheme strings of "four" and "shares" with the english group ID list in fig. 7C, specifies "NUM 00004", "CNF 00001", and "FNC 00001" as the group IDs of "four" and "shares", and generates { ID: NUM00004, ID: CNF00001, ID: FNC00001} as the group ID determination result.

When the group ID determination result includes a plurality of group IDs, the group IDs are set in the order of the results of the voice recognition and the morphological analysis. For example, when the results of speech recognition and morphological analysis are "yon", "bu", "kopi", "wo", and "shite", the results are expressed in { ID: NUM00004, ID: CNF00001, ID: FNC00001} as a group ID determination result. Further, when there are different group IDs corresponding to the same morpheme, the group ID determination result may include all the different group IDs. For example, in the english group ID list of fig. 7C, "CNF 00001" and "FNC 00001" are associated with the same morpheme "share". When the results of the voice recognition and morphological analysis are "four" and "share", the group ID determination result is generated as { ID: NUM00004, ID: CNF00001, ID: FNC00001 }.

The audio synthesis module 708 performs audio synthesis processing based on the notification received from the MFP 101. In the audio synthesis process, the previously registered text corresponding to the received notification is converted into audio data in a predetermined format such as MP 3. In the audio synthesis process, for example, audio data is generated based on an audio database stored in the data management module 702. The audio database is, for example, a database that collects sounds of regular contents such as words. Although the audio synthesis process is performed using the audio database in the embodiment, the method of the audio synthesis process is not limited to this method. Other methods may be used.

The device operation data generation module 703 determines the operation of the MFP101 based on the group ID determination result generated by the group ID determination module 707 and the language setting of the voice recognition data generated by the voice recognition module 705. The device operation data generation module 703 generates a file in a predetermined data format corresponding to the determined operation.

For example, when the language setting of the voice recognition data is "japanese" and the group ID determination result is { ID: NUM00004, ID: CNF00001, ID: FNC 00001), the device operation data generation module 703 determines to set japanese as the language setting of the MFP101 based on "japanese", and generates a character string { "language": "Japanese" }. The device operation data generation module 703 determines, based on "FNC 00001", to instruct the MFP101 to perform a copy job, and generates a character string { "operation": "jobStart" } and { "jobName": "copy" for performing a copy job. The device operation data generation module 703 generates a character string { "copies" for designating "4" as the number of copies of the copy job based on "NUM 00004" and "CNF 00001": "4"}. The device operation data generation module 703 generates data in the JSON format shown in fig. 8 by combining these character strings.

Further, when the language of the voice recognition data is set to "english" and the group ID determination result is { ID: FNC00001, ID: NUM00004, ID: CNF00002, ID: FNC 00003), the device operation data generation module 703 determines to set english as the language setting of the MFP101 based on "english", and generates a character string { "language": "English" }. The device operation data generation module 703 determines to execute job settings of the MFP101 based on "FNC 00001" and "FNC 00003", and generates a character string { "operation" for executing the job settings: "jobSetting" }. The device operation data generation module 703 generates a character string { "dense" based on "NUM 00004" and "CNF 00001": "4"}. The device operation data generation module 703 generates data in the JSON format shown in fig. 9 by combining these character strings.

Fig. 10 is a sequence diagram showing a procedure of processing executed when the image forming system 100 of fig. 1 receives a job execution instruction by voice input. It should be noted that, in fig. 10, the smart speaker 102, the MFP101, and the cloud server 103 should be able to communicate with each other. Further, a home screen 2001 in fig. 20 should be displayed on the operation panel 209 of the MFP101, and on the home screen 2001, functions such as copying, scanning, and printing can be called.

In fig. 10, the user first gives an instruction to start a voice operation to the smart speaker 102 in step S1001. When the user speaks a wakeup word or when the user presses an operation start key (not shown) of the smart speaker 102, an instruction to start a voice operation is given. The instruction to start the voice operation is detected by the voice operation start detection module 607.

When an instruction to start a voice operation is detected, in step S1002, in the smart speaker 102, the display module 606 of the audio control module 600 lights the LED312 as a notification indicating that it is in a speech processing state. Further, the processing of the speech acquisition module 604 begins in the smart speaker 102.

In step S1003, the user makes a function call instruction to the smart speaker 102. For example, the user speaks a phrase such as "yonbukopishite" or "quartet" as a job execution instruction as a function call instruction following the wakeup word detected in step S1001. The audio data is generated based on the user speech obtained by the speech obtaining module 604. The end-of-utterance determination module 608 determines that the utterance is ended when a pause period of a predetermined period of time continues.

In step S1004, the display module 606 of the audio control module 600 blinks the LED312 as a notification indicating that it is in a response processing state according to the utterance end determination. Further, the processing of the speech obtaining module 604 is completed. In step S1005, the data transmission/reception module 601 transmits the generated audio data to the cloud server 103.

In step S1006, the audio data conversion control module 700 in the cloud server 103 executes voice operation service execution processing of fig. 11 described later. Details of the voice operation service execution process will be described later. In the voice operation service execution process, for example, language setting job information as device operation data for executing a job is transmitted to the MFP101, and an audio message described later is transmitted to the smart speaker 102.

In step S1007, the device control module 500 in the MFP101 executes language setting switching processing of fig. 19 described later based on the language setting job information received from the cloud server 103.

In step S1008, the data transmission/reception module 601 in the smart speaker 102 receives the audio message from the cloud server 103. In the next step S1009, the audio reproduction module 605 reproduces the synthetic audio data converted from the audio message received in step S1008. For example, the audio reproduction module 605 reproduces the synthetic audio data "copy will be started" through the microphone 310.

In step S1010, the data transmission/reception module 601 receives an audio message different from the audio message received in step S1008 from the cloud server 103. Further, the data transmission/reception module 601 receives an interactive session end notification from the cloud server 103, the notification ending the interactive session with the user.

In step S1011, the audio reproduction module 605 reproduces the synthetic audio data converted from the audio message received in step S1010. For example, the audio reproduction module 605 reproduces the synthetic audio data "copy has ended" through the microphone 310.

In step S1012, the display module 606 turns off the LED312 as a notification showing that the smart speaker 102 is in a standby state in response to the data transmission/reception module 601 receiving the interactive session end notification in step S1010.

In step S1013, in response to the data transmission/reception module 601 receiving the interactive session end notification in step S1010, the audio control module 600 ends the interactive session and transitions the smart speaker 102 to the standby state.

In the sequence of fig. 10, even if the LED312 is caused to blink as a notification indicating that the response processing state is being made, the user can input a wakeup word into the smart speaker 102. When the user says "cancel" or "stop" after the wake-up word, the interactive session may be forced to end.

Fig. 11 is a flowchart showing a procedure of a voice operation service execution process executed by the cloud server 103 in fig. 1. Since the CPU402 runs the control program developed from the storage unit 405 to the RAM403, the voice operation service execution process is realized. When the data transmission/reception module 701 receives the audio data of the function call instruction transmitted from the smart speaker 102 in step S1005, the voice operation service execution process of fig. 11 is executed.

As shown in fig. 11, the CPU402 performs a voice recognition process of converting audio data into text data by the voice recognition module 705 (step S1101). As described above, in the voice recognition process, the voice recognition module 705 may employ the first voice recognition method that determines the language of the input audio data and converts the audio data into text data of the determined language. In addition, the speech recognition module 705 may employ a second speech recognition method that converts input audio data into phonemes using acoustic models of a plurality of languages and converts the audio data into text data of each language using a corresponding language model.

Next, the CPU402 executes language determination processing based on the text data converted in step S1101 and the language determination result (step S1102). It should be noted that the content of the language determination processing of step S1102 differs based on the method (the first speech recognition method or the second speech recognition method) used for the conversion of the text data in step S1101. For example, when the first speech recognition method is used for conversion of text data in step S1101, the CPU402 executes first language determination processing of fig. 12 described later. Meanwhile, when the second speech recognition method is used for conversion of text data in step S1101, the CPU402 executes a second language determination process of fig. 13 described later.

Next, the CPU402 executes an operation determination process (step S1103) of fig. 14 described later and stores operation information as a determination result of the type of the function call instruction by the user into the RAM 403. Next, the CPU402 determines whether the operation information stored in the RAM403 is "job execution" (step S1104).

As a result of the determination in step S1104, when the operation information is "job execution", the CPU402 executes the job execution processing of fig. 15 described later (step S1105) and ends the voice operation service execution processing. As a result of the determination in step S1104, when the operation information is not "job execution", the CPU402 determines whether the operation information is "job setting" (step S1106).

As a result of the determination in step S1106, when the operation information is "job setting", the CPU402 executes job setting processing of fig. 23 described later (step S1107), and ends the voice operation service execution processing. As a result of the determination in step S1106, when the operation information is not "job setting", the CPU402 generates an operation guidance message, which is a text message for urging input of an operation keyword (step S1108). Then, the CPU402 stores the operation guidance message into the audio data storage area in the RAM 403. The operation instruction message is, for example, "please give your operation of COPY, emalsend, or the like to be executed. ". Next, the CPU402 controls the data transmission/reception module 701 to transmit the operation guidance message stored in the RAM403 to the smart speaker 102 through the network I/F406 (step S1109), and ends the voice operation service execution process.

Fig. 12 is a flowchart showing the procedure of the first language determination processing executed in step S1102 when the first speech recognition method is used for conversion of text data in step S1101 of fig. 11.

As shown in fig. 12, the CPU402 clears a temporary storage area that is a part of the storage area of the RAM403 (step S1201). The temporary storage area is a storage area used in the first language determination processing, and includes, for example, a language determination result temporary storage area, a morpheme string storage area, a group ID storage area, and a language determination result storage area. Next, the CPU402 stores the language determination result of the audio data performed in the voice recognition process in step S1101 into the language determination result temporary storage area of the RAM403 (step S1202). Next, the CPU402 analyzes the above-described text data by the morphological analysis module 706 to extract a morpheme string corresponding to the determined language stored in the language determination result temporary storage area, and converts the morphemes constituting the morpheme string into a group ID by the group ID determination module 707. Next, the CPU402 stores the morpheme string in the morpheme string storage area and stores the group ID in the group ID storage area (step S1203).

Next, the CPU402 obtains language determination exception flags for the morphemes constituting the above-described morpheme string from the group ID lists 711, 722, and 713 in fig. 7B and the group ID lists 721, 722, and 723 in fig. 7C. The CPU402 determines that the morpheme whose language determination exception flag is "yes" is a determination exception morpheme (language word information cannot be specified). The CPU402 determines whether all the morphemes constituting the morpheme string are determination exception morphemes (step S1204).

As a result of the determination in step S1204, when at least one morpheme is not the determination exception morpheme, the CPU402 stores the language determination result stored in the language determination result temporary storage area into the language determination result storage area (step S1205). The language determination result stored in the language determination result temporary storage area is a language determination result of the audio data obtained in the speech recognition process of step S1101. After that, the language determination processing ends.

As a result of the determination in step S1204, when all the morphemes are determination exception morphemes, the CPU402 stores "unknown" showing that language determination cannot be made in the language determination result storage area (step S1206). After that, the language determination processing ends.

Fig. 13 is a flowchart showing the procedure of the second language determination processing executed in step S1102 when the second speech recognition method is used for conversion of text data in step S1101 of fig. 11.

As shown in fig. 13, the CPU402 clears a temporary storage area that is a part of the storage area of the RAM403 (step S1301). The temporary storage area is used for the second language determination process, and includes a japanese speech recognition data storage area, an english speech recognition data storage area, a japanese morpheme string storage area, a japanese group ID storage area, an english morpheme string storage area, an english group ID storage area, a language determination result storage area, and a group ID storage area.

Next, the CPU402 stores speech recognition data including the language setting "japanese" (hereinafter referred to as "japanese speech recognition data") into a japanese speech recognition data storage area. The japanese speech recognition data includes text data obtained as a result of the speech recognition module 705 applying the speech recognition processing to the audio data of japanese (step S1302). Further, the CPU402 stores speech recognition data including the language setting "english" (hereinafter referred to as "english speech recognition data") into an english speech recognition data storage area. The english speech recognition data includes text data obtained as a result of the speech recognition module 705 applying the speech recognition processing to the audio data in english (step S1302).

Next, the CPU402 analyzes text data included in the japanese speech recognition data by the morphological analysis module 706 to extract a morpheme string corresponding to japanese, and converts the morphemes constituting the morpheme string into a group ID by the group ID determination module 707. The CPU402 stores a morpheme string (hereinafter referred to as "japanese morpheme string") in the japanese morpheme string storage area, and stores a group ID (hereinafter referred to as "japanese group ID") in the japanese group ID storage area (step S1303).

Next, the CPU402 analyzes the text data included in the english speech recognition data by the morphological analysis module 706 to extract a morpheme string corresponding to english, and converts the morphemes constituting the morpheme string into a group ID by the group ID determination module 707. The CPU402 stores the morpheme string (hereinafter referred to as "english morpheme string") in the english morpheme string storage area and stores the group ID (hereinafter referred to as "english group ID") in the english group ID storage area (step S1304).

Next, the CPU402 determines whether the japanese group ID storage area is empty (step S1305). In step S1305, when the group ID is not stored in the japanese group ID storage area, the CPU402 determines that the japanese group ID storage area is empty. Meanwhile, when at least one group ID is stored in the japanese group ID storage area, the CPU402 determines that the japanese group ID storage area is not empty.

As a result of the determination in step S1305, when the japanese group ID storage area is not empty, the CPU402 obtains language determination exception flags for the morphemes constituting the japanese morpheme string from the group ID lists 711, 712, and 713 of fig. 7B. The CPU402 determines whether all the morphemes constituting the japanese morpheme string are determination exception morphemes (step S1306).

As a result of the determination in step S1306, when at least one morpheme constituting the japanese morpheme string is not the determination exception morpheme, the CPU402 stores the group ID stored in the japanese group ID storage area into the group ID storage area (step S1307). It should be noted that the group ID stored in the japanese group ID storage area is a japanese group ID. Next, the CPU402 stores the language setting "japanese" in the language determination result storage area (step S1308). After that, the language determination processing ends.

When all the morphemes constituting the japanese morpheme string are determination exception morphemes as a result of the determination in step S1306, or when the japanese group ID storage area is empty as a result of the determination in step S1305, the CPU402 determines whether the english group ID storage area is empty (step S1309).

As a result of the determination in step S1309, when the english group ID storage area is not empty, the CPU402 obtains the language determination exception flag of the morphemes constituting the english morpheme string from the group ID lists 721, 722, and 723 in fig. 7C. The CPU402 determines whether all the morphemes constituting the english morpheme string are determination exception morphemes (step S1310).

As a result of the determination in step S1310, when at least one morpheme constituting the english morpheme string is not the determination exception morpheme, the CPU402 stores the group ID stored in the english group ID storage area into the group ID storage area (step S1311). It should be noted that the group ID stored in the english group ID storage area is an english group ID. Next, the CPU402 stores the language setting "english" in the language determination result storage area (step S1312). After that, the language determination processing ends.

When all the morphemes constituting the english morpheme string as a result of the determination in step S1310 are determination exception morphemes, or when the english group ID storage area is empty as a result of the determination in step S1309, the CPU402 stores "unknown" showing that language determination is impossible in the language determination result storage area (step S1313). After that, the language determination processing ends. In this embodiment, "unknown" is stored in the language determination result storage area in this manner when the phrase spoken by the user is composed of only determination example foreign morphemes such as "kopi", "copy", "kopisetteingu", and "copy setting" as shown in fig. 30. Further, when the user speaks a phrase (such as "yonbukopishite", "noudoseteiyon", "four parts", and "set concentration 4") including a morpheme other than the determination exception morpheme (other than the language word information which cannot be specified), the "english" or "japanese" is stored in the language determination result storage area.

Fig. 14 is a flowchart illustrating the procedure of the operation determination processing of step S1103 in fig. 11.

As shown in fig. 14, the CPU402 determines whether only one group ID (hereinafter referred to as "job type-specifying group ID") specifying the job type is stored in the group ID storage area of the RAM403 (step S1401). The job type designation group ID is, for example, "FNC 00001" corresponding to the job type "COPY" and "FNC 00004" corresponding to the job type "EMAILSEND".

As a result of the determination in step S1401, when only one job type-specifying group ID is stored in the group ID storage area, the CPU402 determines whether a group ID specifying "setting" (hereinafter referred to as "setting-specifying group ID") is stored in the group ID storage area (step S1402). The setting designation group ID is, for example, "FNC 00003" corresponding to "setting".

As a result of the determination in step S1402, when the setting-specifying group ID is stored in the group ID storage area, the CPU402 stores "job setting" showing that the type of the function call instruction of the user is "setting" as operation information into the RAM403 (step S1403), and the operation determination process ends.

As a result of the determination in step S1402, when the setting-specified group ID is not stored in the group ID storage area, "job execution" showing that the type of the function call instruction of the user is "execution of job" is stored as operation information in the RAM403 (step S1404), and the operation determination process ends.

As a result of the determination in step S1401, when a plurality of job type-specifying group IDs are stored in the group ID storage area, or when no job type-specifying group ID is stored in the group ID storage area, the CPU402 stores "unknown" showing that the type of the function call instruction of the user is unknown as operation information into the RAM403 (step S1405), and the operation determination process ends.

Fig. 15 is a flowchart showing the procedure of the job execution processing of step S1105 in fig. 11.

As shown in fig. 15, the CPU402 determines whether the necessary job setting group ID is complete in the group ID storage area of the RAM403 (step S1501). The necessary job setting group ID is a group ID corresponding to a setting that the user must set to execute the job. For example, the necessary job setting group ID of the job type "EMAILSEND" is "CNF 00004" showing the destination. The necessary job setting group ID differs depending on the job type. There is a job type for which a job setting group ID is unnecessary, and there is a job type having a plurality of necessary job setting group IDs.

As a result of the determination in step S1501, when the necessary job setting group ID is complete in the group ID storage area, the CPU402 executes job information generation processing of fig. 16 described later (step S1502) to generate language setting job information, which is device operation data for executing a job by the MFP 101. Next, the CPU402 transmits the relevant language setting job information to the MFP101 through the network I/F406 (step S1503). Next, the CPU402 determines whether a job execution end notification is received from the MFP101 (step S1504). In this embodiment, when a job is completed or is suspended due to an error, the MFP101 transmits a job execution end notification including information indicating such a job end state to the cloud server 103. The CPU402 waits until receiving a job execution end notification from the MFP 101. When receiving a job execution end notification from the MFP101 (yes in step S1504), the CPU402 generates a job end audio message that is a text message corresponding to the received job execution end notification (step S1505). In step S1505, for example, "job completed" as a message at the time of normal end or "end due to error" as a message when a paper jam or error occurs in the MFP101 is generated.

Next, the CPU402 stores the relevant job end audio message in the audio message storage area in the RAM 403. Next, the CPU402 transmits the audio message stored in the audio message storage area to the smart speaker 102 through the network I/F406 (step S1506) and ends the job execution process.

As a result of the determination in step S1501, when the necessary job setting group ID is incomplete in the group ID storage area, the CPU402 generates a job setting guidance audio message (step S1507). The job setting guidance audio message is a text message for causing input of settings necessary for executing the job. For example, when a destination is not specified in a state where the user specifies "EMAILSEND", a job setting guidance audio message "please input a transmission destination" is generated. The CPU402 stores the generated job setting guidance audio message in the audio message storage area, and executes the process of step S1506.

Fig. 16 is a flowchart showing the procedure of the job information generation processing of step S1502 in fig. 15.

As shown in fig. 16, the CPU402 clears the temporary storage area for the job information generation processing on the RAM403 (step S1601). The temporary storage area includes a language-determination character string storage area, a job character string storage area, and a job-setting character string storage area. Next, the CPU402 parameterizes the language setting (step S1602). Specifically, the CPU402 generates a parameter character string corresponding to the language determination result stored in the language determination result storage area in the RAM 403. For example, when "japanese" is stored as the language determination result in the language determination result storage area, as shown in fig. 17, the CPU402 generates a character string { "language" showing that the language is set to japanese: "Japanese", and stores the relevant character string in the language determination character string storage area. Further, when "english" is stored as the language determination result in the language determination result storage area, as shown in fig. 18, the CPU402 generates a character string { "language" showing that the language is set to english: "English" }, and stores the relevant character string in the language determination character string storage area.

Next, the CPU402 parameterizes the job type (step S1603). Specifically, the CPU402 extracts the job type-specifying group ID from the group ID storage area in the RAM403, and generates a parameter character string corresponding to the relevant job type-specifying group ID. For example, as shown in fig. 17 or fig. 18, when "NUM 00004", "CNF 00001", and "FNC 00001" are stored in the group ID storage area, the CPU402 extracts "FNC 00001" as the job type designation group ID therefrom, and generates a character string { "jobName": "copy" } as a parameter string corresponding to "FNC 00001". The CPU402 stores the generated character string in the job character string storage area.

Next, the CPU402 determines in order from the head address of the group ID storage area whether the stored group ID is the setting designation group ID (step S1604).

As a result of the determination in step S1604, when the stored group ID is the setting-specified group ID, the CPU402 parameterizes the job setting (step S1605). Specifically, the CPU402 generates a character string corresponding to the group ID determined as the setting designation group ID, and stores the relevant character string in the job setting character string storage area in the RAM 403. After that, the job information generation processing returns to step S1604. In this way, in the present embodiment, a character string corresponding to the setting designation group ID stored in the group ID storage area is generated. For example, when "NUM 00004", "CNF 00001", and "FNC 00001" are stored in the group ID storage area, as shown in fig. 17 and 18, the CPU402 generates "copies" as a character string corresponding to "CNF 00001" as the setting specifying group ID. Further, as shown in fig. 17 or fig. 18, the CPU402 generates "4" as a character string corresponding to "NUM 00004" as the setting specifying group ID. The CPU402 stores these generated character strings in the job setting character string storage area.

As a result of the determination in step S1604, when the stored group ID is not the setting designation group ID, the CPU402 determines whether the determination of step S1604 has been made for all the group IDs stored in the group ID storage area (step S1606).

As a result of the determination in step S1606, when the determination in step S1604 has not been performed for all the group IDs stored in the group ID storage area, the job information generation processing returns to step S1604. As a result of the determination in step S1606, when the determination in step S1604 has been made for all the group IDs stored in the group ID storage area, the CPU402 generates language setting job information as device operation data for instructing the MFP101 to execute a job, based on the character strings stored in the language determination character string storage area, the job character string storage area, and the job setting character string storage area (step S1607). The language setting job information is, for example, data in JSON format as shown in fig. 8. The data format of the language setting job information is not limited to the JSON format. The data format may be other formats such as an XML format. After that, the job information generation processing ends.

Fig. 19 is a flowchart showing language setting switching processing executed by the MFP101 that receives language setting job information from the cloud server 103. Since the CPU 202 of the MFP101 runs the control program expanded from the ROM204 to the RAM203, the language setting switching process of fig. 19 is realized.

As shown in fig. 19, the CPU 202 obtains the language setting from the received language setting job information through the data analysis module 502, and determines whether the obtained language setting is "unknown" (step S1901).

As a result of the determination in step S1901, when the obtained language setting is "unknown", the language setting switching process proceeds to step S1903 described later. As a result of the determination in step S1901, when the obtained language setting is not "unknown", the CPU 202 updates the display language of the operation panel 209 (step S1902). Specifically, the CPU 202 stores the obtained language setting in the MFP language setting storage area in the storage unit 205. Next, the CPU 202 obtains the job type and job setting from the received language setting job information. The CPU 202 generates job principal information corresponding to the obtained job type (step S1903), and stores the relevant job principal information in the RAM 203. Further, the CPU 202 sets a parameter corresponding to the obtained job setting as the above-described job main information.

Next, the CPU 202 determines whether the job is executable (step S1904). In step S1904, for example, when the MFP101 cannot execute a new job due to execution of another job or occurrence of an error, the CPU 202 determines that the job cannot be executed. Meanwhile, when the MFP101 can execute a new job, the CPU 202 determines that the job is executable.

As a result of the determination in step S1904, when the job is not executable, the language setting switching process proceeds to step S1907 described later. As a result of the determination in step S1904, when the job is executable, the CPU 202 transmits a job execution start notification to the cloud server 103 through the network 104 via the data transmission/reception module 501 (step S1905). Next, the CPU 202 executes the job based on the job principal information generated in step S1903 (step S1906). Next, the CPU 202 transmits a job execution end notification to the cloud server 103 through the data transmission/reception module 501 via the network 104 (step S1907). The job execution end notification includes a job execution result. For example, when the job executed in step S1906 is normally completed, the job execution end notification includes a job execution result showing that the job has been normally completed. In addition, when it is determined in step S1904 that the job is not executable, or when the job executed in step S1906 abnormally ends due to a paper jam or the like, the job execution end notification includes a job execution result showing an error. The cloud server 103 generates an audio message corresponding to the job execution result included in the received job execution end notification. After the CPU 202 executes the processing of step S1907, the language setting switching processing ends.

Fig. 20 is a diagram illustrating a screen transition of the operation panel 209 of the MFP101 when execution of a copy job is instructed by voice input.

When the MFP101 receives, from the cloud server 103, a message including { "language": "Japanese" }, { "operation": "jobStart" } and { "jobName": when the language setting job information of "copy", the MFP101 sets the language setting to japanese, and starts executing the copy job. When the copy job is executed in a state in which the language setting is set to japanese, a copy screen 2002 whose display language is japanese is displayed on the operation panel 209.

Fig. 21 is a diagram showing a screen transition of the operation panel 209 of the MFP101 when execution of EMAILSEND job is instructed by voice input.

When the MFP101 receives, from the cloud server 103, a message including { "language": "Japanese" }, { "operation": "jobStart" } and { "jobName": when the language setting job information of "email send", the MFP101 sets the language setting to japanese, and starts executing EMAILSEND job. When EMAILSEND job is executed in a state where the language setting is set to japanese, a scan screen 2101 whose display language is japanese is displayed on the operation panel 209.

Further, the MFP101 receives, from the cloud server 103, a message including { "language": "English" }, { "operation": "jobStart" } and { "jobName": when the language setting job information of "emailSend", the MFP101 sets the language setting to english, and starts executing EMAILSEND job. When EMAILSEND jobs are executed in a state where the language setting is set to english, a scan screen 2102 whose display language is english is displayed on the operation panel 209. Although the home screen 2001 is described as one example of the job executable screen in the present embodiment, the job executable screen is not limited to the home screen 2001. Further, when the MFP101 receives language setting job information from the cloud server 103 in the power saving mode in which the job executable screen is not displayed and the power of the operation panel 209 and the print engine 211 is turned off, the MFP101 may set the language setting based on the received language setting job information as described above and may execute the job.

Fig. 22 is a sequence diagram showing a procedure of processing executed when the image forming system 100 of fig. 1 receives a job setting change instruction by voice input. It should be noted that, in fig. 22, as with the description about fig. 10, the smart speaker 102, the MFP101, and the cloud server 103 should be communicable with each other. Further, the process of fig. 22 assumes that a main screen 2801 in fig. 28 is displayed on the operation panel 209 of the MFP101, and on the main screen 2801, functions such as copying, scanning, and printing can be called.

In step S2201 in fig. 22, the user gives an instruction to the smart speaker 102 to start a voice operation as in step S1001.

When the start instruction of the voice operation is detected, as in step S1002, the display module 606 of the audio control module 600 in the smart speaker 102 lights the LED312 as a notification showing that it is in the speech processing state in step S2202. Further, the processing of the voice obtaining module 604 is started.

In step S2203, the user makes a function call instruction to the smart speaker 102. For example, the user speaks a phrase such as "kopinoudosetteiyon" or "set copy density 4" as a job setting change instruction as a function call instruction following the wakeup word detected in step S2201. The audio data is generated based on the user speech obtained by the speech obtaining module 604. The end-of-utterance determination module 608 determines that the utterance is ended when a pause period of a predetermined period of time continues.

In step S2204, the display module 606 of the audio control module 600 blinks the LED312 as a notification indicating that it is in the response processing state according to the utterance end determination, as in step S1004. Further, the processing of the speech obtaining module 604 is completed. In step S2205, the data transmission/reception module 601 transmits the generated audio data to the cloud server 103, as in step S1005.

In step S2206, the audio data conversion control module 700 in the cloud server 103 executes the voice operation service execution process of fig. 11 described above. In the voice operation service execution process, for example, the post-described language setting job information is transmitted to the MFP 101.

In step S2207, the device control module 500 in the MFP101 executes the language setting switching process of fig. 19 described later, based on the job setting information received from the cloud server 103.

In step S2208, the data transmission/reception module 601 in the smart speaker 102 receives the audio message from the cloud server 103. In the next step S2209, the audio reproduction module 605 reproduces the synthetic audio data converted from the audio message received in step S2208. For example, the audio reproduction module 605 reproduces the synthetic audio data "will start density setting" through the microphone 310.

In step S2210, the data transmission/reception module 601 receives an audio message different from the audio message received in step S2208 from the cloud server 103. Further, the data transmission/reception module 601 receives an interactive session end notification from the cloud server 103, the notification ending the interactive session with the user.

In step S2211, the audio reproduction module 605 reproduces the synthetic audio data converted from the audio message received in step S2210. For example, the audio reproduction module 605 reproduces the synthetic audio data "density setting has ended" through the microphone 310.

In step S2212, the display module 606 turns off the LED312 as a notification showing that the smart speaker 102 is in a standby state in response to the data transmission/reception module 601 receiving the interactive session end notification in step S2210.

In step S2213, in response to the data transmission/reception module 601 receiving the interactive session end notification in step S2210, the audio control module 600 transitions the smart speaker 102 to a standby state.

Fig. 23 is a flowchart showing the procedure of the job setting processing of step S1107 in fig. 11. When the cloud server 103 receives audio data generated based on the user voice as a job setting change instruction from the smart speaker 102, the job setting process of fig. 23 is executed.

As shown in fig. 23, the CPU402 generates language setting job information including setting values used when the MFP101 executes a job by executing job setting information generation processing of fig. 24 described later (step S2301). Next, the CPU402 transmits the relevant language setting job information to the MFP101 via the network I/F406 (step S2302). Next, the CPU402 determines whether a job setting end notification is received from the MFP101 (step S2303). In this embodiment, when job setting is normally completed or job setting is suspended due to occurrence of an error, the MFP101 transmits a job setting end notification including information indicating the job setting end state to the cloud server 103. The CPU402 waits until a job setting end notification is received from the MFP 101. When receiving the job setting end notification from the MFP101 (yes in step S2303), the CPU402 generates a job setting end audio message that is a text message corresponding to the received job setting end notification (step S2304). In step S2304, for example, the CPU402 generates "job setting completed" as a message at the time of normal end or "job setting disabled" as a message when a paper jam or error occurs in the MFP 101.

Next, the CPU402 stores the job setting end audio message in the audio message storage area in the RAM 403. Next, the CPU402 transmits the audio message stored in the audio message storage area to the smart speaker 102 through the network I/F406 (step S2305) and ends the job setting process.

Fig. 24 is a flowchart showing the procedure of the job setting information generation processing of step S2301 in fig. 23.

As shown in fig. 24, the CPU402 clears the temporary storage area for the job setting information generation processing on the RAM403 (step S2401). The temporary storage area includes a language-determination character string storage area, a job character string storage area, and a job-setting character string storage area. Next, the CPU402 parameterizes the language setting (step S2402). Specifically, the CPU402 generates a parameter character string corresponding to the language determination result stored in the language determination result storage area in the RAM 403. For example, when "japanese" is stored as the language determination result in the language determination result storage area, as shown in fig. 25, the CPU402 generates a character string { "language" showing that the language is set to japanese: "Japanese", and stores the relevant character string in the language determination character string storage area. Further, when "english" is stored as the language determination result in the language determination result storage area, as shown in fig. 26, the CPU402 generates a character string { "language" showing that the language is set to english: "English" }, and stores the relevant character string in the language determination character string storage area.

Next, the CPU402 parameterizes the job type (step S2403). Specifically, the CPU402 extracts the job type-specifying group ID from the group ID storage area in the RAM403, and generates a parameter character string corresponding to the relevant job type-specifying group ID. For example, as shown in fig. 25 or fig. 26, when "FNC 00001", "NUM 00004", "CNF 00002", and "FNC 00003" are stored in the group ID storage area, the CPU402 extracts "FNC 00001" as the job type designation group ID therefrom, and generates a character string { "jobName": "copy" } as a parameter string corresponding to "FNC 00001". The CPU402 stores the generated character string in the job character string storage area.

Next, the CPU402 determines in order from the head address of the group ID storage area whether the stored group ID is the setting designation group ID (step S2404).

As a result of the determination in step S2404, when the stored group ID is the setting-specified group ID, the CPU402 parameterizes the job setting (step S2405). Specifically, the CPU402 generates a character string corresponding to the group ID determined as the setting designation group ID, and stores the relevant character string in the job setting character string storage area in the RAM 403. After that, the job setting information generation processing returns to step S2404. In this way, in the present embodiment, a character string corresponding to the setting designation group ID stored in the group ID storage area is generated. For example, when "FNC 00001", "NUM 00004", "CNF 00002", and "FNC 00003" are stored in the group ID storage area, as shown in fig. 25 and 26, the CPU402 generates "density" as a character string corresponding to "CNF 00002" as the setting designation group ID. Further, as shown in fig. 25 or fig. 26, the CPU402 generates "4" as a character string corresponding to "NUM 00004" as the setting specifying group ID. The CPU402 stores these generated character strings in the job setting character string storage area.

As a result of the determination in step S2404, when the stored group ID is not the setting designation group ID, the CPU402 determines whether the determination of step S2404 has been made for all the group IDs stored in the group ID storage area (step S2406).

As a result of the determination in step S2406, when the determination in step S2404 is not performed for all the group IDs stored in the group ID storage area, the job setting information generation processing returns to step S2404. As a result of the determination in step S2406, when the determination in step S2404 has been made for all the group IDs stored in the group ID storage area, the CPU402 generates language setting job information as device operation data for instructing the MFP101 to execute a job, based on the character strings stored in the language determination character string storage area, the job character string storage area, and the job setting character string storage area (step S2407). The language setting job information is, for example, data in JSON format as shown in fig. 9. The data format of the language setting job information is not limited to the JSON format. The data format may be other formats such as an XML format.

Fig. 27 is a flowchart showing language setting switching processing executed by the MFP101 that receives language setting job information from the cloud server 103. Since the CPU 202 of the MFP101 runs the control program expanded from the ROM204 to the RAM203, the language setting switching process of fig. 27 is realized.

As shown in fig. 27, the CPU 202 obtains the language setting from the received language setting job information through the data analysis module 502, and determines whether the obtained language setting is "unknown" (step S2701).

As a result of the determination in step S2701, when the obtained language setting is "unknown", the language setting switching process proceeds to step S2703 described later. As a result of the determination in step S2701, when the obtained language setting is not "unknown", the CPU 202 updates the display language of the operation panel 209 (step S2702). Specifically, the CPU 202 stores the obtained language setting in the MFP language setting storage area in the storage unit 205. Next, the CPU 202 obtains the job type and job setting from the received language setting job information. The CPU 202 generates job principal information corresponding to the obtained job type (step S2703), and stores the relevant job principal information in the RAM 203. Further, the CPU 202 sets a parameter corresponding to the obtained job setting as the above-described job main information.

Next, the CPU 202 determines whether the screen of the operation panel 209 can be shifted to the job setting screen (step S2704). The user can set a setting value required to execute the job on the job setting screen. In step S2704, for example, when the screen of the operation panel 209 cannot be transitioned because the MFP101 is executing another job or causing an error, the CPU 202 determines that the screen of the operation panel 209 cannot be transitioned to the job setting screen. Meanwhile, when the screen of the operation panel 209 can be shifted, the CPU 202 determines that the screen of the operation panel 209 can be shifted to the job setting screen.

As a result of the determination in step S2704, when the screen of the operation panel 209 cannot be transitioned to the job setting screen, the language setting switching process proceeds to step S2706 described later. As a result of the determination in step S2704, when the screen of the operation panel 209 can be changed to the job setting screen, the job setting screen is displayed on the operation panel 209 (step S2705). Next, the CPU 202 transmits a job setting end notification to the cloud server 103 through the data transmission/reception module 501 via the network 104 (step S2706). The job setting end notification includes a job setting result. For example, when the screen transition normally completes, the job setting end notification includes a job setting result showing that the screen transition normally completes. Further, when it is determined in step S2704 that the screen of the operation panel 209 cannot be transitioned to the job setting screen, the job setting end notification includes a job setting result indicating an error. After the CPU 202 executes the process of step S2706, the language setting switching process ends.

Fig. 28 is a diagram illustrating a screen transition of the operation panel 209 of the MFP101 when setting of a copy job is input by the voice of the user.

As shown in fig. 28, in a state where the home screen 2801 is displayed on the operation panel 209, the MFP101 receives, from the cloud server 103, a message including { "language": "Japanese" }, { "operation": "jobSetting" } and { "jobName": when the language of "copy" sets the job information, the MFP101 sets the language setting as japanese. A copy setting screen 2802 whose display language is japanese is displayed on the operation panel 209. Thereafter, when a user gives a job execution instruction from the smart speaker 102 or the operation panel 209, the job control module 503 executes a copy job, and displays a copy execution screen 2803 whose display language is japanese on the operation panel 209.

Further, the MFP101 receives, from the cloud server 103, a message including { "language": "English" }, { "operation": "jobSetting" } and { "jobName": when the language setting job information of "copy", the MFP101 sets the language setting to english. A copy setting screen 2804 whose display language is english is displayed on the operation panel 209. Thereafter, when the user gives a job execution instruction from the smart speaker 102 or the operation panel 209, the job control module 503 executes a copy job, and displays a copy execution screen 2805 whose display language is english on the operation panel 209.

Fig. 29 is a diagram showing screen transition of the operation panel 209 of the MFP101 when setting of a job is input EMAIL SEND by a voice of a user.

As shown in fig. 29, in a state where the home screen 2801 is displayed on the operation panel 209, the MFP101 receives, from the cloud server 103, a message including { "language": "Japanese" }, { "operation": "jobSetting" } and { "jobName": when the language of "email send" sets the job information, the MFP101 sets the language setting to japanese. A scan setting screen 2901 whose display language is japanese is displayed on the operation panel 209. Thereafter, when the user gives a job execution instruction from the smart speaker 102 or the operation panel 209, the job control module 503 executes EMAILSEND the job and displays a scan screen 2902 whose display language is japanese on the operation panel 209.

Further, the MFP101 receives, from the cloud server 103, a message including { "language": "English" }, { "operation": "jobSetting" } and { "jobName": when the language setting job information of "email send", the MFP101 sets the language setting to english. A scan setting screen 2903 whose display language is english is displayed on the operation panel 209. Thereafter, when the user gives a job execution instruction from the smart speaker 102 or the operation panel 209, the job control module 503 executes EMAILSEND the job and displays a scan screen 2904 whose display language is english on the operation panel 209.

According to the above-described embodiment, a morpheme string composed of a plurality of morphemes is obtained based on a phrase obtained by the smart speaker 102, a language is specified using the relevant morpheme string, and the display language of the operation panel 209 is updated based on the specified language. That is, the display language of the operation panel 209 is changed to the use language of the user without giving a setting operation about the display language of the operation panel each time the user starts using the MFP 101. This can reduce the time and effort of the user to set the display language.

Further, in the above-described embodiment, the cloud server 103 obtains the morpheme string based on the user voice obtained by the smart speaker 102. Therefore, the cloud server 103 can quickly perform processing to specify a language using the obtained morpheme string.

In the above-described embodiment, when the determination exception morpheme is included in the morpheme string, the language is specified using morphemes other than the determination exception morpheme in the morpheme string. Therefore, the accuracy of the specified language is improved.

Although the present invention has been described using the above embodiments, the present invention is not limited to the above embodiments. For example, when the MFP101 is provided with the microphone 308, the MFP101 may transmit audio data generated based on a user voice obtained with the microphone 308 to the cloud server 103.

The MFP101 may be configured to be connectable to other external display apparatuses without the operation panel 209.

Further, when the external display device is provided with the microphone 308, the external display device may transmit audio data generated based on the user voice obtained with the microphone 308 to the cloud server 103.

Further, when the MFP101 connectable to an external display apparatus without the operation panel 209 is provided with the microphone 308, the MFP101 can transmit audio data generated based on the user voice obtained with the microphone 308 to the cloud server 103.

The MFP101 may obtain a morpheme string based on audio data generated by the user voice obtained with the microphone 308 of the MFP101 or audio data obtained from the smart speaker 102, and may transmit the obtained morpheme string to the cloud server 103. This can disperse the load on the cloud server 103 required to perform the processing of obtaining the morpheme string.

When only the determination exception morpheme is included in the morpheme string obtained based on the obtained phrase, the display language of the operation panel 209 may be updated based on the language specified using the morpheme string obtained from the other phrase obtained after the previous phrase is obtained by the smart speaker 102. For example, even if language determination cannot be made because the phrase spoken by the user is composed of only certain exceptional morphemes such as "kopi", when the user speaks a phrase including morphemes other than the certain exceptional morphemes (such as "sanbukopishite") and designates the language as japanese, the display language of the operation panel 209 will be changed to japanese. Thus, the display language can be switched at the timing of the designated language during the interactive session without the user setting the display language.

Other embodiments

Embodiments of the present invention may also be implemented by a computer of a system or apparatus that reads and executes computer-executable instructions (e.g., one or more programs) recorded on a storage medium (also more fully referred to as "non-transitory computer-readable storage medium") to perform the functions of one or more of the above-described embodiments, and/or includes one or more circuits (e.g., Application Specific Integrated Circuits (ASICs)) for performing the functions of one or more of the above-described embodiments, and methods may be utilized by which a computer of the system or apparatus, for example, reads and executes the computer-executable instructions from the storage medium to perform the functions of one or more of the above-described embodiments, and/or controls the one or more circuits to perform the functions of one or more of the above-described embodiments, to implement embodiments of the present invention. The computer may include one or more processors (e.g., a Central Processing Unit (CPU), a Micro Processing Unit (MPU)) and may include a separate computer or a network of separate processors to read out and execute the computer-executable instructions. The computer-executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), a memory of a distributed computing system, an optical disc such as a Compact Disc (CD), a Digital Versatile Disc (DVD), or a blu-ray disc (BD) TM, a flash memory device, and a memory card, among others.

The embodiments of the present invention can also be realized by a method in which software (programs) that perform the functions of the above-described embodiments are supplied to a system or an apparatus through a network or various storage media, and a computer or a Central Processing Unit (CPU), a Micro Processing Unit (MPU) of the system or the apparatus reads out and executes the methods of the programs.

While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.

Priority is claimed in this application from japanese patent application No. 2020-.

57页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：图像形成装置以及文档数据分类方法

Image forming system, control method thereof, and storage medium

相关技术

网友询问留言