Sound information transmission method and device

文档序号：193354 发布日期：2021-11-02 浏览：24次中文

阅读说明：本技术 声音信息传输方法和装置 (Sound information transmission method and device ) 是由林永楷俞凯樊帅朱成亚于 2021-07-30 设计创作，主要内容包括：本发明公开声音信息传输方法和装置,其中,一种声音信息传输方法,用于发送设备,包括：响应于所述发送设备的交互意图,将所述交互意图编码成自然语言文本；获取与所述自然语言文本对应的元信息,并将所述自然语言文本转换为语音信息；将所述语音信息基于第一声音频率通过声波发送至接收设备,将所述元信息基于第二声音频率通过声波发送至所述接收设备。通过将自然语言文本转换为语音信息并基于第一声音频率通过声波发送,获取与自然语言文本对应的元信息并基于第二声音频率通过声波发送,从而可以实现设备之间只需要一个基于声音的接口通过声音就能够进行交互,进一步地提高了使用设备的安全性和通用性。(The invention discloses a sound information transmission method and a device, wherein the sound information transmission method is used for sending equipment and comprises the following steps: in response to an interaction intention of the transmitting device, encoding the interaction intention into natural language text; acquiring meta information corresponding to the natural language text, and converting the natural language text into voice information; and transmitting the voice information to a receiving device through sound waves based on a first sound frequency, and transmitting the meta information to the receiving device through sound waves based on a second sound frequency. The natural language text is converted into the voice information and is sent through the sound wave based on the first sound frequency, the meta information corresponding to the natural language text is obtained and is sent through the sound wave based on the second sound frequency, interaction can be achieved through the sound through only one sound-based interface between the devices, and safety and universality of the devices are further improved.)

1. A sound information transmission method for a transmitting device, comprising:

in response to an interaction intention of the transmitting device, encoding the interaction intention into natural language text;

acquiring meta information corresponding to the natural language text, and converting the natural language text into voice information;

transmitting the voice information to a receiving device through a sound wave based on a first sound frequency, and transmitting the meta information to the receiving device through the sound wave based on a second sound frequency, wherein the first sound frequency can be heard by a human being, and the second sound frequency cannot be heard by the human being.

2. The method of claim 1, wherein the meta information comprises: and the ID of the sending equipment, the ID of the receiving equipment and the verification information of the voice information.

3. The method of claim 1, wherein said transmitting the voice information acoustically based on a first acoustic frequency and the meta information acoustically based on a second acoustic frequency comprises:

and synchronously transmitting the voice information of the first sound frequency and the meta information of the second sound frequency to the receiving equipment.

4. A sound information transmission method is used for a receiving device. The method comprises the following steps:

respectively acquiring voice information and meta information corresponding to the voice information in response to receiving the voice information and the meta information, wherein the voice information is transmitted through sound waves and can be heard by a human, and the voice information is transmitted through sound waves and cannot be heard by the human;

judging whether the voice information is the voice information sent to the receiving equipment or not based on the meta information;

and if so, decoding the voice information into an interaction intention and executing the interaction intention.

5. The method of claim 4, wherein the meta information includes at least a receiving device ID, and wherein the determining whether the voice information is voice information sent to the receiving device based on the meta information comprises:

and judging whether the receiving equipment ID in the meta information is matched with the receiving equipment ID of the current receiving equipment.

6. The method of claim 4, wherein after said decoding the speech information into an interaction intent and executing the interaction intent, further comprising:

judging whether executing the interaction intention of completing the voice information;

if the execution is finished, feeding back the execution result to the sending equipment through sound waves;

and if the execution is not finished, feeding back the execution result to the sending equipment through sound waves so as to interact with the receiving equipment again through the sending equipment.

7. A sound information transmission apparatus for a transmission device, comprising:

a coding program module configured to code an interaction intention of the transmitting device into a natural language text in response to the interaction intention;

the acquisition conversion program module is configured to acquire meta information corresponding to the natural language text and convert the natural language text into voice information;

a transmitting program module configured to transmit the voice information to a receiving apparatus through a sound wave based on a first sound frequency that is audible to a human being, and transmit the meta information to the receiving apparatus through the sound wave based on a second sound frequency that is not audible to the human being.

8. A sound information transmission device is used for receiving equipment. The method comprises the following steps:

a receiving and acquiring program module configured to acquire voice information and meta information corresponding to the voice information, respectively, in response to receiving the voice information and the meta information, wherein the voice information is transmitted by sound waves and can be heard by a human being, and the voice information is transmitted by sound waves and cannot be heard by the human being;

a judgment program module configured to judge whether the voice information is the voice information transmitted to the reception apparatus based on the meta information;

and the decoding execution program module is configured to decode the voice information into the interaction intention and execute the interaction intention if the voice information is the interaction intention.

9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1 to 6.

10. A storage medium having stored thereon a computer program, characterized in that the program, when being executed by a processor, is adapted to carry out the steps of the method of any one of claims 1 to 6.

Technical Field

The invention belongs to the technical field of data transmission, and particularly relates to a sound information transmission method and device.

Background

With the development and popularization of the internet of things and artificial intelligence technology, the number of devices is increasing, the devices are more and more intelligent in function, and the interaction requirements between the devices are more and more urgent, such as device discovery, automatic networking, device control and the like. The method for interacting the machine and the machine which is universal, quick, strong in expandability and independent of a public network is designed, and the method has high application value.

Conventional internet-based technologies include Wifi-based networking and cellular communication technologies, e.g., 5G, 4G, 3G, NB-IoT, LoRa.

Further, there are also internet + lan technologies, e.g. Zigbee from WIFI/wired network connection, cellular down to WIFI and sonic up to WIFI. Point-to-point technologies such as bluetooth technology, contactless radio frequency (NFC near field communication technology, RFID) and acoustic wave communication technology. The transmission protocol, wherein the communication between the processes based on the Internet is mainly based on the TCP/IP protocol and the UDP protocol; communication based on point-to-point technology and Zigbee generally uses a dedicated protocol designed by itself.

In the networking aspect, different networking modes have respective advantages and disadvantages, and no technology is optimal or worst, and the technologies can be compared only after being combined with a specific application scene. The most popular networking mode based on WIFI in the current home environment is complex in configuration for users, and before all IOT configuration accesses, the ssid and the password need to be configured, so that the networking mode based on WIFI is not friendly to the IOT equipment without a screen.

For example, cellular communication technologies (5G, 4G) have the disadvantages of high cost, high power, and no direct communication between the smart phone and the device (through the base station). A drawback of cellular communication technology (NB-IoT) is that the current performance indicators are exaggerated and the network coverage is poor. The Wifi technology has the disadvantages of complex configuration, poor stability and large power consumption.

Further, the inventor finds in the process of implementing the present application that the prior art solution has at least one or more of the following drawbacks:

the technical defects of the internet and the local area network comprise: the defects of connecting the Zigbee from the WIFI/wired network are that a gateway is required to be used as a bridge between the Internet and the ZIGBEE network, the data transmission rate of the ZIGBEE is low, the anti-interference performance is poor, the butt joint of an IP protocol is complex, and the like. The premise of reducing the number of the devices from the honeycomb to the WIFI is that the two devices are connected with the honeycomb and the WIFI at the same time, the cost is high, and the configuration is complex. The disadvantage of going from sound waves to WIFI is that after going from sound waves to WIFI, WIFI needs to be connected to the internet before the device can be operated.

The drawbacks of the point-to-point technique include: the disadvantage of bluetooth technology is that the various versions of bluetooth are incompatible and have poor networking capabilities. A disadvantage of contactless radio frequency (NFC near field communication technology, RFID) is that special hardware is required. The disadvantage of the acoustic wave communication technology is that the propagation distance is short, the data transmission rate is low, the receiver cannot be specified, and the receiver is easily interfered. The drawbacks of data transmission include: the information exchange of specific functions can be carried out only on pre-designed equipment, certain universality is lacked, and the data format of a new functional interface must be defined in advance when the equipment is required to transmit and understand more contents mutually.

The following problems still exist for the current desire to achieve a highly flexible and user friendly standard for machine-to-machine interaction: 1. machine-to-machine interaction requires the use of the internet: the information interaction of the current machine and the machine usually needs to be carried out by means of the internet, for example, in a household, a scene of controlling smart home equipment is required, most of gateway equipment operations need to be carried out through the internet, and new equipment generally needs to be firstly distributed with a network. Therefore, the phenomena of inconvenience and difficulty in data interaction exist in the prior art. 2. Different service interfaces of different tasks need to be defined in advance: when a device interacts with the device, it is usually necessary to provision an interface to exchange specific content or perform some operation. For each function in the interaction, an interface or parameter is required to define.

Disclosure of Invention

An embodiment of the present invention provides a method and an apparatus for transmitting sound information, which are used to solve at least one of the above technical problems.

In a first aspect, an embodiment of the present invention provides a sound information transmission method, used for a sending device, including: in response to an interaction intention of the transmitting device, encoding the interaction intention into natural language text; acquiring meta information corresponding to the natural language text, and converting the natural language text into voice information; transmitting the voice information to a receiving device through a sound wave based on a first sound frequency, and transmitting the meta information to the receiving device through the sound wave based on a second sound frequency, wherein the first sound frequency can be heard by a human being, and the second sound frequency cannot be heard by the human being.

In a second aspect, an embodiment of the present invention provides a sound information transmission method, used for a receiving device, including: respectively acquiring voice information and meta information corresponding to the voice information in response to receiving the voice information and the meta information, wherein the voice information is transmitted through sound waves and can be heard by a human, and the voice information is transmitted through sound waves and cannot be heard by the human; judging whether the voice information is the voice information sent to the receiving equipment or not based on the meta information; and if so, decoding the voice information into an interaction intention and executing the interaction intention.

In a third aspect, an embodiment of the present invention provides a sound information transmission apparatus, configured to a sending device, where the sound information transmission apparatus includes: a coding program module configured to code an interaction intention of the transmitting device into a natural language text in response to the interaction intention; the acquisition conversion program module is configured to acquire meta information corresponding to the natural language text and convert the natural language text into voice information; a transmitting program module configured to transmit the voice information to a receiving apparatus through a sound wave based on a first sound frequency that is audible to a human being, and transmit the meta information to the receiving apparatus through the sound wave based on a second sound frequency that is not audible to the human being.

In a fourth aspect, an embodiment of the present invention provides a sound information transmission apparatus, configured to a receiving device, where the sound information transmission apparatus includes: a receiving and acquiring program module configured to acquire voice information and meta information corresponding to the voice information, respectively, in response to receiving the voice information and the meta information, wherein the voice information is transmitted by sound waves and can be heard by a human being, and the voice information is transmitted by sound waves and cannot be heard by the human being; a judgment program module configured to judge whether the voice information is the voice information transmitted to the reception apparatus based on the meta information; and the decoding execution program module is configured to decode the voice information into the interaction intention and execute the interaction intention if the voice information is the interaction intention.

In a fifth aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of transmitting voice information according to any of the embodiments of the present invention.

In a sixth aspect, the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-volatile computer-readable storage medium, and the computer program includes program instructions, which, when executed by a computer, make the computer execute the steps of the sound information transmission method according to any embodiment of the present invention.

According to the method and the device, the interaction intention of the sending equipment is coded into the natural language text, the natural language text is converted into the voice information and is sent to the receiving equipment through the sound wave based on the first sound frequency, the meta information corresponding to the natural language text is obtained and is sent to the receiving equipment through the sound wave based on the second sound frequency, and therefore the interaction can be carried out through the sound without a network, and the equipment is convenient. Further, the safety and the universality of the used equipment are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of a method for transmitting voice information according to an embodiment of the present invention, which is used in a sending device;

fig. 2 is a flowchart of a method for transmitting voice information, which is used in a receiving device according to an embodiment of the present invention;

fig. 3 is a flowchart of another method for transmitting audio information according to an embodiment of the present invention;

fig. 4 is two different frequency sound wave diagrams of a specific example of the sound information transmission method according to an embodiment of the present invention;

fig. 5 is a flowchart of a specific example of a method for transmitting sound information according to an embodiment of the present invention;

fig. 6 is a block diagram of an apparatus for transmitting sound information for a transmitting device according to an embodiment of the present invention;

fig. 7 is a block diagram of an apparatus for transmitting sound information for a receiving device according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a flowchart of an embodiment of a sound information transmission method according to the present application is shown, where the sound information transmission method is used for a sending device, where the sending device may be a device with a sound generation and/or pickup function, such as a smart speaker, a smart phone, a tablet, a computer, and so on.

As shown in fig. 1, in step 101, in response to an interaction intention of the sending device, encoding the interaction intention into a natural language text;

in step 102, obtaining meta information corresponding to the natural language text, and converting the natural language text into voice information;

in step 103, the speech information is transmitted to a receiving device by sound waves based on a first sound frequency that can be heard by a human being, and the meta information is transmitted to the receiving device by sound waves based on a second sound frequency that cannot be heard by the human being.

In the present embodiment, for step 101, the sound information transmission apparatus responds to the interaction intention of the sending device and encodes the interaction intention into a natural language text, for example, acquires a voice instruction of the user, generates the interaction intention based on the voice instruction of the user, and encodes the interaction intention into the natural language text based on the natural language, for example, selects words conforming to the interaction intention in a preset vocabulary based on the interaction intention, and combines the words conforming to the interaction intention into the interaction intention based on a grammar rule.

Then, for step 102, the sound information transmission apparatus acquires meta information corresponding to the natural language text, and converts the natural language text into speech information, for example, performs speech synthesis of the natural language text into speech information and can be heard and recognized by the user, wherein the meta information is information about information that allows the server to provide information of transmitted data, and the receiving side can determine whether the device is a recipient of the audio by the meta information.

Finally, for step 103, the sound information transmission apparatus transmits the voice information to the receiving device through the sound wave based on a first sound frequency that can be heard by a human being, and transmits the meta information to the receiving device through the sound wave based on a second sound frequency that cannot be heard by the human being, for example, transmits the voice information as a main body to the receiving device through the audible sound frequency, in which the transmitted voice information can be heard and recognized by the user, and further transmits the encoded short meta information through the inaudible sound frequency, in which the meta information can be transmitted based on a preset fixed frequency, and a plurality of frequencies that do not interfere with each other can be simultaneously used as carriers of inaudible sound.

According to the method, the interaction intention is encoded into the natural language text, the natural language text is converted into the voice information and is sent to the receiving equipment through the sound wave based on the first sound frequency, the meta information corresponding to the natural language text is obtained and is sent to the receiving equipment through the sound wave based on the second sound frequency, so that remote operation can be performed through the sound only through one sound-based interface between the equipment without a network, and further, the safety and the universality of the used equipment are improved.

In the method of the above embodiment, the meta information may include: and the ID of the sending equipment, the ID of the receiving equipment and the verification information of the voice information. According to the method of the embodiment, the ID of the sending device, the ID of the receiving device and the verification information of the voice information which are contained in the meta information are used, so that the interaction intention can be accurately transmitted to the receiving device.

In the method according to the above embodiment, the transmitting the voice information by sound waves based on a first sound frequency, and the transmitting the meta information by sound waves based on a second sound frequency includes: the sound information of the first sound frequency and the meta information of the second sound frequency are synchronously transmitted to the receiving device, for example, sound waves are synchronously emitted at the audible sound frequency and the inaudible sound frequency at the same time, the transmission of the information is completed, besides the synchronous transmission of the sound waves, the staggered transmission can also be realized, for example, one of the sound waves is transmitted in hundreds of milliseconds first, or the sound waves are transmitted in the front and the back. The method of the embodiment can realize that the synchronous emission of the sound waves can increase the content of information transmission without increasing the content of audible sound by synchronously transmitting the voice information of the first sound frequency and the meta-information of the second sound frequency to the receiving equipment.

With further reference to fig. 2, a flowchart of a sound information transmission method provided by an embodiment of the present application is shown, which is used for a receiving device, where the receiving device may be a device with sound pickup and/or sound generation functions, such as a smart speaker, a smart phone, a tablet, a computer, and so on.

As shown in fig. 2, in step 201, in response to receiving voice information and meta information corresponding to the voice information, which are transmitted by a transmitting device and can be heard by a human being, and transmitted by a sound wave, the voice information and the meta information are respectively obtained, wherein the voice information and the meta information are not heard by the human being;

in step 202, judging whether the voice information is the voice information sent to the receiving device or not based on the meta information;

in step 203, if yes, the voice information is decoded into the interactive intention and the interactive intention is executed.

In this embodiment, for step 201, the sound information transmission apparatus respectively acquires the voice information and the meta information corresponding to the voice information in response to receiving the voice information and the meta information sent by the sending device, for example, after receiving the sound wave, the received voice information is separated from the meta information corresponding to the voice information by a preset frequency threshold, and a voice information portion and a meta information portion can be respectively obtained, wherein the voice information is transmitted by the sound wave and can be heard by a human being, and the voice information is transmitted by the sound wave and cannot be heard by the human being.

Then, in step 202, the sound information transmission apparatus determines whether the sound information is the sound information transmitted to the receiving device based on the meta information, for example, the meta information at least includes the receiving device ID, decodes the meta information to obtain the receiving device ID, and determines whether the sound information is the sound information transmitted to the current receiving device.

Finally, for step 203, if the voice information of the current receiving device is sent, the voice information is decoded into the interaction intention and the interaction intention is executed, for example, the voice information is converted into a natural language text, and the natural language is decoded into the interaction intention based on the vocabulary and the grammar rule and executed.

According to the method, the voice information and the meta information corresponding to the voice information are obtained, and whether the voice information is the voice information sent to the receiving equipment or not is judged based on the meta information, so that remote interaction can be achieved through sound, and the safety and the universality of the equipment are improved.

In the method of the above embodiment, the meta information may include at least a receiving device ID, and the determining whether the voice information is the voice information sent to the receiving device based on the meta information includes: and judging whether the receiving equipment ID in the meta information is matched with the receiving equipment ID of the current receiving equipment or not, for example, if the receiving equipment ID in the meta information is matched with the receiving equipment ID of the current receiving equipment, responding to the information sent to the current equipment, and if the receiving equipment ID in the meta information is not matched with the receiving equipment ID of the current receiving equipment, directly finishing the interaction.

The method described in this embodiment can reduce the power consumption of other receiving devices while the current receiving device accurately responds to the sending device by determining whether the receiving device ID in the meta information matches the receiving device ID of the current receiving device.

Further referring to fig. 3, a flow chart of another method for transmitting sound information according to an embodiment of the present application is shown. The flowchart is mainly a flowchart of steps further defined by the flow after "decode the voice information into the interactive intention and execute the interactive intention" of the flowchart 2.

As shown in fig. 3, in step 301, it is determined whether to execute an interaction intention to complete the voice message;

in step 302, if the execution is completed, the execution result is fed back to the sending device through sound waves;

in step 303, if the execution is not completed, the execution result is fed back to the sending device by sound wave, so as to interact with the receiving device again through the sending device.

In the present embodiment, for step 301, the sound information transmission apparatus determines whether to execute the intention of completing the interaction of the voice information; then, in step 302, if the execution of the interactive intention of the voice information is finished, the execution result is fed back to the sending device through sound waves; finally, in step 303, if the interaction intention of the uncompleted voice information is executed, feeding back an execution result to the sending device through sound waves, so as to interact with the receiving device again through the sending device, for example, the interaction intention takes opening the air conditioner as an example, the receiving device is the air conditioner, if the air conditioner is opened, it will be broadcasted that the air conditioner is opened, if the air conditioner is not opened or does not respond, it will be broadcasted that the air conditioner is not opened, after receiving the feedback that the air conditioner is opened, the sending device finishes interacting with the receiving device, and if the sending device receives the feedback that the air conditioner is not opened, it regenerates the interaction intention of opening the air conditioner and interacts with the air conditioner again.

The method of the embodiment can realize better completion of the interaction intention of the sending equipment and increase the intelligence of the equipment by judging whether to execute the interaction intention for completing the voice information.

It should be noted that the above method steps are not intended to limit the execution order of the steps, and in fact, some steps may be executed simultaneously or in the reverse order of the steps, which is not limited herein.

The following description is provided to enable those skilled in the art to better understand the present disclosure by describing some of the problems encountered by the inventors in implementing the present disclosure and by describing one particular embodiment of the finally identified solution.

The inventor finds that the defects in the prior art are mainly caused by the following reasons in the process of implementing the application: firstly, information interaction between machines needs to be based on some kind of carrier or medium, and at present, the most popular technology in the home environment is WIFI. Due to the popularization of internet and WIFI, in the IOT device in a home scene, technicians tend to directly operate the IOT device via the internet, for example, to control a home electric lamp through a smart speaker. However, data linkage of most products is quite complex, and the existing scheme roughly realizes the following steps:

the network service of the sound box is requested by the sound box, then the network service of the bulb company is requested by the network service of the sound box, and finally the network service of the bulb company transfers an operation instruction to the lamp at home through the Internet instead of directly sending the instruction to the lamp by the sound box.

Therefore, to control the lamp, the lamp must be connected to a WIFI network capable of accessing the internet. Due to the fact that the WIFI protocol stipulates that the IOT equipment can be added into the network only after the distribution network is successfully distributed, configuration of the IOT equipment of the distribution network for the first time is required to be completed through password input, and the phenomenon that the IOT equipment is complex in distribution network is caused.

In terms of data transmission, if WIFI is used as a networking technology, the device and the device generally interact based on a TCP/IP protocol, and when a similar protocol to TCP/IP is used, a protocol family, a network address and a transport layer port need to be defined before content (data packets) can be transmitted.

The content transmitted based on the TCP/IP protocol is parsed by a program running on the receiving side, which requires that the content transmitted by the sender must be the structure required by the accessing side, otherwise the receiving side will parse the content correctly.

At present, the content transmitted by TCP/IP is usually a specific structure of content, such as a piece of JSON or XML, which is agreed in advance, because of the lack of a general medium for communication of protocols similar to human language between devices.

This results in interfaces that are used between different companies that are often not identical, even though the same functionality is used. Therefore, in order to enable the intelligent gateways of the manufacturer a and the manufacturer B to operate the bulb X, the bulb manufacturer needs to be compatible with the interfaces of the manufacturer a and the manufacturer B.

These are long standing problems in the field.

In the related art, in order to reduce the complexity of a WIFI distribution network and enable a device to be connected to a wireless network more easily, there are two optimization ideas in general. 1. And optimizing the process of first connection of the IOT equipment to the WIFI.

For example, the SSID and password of WIFI are sent to the new IOT device through APP, and the transmitting machine may be based on bluetooth technology, or may be based on acoustic transmission, or other point-to-point networking technologies. 2. IOT equipment, such as Zigbee, is controlled by directly using other networking technologies which are convenient for the network distribution.

The inventor discovers that in the process of implementing the application: the first optimization idea is to simplify the distribution process to a certain extent, and the distribution is still needed in essence, which is still inconvenient. The second optimization idea is that some manufacturers currently use Zigbee as a networking technology in a home environment, but a Zigbee gateway can only control IOT devices of their own company, and thus is not popular in a large scale. Meanwhile, the two ideas fail to solve the problem that data transmission between the devices needs to depend on a network interface which is established in advance.

The inventor also finds that in the internet era, the internet is the ground carrier of most application scenes, technicians are familiar with the internet, and the idea of skipping out of the internet infrastructure becomes difficult.

There is currently no mature protocol based on acoustic networking and the use of acoustic waves to transmit data remains a challenge. Compared with the electromagnetic wave, the sound wave has small bandwidth and low transmission speed, and technicians are difficult to transmit a large amount of data information by using the sound wave. In addition, the sound wave of the common frequency band is easily interfered by environmental noise, and the difficulty of data transmission is increased. In case that the above problems cannot be solved, it is difficult to persuade a technician, and a networking and data transmission scheme based on sound waves is possible.

The scheme of the application is mainly designed and optimized from the following aspects:

the method is suitable for data exchange between equipment and equipment based on sound waves, and the method synchronously sends out the sound waves at audible frequency and inaudible frequency to finish data transmission. The synchronized emission of sound waves allows for increased data transmission content without increasing the audible sound content.

The content sent out by the audible sound frequency is synthetic sound corresponding to the text generated by a specially designed natural language coding and decoding system, so that the difficulty of speech recognition and semantic understanding of equipment can be effectively reduced. The natural language is used as the main body of the message, so that the effect of 'simple and complete meaning' can be achieved by means of rich semantics of the language, and the problem that the volume of the coded data is too large and is not suitable for transmission based on sound waves is solved.

The content emitted by the inaudible sound frequency is coded short meta information, including a sound sender device ID, a sound receiver device ID and audible sound verification information. The problem that a plurality of devices may receive requests in the process of transmission because the sound waves are transmitted all around can be solved, but a sound sender may only want a certain device to respond. The inaudible sound is emitted based on a fixed frequency, and a plurality of non-interfering frequencies can be simultaneously used as carriers of the inaudible sound.

When the receiver receives the sound wave, the sound is separated into the sound with audible sound frequency and the sound with inaudible sound frequency based on the preset frequency threshold, the voices of the natural language part and the meta information data part can be respectively obtained, and the recognition result and the semantic result are obtained by using a natural language coding and decoding tool to recognize and decode the voice with audible sound frequency. The decoding tool provided by the method of the embodiment of the application can decode the sound of the inaudible sound frequency to obtain the metadata, and the device can know whether the current message is sent to the device or not based on the metadata and only responds to the message sent to the device.

Referring to fig. 4, two different frequency sound wave diagrams of a specific example of the sound information transmission method according to an embodiment of the present invention are shown.

As shown in fig. 4, first, natural language (human language) is transmitted as a message body of device interaction using audible sound waves, just as human-to-human communication. Such as turning on the lamp, the information exchange between the devices is completed through the abundant information of the language itself.

Secondly, additional meta-information is transmitted using ultrasound, or frequencies close to ultrasound, which includes at least the information of the receiving party, optionally of the introducing party, and information for verification and error correction.

Finally, the two different frequencies of sound are combined together for transmission.

By combining the sound waves of the two channels, a human-to-human communication mode can be achieved between machines without connecting to the Internet.

Further referring to fig. 5, a flowchart of a specific example of the method for transmitting the sound information according to an embodiment of the present invention is shown.

As shown in fig. 5, step 1: the device 100 receives a request for turning on the light from the user (it should be noted that the request here is not necessarily a voice request, but may also be an operation based on APP or a timing condition, and the application is not limited here).

Step 2: the device 100 generates a piece of human-ear audible speech conforming to the lexical and grammatical rules based on the lexical and grammatical rules, encodes the information of the device 200 as a message receiver based on the protocol of the embodiment of the present application and generates a sound with a frequency inaudible to the human ear, and combines two pieces of speech.

And step 3: the device 100 issues instructions through the sound-generating equipment that sound travels through the sound waves to reach the device 200.

And 4, step 4: the device 200 receives a valid voice (conforming to the natural language based sound transmission method).

And 5: the device 200 needs to confirm that the message was addressed to the device 200 and otherwise does not need to parse.

Step 6: the device 200 acquires the identification content and the semantic information based on the natural language codec tool.

And 7: the device 200 fulfills the light-on requirement and simultaneously returns a voice of the natural language based sound transmission method as feedback.

The inventor finds that the invention has at least the following beneficial effects in the process of realizing the invention: the data transmission can be carried out only by using the most basic pickup and pronunciation equipment without adding additional equipment, the cost of the equipment can be effectively reduced, and the data exchange between the equipment and the equipment does not depend on a wireless network any more. Furthermore, the mutual communication understands the intention of the other party, so that the communication content between the equipment and the equipment becomes more free and flexible without the need of mutually calling through a network interface which is defined in advance, and the interaction cost and the use difficulty of the equipment are reduced. The natural support for cross-device cross-brand linkage and communication. In addition, the voice-based characteristics can enable people to know the content communicated between the devices, and the safety and experience of users are improved. It is foreseeable that especially for humanoid intelligent robots often appearing in science fiction films and in intelligent home scenes, the design of the embodiment of the application can enable communication between devices to be more convenient, efficient and accurate.

In summary, the embodiments of the present application describe a natural language based sound transmission method, which can at least solve one or more of the following problems: the volume of the encoded data is too large to be suitable for transmission based on sound waves; and the problem that the device receiving the sound wave cannot confirm whether the sound is transmitted to the device without adding the audible sound content when the sound is transmitted.

The inventor finds out that the embodiment of the application can achieve deeper effects in the process of realizing the invention: the communication is based on sound waves and can respond to requests based on natural language as long as the device has a radio device. For the current IOT equipment, the distribution network is always a problem that users are headache, especially for children and old people who are not familiar with the Internet equipment, the user experience of the IOT equipment is undoubtedly improved without the characteristics of the distribution network, meanwhile, the privacy of the users is probably leaked when the equipment exposed to the Internet is attacked by the network, for some common household IOT equipment, remote operation can be carried out without connecting the Internet, a stronger safety feeling can be brought to the users, and the attraction of products to the users can be improved.

Compared with radio waves, common users are undoubtedly more familiar with voice, and if the interaction between the equipment and the equipment is also based on sound, when the equipment and the equipment interact, the information can be perceived by the users, and the strangeness brought to the users by the technology can be reduced.

In the related art, when the device and the device interact based on the network interface, the device may interact only if the device knows the interface address of the other party in advance, and generally, the device does not expose the interface to the third party based on the security consideration, and even if the interface is exposed, only a small part of the interface is used for secondary development. The current butt joint mode of using the network interface determines that the number of interfaces is a limited set, for example, an story machine device needs one interface for playing music, needs one interface for switching next, needs one interface for setting an alarm clock, and needs one interface for deleting the alarm clock.

However, if the voice protocol system based on the natural language codec of the embodiment of the present application is used, as long as the information received by the device can be correctly parsed and understood, the operation can be directly performed, for the device, only a voice-based interface needs to be provided to interact with other devices, and even when a request exceeding the capability is received, the reason can be definitely rejected and notified.

Referring to fig. 6, a block diagram of an apparatus for transmitting sound information according to an embodiment of the present invention is shown.

As shown in fig. 6, the sound information transmission device 600 includes an encoding program module 610, an acquisition conversion program module 620, and a transmission program module 630.

Wherein the encoding program module 610 is configured to encode the interaction intention into a natural language text in response to the interaction intention of the transmitting device; an obtaining conversion program module 620 configured to obtain meta information corresponding to the natural language text and convert the natural language text into voice information; a transmitting program module 630 configured to transmit the voice information to a receiving device via sound waves based on a first sound frequency that is audible to a human being and transmit the meta information to the receiving device via sound waves based on a second sound frequency that is not audible to the human being.

Referring to fig. 7, a block diagram of an apparatus for transmitting sound information for a receiving device according to an embodiment of the present invention is shown.

As shown in fig. 7, the sound information transmission device 700 includes a receiving and acquiring program module 710, a determining program module 720 and a decoding and executing program module 730.

The receiving and acquiring program module 710 is configured to respectively acquire voice information and meta information corresponding to the voice information, wherein the voice information is transmitted by sound waves and can be heard by a human being, and the voice information is transmitted by sound waves and can not be heard by the human being, in response to receiving the voice information and the meta information transmitted by a transmitting device; a determining program module 720 configured to determine whether the voice information is the voice information transmitted to the receiving apparatus based on the meta information; and a decoding executive module 730 configured to decode the voice message into the interaction intention and execute the interaction intention if the voice message is received.

It should be understood that the modules recited in fig. 6 and 7 correspond to various steps in the methods described with reference to fig. 1, 2, and 3. Thus, the operations and features described above for the method and the corresponding technical effects are also applicable to the modules in fig. 6 and 7, and are not described again here.

It is to be noted that the modules in the embodiments of the present disclosure are not intended to limit the aspects of the present disclosure, and for example, the encoding program module may be described as a module that encodes the interaction intention into a natural language text in response to the interaction intention of the transmitting device. In addition, related functional modules may also be implemented by a hardware processor, for example, the coding program module may also be implemented by a processor, which is not described herein again.

In other embodiments, the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions may execute the sound information transmission method in any of the above method embodiments;

as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

in response to an interaction intention of the transmitting device, encoding the interaction intention into natural language text;

acquiring meta information corresponding to the natural language text, and converting the natural language text into voice information;

As another embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

judging whether the voice information is the voice information sent to the receiving equipment or not based on the meta information;

and if so, decoding the voice information into an interaction intention and executing the interaction intention.

The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the sound information transmission apparatus, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the acoustic message transmission device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Embodiments of the present invention also provide a computer program product, which includes a computer program stored on a non-volatile computer-readable storage medium, where the computer program includes program instructions that, when executed by a computer, cause the computer to execute any one of the above-mentioned sound information transmission methods.

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 8, the electronic device includes: one or more processors 810 and a memory 820, with one processor 810 being an example in FIG. 8. The apparatus of the sound information transmission method may further include: an input device 830 and an output device 840. The processor 810, the memory 820, the input device 830, and the output device 840 may be connected by a bus or other means, such as the bus connection in fig. 8. The memory 820 is a non-volatile computer-readable storage medium as described above. The processor 810 executes various functional applications of the server and data processing by executing nonvolatile software programs, instructions and modules stored in the memory 820, that is, implementing the sound information transmission method of the above-described method embodiment. The input device 830 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the communication compensation device. The output device 840 may include a display device such as a display screen.

The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.

As an embodiment, the electronic device is applied to a sound information transmission device, and is used for a client, and includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to:

in response to an interaction intention of the transmitting device, encoding the interaction intention into natural language text;

acquiring meta information corresponding to the natural language text, and converting the natural language text into voice information;

As another embodiment, the electronic device is applied to a sound information transmission apparatus for a client, and includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to:

judging whether the voice information is the voice information sent to the receiving equipment or not based on the meta information;

and if so, decoding the voice information into an interaction intention and executing the interaction intention.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.

(3) A portable entertainment device: such devices can display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.

(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.

(5) And other electronic devices with data interaction functions.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

19页详细技术资料下载

Sound information transmission method and device

相关技术

网友询问留言