Data processing method, mobile terminal and storage medium

文档序号:154802 发布日期:2021-10-26 浏览:31次 中文

阅读说明:本技术 数据处理方法、移动终端及存储介质 (Data processing method, mobile terminal and storage medium ) 是由 张小龙 于 2021-07-09 设计创作,主要内容包括:本申请公开一种数据处理方法,包括基于移动终端触发音频数据生成指令时,获取所述音频数据生成指令对应的目标文本数据以及与所述目标文本数据关联的初始音频数据;根据所述目标文本数据与所述初始音频数据的关联关系生成所述目标文本数据对应的目标音频数据。本申请还提供一种移动终端和存储介质。本申请实现文本数据和音频数据之间的转换,使得修改后的文本,对应音频数据也随之修改,保持文本数据和音频数据的一致性。(The application discloses a data processing method, which comprises the steps of acquiring target text data corresponding to an audio data generation instruction and initial audio data related to the target text data when the audio data generation instruction is triggered based on a mobile terminal; and generating target audio data corresponding to the target text data according to the incidence relation between the target text data and the initial audio data. The application also provides a mobile terminal and a storage medium. The method and the device realize conversion between the text data and the audio data, so that the modified text and the corresponding audio data are modified, and the consistency of the text data and the audio data is kept.)

1. A data processing method, characterized in that the processing method comprises:

triggering an audio data generation instruction based on a mobile terminal, and acquiring target text data corresponding to the audio data generation instruction and initial audio data associated with the target text data;

and generating target audio data corresponding to the target text data according to the incidence relation between the target text data and the initial audio data.

2. The data processing method of claim 1, wherein the target text data is text data in which original text data associated with the original audio data is edited.

3. The data processing method of claim 2, wherein the association of the initial audio data with the initial text data is generated based on when the initial audio data is converted into the initial text data.

4. The data processing method according to any one of claims 1 to 3, wherein the step of generating target audio data corresponding to the target text data according to the association relationship between the target text data and the initial audio data comprises:

determining audio data segments to be edited in the initial audio data and the editing types of the audio data segments to be edited based on the incidence relation between the target text data and the initial audio data;

and editing the audio data segment to be edited corresponding to the initial audio data based on the editing type to generate target audio data corresponding to the target text data.

5. The data processing method according to claim 4, wherein the editing the audio data segment to be edited corresponding to the initial audio data based on the editing type, and the generating the target audio data corresponding to the target text data comprises:

when the editing type is deleting, deleting the audio data segment to be edited in the initial audio data, and splicing the audio data of the deleted audio data segment to be edited to generate target audio data corresponding to the target text data; or the like, or, alternatively,

when the editing type is copying and pasting, copying the audio data segment to be edited in the initial audio data, and pasting the audio data segment to be edited in the initial audio data to generate target audio data corresponding to the target text data; or the like, or, alternatively,

and when the editing type is moving, determining a target position of the audio data segment to be edited according to the target text data, wherein the target position comprises at least one of a starting time point and an ending time point, and moving the audio data segment to be edited to the target position in the initial audio data to generate target audio data corresponding to the target text data.

6. A data processing method, characterized in that the processing method comprises:

determining initial audio data based on a text conversion instruction triggered by the mobile terminal;

converting the initial audio data into corresponding initial text data;

and storing the initial text data and the initial audio data in a correlated mode.

7. The data processing method of claim 6, wherein the step of saving the initial text data in association with the initial audio data comprises:

and the text data segments in the initial text data and the audio data segments in the target audio data are stored in a one-to-one correspondence manner.

8. The data processing method of claim 7, wherein the step of storing the text data segments in the initial text data in one-to-one correspondence with the audio data segments in the target audio data comprises:

dividing the initial text into at least two text segments according to the generation sequence of the initial text data;

dividing the target audio data into audio data segments corresponding to the text segments one by one according to the time sequence of the initial audio data;

and storing each text segment and each audio data segment in a one-to-one association manner.

9. A mobile terminal, characterized in that the mobile terminal comprises: memory, processor, wherein the memory has stored thereon a processing program which, when executed by the processor, implements the steps of the data processing method according to any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the data processing method according to any one of claims 1 to 8.

Technical Field

The present application relates to the field of electronic technologies, and in particular, to a data processing method, a mobile terminal, and a storage medium.

Background

With the pace of work and life accelerating, people seek high efficiency in any situation. For example, audio recording is gradually adopted in a conference, a video conference or an audio conference, and audio data is converted into a text at a later stage, and the text is used as a conference summary. However, in the course of conceiving and implementing the present application, the inventors found that at least the following problems existed: between the audio and the text, only the audio data can be converted into the text, and when the text is modified, the audio data cannot be modified, so that the audio data is inconsistent with the text, and the use effect of converting the audio data into the text is influenced.

The foregoing description is provided for general background information and is not admitted to be prior art.

Disclosure of Invention

In view of the above technical problems, the present application provides a data processing method, a mobile terminal, and a storage medium, so that text and audio data can be converted into each other, and the audio data can be modified in time after the text is modified.

In order to solve the above technical problem, the present application provides a data processing method, where the data processing method includes:

triggering an audio data generation instruction based on a mobile terminal, and acquiring target text data corresponding to the audio data generation instruction and initial audio data associated with the target text data;

and generating target audio data corresponding to the target text data according to the incidence relation between the target text data and the initial audio data.

Optionally, the target text data is text data obtained by editing initial text data associated with the initial audio data.

Optionally, the association relationship between the initial audio data and the initial text data is generated based on the initial audio data being converted into the initial text data.

Optionally, the step of generating target audio data corresponding to the target text data according to the association relationship between the target text data and the initial audio data includes:

determining audio data segments to be edited in the initial audio data and the editing types of the audio data segments to be edited based on the incidence relation between the target text data and the initial audio data;

and editing the audio data segment to be edited corresponding to the initial audio data based on the editing type to generate target audio data corresponding to the target text data.

Optionally, the editing the audio data segment to be edited corresponding to the initial audio data based on the editing type, and the generating the target audio data corresponding to the target text data includes:

when the editing type is deleting, deleting the audio data segment to be edited in the initial audio data, and splicing the audio data of the deleted audio data segment to be edited to generate target audio data corresponding to the target text data; or the like, or, alternatively,

when the editing type is copying and pasting, copying the audio data segment to be edited in the initial audio data, and pasting the audio data segment to be edited in the initial audio data to generate target audio data corresponding to the target text data; or the like, or, alternatively,

and when the editing type is moving, determining a target position of the audio data segment to be edited according to the target text data, wherein the target position comprises at least one of a starting time point and an ending time point, and moving the audio data segment to be edited to the target position in the initial audio data to generate target audio data corresponding to the target text data.

The application also provides a data processing method, which comprises the following steps:

determining initial audio data based on a text conversion instruction triggered by the mobile terminal;

converting the initial audio data into corresponding initial text data;

and storing the initial text data and the initial audio data in a correlated mode.

Optionally, the step of saving the initial text data in association with the initial audio data includes:

and the text data segments in the initial text data and the audio data segments in the target audio data are stored in a one-to-one correspondence manner.

Optionally, the step of performing one-to-one correspondence storage on the text data segment in the initial text data and the audio data segment in the target audio data includes:

dividing the initial text into at least two text segments according to the generation sequence of the initial text data;

dividing the target audio data into audio data segments corresponding to the text segments one by one according to the time sequence of the initial audio data;

and storing each text segment and each audio data segment in a one-to-one association manner.

The present application further provides a mobile terminal, the mobile terminal including: the data processing method comprises a memory and a processor, wherein the memory stores a processing program, and the processing program realizes the steps of the data processing method when being executed by the processor.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the data processing method as described above.

As described above, the data processing method of the present application can implement conversion between text data and audio data, so that the modified text and the corresponding audio data are modified accordingly, and consistency between the text data and the audio data is maintained.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic hardware structure diagram of a mobile terminal implementing various embodiments of the present application;

fig. 2 is a communication network system architecture diagram according to an embodiment of the present application;

fig. 3 is a flowchart illustrating a data processing method according to the first embodiment;

FIG. 4 is a schematic diagram of a recording and audio conversion system of a mobile terminal implementing various embodiments of the present application;

FIG. 5 is a flowchart of a detailed embodiment of step S20 in FIG. 3;

fig. 6 is a flowchart illustrating a data processing method according to the second embodiment.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings. With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the recitation of an element by the phrase "comprising an … …" does not exclude the presence of additional like elements in the process, method, article, or apparatus that comprises the element, and optionally, identically named components, features, and elements in different embodiments of the present application may have different meanings, as may be determined by their interpretation in the embodiment or by their further context within the embodiment.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or," "and/or," "including at least one of the following," and the like, as used herein, are to be construed as inclusive or mean any one or any combination. For example, "includes at least one of: A. b, C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C ", again for example," A, B or C "or" A, B and/or C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C'. An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.

It should be understood that, although the steps in the flowcharts in the embodiments of the present application are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, in different orders, and may be performed alternately or at least partially with respect to other steps or sub-steps of other steps.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It should be noted that step numbers such as S10 and S20 are used herein for the purpose of more clearly and briefly describing the corresponding content, and do not constitute a substantial limitation on the sequence, and those skilled in the art may perform S20 first and then S10 in specific implementation, which should be within the scope of the present application.

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module", "component" or "unit" may be used mixedly.

The mobile terminal may be implemented in various forms. For example, the mobile terminal described in the present application may include mobile terminals such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and the like, and fixed terminals such as a Digital TV, a desktop computer, and the like.

The following description will be given taking a mobile terminal as an example, and it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for mobile purposes.

Referring to fig. 1, which is a schematic diagram of a hardware structure of a mobile terminal for implementing various embodiments of the present application, the mobile terminal 100 may include: RF (Radio Frequency) unit 101, WiFi module 102, audio output unit 103, a/V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 1 is not intended to be limiting of mobile terminals, which may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile terminal in detail with reference to fig. 1:

the radio frequency unit 101 may be configured to receive and transmit signals during information transmission and reception or during a call, and specifically, receive downlink information of a base station and then process the downlink information to the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. Alternatively, the radio frequency unit 101 may also communicate with a network and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA2000(Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division duplex Long Term Evolution), and TDD-LTE (Time Division duplex Long Term Evolution).

WiFi belongs to short-distance wireless transmission technology, and the mobile terminal can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 102, and provides wireless broadband internet access for the user. Although fig. 1 shows the WiFi module 102, it is understood that it does not belong to the essential constitution of the mobile terminal, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 into an audio signal and output as sound when the mobile terminal 100 is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the mobile terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 may include a speaker, a buzzer, and the like.

The a/V input unit 104 is used to receive audio or video signals. The a/V input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, the Graphics processor 1041 Processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the WiFi module 102. The microphone 1042 may receive sounds (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, or the like, and may be capable of processing such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting audio signals.

The mobile terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Optionally, the light sensor includes an ambient light sensor that may adjust the brightness of the display panel 1061 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 1061 and/or the backlight when the mobile terminal 100 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

The display unit 106 is used to display information input by a user or information provided to the user. The Display unit 106 may include a Display panel 1061, and the Display panel 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Alternatively, the user input unit 107 may include a touch panel 1071 and other input devices 1072. The touch panel 1071, also referred to as a touch screen, may collect a touch operation performed by a user on or near the touch panel 1071 (e.g., an operation performed by the user on or near the touch panel 1071 using a finger, a stylus, or any other suitable object or accessory), and drive a corresponding connection device according to a predetermined program. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Optionally, the touch detection device detects a touch orientation of a user, detects a signal caused by a touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and can receive and execute commands sent by the processor 110. Alternatively, the touch panel 1071 may be implemented in various types, such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 1071, the user input unit 107 may include other input devices 1072. Optionally, other input devices 1072 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like, and are not limited thereto.

Alternatively, the touch panel 1071 may cover the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or nearby, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 1061 according to the type of the touch event. Although the touch panel 1071 and the display panel 1061 are shown in fig. 1 as two separate components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated to implement the input and output functions of the mobile terminal, and is not limited herein.

The interface unit 108 serves as an interface through which at least one external device is connected to the mobile terminal 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 100 or may be used to transmit data between the mobile terminal 100 and external devices.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a program storage area and a data storage area, and optionally, the program storage area may store an operating system, an application program (such as a sound playing function, an image playing function, and the like) required by at least one function, and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Optionally, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 110 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the mobile terminal. Processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor, optionally, the application processor mainly handles operating systems, user interfaces, application programs, etc., and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The mobile terminal 100 may further include a power supply 111 (e.g., a battery) for supplying power to various components, and preferably, the power supply 111 may be logically connected to the processor 110 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system.

Although not shown in fig. 1, the mobile terminal 100 may further include a bluetooth module or the like, which is not described in detail herein.

In order to facilitate understanding of the embodiments of the present application, a communication network system on which the mobile terminal of the present application is based is described below.

Referring to fig. 2, fig. 2 is an architecture diagram of a communication Network system according to an embodiment of the present disclosure, where the communication Network system is an LTE system of a universal mobile telecommunications technology, and the LTE system includes a UE (User Equipment) 201, an E-UTRAN (Evolved UMTS Terrestrial Radio Access Network) 202, an EPC (Evolved Packet Core) 203, and an IP service 204 of an operator, which are in communication connection in sequence.

Optionally, the UE201 may be the terminal 100 described above, and is not described herein again.

The E-UTRAN202 includes eNodeB2021 and other eNodeBs 2022, among others. Alternatively, the eNodeB2021 may be connected with other enodebs 2022 through a backhaul (e.g., X2 interface), the eNodeB2021 is connected to the EPC203, and the eNodeB2021 may provide the UE201 access to the EPC 203.

The EPC203 may include an MME (Mobility Management Entity) 2031, an HSS (Home Subscriber Server) 2032, other MMEs 2033, an SGW (Serving gateway) 2034, a PGW (PDN gateway) 2035, and a PCRF (Policy and Charging Rules Function) 2036, and the like. Optionally, the MME2031 is a control node that handles signaling between the UE201 and the EPC203, providing bearer and connection management. HSS2032 is used to provide registers to manage functions such as home location register (not shown) and holds subscriber specific information about service characteristics, data rates, etc. All user data may be sent through SGW2034, PGW2035 may provide IP address assignment for UE201 and other functions, and PCRF2036 is a policy and charging control policy decision point for traffic data flow and IP bearer resources, which selects and provides available policy and charging control decisions for a policy and charging enforcement function (not shown).

The IP services 204 may include the internet, intranets, IMS (IP Multimedia Subsystem), or other IP services, among others.

Although the LTE system is described as an example, it should be understood by those skilled in the art that the present application is not limited to the LTE system, but may also be applied to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, and future new network systems.

Based on the above mobile terminal hardware structure and communication network system, various embodiments of the present application are provided.

First embodiment

Referring to fig. 3, the data processor provided in the present application includes the following steps:

step S10, when an audio data generation instruction is triggered based on the mobile terminal, acquiring target text data corresponding to the audio data generation instruction and initial audio data associated with the target text data;

step S20, generating target audio data corresponding to the target text data according to the association relationship between the target text data and the initial audio data.

Optionally, the target text data is text data obtained by editing initial text data associated with the initial audio data.

The mobile terminal of the embodiment can be a mobile phone, a palm computer, a recording pen with a display screen, and the like.

Referring to fig. 4 in combination, the mobile terminal includes a recording module 10, an audio data conversion module 20, a text editing module 60, a mapping module 30, and an audio data clipping module 50.

Optionally, the recording interface of the recording module 10 includes a recording control, a playing control, a pause control, and an adjusting anchor point, and the recording is controlled by the recording control, and in the recording process, the recording can be paused based on the pause control. After the recording is finished, the recording can be played based on the playing control, and in the playing process, the recording can be controlled to be paused based on the pause control. In the playing process, the playing progress can be adjusted based on the adjustment anchor point.

Optionally, the audio data conversion module 20 is configured to convert the recorded audio data into text data. The audio data conversion module 20 is connected to the recording module 10 to realize transmission of the audio data recorded by the recording module 10. After receiving the audio data, the audio data conversion module 20 converts the audio data into text data.

Optionally, the mapping module 30 is connected to the audio data conversion module 20, and the mapping module 30 is configured to establish a mapping relationship between the audio data and the text data in a process of converting the audio data into the text data. Optionally, the mapping module 30 establishes a mapping relationship between the audio data and the text data by mapping audio data segments and text data segments with preset lengths one by one, and optionally, the mapping relationship includes content mapping and position mapping, that is, the text data and the same content in the audio data have a mapping relationship, and/or the same content in the text data and the position in the audio data have a mapping relationship. Alternatively, the preset length may be one or at least two characters in length, or the preset length is determined based on a sentence length, such as a sentence audio mapping a sentence text, or an audio mapping a text.

Optionally, the text editing module 60 is connected to the audio data conversion module 20, and is configured to edit the text data. For example, after the audio data conversion module 20 converts the recorded audio data into text data, the user can edit the text data, such as deleting or adjusting the position or copying a certain segment of text, so that the text data better meets the user's requirements and has stronger readability.

Optionally, the document editing module supports cutting, copying, pasting, and sharing operations on the text.

Optionally, the audio data clipping module 50 is configured to clip the audio data. The audio data is connected to the text editing module 60, the text editing module 60 edits the text data and then transmits the edited target text data to the audio data clipping module 50, and the audio data clipping module 50 clips the initial audio data according to the mapping relationship between the target text data and the initial audio data to obtain the target audio data corresponding to the target text data. In this way, even if the text data is modified, the audio data and the text data can be made consistent at all times.

Optionally, after the audio data conversion module 20 of the mobile terminal converts the audio data into text data, a text corresponding to the converted text data may be displayed based on a display interface. And after the user edits the text, in order to enable the text data to be consistent with the audio data, the audio data corresponding to the edited text data can be generated based on an audio generation control of a display interface.

Optionally, the process of processing data when converting text into audio in this embodiment includes, but is not limited to, one of the following:

when the mobile terminal detects that the 'audio generation' control is triggered, the mobile terminal judges that an audio data generation instruction is received, and the mobile terminal determines target text data according to the audio data generation instruction. Optionally, each display interface of the text data has the "generate audio" control, the mobile terminal triggers the "generate audio" control, and the text data displayed in the interface where the control is located is the target text data. That is, in this embodiment, the "generate audio" control is associated with each text data, and the associated target text data can be obtained based on the triggered "generate audio" control.

After determining target text data, the mobile terminal calls initial audio data associated with the target text data based on the mapping relation between the text data and the audio data, and then generates target audio data corresponding to the target text data based on the association relation between the target text data and the initial audio data.

Optionally, the association relationship between the initial audio data and the initial text data is generated based on the initial audio data being converted into the initial text data.

Optionally, the association relationship includes an overall association of the initial audio data and the initial text data, or the association relationship includes a one-to-one association of the initial audio data and the audio data segment and the text data segment in the initial text data. Optionally, the initial audio data may be audio data generated during recording, or may also be audio data generated during previous text data editing, that is, when modification is performed on the basis of the text data modified last time, the initial audio data may be cut based on the audio data modified last time, so as to reduce data that needs to be processed.

Optionally, the manner of generating the target audio data corresponding to the target text data based on the association relationship between the target text data and the initial audio data includes, but is not limited to, the following manners:

and if the audio corresponding to the text in the target text data is extracted based on the preset mapping relation between the characters and the audio, combining the extracted audio according to the arrangement sequence of the characters to form the audio data corresponding to the target text data, and at the moment, cutting the initial audio data according to the audio data to form the target audio data.

Or, for example, the incidence relation when the initial text data is generated based on the initial audio data, the corresponding audio data is extracted based on the target text data, and then the target audio data is generated based on the extracted audio data.

Or as shown in the second embodiment.

Optionally, the text data includes a text data segment and a space data segment, and the conversion of the text data segment and the space data segment is included in the process of converting the text data segment into the audio data segment. In the process of processing the audio file, a user often has the editing processing requirement on the audio file, for example, large blank and useless information in the recording file need to be deleted, so that the transmission efficiency of the audio information is improved. Audio data segments cannot be deleted in the exemplary technique. In this embodiment, based on that text data can be converted into audio data, a user can process audio into text through the audio data conversion module 20, delete blank and useless information in the text, and then clip audio through the mapping relationship between the text and the audio, so that blank sections in the audio data are reduced, the audio clipping efficiency can be greatly improved, and user experience is improved.

In this embodiment, by associating the text data with the initial audio data, if the text data is modified, the target audio data corresponding to the modified text data may be generated based on an association relationship between the text data and the initial audio data. The conversion between the text data and the audio data is realized, so that the modified text and the corresponding audio data are modified, and the consistency of the text data and the audio data is kept.

Second embodiment

Referring to fig. 5, a second embodiment of the processing method is provided based on the above embodiments, where the step of generating target audio data corresponding to the target text data according to the association relationship between the target text data and the initial audio data includes:

step S21, determining audio data segments to be edited in the initial audio data and the editing type of each audio data segment to be edited based on the incidence relation between the target text data and the initial audio data;

step S22, editing the audio data segment to be edited corresponding to the initial audio data based on the editing type, and generating target audio data corresponding to the target text data.

The present embodiment is one of embodiments in which target audio data corresponding to the target text data is generated according to the association relationship between the target text data and the initial audio data. In this embodiment, when the initial audio data is converted into the initial text data, the audio data segments with preset lengths in the audio data are in one-to-one correspondence with the text data segments of the initial text data, so that when the initial text data segments are edited, the audio data segments corresponding to the edited text data segments can be determined based on the association relationship between the audio data segments and the text data segments, and thus, the audio data segments can be edited, so that the text data and the audio data are modified synchronously.

Optionally, the edit type includes, but is not limited to, delete, copy-paste, and location adjustment. When the editing types are different, the corresponding editing to the audio data is different. Alternatively, the user's edit type for the text data is the same as the edit type for the audio data. Therefore, before generating the target audio data, a text data segment edited by a user on the text data and an editing type of the text data segment are obtained, then when generating the target audio data, the audio data segment to be edited which needs to be edited is determined according to the mapping relation between the text data segment and the audio data segment, and then the audio data segment to be edited is edited by adopting the editing type. And sequentially editing the initial audio data based on the target text data, and splicing to form the target audio data.

Optionally, the following process of processing the initial audio data based on different editing types is exemplified:

in an embodiment, when the edit type is deletion, the editing the to-be-edited audio data segment corresponding to the initial audio data based on the edit type to generate the target audio data corresponding to the target text data includes:

deleting the audio data segment to be edited in the initial audio data;

and splicing the audio data of the audio data segment to be edited, so as to generate target audio data corresponding to the target text data.

If the user deletes the text data segment at the preset position in the text data, in the process of generating the target audio data by the target text data, the initial audio data has the audio data corresponding to the deleted text data segment. Based on the above, if it is detected that the text data segment which has no corresponding relation with the initial audio data segment exists in the target text data segment, the target text data segment is determined to be the deleted text data segment.

And acquiring a deleted text data segment in the target text data, taking an audio data segment corresponding to and associated with the deleted text data segment as the audio data segment to be edited, then deleting the audio data segment to be edited, and splicing the audio data segments adjacent to the front and back of the audio data segment to be edited after deleting the audio data segment to be edited, so that the audio data segments can be continuously connected.

Optionally, in this embodiment, after deleting the audio data segment to be edited, the manner of splicing the remaining audio data segments is as follows: after the audio data segment to be edited is determined based on the mapping relation between the deleted text data segment and the audio data segment, the starting point time and the ending point time of the audio data segment to be edited are obtained, the starting point time of the next audio data segment with the starting point time being the same as the ending point time of the audio data segment to be edited is adjusted, so that the starting point time of the next audio data segment is the same as the starting point time of the audio data segment to be edited, and after the audio data segment to be edited is deleted, other audio data are spliced to form continuous target audio data.

Optionally, in another embodiment, when the editing type is copy-paste, the editing the to-be-edited audio data segment corresponding to the initial audio data based on the editing type, and the generating the target audio data corresponding to the target text data includes:

copying the audio data segment to be edited in the initial audio data;

and pasting the audio data segment to be edited in the initial audio data to generate target audio data corresponding to the target text data.

If the user copies and pastes the first text data segment at the preset position in the text data, in the process of generating the target audio data from the target text data, the audio data segment corresponding to the position of the first text data segment is not in the initial audio data. Based on this, if it is detected that there is no audio data segment corresponding to the first text data segment in the initial audio data, it is determined that the first text data segment is a copied and pasted text data segment.

And acquiring a text data segment to be copied which is the same as the first text data segment in the target text data, taking an audio data segment which is correspondingly associated with the text data segment to be copied as an audio data segment to be edited, and then pasting the audio data segment to be edited in the initial audio data to enable the audio data to be edited to correspond to the first text data segment so as to generate target audio data corresponding to the target text data.

Optionally, the first text data segment is a certain text data segment in the target text data segment, if the user copies and pastes the first sentence in the first line to the head of the third line, the head of the third line is the first text data segment, and the first sentence in the first line is the text data segment to be copied.

Optionally, the audio data segment to be edited corresponding to the text data segment to be copied is determined based on the text data segment to be copied, the audio data segment to be edited is copied, the start time of the audio data segment corresponding to the position of the first text data segment after the start time of the audio data segment to be edited is adjusted, the end time of the audio data segment to be edited is adjusted to the end time of the audio data segment corresponding to the position of the first text data segment, and the audio data segment to be edited is copied and pasted on the corresponding position.

Optionally, in another embodiment, when the editing type is moving, the editing the audio data segment to be edited corresponding to the initial audio data based on the editing type, and the generating the target audio data corresponding to the target text data includes:

determining a target position of the audio data segment to be edited according to the target text data, wherein the target position comprises at least one of a starting time point and an ending time point;

and moving the audio data segment to be edited to the target position in the initial audio data to generate target audio data corresponding to the target text data.

If the user adjusts the second text data segment in the initial text data before the first text data segment, the position of the second text data segment in the target text data is formed to be different from the position in the initial text data. In the process of generating the target audio data from the target text data, if the position of the text data segment in the target text data is different from the position of the audio data segment in the audio data, the position mapping of the text data segment and the audio data segment is not matched. Based on the above, if it is detected that the text data segment which does not match with the position of the initial audio data segment exists in the target text data segment, the target text data segment is determined to be the moved text data segment.

The method comprises the steps of obtaining a moved text data segment in target text data, taking an audio data segment corresponding to and associated with the moved text data segment as an audio data segment to be edited (associated in content), determining a target position of the audio data segment to be edited based on the position of the moved text data segment in the text data segment, moving the audio data segment to be edited to the target position, and correspondingly adjusting the audio data segment corresponding to the target position and the audio data segment after the target position to form continuous target audio data.

Optionally, in this embodiment, the moving manner of the audio data segment to be edited includes, but is not limited to, the following manners: if the starting time and the ending time of the first audio data segment at the corresponding position of the moved text data segment are taken as the starting time and the ending time of the audio data segment to be edited, the starting time and the ending time of the audio data segment to be edited are adjusted, then the starting time of the first audio data segment is adjusted based on the modified ending time of the audio data segment to be edited, and further the ending time of the first audio data segment is correspondingly modified. Correspondingly, the starting time and the ending time of the audio data segments after the first audio data segment are modified in sequence until all the audio data segments are spliced into target audio data.

Optionally, when the user exchanges the positions of the second text data segment and the first text data segment in the initial text data, when generating the target audio data, directly modifying the starting time of the audio data segment to be edited mapped by the second text data segment to the starting time of the audio data segment to be edited mapped by the first text data segment, and modifying the starting time of the corresponding audio data segment to be edited mapped by the first text data segment to the starting time of the audio data segment to be edited mapped by the second text data segment; similarly, the end time of the audio data segment to be edited mapped by the second text data segment is modified into the end time of the audio data segment to be edited mapped by the first text data segment, and the end time of the corresponding audio data segment to be edited mapped by the first text data segment is modified into the end time of the audio data segment to be edited mapped by the second text data segment.

In this embodiment, based on the mapping relationship between the text segment and the audio segment, when the text segment is adjusted, the audio segment that is also adjusted is determined based on the mapping relationship, and then the audio segment is adjusted, so that after the text data is adjusted, the audio data is correspondingly adjusted, and after the text data is updated, the corresponding audio data is synchronously updated.

Optionally, the edit type further includes addition of a non-audio conversion field. If the convenient type is the addition of a non-audio conversion field, the mapping relation with the field does not exist in the corresponding initial audio data, and at the moment, the audio data segment corresponding to the field is mapped to be empty, namely, the empty audio data segment with the same field length is arranged in the audio data. In this way, the user can achieve the goal of adjusting the audio data based on the adjustment to the text data.

Optionally, referring to fig. 6, based on the foregoing embodiment, the present application further provides a third embodiment of a data processing method, where the data processing method includes:

step S110, determining initial audio data when a text conversion instruction is triggered based on the mobile terminal;

step S120, converting the initial audio data into corresponding initial text data;

step S130, storing the initial text data in association with the initial audio data.

The embodiment is applied to a mobile terminal, and referring to fig. 4, the mobile terminal includes a recording module 10, an audio data conversion module 20, and a mapping module 30.

Optionally, the recording interface of the recording module 10 includes a recording control, a playing control, a pause control, and an adjusting anchor point, and the recording is controlled by the recording control, and in the recording process, the recording can be paused based on the pause control. After the recording is finished, the recording can be played based on the playing control, and in the playing process, the recording can be controlled to be paused based on the pause control. In the playing process, the playing progress can be adjusted based on the adjustment anchor point.

Optionally, the audio data conversion module 20 is configured to convert the recorded audio data into text data. The audio data conversion module 20 is connected to the recording module 10 to realize transmission of the audio data recorded by the recording module 10. After receiving the audio data, the audio data conversion module 20 converts the audio data into text data.

Optionally, the mapping module 30 is connected to the audio data conversion module 20, and the mapping module 30 is configured to establish a mapping relationship between the audio data and the text data in a process of converting the audio data into the text data. Optionally, the mapping module 30 establishes a mapping relationship between the audio data and the text data by mapping audio data segments and text data segments with preset lengths one by one, and optionally, the mapping relationship includes content mapping and position mapping, that is, the text data and the same content in the audio data have a mapping relationship, and/or the same content in the text data and the position in the audio data have a mapping relationship. Alternatively, the preset length may be one or at least two characters in length, or the preset length is determined based on a sentence length, such as a sentence audio mapping a sentence text, or an audio mapping a text.

Optionally, a text conversion control is arranged in a playing interface of the audio data of the mobile terminal, and when the user triggers the text conversion control, the audio data conversion module 20 is started to convert the audio data into text data. Alternatively, the conversion mode may directly call the corresponding words based on the audio data to form the text data.

In the process of converting the initial audio data into corresponding initial text data, the mapping module 30 is started, a mapping relationship between the initial audio data and the audio data is established, and the initial text data is stored.

Optionally, the manner of storing the initial text data and the initial audio data in an associated manner includes storing a text data segment in the initial text data and an audio data segment in the target audio data in a one-to-one associated manner.

Therefore, when the text data is modified, the audio data can be correspondingly cut from the modified text data based on the mapping relation between the text data segment and the audio data segment, and the conversion between the text data and the audio data is realized.

Optionally, the step of performing one-to-one correspondence storage on the text data segment in the initial text data and the audio data segment in the target audio data includes:

dividing the initial text into at least two text segments according to the generation sequence of the initial text data;

dividing the target audio data into audio data segments corresponding to the text segments one by one according to the time sequence of the initial audio data;

and storing each text segment and each audio data segment in a one-to-one association manner.

Alternatively, in the dividing process, the text data may be divided based on punctuation marks, such as a sentence division into text segments. Alternatively, the division may be based on the length of characters in the text data, such as dividing a text segment by one character. Specific division ways include, but are not limited to, one or more of the above listed.

After the text data segments and the audio data segments are stored in association, when the initial text data is edited, the initial audio data may be edited correspondingly based on the editing operation and the association relationship between each text data segment and the audio data segment, where the specific editing manner is as described in the first to second embodiments, which is not repeated herein.

Optionally, in this embodiment, before the initial audio data is converted into the corresponding initial text data, a language type selected by a user may be acquired, and the initial audio data is converted into the initial text data corresponding to the language type. For example, if the recording is english and the user wants to convert the recording into a chinese text, the language type may be selected as chinese, and during the conversion of the audio data, the speech in the audio data is recognized, and then the language is translated by using the language type to form text data. The embodiment enables the audio data to be converted into texts of different language types, and enriches the text conversion function.

Optionally, during the process of converting audio data into text data, blank audio segments that are not successfully converted into text are automatically deleted.

In this embodiment, when the initial audio data is converted into corresponding initial text data, the initial text data and the initial audio data are stored in an associated manner, so that the text data and the audio data are associated, and based on the association relationship, the text data is edited and then converted into target audio data, thereby implementing a function of converting text into audio.

Optionally, based on the mobile terminal, the embodiment provides an operation process of the mobile terminal display:

firstly, after recording, clicking a play button of an audio bar to start playing audio;

secondly, dragging the anchor point to adjust the playing progress;

and thirdly, clicking a text conversion button, and converting the audio into the text by the background through an audio conversion module.

And fourthly, after the audio conversion text is successful, the background determines the mapping relation between the text field and the audio clip and the starting time and the ending time of the audio clip mapped with the text field through the text field and audio clip mapping module. The text-to-view button may change to a view text button.

And fifthly, clicking a text viewing button to enter an audio text page. The user edits the text content through the text editing module in the foreground. The editing module supports cutting, copying, pasting and sharing operations on the text.

And sixthly, clicking a button for generating audio after the editing is finished, and generating a new audio file according to the mapping relation between the edited text field and the audio fragment.

And the page appears toast to prompt that a new audio file is generated, and returns to the upper-level page.

The application also provides a mobile terminal, which comprises a memory and a processor, wherein the memory stores a processing program, and the processing program realizes the steps of the processing method in any embodiment when being executed by the processor.

The present application further provides a computer-readable storage medium, on which a processing program is stored, and when the processing program is executed by a processor, the processing program implements the steps of the processing method in any of the above embodiments.

In the embodiments of the mobile terminal and the computer-readable storage medium provided in the present application, all technical features of the embodiments of the processing method are included, and the expanding and explaining contents of the specification are basically the same as those of the embodiments of the method, and are not described herein again.

Embodiments of the present application also provide a computer program product, which includes computer program code, when the computer program code runs on a computer, the computer is caused to execute the method in the above various possible embodiments.

Embodiments of the present application further provide a chip, which includes a memory and a processor, where the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that a device in which the chip is installed executes the method in the above various possible embodiments.

It is to be understood that the foregoing scenarios are only examples, and do not constitute a limitation on application scenarios of the technical solutions provided in the embodiments of the present application, and the technical solutions of the present application may also be applied to other scenarios. For example, as can be known by those skilled in the art, with the evolution of system architecture and the emergence of new service scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The units in the device in the embodiment of the application can be merged, divided and deleted according to actual needs.

In the present application, the same or similar term concepts, technical solutions and/or application scenario descriptions will be generally described only in detail at the first occurrence, and when the description is repeated later, the detailed description will not be repeated in general for brevity, and when understanding the technical solutions and the like of the present application, reference may be made to the related detailed description before the description for the same or similar term concepts, technical solutions and/or application scenario descriptions and the like which are not described in detail later.

In the present application, each embodiment is described with emphasis, and reference may be made to the description of other embodiments for parts that are not described or illustrated in any embodiment.

The technical features of the technical solution of the present application may be arbitrarily combined, and for brevity of description, all possible combinations of the technical features in the embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present application should be considered as being described in the present application.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a controlled terminal, or a network device) to execute the method of each embodiment of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, memory Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

21页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:语音合成方法、装置、电子设备及存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!