Keyboard musical instrument and computer-implemented method of keyboard musical instrument
阅读说明:本技术 键盘乐器以及键盘乐器的计算机执行的方法 (Keyboard musical instrument and computer-implemented method of keyboard musical instrument ) 是由 橘敏之 于 2020-03-16 设计创作,主要内容包括:本发明提供一种键盘乐器以及键盘乐器的计算机执行的方法。键盘乐器具备:包含多个键的键盘,多个操作元件,其设置在键的长度方向的后侧且乐器壳体的顶面,包含第1操作元件和第2操作元件,第1操作元件与输出的语音数据的从第1定时到第2定时前为止的第1区间数据对应起来,第2操作元件与语音数据的从第2定时到第3定时前为止的第2区间数据对应起来;至少1个处理器,至少1个处理器根据向第1操作元件的第1用户操作决定第1图形的语调,通过第1图形的语调来指示与第1区间数据对应的语音的发音,至少1个处理器根据向第2操作元件的第2用户操作决定第2图形的语调,通过第2图形的语调来指示与第2区间数据对应的语音的发音。(The invention provides a keyboard musical instrument and a computer-implemented method of the keyboard musical instrument. A keyboard musical instrument is provided with: a keyboard including a plurality of keys, a plurality of operation elements provided on a top surface of a musical instrument case on a rear side in a longitudinal direction of the keys, the plurality of operation elements including a 1 st operation element and a 2 nd operation element, the 1 st operation element corresponding to 1 st section data of output voice data from a 1 st timing to before a 2 nd timing, the 2 nd operation element corresponding to 2 nd section data of the voice data from the 2 nd timing to before a 3 rd timing; at least 1 processor, at least 1 processor determines the intonation of the 1 st graph according to the 1 st user operation to the 1 st operation element, indicates the pronunciation of the voice corresponding to the 1 st section data through the intonation of the 1 st graph, at least 1 processor determines the intonation of the 2 nd graph according to the 2 nd user operation to the 2 nd operation element, and indicates the pronunciation of the voice corresponding to the 2 nd section data through the intonation of the 2 nd graph.)
1. A keyboard musical instrument is provided with:
a keyboard comprising a plurality of keys;
a plurality of operation elements provided on the rear side of the keys in the longitudinal direction and on the top surface of the instrument case, including a 1 st operation element and a 2 nd operation element, the 1 st operation element corresponding to 1 st section data of the output voice data from the 1 st timing to before the 2 nd timing, the 2 nd operation element corresponding to 2 nd section data of the voice data from the 2 nd timing to before the 3 rd timing; and
at least one of the number of the processors is 1,
said at least 1 processor determining a tone of a 1 st graphic according to a 1 st user operation to said 1 st operating element and indicating pronunciation of a voice corresponding to said 1 st section data by said determined tone of said 1 st graphic,
the at least 1 processor determines a tone of a 2 nd graphic according to a 2 nd user operation to the 2 nd operating element, and instructs pronunciation of a voice corresponding to the 2 nd section data by the determined tone of the 2 nd graphic.
2. The keyboard musical instrument according to claim 1,
when the number of pieces of section data included in the speech data is larger than the number of the plurality of operation elements, the at least 1 processor outputs 1 st section data corresponding to the 1 st operation element, and then changes the section data corresponding to the 1 st operation element from the 1 st section data to section data after the 1 st section.
3. The keyboard musical instrument according to claim 2,
when the number of the plurality of operation elements is assumed to be 8, the at least 1 processor associates the 1 st section data to the 8 th section data in the voice data with the plurality of operation elements at a certain timing,
the at least 1 processor outputs the 1 st section data corresponding to the 1 st operation element, and then changes the section data corresponding to the 1 st operation element from the 1 st section data to the 9 th section data following the 8 th section data.
4. The keyboard musical instrument according to claim 3,
the at least 1 processor adjusts at least one of the output of the 1 st section data and the output of the 2 nd section data so that the final pitch of the speech of the 1 st section data is continuously connected to the first pitch of the speech of the 2 nd section data.
5. The keyboard musical instrument according to claim 1,
the plurality of operating elements include a sliding operating element,
the at least 1 processor determines a certain tone pattern from among a plurality of tone patterns set in advance based on a slide operation amount based on a slide operation to the slide operation element.
6. The keyboard musical instrument according to claim 1,
the at least 1 processor emits speech at a pitch specified by an operation to the keyboard.
7. The keyboard musical instrument according to any one of claims 1 to 6,
the keyboard musical instrument includes a memory storing a learned acoustic model obtained by machine learning processing of voice data of a singer, the learned acoustic model outputting data representing an acoustic feature amount of the voice of the singer by inputting arbitrary lyric data and arbitrary pitch data,
the at least 1 processor deduces the voice of the singer based on data representing an acoustic feature amount of the voice of the singer, the data being output to the learned acoustic model in response to the input of the arbitrary lyric data and the arbitrary pitch data to the learned acoustic model,
the at least 1 processor assigns the voice of the 1 st section data of the singer to the voice of the determined pattern, and outputs the 1 st section data.
8. A computer-implemented method of a keyboard musical instrument,
the keyboard instrument includes:
a keyboard comprising a plurality of keys;
a plurality of operation elements provided on the rear side of the keys in the longitudinal direction and on the top surface of the instrument case, including a 1 st operation element and a 2 nd operation element, the 1 st operation element corresponding to 1 st section data of the output voice data from the 1 st timing to before the 2 nd timing, the 2 nd operation element corresponding to 2 nd section data of the voice data from the 2 nd timing to before the 3 rd timing; and
at least one of the number of the processors is 1,
the above-mentioned at least 1 processor carries out the following step:
deciding the intonation of the 1 st graph according to the 1 st user operation to the 1 st operation element,
indicating pronunciation of voice corresponding to the 1 st section data by the determined intonation of the 1 st graph,
decides the intonation of the 2 nd graph according to the 2 nd user operation to the 2 nd operation element,
the pronunciation of the voice corresponding to the 2 nd section data is instructed by the determined tone of the 2 nd graph.
9. The method of claim 8,
when the number of pieces of section data included in the speech data is larger than the number of the plurality of operation elements, the at least 1 processor outputs 1 st section data corresponding to the 1 st operation element, and then changes the section data corresponding to the 1 st operation element from the 1 st section data to section data after the 1 st section.
10. The method of claim 9,
when the number of the plurality of operation elements is assumed to be 8, the at least 1 processor associates the 1 st section data to the 8 th section data in the voice data with the plurality of operation elements at a certain timing,
the at least 1 processor outputs the 1 st section data corresponding to the 1 st operation element, and then changes the section data corresponding to the 1 st operation element from the 1 st section data to the 9 th section data following the 8 th section data.
11. The method of claim 10,
the at least 1 processor adjusts at least one of the output of the 1 st section data and the output of the 2 nd section data so that the final pitch of the speech of the 1 st section data is continuously connected to the first pitch of the speech of the 2 nd section data.
12. The method of claim 8,
the plurality of operating elements include a sliding operating element,
the at least 1 processor determines a certain tone pattern from among a plurality of tone patterns set in advance based on a slide operation amount based on a slide operation to the slide operation element.
13. The method of claim 8,
the at least 1 processor emits speech at a pitch specified by an operation to the keyboard.
14. The method according to any one of claims 8 to 13,
the keyboard musical instrument includes a memory storing a learned acoustic model obtained by machine learning processing of voice data of a singer, the learned acoustic model outputting data representing an acoustic feature amount of the voice of the singer by inputting arbitrary lyric data and arbitrary pitch data,
the above-mentioned at least 1 processor also carries out the following step:
deducing the voice of the singer from data indicating an acoustic feature amount of the voice of the singer, the data being output to the learned acoustic model in response to the input of the arbitrary lyric data and the arbitrary pitch data to the learned acoustic model,
assigning the voice of the determined pattern tone to the voice of the 1 st section data of the singer, and outputting the 1 st section data.
Technical Field
The present invention relates to a keyboard musical instrument capable of performing a performance such as Rap (Rap) and a computer-implemented method of the keyboard musical instrument.
Background
There is a singing method called rap. Speaking and singing is one of music methods of singing in spoken language or the like in accordance with the rhythm, or time progression of a melody line of music. In rap, in particular, a personalized music expression can be performed by changing the tone of a voice improvingly.
Thus speaking and singing both lyrics and flow (rhythm, melody line) and therefore the handicap when trying to sing with these is very high. At least some of the music elements included in the flow of speaking are automated, and even the initiator can be familiar with speaking as long as the remaining music elements can be played by an electronic musical instrument or the like in accordance with the automated elements.
As the 1 st prior art for automating singing, there is known an electronic musical instrument that outputs a singing voice synthesized by a segment-joining type synthesis method of joining and processing recorded voice segments (for example, japanese patent laid-open No. 9-050287).
However, in the above-described conventional art, although pitch designation can be performed on an electronic musical instrument in accordance with automatic progression of singing based on synthetic speech, a tone unique to the singing cannot be controlled in real time. In addition, it is not limited to rap, and it has been difficult to add a super tone to musical instrument performance.
Disclosure of Invention
According to the present invention, there is an advantage that a desired intonation can be added by a simple operation in the performance of musical instruments, singing.
In an example of the embodiment, a keyboard musical instrument includes: a keyboard comprising a plurality of keys; a plurality of operation elements provided on the rear side of the keys in the longitudinal direction and on the top surface of the instrument case, including a 1 st operation element and a 2 nd operation element, the 1 st operation element corresponding to 1 st section data of the output voice data from the 1 st timing to before the 2 nd timing, the 2 nd operation element corresponding to 2 nd section data of the voice data from the 2 nd timing to before the 3 rd timing; and at least 1 processor, wherein the at least 1 processor determines a tone of a 1 st graphic according to a 1 st user operation to the 1 st operating element, and indicates a pronunciation of a voice corresponding to the 1 st section data by the determined tone of the 1 st graphic, and the at least 1 processor determines a tone of a 2 nd graphic according to a 2 nd user operation to the 2 nd operating element, and indicates a pronunciation of a voice corresponding to the 2 nd section data by the determined tone of the 2 nd graphic.
Drawings
Fig. 1 shows an example of an appearance of an embodiment of an electronic keyboard instrument.
Fig. 2 is a block diagram showing an example of a hardware configuration of an embodiment of a control system of an electronic keyboard instrument.
Fig. 3 is a block diagram showing the main functions of the embodiment.
Fig. 4 is an explanatory diagram of the operation of specifying the bend slider (bend slider), the bend switch (bend switch), and the bend curve (bend curve) according to the embodiment.
Fig. 5 shows an example of the data structure of the embodiment.
Fig. 6 shows an example of the data configuration of the curved sound curve setting table according to the embodiment.
Fig. 7 shows an example of the data structure of the bending curve table according to the embodiment.
Fig. 8 is a main flowchart showing an example of control processing of the electronic musical instrument according to the present embodiment.
Fig. 9 is a flowchart showing a detailed example of the initialization process, the music tempo change process, and the rap start process.
Fig. 10 is a flowchart showing a detailed example of the switching process.
Fig. 11 is a flowchart showing a detailed example of the curved sound curve setting process.
Fig. 12 is a flowchart showing a detailed example of the automatic performance interruption process.
Fig. 13 is a flowchart showing a detailed example of the rap playback process.
Fig. 14 is a flowchart showing a detailed example of the bend sound processing.
Detailed Description
Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings. Fig. 1 shows an example of an external appearance of an electronic keyboard instrument 100 in which an automatic playing device as an information processing device is mounted. The electronic keyboard instrument 100 includes: a
As shown in fig. 1, a plurality of operating elements (slide operating elements 105) are provided on the rear side in the length direction of the keys (the user playing the keyboard instrument is located on the front side in the length direction of the keys) and on the top surface (upper side) of the instrument case. The 1 st switch panel 102, the 2 nd switch panel 103, the LCD104, and the
The plurality of operation elements may be a rotary operation element (knob operation element) 105, a push
Fig. 2 shows an example of a hardware configuration of an embodiment of a
The CPU201 executes the automatic playing control program stored in the ROM202 while using the RAM203 as a work memory, thereby executing the control operation of the electronic keyboard instrument 100 of fig. 1. Further, the ROM202 stores music data including lyric data and accompaniment data in addition to the control program and various fixed data described above.
The CPU201 is provided with a
The sound source LSI204 reads musical tone waveform data from, for example, a waveform ROM, not shown, in accordance with a sound emission control instruction from the CPU201, and outputs the musical tone waveform data to the D/
When the text data of the lyrics and the information on the pitch are given as the rap data 215 from the CPU201, the voice synthesis LSI205 synthesizes the voice data of the rap voice corresponding thereto and outputs the synthesized voice data to the D/a
The
The LCD controller 609 is an IC (integrated circuit) that controls the display state of the LCD 505.
Fig. 3 is a block diagram showing main functions of the present embodiment. Here, the
For example, as shown in fig. 3, the
The curved
For example, the
(non-patent document 1)
Bridge Excellent, high-Liang two 'deep body sample に, the voice synthesis (statistical voice synthesis based on deep learning)' Japan society of acoustics 73 volume No. 1 (2017), pp.55-62.
As shown in fig. 3, for example, the
The
The learning
In accordance with the input of the above-described rap data 311, the learning acoustic feature
The
The
For example, as shown in fig. 3, the learning result 315 (model parameter) may be stored in the ROM202 of the control system of the electronic keyboard instrument 100 of fig. 2 when the electronic keyboard instrument 100 of fig. 1 is shipped, and may be loaded from the ROM202 of fig. 2 to the
The
As a result of performance by a player matching the automatic performance, the text analysis section 307 inputs and analyzes vocal data 215, and the vocal data 215 includes information on the phoneme, pitch, and the like of the lyrics specified by the CPU201 of fig. 2. As a result, the text analysis unit 307 analyzes and outputs a speech feature quantity sequence 316, the speech feature quantity sequence 316 representing phonemes, parts of speech, words, and the like corresponding to the rap data 215.
The
The
The acoustic feature expressed by the learning acoustic feature series 314 and the acoustic feature series 317 include spectral data for modeling a vocal tract of a person and sound source data for modeling a vocal cord of a person. Examples of the spectrum parameter include a Mel cepstrum (Mel cepstrum) and a Line Spectrum Pair (LSP). As the sound source data, a fundamental frequency (F0) indicating a pitch frequency (pitch frequency) of human voice and a power value can be used. The
The sampling frequency with respect to the learning rap voice data 312 is, for example, 16KHz (kilohertz). In addition, for example, in the case where mel-frequency cepstral parameters obtained by the mel-frequency cepstral analysis processing are used as the spectral parameters included in the acoustic feature value sequence 314 and the acoustic feature value sequence 317 for learning, the update frame period is, for example, 5msec (milliseconds). When the mel-frequency cepstrum analysis processing is performed, the analysis window length is 25msec, the window function is a Blackman window (Blackman window), and the number of analyses is 24.
Next, a first embodiment of the statistical speech synthesis process configured by the
(non-patent document 2)
The wine caucasian, maoyangdiran, Nanjiao jiyan, Detian huiyi, and Beicun "whole-grain と singing スタイルを possibly な singing to synthesize システム" information processing society research report music information science (MUS)2008(12(2008-MUS-074)), pp.39-44, 2008-02-08
In the first embodiment of the statistical speech synthesis process, the HMM acoustic model is used to learn how time changes are made in the characteristic parameters of the vocal chord vibration and vocal tract characteristics when the user utters the lyrics following a certain melody. More specifically, the HMM acoustic model is a model in which a frequency spectrum, a fundamental frequency, and a temporal structure thereof obtained from the rap data for learning are modeled in units of phonemes.
Next, a second embodiment of the statistical speech synthesis process configured by the
The automatic performance action of the songs including the rap according to the embodiment of the electronic keyboard instrument 100 of fig. 1 and 2 using the statistical speech synthesis process described in fig. 3 will be described in detail below. Fig. 4 is an explanatory diagram of a bending sound curve specifying operation using the bending
The user performs specification of the bend curve and addition of a bend based on the specification of the bend curve in real time for the automatically-traveling rap song, for example, for every 16 beats (4 bars in the case of a song of 4/4 beats) in succession, using adjustment of the
For example, a bending
The above-described setting of the bending tone curve for each of the consecutive 16 beats by the bending
Specifically, the curved
As described above, in the present embodiment, the progress of the lyrics and time of the rap song is handed over to the automatic performance, and the user can specify the intonation pattern of the curve of the pitch such as the rap for each progress unit (for example, tempo), so that the user can easily enjoy the rap performance.
In particular, in this case, the user can specify a bend curve for each beat of the rap voice in real time for each 16 beats of the automatic performance in the automatic travel using, for example, the
Further, the user can also specify and store in advance a specification of a bend curve for each beat in association with, for example, a rap song to be automatically performed, and the
Thus, the user can slowly add the pitch of the rap voice to the rap song.
However, since the speech data (including various data forms such as music data, lyric data, and text data) generally has a larger number of sections than the number of the plurality of operation elements (the slide operation elements 105), the
Assuming that when the number of the plurality of operation elements (slide operation elements 105) is 8, the
The 1 st section data (
2 nd section data (section 2 of
3 rd section data (section 3 of
4 th section data (section 4 of
The 5 th section data (
6 th section data (section 2 of bar 2) of the 6 th operation element … …
7 th section data (section 3 of subsection 2) of 7 th operation element … …
8 th section data (section 4 of section 2 section) of the 8 th operation element … …
After the keyboard musical instrument outputs the 1 st section data corresponding to the 1 st operating element, the
That is, in the performance, the section data assigned to the 1 st operating element is changed in the order of 1 st section data → 9 th section data → 17 th section data → … …. That is, for example, at the timing when the sound emission of the singing voice is completed up to the 4 th beat of the 1 st subsection, section data assigned to each operation element is as follows.
The 1 st operation element … … data of the 9 th section (section of the 1 st beat of the 3 rd subsection)
The 2 nd section data (section 2 of section 3, section 2) of the 2 nd operation element … …
The 3 rd section data (section 3 of section) of the 3 rd operation element … … at 11 th section
The 4 th operation element … … data of the 12 th section (section of the 4 th beat of the 3 rd subsection)
The 5 th section data (
6 th section data (section 2 of bar 2) of the 6 th operation element … …
7 th section data (section 3 of subsection 2) of 7 th operation element … …
8 th section data (section 4 of section 2 section) of the 8 th operation element … …
According to the present invention, even if the number of operating elements is limited, since the interval of the voice data allocated to one operating element is changed during the performance, there is an advantage that it is possible to satisfactorily sing voice data regardless of the length.
In addition, regarding the combination of the intonation patterns assigned to the respective operation elements, for example, the combination of the intonation patterns such as the intonation pattern (1 st pattern) assigned 401(#0) to the 1 st operation element and the intonation pattern (2 nd pattern) assigned 401(#1) to the 2 nd operation element is not changed unless the
Of course, if the user operates the
In the embodiment of fig. 4, the
In the present embodiment, the songs are synthesized according to the pitch data specified by the user operating the
In the present embodiment, fig. 5 shows an example of the data structure of music data read from the ROM202 to the RAM203 in fig. 2. The data structure is based on a standard MIDI file format which is one of file formats for MIDI (Musical Instrument Digital Interface). The music data is constituted by blocks of data called chunks (chunk). Specifically, the music data is composed of a title block (header chunk) located at the head of the file, a track block (track chunk)1 following the title block and storing lyric data for lyric parts, and a track block 2 storing accompaniment data for accompaniment parts.
The header block is composed of four values of ChunkID (block ID), ChunkSize (block size), FormatType (format type), NumberOfTrack (track number), and TimeDvision. The ChunkID is a 4-byte ASCII code "4D 546864" (16-ary) corresponding to 4 characters in half-angle, such as "MThd" of the header block. ChunkSize is 4-byte data indicating the data length of the FormatType, NumberOfTrack, and TimeDvision parts other than ChunkID and ChunkSize in the header block, and the data length is fixed to 6 bytes "00000006" (16-ary digits). In the case of the present embodiment, the FormatType is 2-byte data "0001" (the number is 16-ary) indicating
The track blocks 1 and 2 are respectively composed of: ChunkID, ChunkSize, DeltaTime _1[ i ] and Event _1[ i ] (case of
DeltaTime _1[ i ] is variable length data of 1 to 4 bytes indicating the waiting time (relative time) from the execution time of the immediately preceding Event _1[ i-1 ]. Similarly, DeltaTime _2[ i ] is variable length data of 1-4 bytes indicating the latency (relative time) from the execution time of the immediately preceding Event _2[ i-1 ]. Event _1[ i ] is a meta-Event (metaevent) indicating the timing and pitch of the utterance of the lyrics being sung in
Fig. 6 shows an example of a data configuration of the curved sound curve setting table 600, and the curved sound curve setting table 600 stores the setting of the curved sound curve for each beat specified by the
Fig. 7 shows a curved-tone curve table 700, and the curved-tone curve table 700 stores curved-tone curves of 4 patterns, for example, corresponding to intonation patterns of curved-tone curves corresponding to 401(#0) to 401(#3) in fig. 4. This curved sound curve table 700 is stored in the ROM202 of fig. 2 by factory setting, for example. In fig. 7, 401(#0), 401(#1), 401(#2), and 401(#3) correspond to the graphs of the curved sound shown in fig. 4, respectively, and the memory addresses of the respective heads on the ROM202 are, for example, BendCurve [0], BendCurve [1], BendCurve [2], and BendCurve [3 ]. R is the resolution of the bend curve, e.g., R48. In each bend curve, the address offset represents an offset value with respect to the storage address of each start, and each of the offset
Fig. 8 is a main flowchart showing an example of control processing of the electronic musical instrument according to the present embodiment. This control processing is, for example, an operation in which the CPU201 of fig. 2 executes a control processing program loaded from the ROM202 to the
The CPU201 first executes initialization processing (step S801), and then repeatedly executes a series of processing of steps S802 to S8708.
In this iterative process, the CPU201 first executes a switching process (step S802). Here, the CPU201 executes processing corresponding to each switch operation of the first switch panel 102, the second switch panel 103, the bending
Next, the CPU201 executes a keyboard process, that is, a process of determining whether or not a certain key of the
Next, the CPU201 performs display processing, that is, processing data that should be displayed in the LCD104 of fig. 1, and displays the data on the LCD104 via the
Next, the CPU201 executes the rap reproduction process (step S805). In this process, the CPU201 executes the control process described in fig. 5 in accordance with the performance of the player, generates rap data 215, and outputs it to the
Next, the CPU201 executes sound source processing (step S806). In the sound source processing, the CPU201 executes control processing such as envelope control (envelope control) of a musical sound being generated in the
Finally, the CPU201 determines whether or not the player has pressed a shutdown switch (not particularly shown) and has shut down (step S807). If the determination at step S807 is no, the CPU201 returns to the process at step S802. If the determination in step S807 is yes, the CPU201 ends the control process shown in the flowchart of fig. 8, and turns off the power supply to the electronic keyboard instrument 100.
Fig. 9(a), (b), and (c) are flowcharts showing detailed examples of the initialization process of step S801 in fig. 8, the music tempo change process of step S1002 in fig. 10 described later in the switching process of step S802 in fig. 8, and the rap start process of step S1006 in fig. 10.
First, in (a) of fig. 9 showing a detailed example of the initialization process of step S801 of fig. 8, the CPU201 executes the initialization process of the TickTime. In the present embodiment, the progression of the lyrics and the automatic accompaniment proceeds in units of time such as TickTime. The time reference value designated as the TimeDvision value in the title block of the music data of fig. 5 indicates the resolution of the 4-note, and if the value is, for example, 480, the 4-note has a time length of 480 TickTime. In addition, the values of the latency DeltaTime _1[ i ] and DeltaTime _2[ i ] within the track block of the music data of fig. 5 are also counted in time units of the TickTime. Here, the 1TickTime is actually several seconds, and differs depending on the tempo of the music specified for the music data. Currently, when the music Tempo value is Tempo [ tap/minute ] and the time reference value is TimeDvision, the number of seconds of the TickTime is calculated by the following equation (1).
TickTime [ sec ] ═ 60/Tempo/TimeDvision (1)
Therefore, in the initialization process illustrated in the flowchart of fig. 9(a), the CPU201 first calculates the TickTime [ sec ] by the arithmetic processing corresponding to the above expression (1) (step S901). In addition, regarding the music velocity value Tempo, it is assumed that a predetermined value, for example, 60[ taps/sec ], is stored in the ROM20 of fig. 2 in the initial state. Alternatively, the music tempo value at the last end may be stored in the nonvolatile memory.
Next, the CPU201 sets a timer interrupt based on the TickTime [ sec ] calculated in step S901 for the
In addition, the later-described bend sound processing is executed in a time unit obtained by multiplying 1TickTime by D. This D is calculated by the following equation (2) using the time reference value TimeDivision indicating the resolution per 4-cent note described in fig. 3 and the resolution R of the bend curve table 700 described in fig. 7.
D=TimeDivision/R (2)
For example, as described above, the 4-point note (1 beat in the case of 4/4 beats) is 480TickTime, and when R is 48, the bend processing is performed for each D480/R480/48 to 10 TickTime.
Next, the CPU201 executes other initialization processing such as initialization of the RAM203 of fig. 2 (step S903). After that, the CPU201 ends the initialization processing of step S801 of fig. 8 illustrated in the flowchart of (a) of fig. 9.
The flowcharts of (b) and (c) of fig. 9 will be described later. Fig. 10 is a flowchart showing a detailed example of the switching process in step S802 in fig. 8.
The CPU201 first determines whether or not the lyric progression and the tempo of the music for automatic accompaniment are changed by the tempo change switch in the first switch panel 102 of fig. 1 (step S1001). If the determination is yes, the CPU201 executes a music tempo change process (step S1002). Details of this processing will be described later with reference to fig. 9 (b). If the determination at step S1001 is no, the CPU201 skips the processing at step S1002.
Next, the CPU201 determines whether a certain rap song is selected in the second switch panel 103 of fig. 1 (step S1003). If the determination is yes, the CPU201 executes the rap song reading process (step S1004). This processing is a process of reading music data having the data structure described in fig. 5 from the ROM202 to the RAM203 in fig. 2. Thereafter, data access to the
Next, the CPU201 determines whether or not the rap start switch is operated in the first switch panel 102 of fig. 1 (step S1005). If the determination is yes, the CPU201 executes the rap start processing (step S1006). Details of this processing will be described later with reference to (c) of fig. 9. If the determination in step S1005 is no, the CPU201 skips the process in step S1006.
Then, the CPU201 determines whether or not the bend curve setting start switch is operated on the 1 st switch panel 102 in fig. 1 (step S1007). If the determination is yes, the CPU201 executes the bending sound curve setting process based on the bending
Finally, the CPU201 determines whether or not another switch is operated in the first switch panel 102 or the second switch panel 103 of fig. 1, and executes a process corresponding to each switch operation (step S1009). After that, the CPU201 ends the switching process of step S802 of fig. 8 illustrated in the flowchart of fig. 10.
Fig. 9(b) is a flowchart showing a detailed example of the music tempo change process in step S1002 in fig. 10. As described above, when the music tempo value is changed, the TickTime [ second ] is also changed. In the flowchart of fig. 9(b), the CPU201 executes control processing relating to the change of the TickTime [ sec ].
First, as in the case of step S901 in fig. 9(a) executed in the initialization processing in step S801 in fig. 8, the CPU201 calculates the TickTime [ sec ] by the arithmetic processing corresponding to the above expression (1) (step S911). The music Tempo value Tempo is stored in the RAM203 or the like after being changed by the music Tempo change switch in the first switch panel 102 in fig. 1.
Next, as in the case of step S902 of fig. 9(a) executed in the initialization process of step S801 of fig. 8, the CPU201 sets a timer interrupt based on the TickTime [ sec ] calculated in step S911 for the
Fig. 9 (c) is a flowchart showing a detailed example of the rap start processing in step S1006 in fig. 10.
First, the CPU201 initially sets the value of a variable ElapseTime on the RAM203 indicating the elapsed time from the start of the automatic performance to 0 in units of TickTime while the automatic performance is progressing. Similarly, the values of the variables DeltaT _1 (track block 1) and DeltaT _2 (track block 2) on the RAM203 for counting the relative time from the occurrence time of the immediately preceding event are both initially set to 0 in the unit of TickTime. Next, the CPU201 sets the values of the variable AutoIndex _1 on the RAM203 for specifying the values of the respective i of the performance data sets DeltaTime _1[ i ] and Event _1[ i ] (1 ≦ i ≦ L-1) in the
Next, the CPU201 initially sets the value of a variable SongIndex on the RAM203 for indicating the current talking position to 0 (step S922).
Then, the CPU201 initially sets a value of a variable SongStart on the RAM203 indicating that the progression of the lyrics and the accompaniment is performed (═ 1) or not performed (═ 0) to 1 (progression) (step S923).
After that, the CPU201 determines whether the player has performed setting for accompaniment reproduction matching the reproduction of the rap lyrics through the first switch panel 102 of fig. 1 (step S924).
If the determination at step S924 is yes, the CPU201 sets the value of the variable Bansou on the RAM203 to 1 (accompanied) (step S925). In contrast, if the determination at step S924 is no, the CPU201 sets the value of the variable Bansou to 0 (no accompaniment) (step S926). After the processing of step S925 or S926, the CPU201 ends the rap start processing of step S1006 of fig. 10 illustrated in the flowchart of (c) of fig. 9.
Fig. 11 is a flowchart showing a detailed example of the curved sound curve setting process in step S1008 in fig. 10. First, the CPU201 specifies a setting start position (bar number) in units of, for example, 16 beats (4 bars in the case of 4/4 beats) (step S1101). Since the bend curve setting process can be executed in real time along with the progress of the automatic performance, the initial value is, for example, the 0 th bar, and the next 16 th, 32 nd and … … can be automatically designated in sequence every time the setting for each 16 beats is completed. In order to change the setting of the tempo in the current automatic performance, the user can designate 16 beats in succession including the tempo in the current performance as the setting start position by a switch not shown in particular on the 1 st switch panel 102, for example.
Next, the CPU201 acquires the lyric data of the 16 beats (4 measures) of rap designated in step S1101 from the ROM202 (step S1102). The CPU201 displays the lyric data of the rap thus obtained on, for example, the LCD104 in fig. 2, in order to assist the user in specifying the bend curve.
Next, the CPU201 sets the initial value of the beat position in the consecutive 16 beats to 0 (step S1103).
After that, the CPU201 initially sets the value of the variable i on the RAM203 indicating the beat positions in 16 consecutive beats to 0 in step S1103, and then repeatedly executes step S1104 and step S1105(#0 to #3) by 16 beats by increasing the value of i by 1 at a time in step S1106 until it is determined in step S1107 that the value of i exceeds 15.
In the above-described repetitive processing, first, the CPU201 reads the slider value (S) of the slider at the beat position i on the bending
Next, when the slider value at the beat position i is "s" 0, the CPU201 stores the
Section number ═ (section number designated in S1101) + (integer part of 4/i) (3)
Remainder of beat position i/4 (4)
When the slider value at the beat position i is s equal to 1, the CPU201 stores the
When the slider value at the beat position i is s equal to 2, the CPU201 stores the number 2 of the inflexion curve 401(#1) in fig. 4 or fig. 7 in the inflexion curve number entry in the inflexion curve setting table 600 in fig. 6. The value of each item of the bar number and the beat number at this time is calculated by the above-described equations (3) and (4), and the value is stored (step S1105(#2) above).
When the slider value at the beat position i is s equal to 3, the CPU201 stores the number 3 of the inflexion curve 401(#1) in fig. 4 or fig. 7 in the inflexion curve number entry in the inflexion curve setting table 600 in fig. 6. The value of each item of the bar number and the beat number at this time is calculated by the above-described equations (3) and (4), and the value is stored (step S1105(#3) above).
In the repetition of the above-described processing, when it is determined in step S1107 that the value of the variable i has reached 15, the CPU201 ends the processing in the flowchart in fig. 11, and ends the curved sound curve setting processing in step S1008 in fig. 10.
Fig. 12 is a flowchart showing a detailed example of the automatic performance interruption process executed based on an interruption (see step S902 of fig. 9(a) or step S912 of fig. 9 (b)) generated for each tick time [ second ] in the
First, the CPU201 executes a series of processing corresponding to the track block 1 (steps S1201 to S1206). First, the CPU201 determines whether the SongStart value is 1, that is, whether the travel of the lyrics and the accompaniment is instructed (step S1201).
If it is determined that the progression of the lyrics and the accompaniment is not instructed (no in step S1201), the CPU201 does not proceed with the lyrics and the accompaniment and directly ends the automatic performance interruption process illustrated in the flowchart of fig. 12.
When determining that the progression of the lyrics and accompaniment is instructed (the determination at step S1201 is yes), the CPU201 first increments the value of the variable ElapseTime on the RAM203 indicating the elapsed time in TickTime from the start of the automatic performance by 1. Since the automatic performance interruption process of fig. 12 occurs every tick time second, the value obtained by adding 1 every time the interruption occurs becomes the value of ElapseTime. The value of the variable ElapseTime is used to calculate the current bar number and beat number in step S1406 of the bending process of fig. 14, which will be described later.
Next, the CPU201 determines whether or not the DeltaT _1 value indicating the relative time from the occurrence time of the previous event with respect to the
If the determination at step S1203 is no, the CPU201 increments the DeltaT _1 value indicating the relative time from the occurrence time of the previous event by +1 for the
If the determination at step S1203 is yes, the CPU201 executes Event [ AutoIndex _1] of the performance data set indicated by the AutoIndex _1 value for the track block 1 (step S1205). The event is a rap event containing lyric data.
Next, the CPU201 stores the AutoIndex _1 value indicating the position of the next rap event to be executed within the
Also, the CPU201 increments the AutoIndex _1 value for referring to the performance data set within the
Further, the CPU201 resets the DeltaT _1 value indicating the relative time from the occurrence time of the rap event, which is referred to this time for the
Next, the CPU201 executes a series of processing corresponding to the track block 2 (steps S1208 to S1214). First, the CPU201 determines whether or not the DeltaT _2 value indicating the relative time from the occurrence time of the previous event with respect to the track block 2 coincides with the waiting time DeltaTime _2[ AutoIndex _2] of the performance data set desired to be executed from now on, which is indicated by the AutoIndex _2 value (step S1208).
If the determination at step S1208 is no, CPU201 increments the DeltaT _2 value indicating the relative time from the occurrence time of the previous event by +1 for track block 2, and advances the time by 1TickTime unit corresponding to the current interrupt (step S1209). After that, the CPU201 advances to the bend sound processing of step S1211.
If the determination in step S1208 is yes, the CPU201 determines whether or not the value of the variable Bansou on the RAM203 instructing accompaniment reproduction is 1 (accompanied by accompaniment) (step S1210) (refer to steps S924 to S926 of fig. 9 (c)).
If the determination at step S1210 is yes, the CPU201 executes an EVENT _2[ AutoIndex _2] associated with the accompaniment of the track block 2, which is indicated by the AutoIndex _2 value (step S1211). If the EVENT _2[ AutoIndex _2] executed here is, for example, a note-on EVENT, a sounding command for an accompaniment musical tone is issued to the sound source LSI204 of fig. 2 by the Key number (Key number) and the velocity specified by the note-on EVENT. On the other hand, if the EVENT _2[ AutoIndex _2] is, for example, a note-off EVENT, a mute command for an accompaniment musical sound in sound is issued to the sound source LSI204 of fig. 2 by the key number and speed specified by the note-off EVENT.
On the other hand, if the determination at step S1210 is "no", the CPU201 skips step S1211, does not execute EVENT _2[ AutoIndex _2] relating to the current accompaniment, and proceeds to the processing at next step S1212 so as to advance in synchronization with the lyrics, and executes only the control processing for advancing the EVENT.
After step S1211 or in the case of the determination of S1210 being "no", the CPU201 increments the AutoIndex _2 value of the performance data set for accompaniment data on the reference track block 2 by +1 (step S1212).
Further, the CPU201 resets the DeltaT _2 value indicating the relative time from the occurrence time of the event executed this time to 0 for the track block 2 (step S1213).
Then, the CPU201 determines whether or not the waiting time DeltaTime _2[ AutoIndex _2] of the performance data set on the next executed track block 2 indicated by the AutoIndex _2 value is 0, that is, whether or not it is an event executed simultaneously with the event of this time (step S1214).
If the determination at step S1214 is no, the CPU201 proceeds to the bending sound processing at step S1211.
If the determination at step S1214 is yes, the CPU201 returns to step S1210 and repeats the control process for the EVENT _2[ AutoIndex _2] of the performance data set executed next to and after the track block 2 indicated by the AutoIndex _2 value. The CPU201 repeatedly executes the processing of steps S1210 to S1214 by the number of simultaneous executions this time. The above processing sequence is executed in the case where a plurality of note-on events are sounded at synchronized timing, such as chord and the like, for example.
After the process of step S1209, or in the case where the determination of step S1214 is "no", the CPU201 executes the bending sound process (step S1211). Here, the processing corresponding to the bending
Fig. 13 is a flowchart showing a detailed example of the rap reproduction processing in step S805 in fig. 8.
First, the CPU201 determines in step S1205 of the automatic performance interrupt process of fig. 12 whether a value other than the Null value is set for the variable songandex on the RAM203 (step S1301). The SongIndex value indicates whether or not the current timing is the reproduction timing of the rap voice.
If the determination in step S1301 is yes, that is, if the timing of the rap playback is reached at the current time point, the CPU201 determines whether or not a new key pressed by the user on the
If the determination of step S1302 is yes, the CPU201 sets the pitch specified by the player through the key as the pitch of utterances to a register not shown in particular or a variable on the RAM203 (step S1303).
Next, the CPU201 reads out the rap lyric character string from the rap EVENT _1[ songddex ] on the
On the other hand, when it is determined that the current time point is the timing of the rap playback by the determination of step S1301 and it is determined that no new key operation is detected at the current time point in step S1302, that is, it is determined that no new key operation is detected at the current time point, the CPU201 reads out data of a pitch from the rap EVENT _1[ song index ] on the
When the music is played for speaking, the pitch can be linked with the pitch of the melody or not.
After that, the CPU201 generates the rap data 215 by executing the processing of step S1305 described above, and instructs the speech synthesis LSI205 to utterance processing (step S1305), the rap data 215 being used to emit the rap
After the process of step S1305, the CPU201 stores the rap position reproduced, which is indicated by the variable songanex on the RAM203, in the variable songanex _ pre on the RAM203 (step S1306).
Then, the CPU201 clears the value of the variable songandex to Null value, and sets the timing after that to a state other than the rap reproduction timing (step S1307). After that, the CPU201 ends the rap reproduction processing of step S805 in fig. 8 illustrated in the flowchart of fig. 13.
If the determination of step S1301 is "no", that is, if the current time point is not the timing of rap playback, the CPU201 determines whether or not a new key pressed by the player on the
When the determination of step S1308 is no, the CPU201 directly ends the rap playback processing of step S805 in fig. 8 illustrated in the flowchart of fig. 13.
When the determination of step S1308 is yes, the CPU201 generates and outputs to the voice synthesis LSI205 rap data 215 for instructing to change the pitch of the rap
By the above processing of step S1309, the pitch of the utterance of the talking
After the process of step S1309, the CPU201 ends the rap reproduction process of step S805 of fig. 8 illustrated in the flowchart of fig. 13.
Fig. 14 is a flowchart showing a detailed processing example of the bend processing of step S1211 in the automatic performance interruption processing of fig. 12. First, the CPU201 increments the value of the variable DividingTime in the RAM203 by 1 (step S1401).
After that, the CPU201 determines whether or not the value of the variable divedingtime matches the value D calculated by the above equation (2) (step S1402). If the determination at step S1402 is no, the CPU201 directly ends the bending sound processing at step S1211 of fig. 12 illustrated in the flowchart of fig. 14. D is a value indicating how many times the TickTime is multiplied, and therefore, the automatic performance interruption process of fig. 12 is executed every 1TickTime, and the essential process of the bend processing of fig. 14 called out therefrom is executed every DTickTime. For example, if D is set to 10, the bend processing is performed every 10 TickTime. Since the value of the variable DividingTime is initially set to D-1 in step S921 of the rap start processing in fig. 9 (c), the determination in step S1402 must be yes after the processing in step S1401 when the first interrupt processing for the automatic playing at the start of the automatic playing is executed.
If the determination in step S1402 is yes, the CPU201 resets the value of the variable divatingtime to 0 (step S1403).
Next, the CPU201 determines whether the value of the variable bendaresoffsleep on the RAM203 coincides with the final address R-1 within one curved sound curve (step S1404). Here, it is determined whether or not the bend processing for one beat is finished. In step S921 of the above-described rap start processing in fig. 9 (c), since the value of the variable bendaresoffoffset is initially set to R-1, the determination in step S1404 is necessarily yes when the first automatic performance interruption processing at the time of starting the automatic performance is executed.
If the determination in step S1404 is yes, the CPU201 resets the value of the variable bendaresoffshiftet to a value 0 (see fig. 7) indicating the head of the curved sound curve (step S1405).
After that, the CPU201 calculates the current bar number and beat number from the value of the variable ElapseTime (step S1406). At 4/4 beats, the number of tick time of 1 beat is given by the value of time division, so the variable ElapseTime is divided by the value of time division, and the result is further divided by 4 (the number of beats per 1 bar), whereby the current bar number and beat number can be calculated.
Next, the CPU201 acquires the inflexion curve number corresponding to the bar number and the tempo number calculated in step S1406 from the inflexion curve setting table 600 illustrated in fig. 6, and sets the value to the variable CurveNum on the RAM203 (step S1407).
On the other hand, if the value of the variable bendaresoffsleep on the RAM203 does not reach the final address R-1 in one curved sound curve and the determination in step S1404 is no, the CPU201 increments the value of the variable bendaresoffsleep which indicates the offset address in the curved sound curve by 1 (step S1409).
Next, the CPU201 determines whether the data has the bend curve number in the variable CurveNum by executing step S1407 in the automatic performance interruption process of this time or the previous time (step S1408).
If the determination of step S1408 is "yes", the CPU201 acquires a pitch bend value (see fig. 7) from an address of the pitch bend curve table 700 obtained by adding the offset value obtained by the variable bendadressoffset to the head address bendarverve [ CurveNum ] of the pitch bend data of the ROM202 corresponding to the pitch bend number obtained by the variable CurveNum (step S1410).
Finally, as in the case described in step S1309 of fig. 13, the CPU201 generates and outputs to the speech synthesis LSI205 the rap data 215 for instructing to change the pitch of the rap
If the CurveNum number is not obtained in the variable CurveNum and the determination in step S1408 is "no", the user invalidates the setting of the curvecurve for the beat, and therefore the CPU201 directly ends the curvesound processing in step S1211 of fig. 12 illustrated in the flowchart of fig. 14.
As described above, in the present embodiment, it is possible to execute, for each beat, a bending sound process corresponding to a bending sound curve specified by the user in real time or in advance for the beat.
In addition to the above-described embodiments, when different inflection curves are designated at the connection portion between the beat and the beat, the
In the above-described embodiment, the user sets the bend curve for each beat in, for example, 16 consecutive beats (4 bars in the case of 4/4 beats), but a user interface may be adopted in which a combination of the bend curves for 16 beats is designated at once. Thus, the vocal performance of the named vocal singer can be directly simulated and designated easily.
Further, it is possible to provide an emphasis unit that emphasizes a intonation by changing a bend curve in real time or for a predetermined number (for example, 4 beats) in which beats such as the beginning of each bar continue. This enables a more colorful rap presentation.
In the above-described embodiment, the bend processing is performed on the pitch of the rap voice as the pitch bend, but the bend processing may be performed on the intensity, tone color, or the like of the sound other than the pitch. This enables a more colorful rap presentation.
In the above-described embodiment, the tone pattern is specified for the rap voice, but the tone pattern may be specified for music information of musical instrument voices other than the rap voice.
In the first embodiment of the statistical speech synthesis process using the HMM acoustic model described with reference to fig. 3 and 4, a subtle musical expression such as a specific singer and a singing style can be reproduced, and smooth speech sound quality without connection distortion can be realized. Moreover, by converting the learning result 315 (model parameter), it is possible to adapt to other singers and express various voices and emotions. Further, it is possible to machine-learn all the model parameters in the HMM acoustic model from the learning rap data 311 and the learning rap speech data 312, thereby obtaining the characteristics of a specific singer as the HMM acoustic model, and automatically construct a musical tone synthesis system expressing these characteristics at the time of synthesis. The fundamental frequency and length of speech are based on the melody of a musical score and the tempo of a musical composition, and the time structure of pitch and rhythm can be uniquely determined from the musical score. In actual speaking voice, the voice is not drawn in order like a musical score, and there is a style unique to each singer depending on the sound quality, the sound level, and the temporal structural change thereof. In the first embodiment of the statistical speech synthesis process using the HMM acoustic model, the spectral information and the time-series change of pitch information in the rap speech can be modeled according to the contents, and by considering the score information, speech reproduction closer to the actual rap speech can be performed. The HMM acoustic model used in the first embodiment of the statistical speech synthesis process corresponds to a generation model that generates a speech while changing with time what kind of vocal cord vibration of a singer or a sequence of acoustic features of a speech in vocal tract characteristics occurs when a lyric according to a certain melody is uttered. In the first embodiment of the statistical speech synthesis process, a synthesis of a rap speech is realized by using an HMM acoustic model including the content of "deviation" between notes and speech, and a singing method that can accurately reproduce a singing method having a tendency to change complicatedly depending on the vocal characteristics of a singer is realized. The technique according to the first embodiment of the statistical speech synthesis process using the HMM acoustic model as described above is combined with a technique based on real-time performance of the electronic keyboard instrument 100, for example, and can accurately reflect the singing method and the voice quality of a singer that becomes a model, which cannot be realized in an electronic musical instrument of a conventional segment synthesis method or the like, and can realize a rap speech performance such as a certain rap singer actually speaking in accordance with a keyboard performance or the like of the electronic keyboard instrument 100.
In the second embodiment of the statistical speech synthesis process using DNN acoustic models described with reference to fig. 3 and 5, the HMM acoustic models that depend on the content based on the decision tree in the first embodiment of the statistical speech synthesis process are replaced with DNN as an expression of the relationship between the speech feature sequence and the acoustic feature sequence. Thus, the relationship between the speech feature sequence and the acoustic feature sequence can be expressed by a complex nonlinear transformation function that is difficult to express by a decision tree. Further, in the HMM acoustic model depending on the contents based on the decision tree, the corresponding learning data is also classified according to the decision tree, and therefore the learning data to which the HMM acoustic model depending on each content is assigned is reduced. In contrast, in the DNN acoustic model, since a single DNN is learned from all the learning data, the learning data can be efficiently used. Therefore, the DNN acoustic model can predict the acoustic feature amount with higher accuracy than the HMM acoustic model, and can significantly improve the naturalness of the synthesized speech. In addition, the DNN acoustic model can use a speech feature sequence related to a frame. That is, since the temporal correspondence relationship between the acoustic feature sequence and the language feature sequence is determined in advance in the DNN acoustic model, it is possible to use the language feature amount related to the frame such as "the number of continuation frames of the current phoneme" and "the intra-phoneme position of the current frame" which are difficult to be considered in the HMM acoustic model. Thus, by using the speech feature amount related to the frame, more detailed features can be modeled, and the naturalness of the synthesized speech can be improved. The technique of the second embodiment of the statistical speech synthesis process using the DNN acoustic model as described above can more naturally approximate the singing method and the vocal quality of the model rap singer by the keyboard performance or the like by combining with the technique of the real-time performance of the electronic keyboard instrument 100, for example.
In the above-described embodiment, by adopting the technique of statistical speech synthesis processing as the speech synthesis method, it is possible to realize a memory capacity that is extremely small compared to the conventional segment synthesis method. For example, although a memory having a storage capacity of several hundred megabytes is necessary for storing speech segment data in an electronic musical instrument of the segment synthesis method, a memory having a storage capacity of several megabytes is sufficient for storing the model parameters of the learning result 315 shown in fig. 3 in the present embodiment. Therefore, a lower price of the electronic musical instrument can be realized, and a high-quality rap performance system can be used for a wider user layer.
Further, in the conventional clip data method, although it takes much time (in units of years) and labor to create data for a rap performance because it is necessary to manually adjust the clip data, it is almost unnecessary to adjust the data when generating the model parameters of the learning result 315 for the HMM acoustic model or the DNN acoustic model in the present embodiment, and thus it takes only a fraction of the generation time and labor. According to these, a lower price electronic musical instrument can be realized. Further, it is also possible for a general user to learn his or her own voice, family voice, voice of a famous person, or the like using a learning function built in the
In the embodiments described above, the present invention was implemented for an electronic keyboard musical instrument, but the present invention can also be applied to other electronic musical instruments such as an electronic stringed musical instrument.
The speech synthesis method that can be used by the
In the above-described embodiment, the speech synthesis method of the first embodiment of the statistical speech synthesis process using the HMM acoustic model or the subsequent second embodiment using the DNN acoustic model has been described, but the present invention is not limited to this, and any speech synthesis method such as an acoustic model combining HMM and DNN may be employed as long as the technique using the statistical speech synthesis process is used.
In the above-described embodiment, the lyric information of singing is provided as music data, but text data obtained by speech-recognizing the content of singing by the player in real time may be provided as the lyric information of singing in real time.
- 上一篇:一种医用注射器针头装配设备
- 下一篇:多媒体点播方法、移动终端及计算机可读存储介质