Data object English coding method and device

文档序号:782688 发布日期:2021-04-09 浏览:30次 中文

阅读说明:本技术 数据对象英文编码方法及装置 (Data object English coding method and device ) 是由 张冀兰 姚昊 杨加东 郭强 刘华 熊伟 富会佳 肖薇 杨沥铭 于 2020-10-30 设计创作,主要内容包括:本公开属于核电技术领域,具体涉及一种数据对象英文编码方法及装置。本公开将词典库分成互不相交的多个词库,根据数据对象的类别,确定该数据对象的词库调用顺序,由此可以灵活适应不同数据对象的编译特点,由此更加高效的对数据堆型进行编译,可操作性强、编码质量高。此外,本公开将业务领域专家的知识抽象形成核电数据对象英文编码的知识库,实现知识的信息化,达到知识共享的目的,提高了英文编码的质量;以词典库和英文编码规范为基础,减少对业务领域专家依赖,提高工作效率;统一的词典和标准的英文编码规范,形成了规范的数据对象英文编码,消除了一词多译。(The disclosure belongs to the technical field of nuclear power, and particularly relates to a data object English coding method and device. According to the method, the dictionary base is divided into a plurality of word bases which are not intersected with each other, and the word base calling sequence of the data object is determined according to the category of the data object, so that the method can flexibly adapt to the compiling characteristics of different data objects, and can compile the data heap type more efficiently, and is high in operability and high in coding quality. In addition, the knowledge of experts in the service field is abstracted to form a knowledge base of the nuclear power data object English code, knowledge informatization is achieved, the purpose of knowledge sharing is achieved, and the quality of the English code is improved; on the basis of a dictionary database and an English encoding specification, the dependence on experts in the business field is reduced, and the working efficiency is improved; the unified dictionary and the standard English coding standard form the standard data object English coding, and the one-word multi-translation is eliminated.)

1. A method for encoding a data object in english, the method comprising:

acquiring a data object to be coded;

determining a word stock calling sequence required in the process of coding the data object according to the category associated with the data object and the corresponding relation between the category and the word stock calling sequence, wherein the contents of all word stocks are not repeated;

performing word segmentation processing on the data object to obtain a plurality of words;

and sequentially calling the word libraries according to the determined word library calling sequence to code the words until the words are coded to form a coding result.

2. The method of claim 1, wherein sequentially calling word banks according to the determined word bank calling sequence to encode the plurality of words until the plurality of words are encoded, and forming an encoding result comprises:

and when the word stock is called each time according to the determined word stock calling sequence, if the unencoded words in the words are matched with the called word stock, the matched words are coded according to the called word stock until the words are coded, and a coding result is formed.

3. The method of claim 1, further comprising:

determining the character length of the encoding result;

judging whether the character length of the coding result meets a preset condition or not;

and under the condition that the character length of the coding result does not accord with the preset condition, continuously replacing the non-abbreviated word with the longest character of the plurality of words with the abbreviated word according to the called Chinese-English abbreviation comparison library to form a new coding result until the character length of the new coding result accords with the preset condition.

4. The method of claim 1, further comprising:

if the type of the data object is judged to be the table name, acquiring the field associated with the data object;

sequentially calling each word bank according to the determined word bank calling sequence to encode the plurality of words until the plurality of words are encoded, and forming an encoding result, wherein the encoding result comprises the following steps:

and sequentially calling the word banks according to the determined word bank calling sequence to code the plurality of words and the field of the data object until the fields of the plurality of words and the data object are coded to form a coding result.

5. An apparatus for encoding a data object in english, the apparatus comprising:

the first acquisition module is used for acquiring a data object to be coded;

the first determining module is used for determining the word stock calling sequence required in the coding process of the data object according to the category associated with the data object and the corresponding relation between the category and the word stock calling sequence, and the contents of all the word stocks are not repeated;

the word segmentation module is used for carrying out word segmentation processing on the data object to obtain a plurality of words;

and the coding module is used for sequentially calling the word libraries according to the determined word library calling sequence to code the words until the words are coded to form a coding result.

6. The apparatus of claim 5, wherein the encoding module comprises:

and the first coding sub-module is used for coding the matched words according to the called word bank when the word bank is called each time according to the determined word bank calling sequence and if the uncoded words in the words are matched with the called word bank, till the words are coded completely, so as to form a coding result.

7. The apparatus of claim 5, further comprising:

the second determining module is used for determining the character length of the encoding result;

the judging module is used for judging whether the character length of the coding result meets a preset condition or not;

and the reduction module is used for continuously replacing the non-abbreviated words with the longest characters of the plurality of words into abbreviated words according to the called Chinese-English abbreviation comparison library under the condition that the character length of the coding result does not accord with the preset condition to form a new coding result until the character length of the new coding result accords with the preset condition.

8. The apparatus of claim 5, further comprising:

the second acquisition module is used for acquiring the field associated with the data object under the condition that the type of the data object is judged to be the table name;

the encoding module includes:

and the second coding submodule is used for sequentially calling all word banks according to the determined word bank calling sequence to code the plurality of words and the field of the data object until the fields of the plurality of words and the data object are coded to form a coding result.

9. An apparatus for encoding a data object in english, the apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of any one of claims 1 to 4.

10. A non-transitory computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method of any of claims 1 to 4.

Technical Field

The invention belongs to the technical field of nuclear power, and particularly relates to a data object English coding method and device.

Background

With the national nuclear power group stacking management and the internationalization of nuclear power utilities, a batch of domestic nuclear power production management systems are also rapidly researched and developed. In the process of building a nuclear power information project, data object English codes of small and medium-sized information systems are usually coded by business personnel of project contractors, and data object English codes of large-sized information systems are coded by business experts in corresponding fields of the contractors. Coding methods of different heap types and different projects are not uniform, so that English codes of the same Chinese data object are inconsistent, and data exchange among systems is influenced.

The 2010 version of English-Chinese nuclear power technology dictionary defines 62889 standard English terms and 3754 common abbreviations for nuclear power in the nuclear power field. In the design of information systems, the direct use of the dictionary also suffers from a number of problems: (1) most entries in English-Chinese nuclear power technology dictionary are phrases, and at present, a plurality of data object Chinese names cannot directly correspond to the phrases in the dictionary; (2) the vocabulary is not sufficient. The dictionary also lacks entries in the fields of inventory, artificial intelligence, information technology, file management and the like; (3) and the requirements of an information system are not met. The infrastructure IT on which the information system depends has ITs specific requirements, such as named length requirements, special character requirements; (4) english coding has high requirements on personnel. The number of data objects of an information system exceeds ten thousand, the encoding work from Chinese naming to English naming of the data objects has high requirements on personnel, the personnel needs to have rich service backgrounds and cross-professional skills, so that English naming completely depends on the experience of a few service experts, and further the construction progress of the information system is influenced. Therefore, an efficient encoding method is needed.

Disclosure of Invention

In order to overcome the problems in the related art, a data object English coding method and a data object English coding device are provided.

According to an aspect of the embodiments of the present disclosure, there is provided a method for encoding a data object in english, the method including:

acquiring a data object to be coded;

determining a word stock calling sequence required in the process of coding the data object according to the category associated with the data object and the corresponding relation between the category and the word stock calling sequence, wherein the contents of all word stocks are not repeated;

performing word segmentation processing on the data object to obtain a plurality of words;

and sequentially calling the word libraries according to the determined word library calling sequence to code the words until the words are coded to form a coding result.

In a possible implementation manner, sequentially calling word libraries according to a determined word library calling sequence to encode the plurality of words until the plurality of words are encoded, and forming an encoding result, the method includes:

and when the word stock is called each time according to the determined word stock calling sequence, if the unencoded words in the words are matched with the called word stock, the matched words are coded according to the called word stock until the words are coded, and a coding result is formed.

In one possible implementation, the method further includes:

determining the character length of the encoding result;

judging whether the character length of the coding result meets a preset condition or not;

and under the condition that the character length of the coding result does not accord with the preset condition, continuously replacing the non-abbreviated word with the longest character of the plurality of words with the abbreviated word according to the called Chinese-English abbreviation comparison library to form a new coding result until the character length of the new coding result accords with the preset condition.

In one possible implementation, the method further includes:

if the type of the data object is judged to be the table name, acquiring the field associated with the data object;

sequentially calling each word bank according to the determined word bank calling sequence to encode the plurality of words until the plurality of words are encoded, and forming an encoding result, wherein the encoding result comprises the following steps:

and sequentially calling the word banks according to the determined word bank calling sequence to code the plurality of words and the field of the data object until the fields of the plurality of words and the data object are coded to form a coding result.

According to another aspect of the embodiments of the present disclosure, there is provided an apparatus for encoding a data object in english, the apparatus including:

the first acquisition module is used for acquiring a data object to be coded;

the first determining module is used for determining the word stock calling sequence required in the coding process of the data object according to the category associated with the data object and the corresponding relation between the category and the word stock calling sequence, and the contents of all the word stocks are not repeated;

the word segmentation module is used for carrying out word segmentation processing on the data object to obtain a plurality of words;

and the coding module is used for sequentially calling the word libraries according to the determined word library calling sequence to code the words until the words are coded to form a coding result.

In one possible implementation, the encoding module includes:

and the first coding sub-module is used for coding the matched words according to the called word bank when the word bank is called each time according to the determined word bank calling sequence and if the uncoded words in the words are matched with the called word bank, till the words are coded completely, so as to form a coding result.

In one possible implementation, the apparatus further includes:

the second determining module is used for determining the character length of the encoding result;

the judging module is used for judging whether the character length of the coding result meets a preset condition or not;

and the reduction module is used for continuously replacing the non-abbreviated words with the longest characters of the plurality of words into abbreviated words according to the called Chinese-English abbreviation comparison library under the condition that the character length of the coding result does not accord with the preset condition to form a new coding result until the character length of the new coding result accords with the preset condition.

In one possible implementation, the apparatus further includes:

the second acquisition module is used for acquiring the field associated with the data object under the condition that the type of the data object is judged to be the table name;

the encoding module includes:

and the second coding submodule is used for sequentially calling all word banks according to the determined word bank calling sequence to code the plurality of words and the field of the data object until the fields of the plurality of words and the data object are coded to form a coding result.

According to another aspect of the embodiments of the present disclosure, there is provided an apparatus for encoding a data object in english, the apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method described above.

According to another aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

The beneficial effect of this disclosure lies in: according to the method, the dictionary base is divided into a plurality of word bases which are not intersected with each other, and the word base calling sequence of the data object is determined according to the category of the data object, so that the method can flexibly adapt to the compiling characteristics of different data objects, and can compile the data heap type more efficiently, and is high in operability and high in coding quality. In addition, the knowledge of experts in the service field is abstracted to form a knowledge base of the nuclear power data object English code, knowledge informatization is achieved, the purpose of knowledge sharing is achieved, and the quality of the English code is improved; on the basis of a dictionary database and an English encoding specification, the dependence on experts in the business field is reduced, and the working efficiency is improved; the unified dictionary and the standard English coding standard form the standard data object English coding, and the one-word multi-translation is eliminated.

Drawings

Fig. 1 is a flowchart illustrating a method for encoding a data object in english according to an exemplary embodiment.

Fig. 2 is a flowchart of an application example of a data object english encoding method.

Fig. 3 is a flowchart of an application example of a data object english encoding method.

Fig. 4 is a block diagram illustrating an apparatus for encoding a data object in english according to an exemplary embodiment.

Fig. 5 is a block diagram illustrating an apparatus for encoding a data object in english according to an exemplary embodiment.

Detailed Description

The invention is described in further detail below with reference to the figures and the embodiments.

Fig. 1 is a flowchart illustrating a method for encoding a data object in english according to an exemplary embodiment. The method may be executed by a terminal device, for example, the terminal device may be a server, a desktop computer, a notebook computer, a tablet computer, or the like, and the terminal device may also be a user device, a vehicle-mounted device, or a wearable device, or the like, and the type of the terminal device is not limited in the embodiment of the present disclosure. As shown in fig. 1, the method may include:

step 10, acquiring a data object to be coded;

step 11, determining a lexicon calling sequence required in the encoding process of the data object according to the category associated with the data object and the corresponding relation between the category and the lexicon calling sequence, wherein the content of each lexicon is not repeated;

step 12, performing word segmentation processing on the data object to obtain a plurality of words;

and step 13, sequentially calling the word libraries according to the determined word library calling sequence to code the words until the words are coded to form a coding result.

In the disclosure, a plurality of word banks may be preset, the contents of the word banks are not repeated, the word banks may be pre-stored in the terminal device, or may be pre-stored in one or more other terminal devices other than the terminal device, and the terminal device may establish a communication connection with the one or more other terminal devices, so that the terminal device may sequentially call the word banks when compiling is required. The plurality of thesauruses may include: the system comprises a business field word English abbreviation comparison library, a specific word abbreviation library, a domain word English abbreviation comparison library, a term library, a business field code library and a common data attribute coding library.

In one possible implementation, the english-chinese abbreviation comparison library of business domain words may be used to store chinese and english words of professional domain, and the english-chinese abbreviation comparison library of business domain words may not store general chinese and english words. Each mapping entry is composed of Chinese words, English abbreviations and standard abbreviations or not. Each word may or may not have english abbreviations. The data format of the English abbreviation comparison library in the words of the business field can be shown in Table 1

TABLE 1

Chinese words English word English abbreviation Standard or not
Chemistry Chemistry chem N
Analysis of Analysis anls Y
Effluent liquid emission emis Y
Sampling Sampling smpl Y

The method for creating the English abbreviation comparison library in the words in the business field can comprise any one or more of the following steps:

1. according to 98 service fields of nuclear power, establishing a corresponding number of service field sub-libraries;

2. selection specification for English abbreviations

(1) If the English abbreviation of the target word is contained in the English-Chinese abbreviation dictionary, the abbreviation of the English-Chinese abbreviation dictionary is adopted.

(2) And if the English abbreviation of the target word is not contained in the English abbreviation dictionary, taking the 1 st letter self-defined English abbreviation corresponding to each consonant in the English word. The user-defined abbreviation does not exceed 5 English letters;

3. english word selection specification

(1) The more commonly used english words are used rather than the more precise english words. If Chinese is 'purchasing', buy is used instead of purchase; as in the Chinese "gaseous", gas is used instead of vapor;

(2) shorter nouns or verbs are used instead of adjectives. Like Chinese "welding", we use the world instead of the welding;

(3) if Chinese characters appear, the characters of 'table', 'single', 'item' representing the meaning of form, list, entry, etc. are coded, and the length of the Chinese word exceeds 2 Chinese characters, the characters of 'table', 'single', 'item' are not coded. If the 'application form' is adopted, the code is 'request', and the 'request sheet' is not used; if the analysis list and the analysis item are coded as the analysis, the analysis order is not used;

in one possible implementation, a library of specific word abbreviations may be used to store abbreviations in which specific, high frequency words appear in the data attributes. In order to reduce the length of English coding and improve the efficiency of data interaction, special coding is carried out on the specific words. Where a particular word may be represented as only the first or last word that is the Chinese name of the data attribute. The specific word abbreviation library is stored in the form of triple of Chinese words, English translation and English abbreviations. The data format of the specific word abbreviation library may be as shown in table 2,

TABLE 2

Chinese words English translation English abbreviation
Encoding/coding Code c
Numbering Identification id
Status of state Status s
Serial number Sequence seq
Description of the invention Description des
Title Title ti
Date Date d
Time Datetime dt
Is not (i.e. Boolean type) Boolean b

The creation specification for a particular abbreviation library may include any one or more of the following:

1. the total number of the databases is not more than 30

2. The English abbreviation preferably has a length of 1 letter and a maximum length of no more than 3 letters

3. The English abbreviation shall be the initials or combination of English words

4. The English abbreviation must not be repeated

In one possible implementation, the comparison library of english abbreviations in general domain words may be used to store the abbreviations of common english words other than the "comparison library of english abbreviations in business domain words" and the "library of specific word abbreviations" instead of the abbreviations of english phrases. The English abbreviation comparison library in the general field words is stored by means of a binary vocabulary entry of English words and English word abbreviations, and the data form of the English abbreviation comparison library in the general field words can be shown as table 3:

TABLE 3

English word English word abbreviation
scale scal
schema schm
scope scp
screen scrn

The creation specification of the english abbreviation comparison library in the general-purpose domain words can comprise 2635 words and abbreviations used for nuclear power business data selected from the english-chinese abbreviation dictionary.

In one possible implementation, the English language may be encoded for nuclear terms or common and fixed Chinese phrases and form a term library. The term library is stored in the form of triples of Chinese terms, English translations and English term abbreviations. English term abbreviation is the initial of each word encoded by english term. The term library refers to an appendix common abbreviation for nuclear power according to an English-Chinese nuclear power technology dictionary in 2010 version and is expanded according to business.

The data format of the term base may be as shown in table 4,

TABLE 4

Chinese terms English translation Acronym for English terms
Corrective action corrective action ca
Quality defect reporting quality deficiency report qdr
Report of non-compliant items non-conformance report ncr
Technical specification book technical specification ts
Description of the technology technical specification ts

The term library creation specification may include any one or more of:

1. the length of English term abbreviation does not exceed 4 letters

2. The same meaning but named Chinese terms are abbreviated with their respective English terms

3. The repetition rate of the English term abbreviations does not exceed 2%, namely, when the maximum 2 abbreviations in 100 English term abbreviations are the same 4 and the Chinese phrase is number + quantifier, the English term abbreviations are number + quantifier abbreviations.

In one possible implementation, the service domain code library may be used to store chinese names and english codes of all service domains in the nuclear power domain. The library is stored in the form of triples of Chinese names, English translations and English codes. The data form of the service area code library can be as shown in table 5:

TABLE 5

Name of Chinese English name English code
Debugging work order Commissioning work order cw
Debugging work bag Commissioning work package cp
Radiation management Radiation Management rm
Work application Work request wr
Quality plan quality plan qp
Nuclear safety nuclear safety ns
Workflow arrangement Workflow configuration wc
Chemical management Chemical Management cm

The business realm code base creation specification can include any one or more of the following:

1. the length of English code in the business field is fixed to 2 bits;

2. the English code in the business field is the first letter of the first 2 words of the English name;

3. the business domain English code does not allow repetition, if the combination of the first 2 words is repeated, the first letters of the 1 st and 3 rd words are used; if repeated, the first letter of the 1 st word and the end letter of the 2 nd word are used.

In a possible implementation manner, the common data attribute coding library can store frequently-occurring words of a data heap type, so that the codes of the same data attribute between different data entities are unified, data exchange between different systems is facilitated, and therefore the common data attribute English coding library is established. The library is stored in the form of two tuples of Chinese name and English code. The data form of the common data attribute code library can be as shown in table 6:

TABLE 6

Name of Chinese English coding
Remarks for note Memo
Renewing person Update_by
Update time Update_dt
Creators Create_by
Creation time Create_dt

The creation specification of the common data attribute code library may include: 1. data attributes with frequency of occurrence exceeding 2% are allowed to be included in the coding library; 2. the English code accords with the 'data object English code standard' of the patent "

As an example of this embodiment, in step 10, the terminal device may obtain one or more data objects to be encoded from a pre-stored data object, or obtain the encoded data objects from other devices or systems when encoding is needed.

In step 11, each data object may include a pre-associated category, which may include, for example, a table name and table attributes. Different categories may correspond to different thesaurus calling orders.

In step 12, a word segmentation processing technology may be used to perform word segmentation processing on the data object to obtain a plurality of words, it should be noted that any suitable word segmentation processing technology may be selected to perform word segmentation processing on the data object, and the category of the word segmentation processing technology is not limited in the embodiment of the present disclosure.

In step 13, the terminal device may, when the word bank is called each time according to the determined word bank calling order, encode the matched word according to the called word bank if an unencoded word in the plurality of words matches the called word bank until the plurality of words are encoded, so as to form an encoding result.

In a possible implementation manner, the data object may also be associated with a domain (e.g., a chemical domain) in advance, and in the case that the class of the data object is determined to be a table name, the terminal device may obtain the domain associated with the data object;

sequentially calling each word bank according to the determined word bank calling sequence to encode the plurality of words until the plurality of words are encoded, and forming an encoding result, wherein the encoding result comprises the following steps:

and sequentially calling the word banks according to the determined word bank calling sequence to code the plurality of words and the field of the data object until the fields of the plurality of words and the data object are coded to form a coding result.

In a possible implementation manner, after forming the encoding result, the terminal device may further determine a character length of the encoding result; judging whether the character length of the coding result meets a preset condition or not; and under the condition that the character length of the coding result does not accord with the preset condition, continuously replacing the non-abbreviated word with the longest character of the plurality of words with the abbreviated word according to the called Chinese-English abbreviation comparison library to form a new coding result until the character length of the new coding result accords with the preset condition.

In one possible implementation, the method may further include:

if the type of the data object is judged to be the table name, acquiring the field associated with the data object;

step 13 may further include: and sequentially calling the word banks according to the determined word bank calling sequence to code the plurality of words and the field of the data object until the fields of the plurality of words and the data object are coded to form a coding result.

In a possible implementation manner, the terminal device may process the obtained codes according to the data object english coding specification to obtain the coding result, where the data object english coding specification may include the following multiple items.

1. The English code consists of Arabic numerals, English letters and underlines, and the first character or the last character of the English code can only be letters;

2. using lower case English letter code;

3. the words are connected by underlines;

4. the encoding result does not exceed 30 characters;

5. when the data entity is coded, if the requirements of 3 and 4 points are met, combining English words to form English codes, otherwise, combining English abbreviations to form English codes;

6. the English coding of the data entity comprises the following steps: business field code + English code;

7. data attributes in english coding, some data attributes (table 2) are often found in many data entities, and the english coding rule of such data attributes is: data entity code (no business field code) + english abbreviation of table 2, underlined;

fig. 2 is a flowchart of an application example of a data object english encoding method. As shown in fig. 2, the method may include:

s101: and starting. A data object is obtained, which is a table name and is named as 'chemical sampling plan-seawater monitoring analysis item' in chinese.

S102: and obtaining the domain code according to the domain to which the data object belongs and the 'service domain code library'. Each data object has a unique belonging business domain. The data object belongs to the service field of chemical management, and a service field code cm is obtained according to a service field code library;

s103: according to the term library, the data objects are examined and, if terms are present in the data objects, directly encoded. The term "chemical sampling plan" exists in this data object, and the code csp is obtained from the term "term library".

S104: and segmenting the Chinese name of the data object. Performing Chinese word segmentation on the rest Chinese parts of the data object to obtain Chinese words: seawater, monitoring, analytical items.

S105: mapping and coding the Chinese words according to a sub-library (such as chemical management) of the English abbreviation comparison library in the business field words. The data object belongs to the chemical management field, because the Chinese words in step S104 are respectively inquired in the chemical management sub-library of the English abbreviation comparison library in the business field words, and the corresponding English word "seawater" of the seawater is obtained.

S106: chinese words are mapped according to a "comparison library of English abbreviations in general field words". And coding other Chinese words which are not coded in the S105 to obtain English words 'analysis' corresponding to the English words 'monitor' and 'analysis' of the 'monitoring'.

S107: and generating codes according to the 3 rd rule of the English coding specification of the data object to obtain 'cm _ csp _ seawater _ monitor _ analysis'.

S108: the code length is checked. If the English words are overlong, the English words are replaced by the abbreviations one by using the English abbreviation comparison library in the general field words and the English abbreviation comparison library in the business field words, and the coding length is checked again. The code length is 33 characters, which exceeds the requirement of the specification, in the two comparison libraries, the abbreviations of seawater and monitor are not found, and the abbreviation anal of analysis is found. After replacement, a new code "cm _ csp _ seawater _ monitor _ anls" is obtained. The new code length is checked again, and the length of the new code length is 29 characters, which accords with the data object English code specification.

S109: the encoding length is output according to the 4 th rule of "data object English encoding Specification". If the length is still over long, creating abbreviations for non-abbreviated English words according to the Chinese-English comparison dictionary library and creation specification and supplementing the abbreviations to the general field word Chinese-English abbreviation comparison library and the business field word Chinese-English abbreviation comparison library.

Fig. 3 is a flowchart of an application example of a data object english encoding method. As shown in fig. 3, the method may include:

s201: and checking the data attribute according to a common data attribute coding library, and directly coding if the data attribute belongs to the library. In this embodiment, the data attribute "creator" belongs to the library, and its english code is "create _ by";

s202: according to the term library, the data attribute is checked, and if the term exists in the data attribute, the data attribute is directly coded. In this embodiment, the data attribute "parameter unit" belongs to the library, and its english code is "uom";

s203: and segmenting the Chinese name in the data attribute. In this embodiment, the result after implementing chinese word segmentation is as follows:

material code (Material code)

Parameter unit- > uom

Chemical classes (chemicals, classes)

Chemical technology approval number (chemical, technology, approval, number)

Create person- > create _ by

S204: and mapping and coding Chinese words according to the 7 th rule of the data object English coding specification and 4. a specific word abbreviation library. In this embodiment, "code" and "number" belong to the library, and the result after implementation is as follows:

material code (Material- >, code- > c)

Parameter unit- > uom

Chemical classes (chemicals, classes)

Chemical technology approval number (chemical, technology, approval, number- > id)

Create person- > create _ by

S206: and mapping and coding the Chinese words according to a sub-library of '2. English abbreviation comparison library in service field words'. In this example, the "chemical management" sublibrary is used, and the result of encoding the chinese word "chemical" is as follows:

material code (Material- >, code- > c)

Parameter unit- > uom

Chemical class (chemical-chemical, class)

Chemical technology approval number (chemical- > chemical, technology, approval, number- > id)

Create person- > create _ by

S207: chinese words are mapped according to "4. comparison library of English abbreviations in common Domain words". In this embodiment, the results of encoding the Chinese words "material", "technique", "approval" are as follows:

material code (material- > material, code- > c)

Parameter unit- > uom

Chemical class (chemical, class- > type)

Chemical technology approval number (chemical- > chemical, technology- > technology, approval- > amine, number- > id)

Create person- > create _ by

S208: and generating codes according to the 3 rd rule of the data object English coding specification. In this embodiment, the result after encoding is as follows:

material code- > Material _ c

Parameter unit- > uom

Chemical class- > chemical _ type

Chemical technology approval number- > chemical _ technology _ amine _ id

Create person- > create _ by

S209: the encoding length is checked according to the "data object english encoding specification" rule 4. If the English words are overlong, the English words are replaced by the abbreviations one by using the English abbreviation comparison library in the general field words and the English abbreviation comparison library in the business field words, and the length is checked again. In this embodiment, the encoding length of "chemical _ technology _ amine _ id" is 31, which exceeds 30 characters required by the rule, and the result after re-encoding the first word according to the comparison library is as follows:

material code- > Material _ c

Parameter unit- > uom

Chemical class- > chemical _ type

Chemical technology approval number- > chem _ technology _ examine _ id

Create person- > create _ by

S210: the encoding length is output according to the 4 th rule of "data object English encoding Specification". If still too long, abbreviations are created for non-abbreviated English words as per article 2 of the "English abbreviation selection Specification". In the present embodiment, this scenario does not exist.

Fig. 4 is a block diagram illustrating an apparatus for encoding a data object in english according to an exemplary embodiment. As shown in fig. 4, the apparatus may include:

a first obtaining module 40, configured to obtain a data object to be encoded;

a first determining module 41, configured to determine, according to the category associated with the data object and a correspondence between the category and a lexicon calling order, a lexicon calling order required in a process of encoding the data object;

a word segmentation module 42, configured to perform word segmentation processing on the data object to obtain multiple words;

and the encoding module 43 is configured to sequentially invoke word libraries according to the determined word library invoking sequence to encode the plurality of words until the plurality of words are encoded, so as to form an encoding result.

In one possible implementation, the encoding module includes:

and the first coding sub-module is used for coding the matched words according to the called word bank when the word bank is called each time according to the determined word bank calling sequence and if the uncoded words in the words are matched with the called word bank, till the words are coded completely, so as to form a coding result.

In one possible implementation, the apparatus further includes:

the second determining module is used for determining the character length of the encoding result;

the judging module is used for judging whether the character length of the coding result meets a preset condition or not;

and the reduction module is used for continuously replacing the non-abbreviated words with the longest characters of the plurality of words into abbreviated words according to the called Chinese-English abbreviation comparison library under the condition that the character length of the coding result does not accord with the preset condition to form a new coding result until the character length of the new coding result accords with the preset condition.

In one possible implementation, the apparatus further includes:

the second acquisition module is used for acquiring the field associated with the data object under the condition that the type of the data object is judged to be the table name;

the encoding module includes:

and the second coding submodule is used for sequentially calling all word banks according to the determined word bank calling sequence to code the plurality of words and the field of the data object until the fields of the plurality of words and the data object are coded to form a coding result.

It should be noted that the description of the above apparatus has been detailed in the above description of the method, and is not repeated herein.

Fig. 5 is a block diagram illustrating an apparatus for encoding a data object in english according to an exemplary embodiment. For example, the apparatus 1900 may be provided as a server. Referring to FIG. 5, the device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by the processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The device 1900 may also include a power component 1926 configured to perform power management of the device 1900, a wired or wireless network interface 1950 configured to connect the device 1900 to a network, and an input/output (I/O) interface 1958. The device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the apparatus 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

20页详细技术资料下载
上一篇:一种医用注射器针头装配设备
下一篇:手术编码方法以及电子设备、存储装置

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!