Generating configurable text strings based on raw genomic data

文档序号：1256521 发布日期：2020-08-21 浏览：13次中文

阅读说明：本技术 基于原始基因组数据生成可配置文本串 (Generating configurable text strings based on raw genomic data ) 是由 A·泽希尔 J·S·齐格勒于 2019-01-09 设计创作，主要内容包括：一种基因组数据翻译系统可以被配置成用于处理下一代测序信息。所述系统可以接收包括原始基因组数据的输出文件。所述系统可以解析所述输出文件,以确定对应于各染色体的片段。所述系统可以识别核苷酸范围并确定包括在人类参考基因组列表中的属于所述范围的所述第一组基因。所述系统还可以维护基因的基因列表,并确定匹配的一组基因,这组基因包括在所述基因列表和所述第一组基因中。所述系统可以生成包括不可配置区域和可配置区域的可配置文本串。所述可配置区域可以基于原始基因组数据、一组翻译规则和一组翻译文本串被填充以文本。(A genomic data translation system may be configured to process next generation sequencing information. The system can receive an output file that includes raw genomic data. The system may parse the output file to determine segments corresponding to each chromosome. The system can identify nucleotide ranges and determine the first set of genes included in the human reference genome list that belong to the range. The system may also maintain a gene list of genes and determine a matching set of genes, which is included in the gene list and the first set of genes. The system may generate a configurable text string that includes a non-configurable region and a configurable region. The configurable area may be populated with text based on the original genomic data, a set of translation rules, and a set of translated text strings.)

1. A system for processing next generation sequencing information, comprising:

one or more processors; and

one or more memory elements comprising instructions that, when executed, cause the one or more processors to:

receiving, via a user interface, an output file generated by a next generation sequencer;

determining at least one fragment in the output file, the at least one fragment comprising a chromosome number, cytogenetic band (cytoband) information, a nucleotide range, and a set of copy numbers;

determining a first set of genes within the nucleotide range, the first set of genes included in a human reference genome list;

determining a matched set of genes, the matched set of genes comprising at least one gene appearing in a list of genes matching a subset of the first set of genes;

generating a configurable text string comprising a non-configurable text region, a first configurable text region, a second configurable text region, and a third configurable text region;

a first text in the first configurable text region that includes the chromosome number based, a second text in the second configurable text region that includes the set of copy numbers based, and a third text in the third configurable text region that includes the set of genes based on the match; and

providing the configurable text string to an output interface.

2. The system of claim 1, wherein the next generation sequencer comprises at least one of an Illumina sequencer, an iontorent sequencer, or a 454 pyrophosphate sequencer.

3. The system of claim 1, wherein the one or more memory elements comprise instructions that, when executed, cause the one or more processors to:

determining a starting position and an ending position of the at least one fragment, the starting position comprising the chromosome number, and the ending position indicating the set of copy numbers.

4. The system of claim 3, wherein the at least one fragment corresponds to at least one of a short arm (p), a long arm (q), or a combination of the short and long arms of a chromosome identified by the chromosome number.

5. The system of claim 1, wherein the one or more memory elements comprise instructions that, when executed, cause the one or more processors to:

including the second text in the second configurable text region based on the set of copy numbers and a gene loss-gain rule stored in memory, the gene loss-gain rule designating the second text as "lost" when the set of copy numbers includes numbers less than 2.

6. The system of claim 1, wherein the list of human reference genomes comprises at least one of GRCh38, GRCh37, NCBI Build 36.1, NCBI Build 35, NCBI Build 34, hg38, hg19, hg18, hg17, and hg 16.

7. The system of claim 1, wherein the list of genes includes at least one cancer-associated gene.

8. A method of processing next generation sequencing information, comprising:

receiving, at one or more processors via a user interface, an output file generated by a next generation sequencer;

determining, at the one or more processors, at least one segment in the output file, the at least one segment comprising a chromosome number, cytogenetic strip information, a nucleotide range, and a set of copy numbers;

determining, at the one or more processors, a first set of genes within the nucleotide range, the first set of genes included in a human reference genome list;

determining, at the one or more processors, a matched set of genes, the matched set of genes including at least one gene that appears in a list of genes that match a subset of the first set of genes;

generating, at the one or more processors, a configurable text string comprising a non-configurable text region, a first configurable text region, a second configurable text region, and a third configurable text region;

providing, by the one or more processors, the configurable text string to an output interface.

9. The method of claim 8, wherein the next generation sequencer comprises at least one of an Illumina sequencer, an iontorent sequencer, or a 454 pyrophosphate sequencer.

10. The method of claim 8, further comprising:

determining, at the one or more processors, a starting location and an ending location of the at least one fragment, the starting location comprising the chromosome number, and the ending location indicating the set of copy numbers.

11. The method of claim 10, wherein the at least one fragment corresponds to at least one of a short arm (p), a long arm (q), or a combination of the short and long arms of a chromosome identified by the chromosome number.

12. The method of claim 8, further comprising:

13. The method of claim 8, wherein the list of human reference genomes comprises at least one of GRCh38, GRCh37, NCBI Build 36.1, NCBI Build 35, NCBI Build 34, hg38, hg19, hg18, hg17, and hg 16.

14. The method of claim 8, wherein the list of genes includes at least one cancer-associated gene.

15. A computer-readable storage medium storing processor-executable instructions that, when executed by at least one processor, cause the at least one processor to:

receiving, via a user interface, an output file generated by a next generation sequencer;

determining at least one segment in the output file, the at least one segment comprising a chromosome number, cytogenetic band information, a nucleotide range, and a set of copy numbers;

determining a first set of genes within the nucleotide range, the first set of genes included in a human reference genome list;

determining a matched set of genes, the matched set of genes comprising at least one gene appearing in a list of genes matching a subset of the first set of genes;

generating a configurable text string comprising a non-configurable text region, a first configurable text region, a second configurable text region, and a third configurable text region;

providing the configurable text string to an output interface.

16. The computer-readable storage medium of claim 15, wherein the next generation sequencer comprises at least one of an Illumina sequencer, an Ion Torrent sequencer, or a 454 pyrophosphate sequencer.

17. The computer-readable storage medium of claim 15, further comprising instructions that, when executed by at least one processor, cause the at least one processor to:

18. The computer-readable storage medium of claim 15, wherein the at least one segment corresponds to at least one of a short arm (p), a long arm (q), or a combination of the short arm and long arm of a chromosome identified by the chromosome number.

19. The computer-readable storage medium of claim 15, further comprising instructions that, when executed by at least one processor, cause the at least one processor to:

20. The computer-readable storage medium of claim 15, wherein the list of human reference genomes comprises at least one of GRCh38, GRCh37, NCBI Build 36.1, NCBI Build 35, NCBI Build 34, hg38, hg19, hg18, hg17, and hg 16.

Technical Field

The present disclosure relates generally to converting raw genomic data into readable text output.

Background

Genomic data processing may include graphically displaying genomic output received from a next generation sequencer. The graphical representation may include a read frequency showing changes to a particular gene in the test nucleic acid sequence. However, such graphical representations do not provide additional useful information that can be obtained in the raw genomic data generated by the next generation sequencer.

Disclosure of Invention

In one aspect, the disclosure includes a system for processing next generation sequencing information. The system includes one or more processors, and one or more memory elements including instructions that, when executed, cause the one or more processors to perform a plurality of acts. The actions include receiving, via a user interface, an output file generated by a next generation sequencer. The actions further include determining at least one segment in the output file, the at least one segment including a chromosome number, cytogenetic strip information, a nucleotide range, and a set of copy numbers. The actions also include determining a first set of genes within the nucleotide range, the first set of genes included in a human reference genome list. The actions further include determining a matched set of genes, the matched set of genes including at least one gene that appears in a list of genes that match the subset of the first set of genes. In some embodiments, the at least one gene present in the gene list is associated with cancer. The actions also include generating a configurable text string that includes a non-configurable text region, a first configurable text region, a second configurable text region, and a third configurable text region. The actions also include including a first text in the first configurable text region based on the chromosome number, a second text in the second configurable text region based on the set of copy numbers, and a third text in the third configurable text region based on the matched set of genes. The actions additionally include providing the configurable text string to an output interface.

In another aspect, the disclosure includes a method of processing next generation sequencing information. The method includes receiving, at one or more processors via a user interface, an output file generated by a next generation sequencer. The method also includes determining, at the one or more processors, at least one segment in the output file, the at least one segment including a chromosome number, cytogenetic strip information, a nucleotide range, and a set of copy numbers. The method further includes determining, at the one or more processors, a first set of genes within the nucleotide range, the first set of genes included in a human reference genome list. The method further includes determining, at the one or more processors, a matching set of genes, the matching set of genes including at least one gene that appears in a list of genes that match a subset of the first set of genes. The method additionally includes generating, at the one or more processors, a configurable text string that includes a non-configurable text region, a first configurable text region, a second configurable text region, and a third configurable text region. The method further comprises: a first text in the first configurable text region that includes a number based on the chromosome, a second text in the second configurable text region that includes a number based on the set of copies, and a third text in the third configurable text region that includes a number based on the matched set of genes. The method also includes providing, by the one or more processors, the configurable text string to an output interface.

In yet another aspect, the present disclosure is directed to a computer-readable storage medium storing processor-executable instructions that, when executed by at least one processor, cause the at least one processor to perform a plurality of acts. The actions include receiving, via a user interface, an output file generated by a next generation sequencer. The actions further include determining at least one segment in the output file, the at least one segment including a chromosome number, cytogenetic strip information, a nucleotide range, and a set of copy numbers. The actions also include determining a first set of genes within the nucleotide range, the first set of genes included in a human reference genome list. The actions further include determining a matched set of genes, the matched set of genes including at least one gene that appears in a list of genes that match the subset of the first set of genes. In some embodiments, the at least one gene present in the gene list is associated with cancer. The actions also include generating a configurable text string that includes a non-configurable text region, a first configurable text region, a second configurable text region, and a third configurable text region. The actions also include including a first text in the first configurable text region based on the chromosome number, a second text in the second configurable text region based on the set of copy numbers, and a third text in the third configurable text region based on the matched set of genes. The actions additionally include providing the configurable text string to an output interface.

Drawings

The foregoing and other objects, aspects, features, and advantages of the disclosure will become more apparent and better understood by reference to the following description taken in conjunction with the accompanying drawings in which:

FIG. 1A is a block diagram depicting an embodiment of a network environment including a client device in communication with a server device;

FIG. 1B is a block diagram depicting a cloud computing environment including a client device in communication with a cloud service provider;

FIGS. 1C and 1D are block diagrams depicting embodiments of computing devices useful in conjunction with the methods and systems described herein;

FIG. 2 illustrates a computer environment for translating raw genomic data generated by a next generation sequencer into human-readable text strings;

FIG. 3 shows exemplary raw genomic data generated by a next generation sequencer;

FIG. 4 shows a flow chart of the process of translating the original genomic data;

FIG. 5 shows various segments identified by the genomic data translation system from the original genomic data shown in FIG. 3;

FIG. 6 illustrates an exemplary configurable text string; and is

FIG. 7 shows exemplary translation output of a translation engine based on raw genomic data, translation rules, and a list of genes.

Detailed Description

For purposes of reading the following description of the various embodiments, the following description of the various portions of the specification and their respective contents may be helpful:

section a describes network environments and computing environments that may be used to practice the embodiments described herein.

Section B describes embodiments of systems and methods for translating raw genomic data generated by a next generation sequencer into human-readable text.

A.Computing and network environment

Before discussing particular embodiments of the present technology, it may be helpful to describe aspects of the operating environment and associated system components (e.g., hardware elements) in connection with the methods and systems described herein. Referring to FIG. 1A, an embodiment of a network environment is depicted. Briefly, the network environment includes one or more clients 102a-102n (also commonly referred to as local machines 102, clients 102, client nodes 102, client computers 102, client devices 102, endpoints 102, or endpoint nodes 102) in communication with one or more servers 106a-106n (also commonly referred to as servers 106, nodes 106, or remote machines 106) via one or more networks 104. In some embodiments, the client 102 has the functionality of both being a client node seeking access to resources provided by the server and a server providing access to hosted resources for other clients 102a-102 n.

Although fig. 1A shows the network 104 between the client 102 and the server 106, the client 102 and the server 106 may be on the same network 104. In some embodiments, there are multiple networks 104 between the client 102 and the server 106. In one of these embodiments, network 104' (not shown) may be a private network and network 104 may be a public network. In another of these embodiments, network 104 may be a private network and network 104' may be a public network. In yet another of these embodiments, both networks 104 and 104' may be private networks.

The network 104 may be connected by a wired or wireless link. The wired link may include a Digital Subscriber Line (DSL), coaxial cable line, or fiber optic line. The wireless link may include Bluetooth, Wi-Fi, Worldwide Interoperability for Microwave Access (WiMAX), infrared channels, or satellite bands. The wireless link may also include any cellular network standard for communicating between mobile devices, including compliance with 1G, 2G, 3G, or 4G standards. The network standard may conform to one or more generations of mobile telecommunications standards by meeting one or more specifications, such as specifications maintained by the international telecommunications union. For example, the 3G standard may correspond to the international mobile telecommunications-2000 (IMT-2000) specification, and the 4G standard may correspond to the international mobile telecommunications Advanced (IMT-Advanced) specification. Examples of cellular network standards include AMPS, GSM, GPRS, UMTS, LTE advanced, mobile WiMAX, and WiMAX advanced. Cellular network standards may use various channel access methods such as FDMA, TDMA, CDMA, or SDMA. In some embodiments, different types of data may be transmitted via different links and standards. In other embodiments, the same type of data may be transmitted via different links and standards.

The network 104 may be any type and/or form of network. The geographic extent of network 104 may vary widely, and network 104 may be a Body Area Network (BAN), a Personal Area Network (PAN), a Local Area Network (LAN) (e.g., an intranet), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), or the internet. The topology of the network 104 may be of any form, and may include, for example, any of the following: point-to-point, bus, star, ring, mesh, or tree. The network 104 may be a virtual overlay network and be located on one or more layers of other networks 104'. Network 104 may be any such network topology known to one of ordinary skill in the art capable of supporting the operations described herein. The network 104 may utilize different technologies and protocol layers or protocol stacks including, for example, an ethernet protocol, internet protocol suite (TCP/IP), ATM (asynchronous transfer mode) technology, SONET (synchronous optical network) protocol, or SDH (synchronous digital hierarchy) protocol. The TCP/IP internet protocol suite may include an application layer, a transport layer, an internet layer (including, for example, IPv6), or a link layer. The network 104 may be a broadcast network, a telecommunications network, a data communications network, or a computer network.

In some embodiments, the system may include a plurality of logically grouped servers 106. In one of these embodiments, the logical group of servers may be referred to as a server farm 38 or a cluster 38. In another of these embodiments, the servers 106 may be geographically dispersed. In other embodiments, the cluster 38 may be managed as a single entity. In other embodiments, the fleet 38 includes a plurality of fleets 38. The servers 106 within each cluster 38 may be heterogeneous, i.e., one or more of the servers 106 or machines 106 may operate according to one type of operating system platform (e.g., WINDOWS NT manufactured by MICROSOFT CORPORATION of Redmond, Washington), while one or more other servers 106 may operate according to another type of operating system platform (e.g., Unix, Linux, or Mac OS X).

In one embodiment, the servers 106 in the cluster 38 may be stored in a high-density rack system along with associated storage systems and located in an enterprise data center. In this embodiment, by locating the servers 106 and high-performance storage systems on a local high-performance network, consolidating the servers 106 in this manner may improve system manageability, data security, physical security of the system, and system performance. Centralizing the servers 106 and storage systems and interfacing them with advanced system management tools allows for more efficient use of server resources.

The servers 106 of each cluster 38 need not be physically close to another server 106 in the same cluster 38. Thus, the group of servers 106 logically grouped into clusters 38 may be interconnected using a Wide Area Network (WAN) connection or a Metropolitan Area Network (MAN) connection. For example, the cluster 38 may include servers 106 physically located in different continents or different regions of different continents areas, countries, states, cities, campuses, or rooms. Data transfer speeds between servers 106 in the cluster 38 may be increased if the servers 106 are connected using a Local Area Network (LAN) connection or some form of direct connection. Further, the heterogeneous cluster 38 may include one or more servers 106 operating according to an operating system type, while one or more other servers 106 execute one or more types of hypervisors rather than operating systems. In these embodiments, the virtual machine hypervisor may be used to emulate virtual hardware, partition physical hardware, virtualize physical hardware, and execute virtual machines that provide access to the computing environment, allowing multiple operating systems to run concurrently on the host machine. The native virtual machine hypervisor may run directly on the host machine. The virtual machine hypervisor may include VMware ESX/ESxi manufactured by VMware corporation of Palo alto, Calif.; xen virtual machine hypervisor, an open source product whose development is supervised by smith systems corporation; HYPER-V virtual machine management programs provided by microsoft or other companies. The managed virtual machine hypervisor may run in an operating system at a second software level. Examples of managed virtual machine hypervisors may include VMware workstations and VIRTUALBOX.

Management of the cluster 38 may be decentralized. For example, one or more servers 106 may include components, subsystems, and modules that support one or more management services for the cluster 38. In one of these embodiments, the one or more servers 106 provide functionality for managing dynamic data, including techniques for handling failover, data replication, and increasing the robustness of the cluster 38. Each server 106 may be in communication with persistent storage and, in some embodiments, dynamic storage.

The server 106 may be a file server, an application server, a web server, a proxy server, a device, a network device, a gateway server, a virtualization server, a deployment server, an SSL VPN server, or a firewall. In one embodiment, the server 106 may be referred to as a remote machine or node. In another embodiment, multiple nodes 290 may be in the path between any two communication servers.

Referring to FIG. 1B, a cloud computing environment is depicted. The cloud computing environment may provide one or more resources provided by the network environment to the client 102. The cloud computing environment may include one or more clients 102a-102n in communication with the cloud 108 over one or more networks 104. The clients 102 may include, for example, thick clients, thin clients, and zero clients. A thick client may provide at least some functionality even when disconnected from the cloud 108 or server 106. A thin client or zero client may rely on a connection to the cloud 108 or server 106 to provide functionality. The zero client may rely on the cloud 108 or other network 104 or server 106 to retrieve operating system data for the client device. Cloud 108 may include a backend platform such as server 106, storage, a server farm, or a data center.

The cloud 108 may be public, private, or hybrid. The public cloud may include public servers 106 maintained by third parties of the clients 102 or owners of the clients. The server 106 may be remotely located from the geographic location as described above or otherwise. The public cloud may be connected to the server 106 through a public network. The private cloud may include private servers 106 physically maintained by the client 102 or the client owner. The private cloud may connect to the server 106 through the private network 104. The hybrid cloud 108 may include private and public networks 104 and servers 106.

The cloud 108 may also include cloud-based delivery, such as software as a service (SaaS)110, platform as a service (PaaS)112, and infrastructure as a service (IaaS) 114. IaaS may refer to a user leasing infrastructure resources required for a particular time period. IaaS providers can offer storage, network, server, or virtualized resources from large pools, allowing users to scale quickly by accessing more resources as needed. Examples of IaaS may include infrastructure and SERVICES (e.g., EG-32) provided by OVHHOSTING of montreal, quebec, canada, AMAZON WEB SERVICES provided by AMAZON. PaaS providers may provide the functionality provided by IaaS including, for example, storage, networking, server or virtualization, and additional resources such as operating systems, middleware, or runtime resources. Examples of PaaS include windows, supplied by microsoft corporation of redmond, washington, Google App Engine, supplied by Google, and Heroku supplied by Heroku corporation of san francisco, california. SaaS providers may provide PaaS-provided resources, including storage, network, server, virtualization, operating system, middleware, or runtime resources. In some embodiments, the SaaS provider may provide additional resources, including, for example, data and application resources. Examples of SaaS include GOOGLE APPS, provided by GOOGLE, inc, SALESFORCE, provided by SALESFORCE. Examples of SaaS may also include data storage providers such as Dropbox provided by Dropbox corporation of san francisco, california, microsoft SKYDRIVE provided by microsoft corporation, Google Drive provided by Google corporation, or Apple ICLOUD provided by Apple inc.

The client 102 may access IaaS resources using one or more IaaS standards including, for example, amazon elastic computing cloud (EC2), Open Cloud Computing Interface (OCCI), Cloud Infrastructure Management Interface (CIMI), or OpenStack standards. Some IaaS standards may allow clients to access resources over HTTP, and may use the representational state transfer (REST) protocol or the Simple Object Access Protocol (SOAP). The client 102 may use different PaaS interfaces to access the PaaS resources. Some PaaS interfaces use HTTP packets, standard Java APIs, Java mail APIs, Java Data Objects (JDO), Java persistence APIs (jpa), Python APIs, network integration APIs for different programming languages, including for example Ruby framework, WSGI of Python, or PSGI of Perl, or other APIs that may be built on REST, HTTP, XML, or other protocols. The client 102 may access SaaS resources (e.g., GOOGLE CHROME, microsoft INTERNET EXPLORER, or Mozilla firefox provided by the Mozilla foundation of mountain view, california) by using a web-based user interface provided by a web browser. The client 102 may also access SaaS resources through a smartphone or tablet application, including, for example, a Salesforce Sales Cloud or Google Drive application. The client 102 may also access SaaS resources, including, for example, a Windows file system for DROPBOX, through the client operating system.

In some embodiments, access to IaaS, PaaS, or SaaS resources may be authenticated. For example, the server or authentication server may authenticate the user through a secure certificate, HTTPS, or API key. The API key may include various encryption standards such as Advanced Encryption Standard (AES). The data resources may be sent over Transport Layer Security (TLS) or Secure Sockets Layer (SSL).

The clients 102 and the servers 106 may be deployed and/or executed on any type and form of computing device, such as a computer, network device, or apparatus capable of communicating over any type and form of network and performing the operations described herein. Fig. 1C and 1D depict block diagrams of a computing device 100 for practicing embodiments of a client 102 or server 106. As shown in fig. 1C and 1D, each computing device 100 includes a central processing unit 121 and a main memory unit 122. As shown in FIG. 1C, computing device 100 may include storage 128, installation 116, network interface 118, I/O controller 123, display devices 124a-124n, keyboard 126, and pointing device 127 (e.g., a mouse). The storage device 128 may include, but is not limited to, an operating system, software, and the software of the genomic data translation system 120. As shown in FIG. 1D, each computing device 100 may also include additional optional elements, such as a memory port 103, a bridge 170, one or more input/output devices 130a-130n (generally referred to using reference numeral 130), and a cache memory 140 in communication with the central processing unit 121.

Central processing unit 121 is any logic circuitry that responds to and processes instructions fetched from main memory unit 122. In many embodiments, the central processing unit 121 is provided by a microprocessor unit, such as manufactured by intel corporation of mountain view, california; manufactured by motorola corporation of semburg, illinois; an ARM processor and TEGRA system chip (SoC) manufactured by Nvidia, Inc. of Santa Clara, Calif.; a POWER7 processor manufactured by International Business machines corporation of Wyork Won Leiens; or by Advanced Micro Devices, Inc. of Sunnyvale, Calif. Computing device 100 may be based on any of these processors, or any other processor capable of operating as described herein. The central processing unit 121 may take advantage of instruction level parallelism, thread level parallelism, different levels of caching, and multi-core processors. A multi-core processor may include two or more processing units on a single computing component. Examples of multicore processors include AMD PHENOM IIX2, Intel core i5, and Intel core i 7.

The main memory unit 122 may include one or more memory chips capable of storing data and allowing the microprocessor 121 to directly access any memory location. The main memory unit 122 may be volatile and faster than the memory device 128 memory. The main memory unit 122 may be Dynamic Random Access Memory (DRAM) or any variation, including Static Random Access Memory (SRAM), burst SRAM, or synchronous burst SRAM (bsram), fast page mode DRAM (fpm DRAM), enhanced DRAM (edram), extended data output ram (edo ram), extended data output DRAM (edo DRAM), burst extended data output DRAM (bedo DRAM), single data rate synchronous DRAM (sdr sdram), double data rate sdram (ddr sdram), Direct Rambus DRAM (DRDRAM), or extreme data rate DRAM (xdr DRAM). In some embodiments, main memory 122 or storage device 128 may be non-volatile; such as non-volatile random access memory (NVRAM), flash non-volatile static ram (nvsram), ferroelectric ram (feram), magnetoresistive ram (mram), phase change memory (PRAM), conductive bridging ram (cbram), silicon-oxide-nitride-oxide-silicon (SONOS), resistive ram (rram), racetrack memory, nanotube ram (nram) or millipede memory. The main memory 122 may be based on any of the above-described memory chips, or any other available memory chip capable of operating as described herein. In the embodiment shown in FIG. 1C, processor 121 communicates with main memory 122 via system bus 150 (described in more detail below). FIG. 1D depicts an embodiment of the computing device 100 in which the processor communicates directly with main memory 122 via a memory port 103. For example, in fig. 1D, the main memory 122 may be a DRDRAM.

FIG. 1D depicts an embodiment in which the main processor 121 communicates directly with cache memory 140 over a secondary bus, sometimes referred to as a backside bus. In other embodiments, the main processor 121 communicates with the cache memory 140 using the system bus 150. Cache memory 140 typically has a faster response time than main memory 122 and is typically provided by SRAM, BSRAM, or EDRAM. In the embodiment shown in FIG. 1D, the processor 121 communicates with various I/O devices 130 over a local system bus 150. Various buses may be used to connect the central processing unit 121 to any of the I/O devices 130, including a PCI bus, a PCI-X bus, or a PCI-Express bus or NuBus. For embodiments in which the I/O device is a video display 124, the processor 121 may use an Advanced Graphics Port (AGP) to communicate with the display 124 or an I/O controller 123 of the display 124. FIG. 1D depicts an embodiment of the computer 100 in which the main processor 121 communicates directly with the I/O device 130b or other processor 121' via HyperTransport, RAPIDIO or INFINIBAND communication techniques. FIG. 1D also depicts an embodiment in which the local bus and direct communication are mixed: the processor 121 communicates with I/O device 130a using a local interconnect bus while communicating directly with I/O device 130 b.

A variety of I/O devices 130a-130n may be present in the computing device 100. The input devices may include a keyboard, mouse, track pad, trackball, touch pad, touch mouse, multi-touch pad and touch mouse, microphone, multi-array microphone, drawing pad, camera, single lens reflex camera (SLR), digital SLR (dslr), CMOS sensor, accelerometer, infrared optical sensor, pressure sensor, magnetometer sensor, angular velocity sensor, depth sensor, proximity sensor, ambient light sensor, gyroscope sensor, or other sensor. Output devices may include video displays, graphics displays, speakers, headphones, inkjet printers, laser printers, and 3D printers.

The devices 130a-130n may include a combination of multiple input or output devices including, for example, Microsoft KINECT, Nintendo Wiimote for WII, Nintendo WII U GAMEPAD, or apple IPHONE. Some devices 130a-130n allow gesture recognition input by combining some inputs and outputs. Some devices 130a-130n provide facial recognition that may be used as input for different purposes, including authentication and other commands. Some devices 130a-130n provide voice recognition and input including, for example, Microsoft KINECT, SIRI from apple IPHONE, Google Now, or Google voice search.

The additional devices 130a-130n have input and output capabilities including, for example, a haptic feedback device, a touch screen display, or a multi-touch display. Touch screens, multi-touch displays, touch pads, touch mice, or other touch sensing devices may use different technologies to sense touch, including, for example, capacitive, surface capacitive, Projected Capacitive Touch (PCT), in-cell (in-cell) capacitive, resistive, infrared, waveguide, Dispersed Signal Touch (DST), in-cell optical, Surface Acoustic Wave (SAW), Bending Wave Touch (BWT), or force-based sensing technologies. Some multi-touch devices may allow two or more points of contact with a surface, allowing advanced functions including, for example, pinching, expanding, rotating, scrolling, or other gestures. Some touch screen devices, including, for example, microsoft PIXELSENSE or multi-touch collaboration walls, may have larger surfaces, such as on a desktop or wall, and may also interact with other electronic devices. Some of the I/O devices 130a-130n, display devices 124a-124n, or groups of devices may be augmented reality devices. As shown in FIG. 1C, the I/O devices may be controlled by the I/O controller 123. The I/O controller may control one or more I/O devices, such as a keyboard 126 and a pointing device 127, such as a mouse or light pen. Further, the I/O device may also provide storage and/or installation media 116 for the computing device 100. In other embodiments, computing device 100 may provide a USB connection (not shown) to receive a handheld USB storage device. In further embodiments, the I/O device 130 may be a bridge between the system bus 150 and an external communication bus, such as a USB bus, a SCSI bus, a FireWire bus, an Ethernet bus, a gigabit Ethernet bus, a fibre channel bus, or a Thunderbolt bus.

In some embodiments, display devices 124a-124n may be connected to I/O controller 123. The display device may include, for example, a liquid crystal display, a thin film transistor LCD (TFT-LCD), a blue phase LCD, an electronic paper (electronic ink) display, a flexible display, a light emitting diode display (LED), a Digital Light Processing (DLP) display, a Liquid Crystal On Silicon (LCOS) display, an organic light emitting diode display (OLED), an Active Matrix Organic Light Emitting Diode (AMOLED) display, a liquid crystal laser display, a time division multiplexed optical shutter (TMOS) display, or a 3D display. Examples of 3D displays may use, for example, stereo vision, polarized filters, active shutters, or auto-stereoscopy. The display devices 124a-124n may also be Head Mounted Displays (HMDs). In some embodiments, the display devices 124a-124n or corresponding I/O controllers 123 may be controlled through or with hardware support for OPENGL or DIRECTX APIs or other graphics libraries.

In some embodiments, computing device 100 may include or be connected to multiple display devices 124a-124n, each of which may be of the same or different type and/or form. As such, any of the I/O devices 130a-130n and/or the I/O controller 123 may include any type and/or form of suitable hardware, software, or combination of hardware and software to support, enable, or provide for connection and use of the computing device 100 with multiple display devices 124a-124 n. For example, the computing device 100 may include any type and/or form of video adapter, video card, driver, and/or library to interface, communicate, connect, or otherwise use the display devices 124a-124 n. In one embodiment, the video adapter may include multiple connectors to interface with multiple display devices 124a-124 n. In other embodiments, computing device 100 may include multiple video adapters, each video adapter connected to one or more of display devices 124a-124 n. In some embodiments, any portion of the operating system of computing device 100 may be configured to use multiple displays 124a-124 n. In other embodiments, one or more of the display devices 124a-124n may be provided by one or more other computing devices 100a or 100b connected to the computing device 100 via the network 104. In some embodiments, the software may be designed and configured to use another computer's display device as the second display device 124a of the computing device 100. For example, in one embodiment, an apple iPad may be connected to the computing device 100 and use the display of the device 100 as an additional display screen that may be used as an extended desktop. Those of ordinary skill in the art will recognize and appreciate various ways and embodiments in which the computing device 100 may be configured with multiple display devices 124a-124 n.

Referring again to FIG. 1C, the computing device 100 may include a storage device 128 (e.g., one or more hard disk drives or a redundant array of independent disks) for storing an operating system or other related software, and for storing application software programs, such as any program related to the software of the genomic data translation system 120. Examples of the storage device 128 include, for example, a Hard Disk Drive (HDD); an optical drive including a CD drive, a DVD drive, or a BLU-RAY drive; a Solid State Disk (SSD); a USB flash drive; or any other device suitable for storing data. Some storage devices may include multiple volatile and non-volatile memories, including, for example, a solid-state hybrid drive that combines a hard disk with a solid-state cache. Some storage devices 128 may be non-volatile, alterable, or read-only. Some storage devices 128 may be internal and connected to the computing device 100 via a bus 150. Some storage devices 128 may be external and connected to computing device 100 via I/O devices 130, which provide an external bus. Some storage devices 128 may be connected to the computing device 100 through the network 104 via the network interface 118, which may include, for example, the apple MACBOOK AIR remote disk. Some client devices 100 may not require a non-volatile storage device 128 and may be thin clients or zero clients 102. Some storage devices 128 may also serve as the installation device 116 and may be suitable for installing software and programs. Furthermore, the operating system and software may run on a bootable medium, e.g., a bootable CD such as KNOPPIX, a bootable CD for GNU/Linux available from KNOPPIX.

The client device 100 may also install software or applications from an application distribution platform. Examples of application distribution platforms include the iOS application store provided by apple, the Mac application store provided by apple, GOOGLE PLAY provided by GOOGLE for the android operating system, the CHROME web application store provided by GOOGLE for the CHROME operating system, and the amazon application store provided by amazon. The application distribution platform may facilitate installation of software on the client device 102. The application distribution platform may include an application library on the server 106 or cloud 108 that the clients 102a-102n may access over the network 104. The application distribution platform may include applications developed and provided by various developers. A user of the client device 102 may select, purchase, and/or download an application via the application distribution platform.

Further, computing device 100 may include network interface 118 to connect to network 104 through various connections, including, but not limited to, standard telephone line LAN or WAN links (e.g., 802.11, T1, T3, gigabit ethernet, infiniband), broadband connections (e.g., ISDN, frame relay, ATM, gigabit ethernet, SONET-based ethernet, ADSL, VDSL, BPON, GPON, fiber optics including FiOS), wireless connections, or some combination of any or all of the above. Connections may be established using various communication protocols (e.g., TCP/IP, Ethernet, ARCNET, SONET, SDH, Fiber Distributed Data Interface (FDDI), IEEE 802.11a/b/g/n/ac CDMA, GSM, WiMax, and direct asynchronous connections). In one embodiment, the computing device 100 communicates with other computing devices 100' via any type and/or form of gateway or tunneling protocol, such as Secure Sockets Layer (SSL) or Transport Layer Security (TLS), or Citrix gateway protocol produced by jie systems corporation of laddalberg, florida. Network interface 118 may include a built-in network adapter, network interface card, PCMCIA network card, EXPRESSCARD network card, card bus network adapter, wireless network adapter, USB network adapter, modem, or any other device suitable for interfacing computing device 100 to any type of network capable of communication and performing the operations described herein.

The computing device 100 of the type depicted in FIGS. 1B and 1C may operate under the control of an operating system that controls the scheduling of tasks and access to system resources. The computing device 100 may run any operating system, such as any version of the MICROSOFT WINDOWS operating system, different versions of the Unix and Linux operating systems, any version of the MAC operating system for a Macintosh computer, any embedded operating system, any real-time operating system, any open source operating system, any proprietary operating system, any operating system for a mobile computing device, or any other operating system capable of running on a computing device and performing the operations described herein. Typical operating systems include, but are not limited to: WINDOWS2000, WINDOWS Server 2022, WINDOWS CE, WINDOWS Phone, WINDOWS XP, WINDOWS VISTA and WINDOWS 7, WINDOWS RT and WINDOWS 8, all of which are manufactured by MICROSOFT CORPORATION of Redmond, Washington; MAC OS and iOS manufactured by apple inc of cupertino, california; and Linux, a freely available operating system, such as Linux Mint release ("distro") or Ubuntu, distributed by Canonical limited, london, england; or Unix or other Unix-like derivative operating systems; and google, mountain view, california, etc. Some operating systems, including the CHROME operating system such as Google, may be used on zero or thin clients, including CHROMEBOOKS, for example.

Computer system 100 may be any workstation, telephone, desktop computer, laptop or notebook computer, netbook, ultrabook, tablet, server, handheld computer, mobile phone, smartphone or other portable telecommunications device, media playing device, gaming system, mobile computing device, or any other type and/or form of computing, telecommunications, or media device capable of communicating. Computer system 100 has sufficient processor power and memory capacity to perform the operations described herein. In some embodiments, the computing device 100 may have different processors, operating systems, and input devices consistent with the device. For example, a samsung GALAXY smartphone runs under the control of the android operating system developed by google, inc. The GALAXY smartphone receives input through a touch interface.

In some embodiments, computing device 100 is a gaming system. For example, the computer system 100 may include PLAYSTATION 3 or PERSONAL PLAYSTATION Portable (PSP) or PLAYSTATION VITA devices manufactured by Sony corporation of Tokyo, Japan, NINTENDO DS, NINTENDO 3DS, NINTENDO WII or NINTENDO WII U devices manufactured by Nintendo corporation of Kyoto, Japan, XBOX 360 devices manufactured by MICROSOFT corporation of Redmond, Washington.

In some embodiments, the computing device 100 is a digital audio player, such as the apple IPOD, IPOD Touch, and IPOD NANO series of devices manufactured by apple computer of cupertino, california. Some digital audio players may have other functionality including, for example, a gaming system or any functionality provided by an application from a digital application distribution platform. For example, the IPOD Touch may access the apple app store. In some embodiments, computing device 100 is a portable media player or digital audio player that supports file formats, including, but not limited to, MP3, WAV, M4A/AAC, WMA protected AAC, AIFF, Audio book, apple lossless audio file format, and. mov,. M4v, and. MP4 MPEG-4(H.264/MPEG-4AVC) video file format.

In some embodiments, computing device 100 is a tablet computer, such as the IPAD family of devices from apple inc; triax GALAXY TAB series devices; or KINDLE FIRE by amazon. In other embodiments, the computing device 100 is an electronic book reader, such as the KINDLE series of Amazon.com, or the NOOK series of Barnes & Noble, N.Y..

In some embodiments, the communication device 102 comprises a combination of devices, such as a smartphone in combination with a digital audio player or a portable media player. For example, one of these embodiments is a smartphone, such as the IPHONE series smartphone manufactured by apple inc; samsung GALAXY series smart phones manufactured by samsung corporation; or motorola DROID series smart phones. In yet another embodiment, the communication device 102 is a laptop or desktop computer equipped with a web browser and a microphone and speaker system (e.g., a telephone headset). In these embodiments, the communication device 102 is network-enabled and can receive and initiate telephone calls. In some embodiments, the laptop or desktop computer is also equipped with a webcam or other video capture device that enables video chat and video calls.

In some embodiments, the status of one or more machines 102, 106 in the network 104 is monitored, typically as part of network management. In one of these embodiments, the state of the machine may include an identification of load information (e.g., number of processes on the machine, CPU and memory utilization), an identification of port information (e.g., number of available communication ports and port addresses), or an identification of session state (e.g., duration and type of process, and whether the process is active or idle). In another of these embodiments, the information may be identified by a plurality of metrics, and the plurality of metrics may be applied, at least in part, to decisions in load distribution, network traffic management, and network failure recovery, as well as any aspect of the operation of the present solution described herein. Aspects of the above-described operating environment and components will become apparent in the context of the systems and methods disclosed herein.

B.Processing of raw genomic data

FIG. 2 shows a genomic data translation system 200, similar to the genomic data translation system 120 shown in FIG. 1C. As described below, the genome data translation system 200 can receive raw genome data (e.g., in an extended table or comma-separated text file) and generate data indicative of abnormalities at the gene and chromosome levels identified in the raw genome data. The genomic data translation system 200 includes a translation engine 202, a Graphical User Interface (GUI) engine 204, and a data store 218. Data store 218 may store gene list 206, translation rules 208, reconfigurable text store 210, and human reference genome list 212. The GUI engine 204 may provide a GUI for display on a monitor or other display device. The GUI engine 204 may also receive user input from one or more input devices, such as a keyboard, mouse, touch screen, gesture detector, or other input device. The GUI engine 204 can provide an interactive interface to allow a user to provide input to control the operation of the genomic data translation system 200. The genome data translation system 200 can also be coupled to a computer network 214, which can include one or more wired or wireless networks, such as an ethernet network, the internet, a WiFi network, a bluetooth network, and the like. The genomic data translation system 200 may be implemented using the computing systems discussed above in connection with fig. 1A-1D.

The genome data translation system 200 can receive data from a next generation genome sequencer ("NG sequencer") 216, such as an Illumina sequencer, an Ion Torrent sequencer, and a 454 pyrophosphate sequencer. NG sequencer 216 may provide detailed chromosome analysis and may employ techniques such as array Comparative Genomic Hybridization (CGH), microarrays, oligonucleotide arrays, Single Nucleotide Polymorphism (SNP) arrays, Whole Genome Arrays (WGA), and the like. The NG sequencer 216 can provide the raw genomic data to the genomic data translation system 200. In particular, the NG sequencer 216 can generate raw genomic data that includes cytogenetic strip information. In some embodiments, rather than receiving raw genomic data directly from NG sequencer 216, genome data translation system 200 can provide the ability to upload raw genomic data generated by NG sequencer 216 through GUI engine 204.

Fig. 3 shows example raw genomic data 300 generated by a next generation sequencer. In particular, raw genomic data 300 may include cytogenetic band information. The cytogenetic band information may correspond to one or more chromosomes that exhibit abnormalities. As such, the original genomic data 300 may include only the cytogenetic strip genomic information of chromosomes exhibiting genetic alterations. The raw genomic data 30 may also include chromosome identification data, nucleotide ranges, and copy numbers, which represent the copy numbers of the corresponding gene regions present within the chromosomal nucleotide ranges.

FIG. 4 shows a flow diagram of a process 400 for translating raw genomic data. The process 400 can be used, for example, to translate the raw genomic data 300 shown in fig. 3. The process 400 may be performed by, for example, the genomic data translation system 200, and in particular the translation engine 202, shown in fig. 2. Process 400 includes receiving an output file generated by the NG sequencer that includes raw genomic data (stage 402). Referring again to fig. 2, the genomic data translation system 200 can receive the raw genomic data 300 directly from the NG sequencer 216. For example, the genome data translation system 200 can include one or more serial or parallel communication ports connected to the NG sequencer 216, and can receive the raw genome data 300 from the NG sequencer 216 through the communication ports. In some embodiments, the genomic data translation system 200 may receive a file, e.g., a data file, including the raw genomic data 300 from a user via the GUI engine 204.

Process 400 also includes determining at least one segment in the output file, the at least one segment including a chromosome number, cytogenetic strip information, a nucleotide range, and a set of copy numbers (stage 404). Fragments may include genomic data associated with a chromosome. The raw genomic data 300 includes genomic data associated with several genes. The translation engine 202 can parse the raw genomic data 300 to identify chromosomes in the raw genomic data for which genomic abnormality information exists. Translation engine 202 may determine the start of a file by searching for a file start identifier, such as "arr hg19," which may be unique to the NG sequencer 216 used and may vary based on the type of NG sequencer 216 used. In the raw genomic data 300 shown in FIG. 3, the identifier "arr [ hg19 ]" indicates that genomic analysis was done using array technology (e.g., array-CGH or SNP array) and encoded using "human genome construction (built) -19". Other constructs such as "hg 38," "hg 18," "hg 17," etc. may also be used to generate the raw genomic data 300. The translation engine 202 can parse the remainder of the raw genomic data 300 after the file start identifier to determine the start of the segment. For example, translation engine 202 may search for integers between 1 and 22 or the letters "X" and "Y" followed by the letters "p" or "q". Integers 1 through 22 correspond to chromosome number, "X" and "Y" correspond to X and Y chromosomes, and "p" and "q" correspond to the short and long arms of the chromosome, respectively. The translation engine may determine the end of the segment by searching for duplicate information indicated by the letter "x" followed by one or more integers (e.g., "x 2" or "x 1-2").

FIG. 5 shows various segments identified by genomic data translation system 200 from the original genomic data shown in FIG. 3. Specifically, the translation engine 202 identifies 15 segments: chromosome segment 1501, chromosome segment 3503, chromosome segment 5505, chromosome segment 6506, chromosome segment 7507, chromosome segment 9509, chromosome segment 11511, chromosome segment 12512, chromosome segment 16516, chromosome segment 17517, chromosome segment 19519, chromosome segment 20520, chromosome segment 21521, chromosome segment X522, and chromosome segment Y524.

Each fragment includes a chromosome number, e.g., the first integer "1" of the fragment, which indicates the chromosome number. Each fragment also includes cytogenetic band information such as "1 p36.33p11.2" and "1 q21.1q44" which identify the cytogenetic bands within the short and long arms of the first chromosome. Each fragment also includes a nucleotide range, e.g., "(849, 466-121,343, 783)", which represents a range having an abnormal or abnormal base pair as compared to the reference genome construction. In addition, each fragment also includes copy numbers, such as "x 1," which means that base pairs within the corresponding nucleotide range are observed only once, rather than twice as expected in normal subjects. Other copy numbers, such as "x 1-2," indicate that base pairs within the corresponding nucleotide range are observed once or twice.

The process 400 also includes determining a first set of genes within the nucleotide range, wherein the first set of genes is included in the human reference genome. Translation engine 202 can look up human reference genome list 212 to determine the genes present in each nucleotide range. There are several versions or constructs of the human reference genome. Translation engine 202 may determine the version to look for based on the identifier "arr [ hg19 ]" which in the example shown in fig. 3 refers to the "hg 19" version of the human genome list. The translation engine 202 can, for example, look up the nucleotide ranges (849,466, 121,343,783), (882,802, 121,339,317) present in the first segment 501 of the human reference genome list 212 and ((143,932,349, 249,224, 684). the human reference genome list 212 can return a first set of genes present within each of these nucleotide ranges.

Process 400 also includes determining a matching set of genes that includes at least one gene that appears in gene list 206 that matches a subset of the first set of genes (stage 408). The gene list 206 includes an identification of genes of interest to the clinician. The gene list 206 may include genes associated with certain diseases or abnormalities. For example, genes including, but not limited to, TNFRSF14, TP53, NOTCH4, DAXX, and LTB can be included in gene list 206. The gene list 206 may also include genes such as tumor suppressor genes, oncogenes, cell signaling proteins, adaptor proteins, cell surface receptors, soluble and/or membrane bound ligands, enzymes (e.g., proteases), chaperones, transcription factors, structural proteins, cytoskeletal proteins, proteins that regulate angiogenesis, cell division, cell adhesion, and cell cycle progression, and the like. The gene list 206 may also include cancer-related genes and/or non-cancer-related genes. In some embodiments, the gene list 206 includes genes that affect the function of specific organs including, but not limited to, lung, skin, heart, liver, kidney, pancreas, intestine, brain, eyes, ears, nose, and the like. In some embodiments, gene list 206 includes genes that affect the function of specific cell types including, but not limited to, neurons, epithelial cells, endothelial cells, striated muscle cells, smooth muscle cells or cardiac muscle cells, kidney cells, pancreatic cells, intestinal cells, eye cells, blood cells, sensory cells, mesenchymal cells, germ cells, extracellular matrix cells, secretory epithelial cells, hormone secreting cells, glial cells, and the like. In some embodiments, gene list 206 includes at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, at least 95, at least 100, at least 150, at least 200, at least 250, at least 300, at least 350, at least 400, at least 450, or at least 500 genes.

As described above, the first set of genes is determined using the nucleotide ranges and the human reference genome list 212. The translation engine 202 can compare the gene list 206 to the first set of genes to determine whether any genes in the gene list 206 are present in the first set of genes. For example, translation engine 202 can look up each gene in gene list 206 in the first set of genes, and if there is a match, the identity of the gene can be added to the matching genome. The presence of an abnormality in the first set of genes corresponding to genes from the gene list 206 includes a clinically relevant gene marker, and can indicate the nature and/or prognosis of a patient disease state (e.g., cancer) based on the raw genomic output corresponding to a patient nucleic acid sample that has been sequenced using the NG sequencer 216. Genetic abnormalities include deletions, insertions, translocations, miniclones (minor clone), copy number variations, and the like.

Process 400 additionally includes generating a configurable text string that includes a non-configurable text region and a configurable text region (stage 410). Fig. 6 illustrates an exemplary configurable text string 600. Configurable text string 600 includes a first non-configurable text region 602, a second non-configurable text region 614, and configurable text regions, namely chromosome # field 604, mini-clone field 606, gain/loss field 608, match genes field 610, and chromosome/fragment identifier field 612. The first non-configurable text region 602 includes the text "chromosome", and the second non-configurable text region 614 includes the text ": ". The first and second non-configurable text regions 601 and 614 may be left unchanged by the data in the original genomic data 300. However, the translation engine 202 may use other text instead of the text shown in FIG. 6. The translation engine 202 may populate the configurable text regions based on the raw genomic data 300 and the translation rules 208 (fig. 2). Translation rules 208 may include one or more translation rules associated with each configurable region. The translation rules 208 for a configurable region provide an identification of text to be entered into the configurable region based on the raw genomic data 300. The identification of text may be included in a reconfigurable text store 210, which may include a list of text that may be inserted into each configurable area.

The process further includes populating the configurable text region based on the chromosome number, the set of copy numbers, and the matched set of genes (stage 412). Fig. 7 shows an example translation output 700 of the translation engine 202 based on data in the raw genomic data 300, the translation rules 208, and the gene list 206. In particular, the translation output 700 includes a configurable text string corresponding to each chromosome identified in the original genomic data 300 or to each segment identified in fig. 5.

Chromosome # field 604 may be populated with text corresponding to a chromosome number, e.g., "1", "6", etc. The translation rule of the chromosome # field may specify that numbered text is included that corresponds to the chromosome number of the fragment. As shown in fig. 7, translation output 700 includes the appropriate number for each chromosome in the chromosome # field.

The subclone field 606 may be populated with the text "subclone with (minoclone with)" or no text at all, based on the absence of a "p" or "q" arm in the chromosome. For example, referring to the fragment of the ninth chromosome 509 shown in FIG. 5, the long arm "q" is deleted. As a result, translation engine 202 may include the text "minor clone with" in the minor clone field, as shown in the configurable text string corresponding to chromosome 9 in translation output 700.

Based on the copy number, the gain/loss field 608 may be populated with the text "loss of" or "gain of" or no text at all. For example, the conversion rule of the gain/loss field 608 may specify that if the copy number is less than 2, the gain/loss field may be filled with the text "loss of", and on the other hand, if the copy number is greater than 2, the gain/loss field may be filled with the text "gain of", e.g., referring to the segment of the ninth chromosome 509 shown in fig. 5, the copy number is "1-2" which is less than 2. Thus, the gain/loss field 608 may be populated with the text "loss of

The matching genes field 610 may be populated with text corresponding to the matching genes. For example, referring to the first fragment 501 of the first chromosome shown in fig. 5, the match list includes the gene "TNFRSF 14" and furthermore, a fragment containing "hmz" indicating a loss of heterozygosity is associated with the "p" arm. Thus, the matched gene field 610 may be filled with the text "1 p overlaps heterozygosity of TNFRSF14 gene". Translation output 700 shown in fig. 7 shows several examples of text inserted into matching gene field 610, two of which include text corresponding to chromosome 1 and chromosome 17.

Chromosome/fragment identifier field 612 identifies chromosomes, fragments, or cytogenetic bands that exhibit gain or loss. This field may be populated with one of the chromosome number, long arm/short arm identifier, or cytogenetic strip identifier. For example, referring again to the segment of the ninth chromosome 509 shown in FIG. 5, the copy number is less than 2, and thus, the chromosome/segment identifier field 612 is populated with the text "chromosome 9," as shown in FIG. 7. In another example, the fragment corresponding to chromosome 6506 in fig. 5 shows copy number "x 0", indicating a complete deletion of the "q" arm. Thus, chromosome/fragment identifier field 612 corresponding to chromosome 6 may be populated with "6 q", as shown in translation output 700 in fig. 7.

It should be understood that translation engine 202 is not limited to generating the number and types of configurable and non-configurable fields shown in fig. 6 and 7, and more configurable fields or fewer configurable fields may also be used.

In some embodiments, translation engine 202 may determine the content of the configurable text based on the number of base pairs in the nucleotide range of the chromosome. For example, if the number of base pairs in a nucleotide range is less than 5 and 10⁶Base pairs (Mb), then the translation engine may forego providing translation output in the form shown in the first portion 702, but may instead provide translation output in the manner shown in the second portion 704. In the second section 704, the translation engine 202 can provide a list of genes in the matching list and their corresponding segments.

28页详细技术资料下载

上一篇：一种医用注射器针头装配设备

下一篇：用于确定用于在对患者进行透析时改变治疗参数的治疗方案的方法和装置

Generating configurable text strings based on raw genomic data

相关技术

网友询问留言