Issue
Design of abnormal data detection system for protein gene library based on data mining technology
Corresponding Author(s) : Cuixia Liu
Cellular and Molecular Biology,
Vol. 66 No. 7: Issue 7
Abstract
In view of the shortcomings of the current abnormal data detection system of the protein gene library, such as low detection rate and high error detection rate, the abnormal data detection system of the protein gene library based on data mining technology is designed. The protein gene enters the firewall module of the system, and enters the immune module when it does not match the firewall rules; the memory detector in the immune module presents the protein gene, if the memory detector does not match the protein gene, the mature detector presents the protein gene, if the mature detector does not match the protein gene, it is determined as the normal protein gene data package, if it matches, it is considered that The abnormal data of protein gene was processed by the collaborative stimulation module, and the control module controlled by C8051F060 chip to detect the abnormal data of protein gene library. The immune module generates new protein gene sequences through an immature detector, simulates the immune mechanism of protein gene through a mature detector module, and simulates the secondary response in the abnormal data detection system of protein gene library through memory detector. The system introduces data mining technology into the detection and uses a two-level dynamic optimization algorithm to calculate the ASG similarity value of protein gene secondary structure arrangement. According to this value, the abnormal data detection of the protein gene library is realized by randomly generating protein genes, negative selection, clone selection and copying memory cells through gene expression. The experimental results show that the system can quickly detect abnormal data of the protein gene library, ensure the detection efficiency, and the detection accuracy reaches 97.1%. The system can reduce the error rate of normal protein gene detection as an abnormal protein gene.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- Jiu PX, Kai S, Lei X. Integrated system health management-oriented maintenance decision-making for multi-state system based on data mining. Int J Syst Sci 2016; 47: 15-23
- Yan HY, Yu XZ, Yang G. Analysis of the autophagy gene expression profile of pancreatic cancer based on autophagy-related protein microtubule-associated protein 1A/1B-light chain 3. World J Gastroenterol 2019; 25: 2086-2098.
- Ernur S, Benjamin J, Harrison KW. Framework for reanalysis of publicly available Affymetrix® GeneChip® data sets based on functional regions of interest. BMC Genomics 2017; 18: 875-889.
- Ming Z, Gerold SU, Christine S. Drug repositioning for Alzheimer's disease based on systematic ‘omics' data mining. Plos One, 2016; 11: 168-181.
- Shon HS, Han SH, Kim KA. Proposal reviewer recommendation system based on big data for a national research management institute. J Inform Sci 2016; 43: 147-158.
- Zhe HZ, Zhi HY, Hong L. A protein-protein interaction extraction approach based on deep neural network. Int J Data Mining and Bioinformatic 2016; 15: 145-167.
- Jia HB, Yi FT, Zhe WQ. ClickGene: An open cloud-based platform for big pan-cancer data genome-wide association study, visualization and exploration. BioData Mining 2019; 12: 114-152.
- Zhang YQ, Wang Y, Liu HD, Li B. Six genes as potential diagnosis and prognosis biomarkers for hepatocellular carcinoma through data mining. J Cell Physiol 2018; 234: 332-339.
- Marina M, Dora I, Ríos C. Determining clostridium difficile intra-taxa diversity by mining multilocus sequence typing databases. BMC Microbiol 2017; 17: 62-71.
- Rajalkshmi D, Dinakaran K. A novel time series pattern matching model combined with ant colony optimization and optimal binary search trees based segmentation approach. J Comput Theor Nanosci 2017; 14: 5203-5208.
- Hong BX, Hai XW, Ming ZQ. In silico drug repositioning for the treatment of Alzheimer's disease using molecular docking and gene expression data. RSC Adv 2016; 6: 98080-98090.
- Jaewon C, Hyuk JK. The information filtering of gene network for chronic diseases: Social network perspective. Int J Distrib Sens Netw 2015; 20: 1-6.
- Jin WB, Rosa Y, Kenneth DC. Fungal artificial chromosomes for mining of the fungal secondary metabolome. BMC Genomics 2015; 16: 343-361.
- Zhao XW, Fei F, Xiao SZ. Development of diagnostic model of lung cancer based on multiple tumor markers and data mining. Oncotarget 2017; 8: 94793-94804.
- Song JD, Simon XY, Feng CT. Classification of orange growing locations based on the near-infrared spectroscopy using data mining. Intell Autom Soft Comput 2015; 22: 1-7.
- Devi F, Achmad NH, Jonson LG. A spatio-temporal data-mining approach for identification of potential fishing zones based on oceanographic characteristics in the eastern Indian Ocean. IEEE J Sel Top Appl Earth Obs Remote Sens 2015; 9: 1-9.
- Fereshteh C, Mehdi S, Elahe E. Confident gene activity prediction based on single histone modification H2BK5ac in human cell lines. BMC Bioinformatics 2017;18: 67-89.
- Hongya Z, Debby DW, Long C. Identifying multi-dimensional co-clusters in tensors based on hyperplane detection in singular vector spaces. Plos One 2016; 11; 162-293.
- Hongyan L, Xi XM, Chun XW. Design and analysis of a general data evaluation system based on social networks. EURASIP J Wirel Commun Netw 2018; 218: 109-127.
- Yu XL, Xu YJ, Zhou ZX. Sparse event detection based on parallel discrete social spider optimization algorithm and compressed sensing in wireless sensor networks. J Chin Acad Electron Inform Technol 2017; 12: 202-208.
- Zhang CH, Zhou JW, Du CS. Review of control strategies of single-phase cascaded h-bridge multilevel inverter for grid-connected photovoltaic systems. J Power Supply 2017; 15: 1-8.
- Hu NJ, Zhou W, Zheng JL. Preparation and electrochemical performance of porous V2O5 microspheres. Chin J Power Sources 2018; 42: 108-116.
- Qu JJ. Research on the function of electronic medical record and related problems in hospital informatization management. Autom Instrum 2017; 15: 226-227.
- Tang M, Yang Y, Li XF. Two-grid finite element discretization methods for a class of Poisson-Nernst-Planck equations. J Jilin Univ (Science Edition), 2019; 57: 71-77.
- Liu W, Xu CH, Chen ZY. Simulation of adaptive filtering method for target image data. Comput Simul 2017; 34: 260-263.
- Talat F, Wang K. Comparative Bioinformatics Analysis of the Chloroplast Genomes of a Wild Diploid Gossypium and Two Cultivated Allotetraploid Species. Iran J Biotechnol 2015; 13(3): 47-56.
- Son J, Jeong H, Lee E, No S, Park D, Chung H. Identification of specific gene expression after exposure to low dose ionizing radiation revealed through integrative analysis of cDNA microarray data and the interactome. Int J Radiat Res 2019; 17 (1) :15-23.
- Kuai H, Zhong N. The Extensible Data-Brain Model: Architecture, Applications and Directions. J Comput Sci 2020: 101103.
- Dash S, Shakyawar SK, Sharma M, Kaushik S. Big data in healthcare: management, analysis and future prospects. J Big Data 2019; 6(1):54.
References
Jiu PX, Kai S, Lei X. Integrated system health management-oriented maintenance decision-making for multi-state system based on data mining. Int J Syst Sci 2016; 47: 15-23
Yan HY, Yu XZ, Yang G. Analysis of the autophagy gene expression profile of pancreatic cancer based on autophagy-related protein microtubule-associated protein 1A/1B-light chain 3. World J Gastroenterol 2019; 25: 2086-2098.
Ernur S, Benjamin J, Harrison KW. Framework for reanalysis of publicly available Affymetrix® GeneChip® data sets based on functional regions of interest. BMC Genomics 2017; 18: 875-889.
Ming Z, Gerold SU, Christine S. Drug repositioning for Alzheimer's disease based on systematic ‘omics' data mining. Plos One, 2016; 11: 168-181.
Shon HS, Han SH, Kim KA. Proposal reviewer recommendation system based on big data for a national research management institute. J Inform Sci 2016; 43: 147-158.
Zhe HZ, Zhi HY, Hong L. A protein-protein interaction extraction approach based on deep neural network. Int J Data Mining and Bioinformatic 2016; 15: 145-167.
Jia HB, Yi FT, Zhe WQ. ClickGene: An open cloud-based platform for big pan-cancer data genome-wide association study, visualization and exploration. BioData Mining 2019; 12: 114-152.
Zhang YQ, Wang Y, Liu HD, Li B. Six genes as potential diagnosis and prognosis biomarkers for hepatocellular carcinoma through data mining. J Cell Physiol 2018; 234: 332-339.
Marina M, Dora I, Ríos C. Determining clostridium difficile intra-taxa diversity by mining multilocus sequence typing databases. BMC Microbiol 2017; 17: 62-71.
Rajalkshmi D, Dinakaran K. A novel time series pattern matching model combined with ant colony optimization and optimal binary search trees based segmentation approach. J Comput Theor Nanosci 2017; 14: 5203-5208.
Hong BX, Hai XW, Ming ZQ. In silico drug repositioning for the treatment of Alzheimer's disease using molecular docking and gene expression data. RSC Adv 2016; 6: 98080-98090.
Jaewon C, Hyuk JK. The information filtering of gene network for chronic diseases: Social network perspective. Int J Distrib Sens Netw 2015; 20: 1-6.
Jin WB, Rosa Y, Kenneth DC. Fungal artificial chromosomes for mining of the fungal secondary metabolome. BMC Genomics 2015; 16: 343-361.
Zhao XW, Fei F, Xiao SZ. Development of diagnostic model of lung cancer based on multiple tumor markers and data mining. Oncotarget 2017; 8: 94793-94804.
Song JD, Simon XY, Feng CT. Classification of orange growing locations based on the near-infrared spectroscopy using data mining. Intell Autom Soft Comput 2015; 22: 1-7.
Devi F, Achmad NH, Jonson LG. A spatio-temporal data-mining approach for identification of potential fishing zones based on oceanographic characteristics in the eastern Indian Ocean. IEEE J Sel Top Appl Earth Obs Remote Sens 2015; 9: 1-9.
Fereshteh C, Mehdi S, Elahe E. Confident gene activity prediction based on single histone modification H2BK5ac in human cell lines. BMC Bioinformatics 2017;18: 67-89.
Hongya Z, Debby DW, Long C. Identifying multi-dimensional co-clusters in tensors based on hyperplane detection in singular vector spaces. Plos One 2016; 11; 162-293.
Hongyan L, Xi XM, Chun XW. Design and analysis of a general data evaluation system based on social networks. EURASIP J Wirel Commun Netw 2018; 218: 109-127.
Yu XL, Xu YJ, Zhou ZX. Sparse event detection based on parallel discrete social spider optimization algorithm and compressed sensing in wireless sensor networks. J Chin Acad Electron Inform Technol 2017; 12: 202-208.
Zhang CH, Zhou JW, Du CS. Review of control strategies of single-phase cascaded h-bridge multilevel inverter for grid-connected photovoltaic systems. J Power Supply 2017; 15: 1-8.
Hu NJ, Zhou W, Zheng JL. Preparation and electrochemical performance of porous V2O5 microspheres. Chin J Power Sources 2018; 42: 108-116.
Qu JJ. Research on the function of electronic medical record and related problems in hospital informatization management. Autom Instrum 2017; 15: 226-227.
Tang M, Yang Y, Li XF. Two-grid finite element discretization methods for a class of Poisson-Nernst-Planck equations. J Jilin Univ (Science Edition), 2019; 57: 71-77.
Liu W, Xu CH, Chen ZY. Simulation of adaptive filtering method for target image data. Comput Simul 2017; 34: 260-263.
Talat F, Wang K. Comparative Bioinformatics Analysis of the Chloroplast Genomes of a Wild Diploid Gossypium and Two Cultivated Allotetraploid Species. Iran J Biotechnol 2015; 13(3): 47-56.
Son J, Jeong H, Lee E, No S, Park D, Chung H. Identification of specific gene expression after exposure to low dose ionizing radiation revealed through integrative analysis of cDNA microarray data and the interactome. Int J Radiat Res 2019; 17 (1) :15-23.
Kuai H, Zhong N. The Extensible Data-Brain Model: Architecture, Applications and Directions. J Comput Sci 2020: 101103.
Dash S, Shakyawar SK, Sharma M, Kaushik S. Big data in healthcare: management, analysis and future prospects. J Big Data 2019; 6(1):54.