A semantic structuring of educational research using ontologies

. This article is devoted to the presentation of the semantic interoperability of research and scientific results through an ontological taxonomy. To achieve this, the principles of systematization and structuration of the scientific/research results in scientometrics databases have been analysed. We use the existing cognitive IT platform Polyhedron and extend it with an ontology-based information model as main contribution. As a proof-of-concept we have modelled two ontological graphs, “Development of a rational way for utilization of methane tank waste at LLC Vasylkivska poultry farm” and “Development a method for utilization of methane tank effluent”. Also, for a demonstration of the perspective of ontological systems for a systematization of research and scientific results, the “Hypothesis test system” ontological graph has created.


Introduction
Now, more than ever, science affects all aspects of human life. Latest scientific developments are often and quickly implemented in industry. However, the scientific results usually are presented in human-readable form and not in a machine-readable, so it is hard to process the knowledge using automated informational technologies.
The basic structure of a typical research paper is the sequence of Introduction, Methods, Results, and Discussion (sometimes noted as IMRAD) [30]. Each section addresses a different objective. The Introduction section motivates the research problem that was discovered or the known facts about the problem; the Method section states what authors did to discover and address the problem in a new solution, what they achieved as results in experiments is written in the Discussion section, and what they had observed is discussed in the Results section.
The most common form of science reporting is a written paper. Depending on the purpose there are a few different types of papers: Analytical Research Paper, Argumentative (Persuasive) Research Paper, Definition Paper, Compare and Contrast Paper, Cause and Effect Paper, The argumentative paper presents two sides of a controversial question in one paper. Definition Paper 5000+ The definition paper describes facts or objective arguments without using any personal emotion or opinion of the author. Compare and Contrast Paper 5000+ Compare and contrast papers are used to analyse the difference between two viewpoints, authors, subjects or stories. Cause and Effect Paper 3000+ Cause and Effect Paper trace probable or expected results from a specific action and answer the main questions "Why?" and "What?". Interpretative Paper 3000+* An interpretative paper requires to use knowledge that have gained from a particular case study. Experimental Research Paper 3000+* This type of research paper describes a particular experiment in detail. Survey Research Paper 5000+* This research paper demands the conduction of a survey that includes asking questions to respondents. * Depends on the purpose of the article and the requirements of the journal, institute, teacher Most of the papers (but not all of them) nowadays are systemized by using scientometric databases. However, educational research reports, which use scientific methods, have not been systemized at all. Besides, scientist, unlike pupils, already know their field of research in detail and can determine by themselves their research hypothesis and they can do further analyse it by themselves. Students instead can't do this. Automated informational tools can help students in this scientific discovery and analysis tasks.
The scientific method is often used in an educational process during STEM approach by providing educational researches. This approach is only recently applied in countries such as Ukraine [47]. There are various school competitions for scientific works, such as the competition on scientific articles of the Junior academy of sciences of Ukraine and international competitions (for example, Intel ISEF). Also, the scientific method can be used during the process of creation of thesis papers (for masters' degree, bachelor's degree, etc.), pupil's research reports (for events noted before), or in simpler, but more common form of essays. In addition, students can report their results in form of scientific papers, if the level of quality of their work will be satisfactory for the scientific requirements. An overview of the types of educational research reports works are presented in table 2. The focus of this paper is on the systematization and processing of educational research reports. The problem to be addresses is the lack of a Structuration mechanism which complicates the automated processing of the reports.

Literature review
To increase the convenience and efficiency of scientific data processing, structuration, and systematization of research and scientific results, the active dissemination and use of different scientometrics databases continues [44]. Specialized databases for structural science information are an integral part of the information-support system for any scientist. Scientometrics is the "quantitative study of science, communication in science, and science policy" [41] commonly referred to as the "science of science". Scientometrics is essential to help academic disciplines understand various aspects of their research efforts, including (but not limited to) the productivity of their scholars [1,41], the emergence of specializations [38], collaborative networks [28], patterns of scientific communications [7], and quality of research products [17]. Metric studies had developed as a subsidiary branch of Library and Information Science (LIS) over time [13]. In most cases, scientometrics models by using bibliometrics, which is a measure of the impact of publications.
To increase the quality and performance of scientometrics the ten principles of the "Leiden Manifesto of Scientometrics" have been stated [13]: • Quantitative evaluation should support qualitative expert assessment. Today, all existing scientometrics databases can be divided into two major groups: international and national [13,15,24,34,37,42,43]. The most well-known international databases are: Springer, Scopus, Web of Science, CiteseerX, Microsoft Academic, aminer, refseek, BASE (Bielefeld Academic Search Engine), WorldWideSciense, JURN, Google Scholar, Google patent and others. National databases incorporate a variety of bibliographic databases, and a variety of library and university repositories. International scientometric databases are characterized by a larger scale and mandatory support for various languages, including English. Also, a characteristic feature of such databases is the availability and work with various special indices that have international recognition for example h-index [14].
As scienctific publications continue to grow exponentially, also the amount of academic databases and scientometrics databases increases, which supports gaining insights into the structure and processes of science [37].In this case, many scientific publications devoted to the principle of working scientometrics databases, and their number is growing. Thanks to them, concepts such as "metadata" of scientific articles began to be actively used in scientometrics [13,15,24,34,37,42,43]. Metadata is essential data about data providing information such as titles, authors, abstracts, keywords, cited references, sources, and bibliography, and other data. Metadata do not substitute the corresponding article, but it explicitly describes valuable information about the article.
By using of scientometrics systems, the contributions of researchers in the field of informatics and scientometrics were previously quantified [24]. The principal metadata indicators are: the indicators and citation indices of journals, the number of authors, the number of the publication and the degree of cooperation based on affiliation data. The disadvantage of this research is that it is devoted only to scientific articles. The authors noted that their study could not touch student's and pupil's research report because there is no single database where they are all located [24].
The application of the principles of the "Leiden Manifesto of Scientometrics" is stated and substantiated, which provides for transparent monitoring and support of research and encourages constructive dialogue between the scientific community and the public. In this work, the bibliometric base, which corresponds to principles of the "Leiden Manifesto of Scientometrics" has been created. The proposed bibliometric centre did not address the systematization of students and pupils' research reports, but the authors noted the necessity of involvement of students' and pupils' research reports in their bibliometric centre [15].
The approach of co-word analysis has been introduced and its application in scientometrics is substantiated in [43]. The trends and patterns of scientometrics in journals has been revealed by measuring the association strength of selected keywords which represent the produced concept and idea in the field of scientometrics. Also, the authors have developed a web system for extraction of keywords from the title and abstract of the article manually. However, the web system proposed by them cannot work with research reports of students and pupils.
Another concept of analysis is iMetrics or "information metrics". Its application in scientometrics is substantiated in [19]. iMetrics is devoted to the scientometrics of scientific journals in the field of informatics. The authors note the possibility of applying their approach for systematization of the scientific works of students and pupils. The research related to scientometrics databases is shown in table3. Table 3 The research related to scientometrics databases

Subject of study
The general result of the study Authors Previously, ontological graphs were used to systematize scientific articles [3,6,32,36]. Systematization and structuration in such graphs is based on different approaches such as using of scientific article recommendation system [3], Scientific Articles Tagging system [6], machine learning [36], automatic summarization [32]. Also, ontologies can be to provide interoperability through semantic technologies [2]. However, none of the proposed ontological approaches for systematization and structuration is addressing the structuration of research reports of students and pupils.
None of the scientometrics database systems previously proposed [13,15,24,34,37,42,43] can offer a universal solution for systematization, and structured presentation of research and scientific results to pupils and students. Also, the disadvantages of all these systems are the complete lack of many parameters, that are useful for processing information about scientific works. These parameters are: the scientific novelty of the article, the practical value of the study, the hypothesis of the study, subject and object of the research. Also, existing solutions do not allow to compare research reports between each other.
This work aims to propose and justify the use of an ontological system, which permits the systematization of scientific articles with all advantages of existing scientometrics systems and without disadvantages of these systems. Which at the same time will not be deprived of the functionality of current scientometrics systems and will meet the Leiden Manifesto for Scientometrics.
We propose to use the existing cognitive IT-platform Polyhedron as technical basis for solving this problem. The core of the Polyhedron system consists of advanced and improved functions of the TODOS IT-platform described in previous works. Polyhedron is a multi-agent system which allows for transdisciplinary and acts as an interactive component in any educational and scientific research [52]. Besides, the cognitive IT-platform Polyhedron contains a function for comparison with standards which is called auditing [9,10,52]. Polyhedron provides: semantic web support, information systematization and ranking [11] transdisciplinary support, internal search [45] has all advantages of ontological interface tools [40], and the construction of all chains of the process of transdisciplinary integrated interaction is ensured [56]. Due to active states are hyper-ratio plural partial ordering [29,60], the cognitive IT-platform Polyhedron is an innovative IT technology for ontological management of knowledge and information resource. The user of the Polyhedron IT system has an opportunity to use an internal search function that is more protected and reliable compared to the external one, because it provides information created by experts.

Ontology creation mechanism
To create ontologies in Polyhedron, Google Sheets were used to collect and structure the information (see example in figure 1). The sheets with research report data (structure file and numeric/semantic data file) have been downloaded and saved in .xls format. The files have been loaded to "editor.stemua.science", which is part of Polyhedron. After that, the generation of the graph nodes (in .xls) with its characteristics using the data structures in the file have been carried out. The obtained graphs have been saved in .xml format and located in the database. The graphs have been filled by semantic and numeric information for ranking and filtering. Ontological edges (relations) have been formed using predicate equations, as described previously in [56].

Ranking tools
Taking into account that e.g. proposed reports "A" and "B" are technical, the results of the reported works can be used to provide analysis of the rationality of the implementation proposed in the concrete project. For instance, to provide it, research reports "A" and "B" were also compared with each other using ranking tool applying the following criteria: "Short-term economic perspective", "Long-term economic prospects". For creating a ranking the ontologies have used the module "Alternative" which is described in our previous works [11]. To provide this ranking, the nodes of a graph have been filled with semantic data grouped in semantic classes. The ranking uses grade scale from one to ten point to underline the importance coefficient.
The projects with a payback period of more than 25 years have been evaluated with 1 point, with 20-25 years of payback period with 2 points, from 15-20 years of payback period with 3 points, from 10-15 years of payback period with 4 points, 6-10 yeas of payback period with 5 points and with 1-5 years were evaluated as 6-10 points, respectively, by the "Economic attractiveness" criterion. A detailed evaluation for projects with 1-5 years is provided, due to it's utmost interest for the investor's "payback time" , which determines the expediency of investment.

Auditing tools
To provide an audit of hypothesis of work "A" and "B", the "standard" graph (with which the comparison is done) and the "comparison" graph (which is compared with the "standard") have been created. The "standard" ontology graph contains the data on hypotheses, subjects, objects of research, keywords, and other parameters, of the research reports done before. For the "standard" graph, each parameter was presented in a separate node. The content of this ontological graph "standard" is updates and supplemented constantly.
The nodes of the "comparison" graph have been represented as names of the works which need to be audited with the "standard" graph. The parameters of the work used to be audited with the "standard" graph have been located in the metadata of each separate node. The metadata type names were identical to the names of the nodes of the "standard" graph in order to enable interaction between graphs.

Results and discussion
The general concept of the proposed ontology-based graph model for Polyhedron research reports has a specific, logically connected structure and can be represented as an ontology. After structuration, it is possible to represent the reports' content in simpler to understand presentation form. Besides, most results can be domain specific for each industry, and if the current standards are correctly identified, these values will be easy to compare. Also, most research in one field often use the same equipment, materials, chemicals, standard methods of analysis, literature, etc., which allow comparing these works with each other and correctly structuration them.
However, the main advantage of the proposed approach (besides structuration of the research) is the processing of results in terms of separated result parameters of the reports. This supports data analysis, further processing using ranking, and semantic data interoperability. The separation of numeric data and its location metadata class is possible due to the addresses of the same field, that is describing the process using same (or similar) parameters of the process description and result parameters description. For example, for most reports on anaerobic digestion, the process parameters are on temperature, type of substrate, reactor volume, moisture content, initial pH, parameters; the characteristics of efficiency of the process are biogas yield, methane content, average pH during the process, destruction process etc. [12].
As all research reports will be presented in a simplified form, this approach will be especially relevant for pupils and novice researchers with further potential use in the educational process or to simplify the literature review process for the new educational research.

Description of scientific works used to provide structuration
As an example, the object of the study of research report "A" is the disposal of anaerobic effluent. The subject of the research of the report is the Cultivation of Chlorella Vulgaris microalgae on effluent obtained after methane fermentation. The study aims to develop a method of growing Chlorella Vulgaris in effluent after methane fermentation. The practical significance of this scientific work is the results of this work, which will contribute to the spread of biogas technologies. Also, the proposed approach makes it possible to increase the economic benefits from the utilization of chicken manure by converting the anaerobic digestion effluent into microalgae, that have a wide range of applications. The scientific novelty of that research report is a method of utilization of anaerobic digestion effluent by using microalgae, also had obtained cultures of Chlorella Vulgaris that had adapted to the anaerobic digestion effluent. The working hypothesis was that the effluent obtained after anaerobic digestion can be used as a nutrient medium for microalgae Chlorella Vulgaris.
The object of the study of the research report "B" is the disposal of anaerobic digestion effluent. The subject of the research is the processing of anaerobic digestion effluent into humates by the autocatalytic catalysis method. The study aims to establish regularities of processing of the solid fraction, which had obtained during the process of methane fermentation of chicken manure by autocatalytic catalysis method. The practical significance of this scientific work is that the study indicates the possibility of acquiring salts of humic and fulvic acids by the autocatalytic catalysis method. This approach makes it possible to increase the economic benefits from the disposal of chicken manure by converting the anaerobic digestion effluent into a more valuable product with a wide range of applications. Its scientific novelty is that potassium hu-mate had firstly obtained from anaerobic digestion effluent and for the first time the efficiency of receiving humates from the solid fraction of anaerobic digestion had investigated and the main regularities of the process determined. The working hypothesis was that the solid fraction of methane fermentation of chicken manure can be recycled by the autocatalytic catalysis method.
For both research report "A" and "B", as a substrate for anaerobic digestion have used the chicken manure from the same poultry farm. In this case, chicken manure and its effluent, which has obtained by anaerobic digestion, were analysed by the same methods and indicators. Such indicators were: "ash and dry content", "Determination of volatile fatty acids content" (in terms of acetic acid), "Determination of ammonium nitrogen content with Nessler's reagent". The equipment which has used to determine these indicators was also the same. Therefore, has considered how these works can be structured and integrated by using of the cognitive IT-platform Polyhedron. All examples of the usage ontological nodes the obtained graphs for further potential information processing are presented in table 4.

Structuration of the scientific works using ontologies
For the presentation of possibilities and systematization of the research report we have applied a ontological taxonomy for students' works "A" and "B". The general view of the obtained graphs is shown in figure 2 [56]. A separate node called "Abstract" has been created, which contains all the necessary metadata of the work such as "Object of the study", "Subject of study", "The aim of the study", "Practical value", "Scientific novelty", "Keywords" and "Hypothesis of scientific works" in form of the attributes. All metadata have been used to provide filtering and ranking.
The "Materials and methods" node, which contains all the materials was used to perform the experiments. Every approach has been divided into the separate attribute of the node. This allows concentrating the reader's attention, and it helps to process the data with each other. In further researchers, this mechanism will be described in detail. The general view of both works' "Material and Methods" node is shown in figure 3 [56].
For each ontological node that duplicate sections of the research report, and that contain specific indicators after analysing, additional separate leaf nodes with these results have created. In this leaf node, all the issues are held in the form of semantic and numeric data. These results are automatically available for filtering, auditing and ranking. An example of this leaf node is shown in figure 4.

Information processing of the research report using
Polyhedron tools

Using an audit tool to test a hypothesis
The audit tool [9,10,52] can be used to compare the hypotheses, subjects, objects of research, keywords, and other parameters of the research reports. To demonstrate the capabilities of the audit tool, the focus is on auditing only hypotheses. A model version of the "standard" ontology has been created, which contains metadata from the "Abstract" node of the research reports "A" ontological graph. This ontology had a simple structure without branches with the parent node being named "Abstract". The child nodes duplicate metadata from the "Abstract" node of the research reports "A". The "comparison" ontology has been created with the child nodes which contain the following hypothesis: the effluent obtained after anaerobic digestion can be used as a nutrient medium for microalgae Spirulina Platensis (hypothesis 1), the effluent obtained after anaerobic digestion can be used as a nutrient medium for microalgae Chlorella Vulgaris (hypothesis 2), the effluent obtained after anaerobic digestion cannot use it as a nutrient medium for microalgae Chlorella Vulgaris (hypothesis 3). The hypothesis 2 node also contain some metadata. This ontology also had a simple structure without branches with the parent node is the "Hypothesis test system". The general view of the obtained ontology of the comparison and the ontology of the standard in taxonomic form is shown in figure 5. Using the function of the audit the system has checked the hypothesis to be true or false. Those indicators which do not correspond to the standard have been colored by red. Thus, this solution will allow not only to test the hypothesis of these scientific works, but also to check other metadata that have already been set by using information from the "Abstract" node (see figure 6).

Analysing of the research reports result on the practice value
Research report "A" and research report "B" have been compared with each other by the following criteria "Short-term economic perspective", "Long-term economic prospects". According to section 2 of the research report "A", the payback period of project "A" is five years, which corresponds to 6 points according to the criterion "Economic attractiveness". This parameter is better for the project described in report "B" with a payback period of four years and three months which corresponds to 5 points on "Economic attractiveness". The system provides raking of the results. In case, if there will be a large amount of the data, the instrument, will be useful to quickly and effectively evaluate the projects on "Economic attractiveness". Besides, in further research, the other criteria will be justified and used to provide data management on the educational research, which will make the tool more functional. The general view of the ranking result is presented in figure 7.

Discussion
The proposed database follows the "Leiden Manifesto of Scientometrics". In the obtained ontological database quantitative evaluation can be supported by qualitative expert assessment. Additionally, this ontological database can unite the research missions of the institution, group, or researcher and protect excellence in internally relevant research. The ontological form of research reports can keep data collection and analytical processes open, transparent, and simple. Because all metadata is contained in a separate node that can be expanded and supplemented. Thus the obtained ontological database can also account for variations, e.g. in publication and citation practices and it can provide a base assessment of individual researchers in a qualitative judgment of their portfolio. Because all ontological graphs are validated by experts, in this way it is possible to avoid misplaced concreteness, including false precision and recognize the systemic effects of all assessment and indicators. In addition, in the obtained ontological database indicators can be scrutinized regularly and updated. Furthermore, the proposed ontology-based research reports can be integrated in a single environment -ontology repositories, as it was proposed before [33].
The process starts from the paper creation, for this stage we can use various text editors, for example, word or google doc. Then expert or author of the paper will formulate metadata, which is necessary for the ontology. For this purpose, the author will use Microsoft Excel or Google Sheets. Then, an editor needs to add information in the graph, in our occasion it is the IT Platform Polyhedron. And last, but not least it is possible to use the "Alternative" system, which includes Audit, Filtering and Ranking instruments. All proposed instruments are illustrated in the workflow diagram in figure 8.

Conclusions
An ontological approach for the systematization of scientific works has been proposed, which also ensures their interoperability. A method of research reports structuration using digital taxonomies (ontologies) has been developed. It supports using the native structure of the reports to define hierarchical relations of the nodes. Concrete parameters were added as metadata (semantic, numeric, pictures and links) of the nodes to provide processing using Polyhidron tools. Ranging and filtering were used for semantic and numeric metadata processing. Obtained results provide interoperability between different research reports (including educational). The obtained ontological approach follows the "Leiden Manifesto of Scientometrics". Further research will be devoted to provide even better interoperability between research works by providing generation of one single taxonomy that provides hierarchization by same methods, literature and results of the reports and its processing using both, methods proposed in the research and newly developed ones.