Some experience in maintenance of an academic cloud

. The article is devoted to the systematization of experience in the deployment, maintenance and servicing of the private academic cloud. The article contains model of the authors’ cloud infrastructure. It was developed at Ternopil Volodymyr Hnatiuk National Pedagogical University (Ukraine) on the basis of the Apache CloudStack platform. The authors identify the main tasks for maintaining a private academic cloud. Here they are making changes to the cloud infrastructure; maintenance of virtual machines (VM) to determine the performance and migration of VM instances; work with VMs; backup of all cloud infrastructure. The analysis of productivity and providing students with computing resources is carried out. The main types of VM used in training are given. The number and characteristics of VM that can be served by a private academic cloud are calculated. Approaches and schemes for performing backup are analysed. Some theoretical and practical experience of using cloud services to perform backup has been studied. Several scripts have been developed for archiving the platform database and its repositories. They allow you to upload backups to the Google Drive cloud service. The performance of these scripts for the author’s deployment of private cloud infrastructure was evaluated.


The problem statement
Today, many universities are creating their own cloud-based learning environments (CBLE). Although there is currently no single concept for CBLE, scientists understand it as similar concepts [3,9,14,17], in general, it can be understood as an IT system consisting of cloud services and providing learning mobility, group collaboration of teachers and students to achieve educational goals [15].
As the analysis of the literature shows, many university CBLE are usually deployed according to the hybrid model [16,23,28]. One of the most important components in the structure of this environment is the private academic cloud [8]. productive IaaS service model. A hybrid cloud is a cost-effective way to solve the problem of insufficient computing resources. The private academic cloud allows the university to meet the peak demand of students and faculty through a combination of local infrastructure and one or more public clouds [29].
Various commercial and free platforms are used in universities to build private academic clouds [1,2,6,12]. A productive method of deploying private academic clouds is to use solutions from leading cloud vendors such as Google Inc., Microsoft, Amazon and others. Google Inc. offers researchers, universities, faculties, faculty and students grants and loans for teaching and research. In particular, leading European educational institutions can access Google Cloud within the Internet2 project [10]. Unfortunately, these opportunities are not currently available for our country (Ukraine). For example, Microsoft Educator Grant is a program designed specifically to provide access to Microsoft Azure to college and university professors teaching advanced courses. As part of this program, faculty teaching Azure in their curricula are awarded subscriptions to support their course [18]. In general, these programs are very useful and productive. However, they are usually provided on a temporary basis and therefore cannot completely replace the cloud-based IT infrastructure of universities and colleges.
Among the free platforms for cloud infrastructure deployment, CloudStack, Openstack, Proxmox, and Eucalyptus are the most suitable. Each of them has its advantages and disadvantages. There are many attempts to compare these platforms. The authors of articles comparing such platforms state [7,18]: • OpenStack has large community, offers wide integration with storage, network and compute technologies, but is too complex to deploy and configure; • Eucalyptus the longest-standing open source project is banking on its very tight technical ties to Amazon Web Services (AWS). The platform is configurable, but not very customizable; • Proxmox is open source platform. It can provide easy way to deployment cloud infrastructure. But it is not very suitable as a platform for a private cloud in the CLBE; • CloudStack has well rounted GUI, can provide an advanced cloud infrastructure, but it is very GUI centric built on single Java core.
We have deployed private academic cloud based on the Apache CloudStack platform. It contains a management server, 4 hosts, 4 primaries and 1 secondary storage. We decided to use hypervisors instead of containers. This is because the former are more versatile. In addition, the use of hypervisors is safer than containers. To save computing resources, we have installed primary storage on the hosts. We used VLANs to distribute traffic across individual networks. These networks can be allocated to groups or individual students.
In general, our private academic cloud provides [7]: • Development and Execution of student virtual machines; • Aggregation of computing resources of several hosts; • VM migration between repositories; • VM connections to each other through guest networks; • Launching VMs within other VMs; • Integration with Active Directory; • Distribution of student accounts according to their academic groups.
There are many problems and corresponding tasks in the process private academic cloud using.
The purpose of the article is to systematize author's experience in maintenance the Cloud Based Learning Environment.
The following tasks are required to achieve the goal of the research: 1. Analysis of maintenance tasks of academic clouds in foreign and Ukrainian universities. 2. Definition of maintenance tasks of academic cloud which has deployed by authors.
3. Listing and systematization of author's experience in maintenance of academic cloud of Ternopil Volodymyr Hnatiuk National Pedagogical University.

The private clouds maintenance tasks
As the experience of cloud infrastructure maintenance shows, this is an ongoing process. It requires constant attention from engineers, network system administrators, teachers, and student involvement. Scientists describe the experience of deploying an academic private cloud, including determining the performance of hypervisors and storage [25]. The biggest challenge for researchers was the transition from a prototype of an academic cloud to a productive one. In this context, they addressed the problem of load balancing, elastic hypervisors, security threats. Storage backup tasks are also important for such clouds. The authors of the book "Data Backups and Cloud Computing" offer the concept of backup to cloud storage. They say that both the cloud provider and the cloud consumers have to take comprehensive steps to ensure appropriate configurations, hardening of the CBLE, appropriate design and development, appropriate interoperability, and adequate testing [13].
Scientists from the Institute of Physics and Mechanics of the National Academy of Sciences of Ukraine and Lviv Polytechnic National University have developed an effective method of deduplication and distribution of data in cloud storage during the creation of backups. Researchers have developed an intelligent system for such deduplication and tested it [21]. Junfeng Tian, Zilong Wang and Zhen Li also studied cloud data backup [26]. The authors propose a scheme for data separation and backup and encryption. They state that their own scheme resolves the conflict between data security and the survivability of the IT infrastructure with the help of encrypted backup.
The Apache CloudStack cloud infrastructure redundancy model developed by Paul Angus is very useful for our study. It creates a vendor agnostic API and UI in CloudStack for end users. The author's Framework abstracts the specifics of solutions, such that through the use of a plugin, a 3rd party solution can deliver backup and recovery solutions [22].

Definition of maintenance tasks of authors' academic cloud
Here are the main tasks for servicing our sample private academic cloud: 1. Work with student accounts; 2. Making changes to the cloud infrastructure; 3. Creating VM templates; 4. VM service (system, student, teacher); 5. Determining the performance of individual hosts and the cloud as a whole; 6. Migration of VM specimens; 7. Stopping and restarting physical hosts; 8. Cloud infrastructure backup.
The first task involves creating student and faculty accounts. We authenticate users of the academic cloud from a centralized database -LDAP-directory (Microsoft Active Directory). This approach makes it possible to use single registration data to access all hybrid IT infrastructure services. We used CloudStack domains to distribute students according to academic groups. Adding users to them is possible in automatic (using links at the first successful authentication) and manual mode. Unfortunately, due to the incompatibility of our users' logins with the Apache CloudStack platform, we had to choose the manual mode. To reduce the technical work involved in finding LDAP directory entries, we have created several queries to filter user account data.
Maintenance of our developed cloud infrastructure involves the implementation of tasks such as: • changing the parameters of the components of the cloud infrastructure -zones, clusters, hosts, storages; • creating and routing of virtual networks for individual groups or students; • creating and modifying templates of compute offering services that determine the performance of VM; • creating and modifying network offering service templates such as VPN, DHCP, DNS servers, Firewall, Load Balancer and others; • creating projects for VM sharing by students.
When creating the service offering templates, we compared the characteristics of the hardware hosts (CPU frequency, RAM) with the minimum guest OS requirements and the number of students. To do this, we used the inequality: where -the total frequency of VMs processors; -amount of students; -the minimum frequency is recommended for the guest OS; ℎ -total frequency of hardware host processors. The last value can be found from the ratio: where -the number of cores in the processor of the i-th host, -CPU frequency of the i-th host.
It is well known that the frequency of a modern processor is not constant. It can increase or decrease depending on the mode of operation of the CPU. That's why we use Processor Base Frequency in the tables and formulas above. Processor Base Frequency describes the rate at which the processor's transistors are open and close. This processor frequency is measured by each hypervisor in the Apache CloudStack platform.
Similarly, to determine the required amount of memory we used the inequality: As table 1 shows, the private academic cloud has a total frequency of about 50GHz. And the total amount of memory is about 90 Gb. Regarding the frequency, two other opposite factors should be taken into account: • Table 1 shows the base frequency, and processors can run faster thanks to Turbo-Boost technology; • Hosts run other software (OS, databases, management servers, hypervisors). It also consumes resources. Comparing the data in table 2 and table 1, we can conclude that our academic cloud provides about 50 VM with Linux without a graphical user interface (GUI), more than 40 VM with Windows Workstation, about 35 VM with Windows Server and OC Linux with GUI. We use the EVE-NG platform for modelling in the study of computer networks. It launches its own VMs inside the main Apache CloudStack VM [24]. Such nested virtualization requires more resources. Therefore, Table 1 has a row named OSAdvLinux. For this OS, our cloud can run about 20 instances. We have created in our cloud infrastructure some compute offering templates based on the data provided.
Regarding VM migration, we used the approach described in [26]. Its authors propose to evaluate the efficiency of the cloud infrastructure as an integrated indicator of the use of resources of each instance of the VM. The authors indicate that a specific instance needs to be migrated to another host to resolve the issue. They propose the concept of non-uniformity, which is determined by the ratio: where is the number of resources, is the projected use of the -th resource,¯is the average predicted value of the use of all resources of the -th server. Academic cloud administrators should minimize value . To define an overloaded host, the concept of "hot spot" is used. The host will be "hot" if at least one of its resources exceeds the limit value ("temperature"). To determine the host's "temperature" the amount of use of all its resources.
If the value of * is greater than zero, then virtual machines should be migrated from the appropriate host. Apache CloudStack system implements the appropriate functionality. Root or domain administrator can transfer both disks of virtual machines and run them on another host.
Another way to solve the problem of lack of computing resources is CPU and RAM overcommit. In this case, the Apache CloudStack system administrator sets the multiplier. This number is multiplied by the total CPU frequency or amount of RAM. However, this method should not be abused. This can lead to unpredictable consequences, such as denial of service to virtual machines.

Designing and realization an academic clouds' backup model
Experience shows that the task of backup is very important and time consuming. This is primarily due to the large amounts of student VMs data in the private academic cloud. Large companies develop a disaster recovery plan in this case. Large companies are developing a disaster recovery plan in this case. But in educational institutions, IT services work to perform such tasks. Therefore, they need to develop a model, identify potential risks in the IT infrastructure, consider and implement an appropriate backup system.
The development of a backup strategy requires the definition of the main goals and objectives of the backup, tools and regulations. In general, the problem of back-up is relevant for almost all IT infrastructures. When choosing a backup method, the following criteria are important [4].
The development of a backup strategy requires the definition of the main goals and objectives of the backup, tools and regulations. In general, the problem of backup is relevant for almost all IT infrastructures. When choosing a backup method, the following criteria are important: • backup time to the storage; • recovery time from backup; • the number of copies that can be stored; • risks due to inconsistency of backups, imperfection of the backup method, complete or partial loss of backups; • overhead costs: the level of load on the servers when performing copying, reducing the speed of service response, etc; • the cost of renting all services and storage.
Currently, there are 3 main backup schemes such as: • Full. This type of backup creates a complete copy of all data. • Incremental. In this case, only files that have changed since the previous backup are copied. The following incremental backup only adds files that have been modified since the previous backup. • Differential. The backup program copies each file that has changed since the last full backup. Differential copying speeds up the recovery process.
To save material costs, we use almost no server equipment and powerful and high-speed network storage in our academic cloud installation. Instead, we decided to use cloud services. For example, the Google Drive service within the G Suite for education package offers virtually unlimited disk space [20]. The disadvantage of such a repository is the significant time to upload or download backups. This speed will be limited by the bandwidth of the university's Internet channel. The latter requirement can be considered acceptable, as our implementation of the academic cloud is used primarily for training rather than for production.
To use Google Drive in our own scripts, we need to use the API of this service. This interface is accessible through Google Developers Console, a software developer service. First you need to create your own project. Credentials were created to access this project. We have chosen to access OAuth 2.0 accounts. OAuth is an open authorization standard that allows a user or application to give and access data without having to enter a login and password. Access tokens are used for this purpose. Each access token provides access to a specific client to specific resources and for a specified period of time [20]. After adding a new project, we created new data for authentication, selected the type of application (desktop) and activated the appropriate API (Google Drive API).
Our research was performed at the Joint Laboratory of the Institute of Information Technologies and Learning Tools of the National Academy of Educational Sciences of Ukraine, and Ternopil Volodymyr Hnatiuk National Pedagogical University.
Our academic cloud deployment contains the following objects: • One management server; • Four hosts for running VMs instances; • Four primary repositories containing disks of these VMs; • One secondary repository for saving templates and ISO images.
Because templates and ISO images do not change, but only new ones are added, we chose the incremental method to back up the secondary storage. Its implementation was based on the use of a ready-made utility for synchronizing storage files. Unfortunately, there is currently almost no such high-quality utility like Google back-up and Sync, which is developed for OC Windows. We analysed several tools such as: • Gdrive (grive2). Google Drive client with the support for the new Drive REST API and partial sync. It can't provide continuously wait for changes in file system or in Google Drive to occur and upload. • Gnome-online-accounts. It is system utility located within system's settings in Gnome GUI. But it can only be executed in a graphical interface. • GoSync is a Google Drive client with GUI support for Linux. It is designed under the GNU General Public License. The client is not perfect enough, for example, it has automatic regular synchronization every 10 minutes. • Google-drive-ocamlfuse is a FUSE (Filesystem in Userspace) filesystem for Google Drive, written in OCaml. FUSE is a free module for the kernel of Unix-like operating systems. It allows developers to create new types of file systems available for users to mount without the root privileges of Google Drive on Linux.
We used the latest utility. Here are its main features [11]: • full read/write access to ordinary files and folders; • read-only access to google docs, sheets, and slides; • multiple account support; • duplicate file handling; • access to trash; • storing Unix permissions and ownership; • support symbolic links; • streaming through read-ahead buffers.
Some problem was that the utility requires authorization using a browser in a graphical interface. Therefore, we used an alternative authorization mode. Since we already had our own OAuth2 client ID and client secret, we specified them in the command: google-drive-ocamlfuse -id 12345678.apps.googleusercontent.com -secret abcde12345 As the command tries to start the browser on the server where there is no GUI we formed the necessary URL as it is written in the documentation on Google Developers Console. After going to this address, we received a verification code. This code gave access to folder synchronization to the Google Drive.
For security reasons, we decided to sync not the secondary storage itself, but a copy of it from the backup drive (Backup_Secondary task, see figure 1). So, we first synchronized local folders with the command: rsync -azvh /export/secondary /export/sync_secondary/arch_cloud where /export/secondary -the secondary storage of Apache CloudStack infrastructure; /export/sync_secondary/arch_cloud -the local copy of this storage. To synchronize the /export/sync_secondary/arch_cloud folder, the following command has been added to the server task scheduler: google-drive-ocamlfuse /export/sync_secondary It runs every time a server with secondary storage is loaded. A backup of all databases is required to restore the Apache CloudStack cloud infrastructure. These are such databases: • Cloud. It contains all objects of cloud infrastructure. • Cloud_usage. A database that contains generalized data on resource consumption by the end user. It is used to obtain statistics and compile reports.
Since the backup of these databases is quite small, we decided to store all backups in the cloud storage (Backup_Database task, see figure 1). The traditional database for the Apache CloudStack platform is MySQL. The main utility for backing up MySQL databases is mysqldump. Its syntax involves entering a login name and password. Because the shell script in Linux is written as a plain-text file, it will contain the name of the user's password (usually the root) of the database. This is a potential security risk for the entire server. In order not to leave open the data for authorization of the database user, we used the "login path" option. A "login path" is an option group containing options that specify which MySQL server to connect to and which ac-count to authenticate as. To create or modify a login path file, we have used the mysql_config_editor utility. In general, the commands for creating and archiving a database dump are as follows: /usr/bin/mysqldump -login-path=DailyBackup -u root -A > $BACKUP_DIR/"archive_cloud_all_""$date_daily"".sql" tar -czf $BACKUP_DIR/"archive_cloud_all_""$date_daily"".sql.tgz" $BACKUP_DIR/"archive_cloud_all_""$date_daily"".sql" The variable $date_daily contains the current date of the archive. This allows you to see the date of archiving directly in the file name.
Performing backup of primary repositories (Backup_Primary task (250,251,252,253)) has some difficulties. An analysis of Internet sources, management server databases, and storages files showed that the Apache CloudStack platform does not typically use full copies of disk templates for each VM. This means that full backups should be made to reduce the risk of inconsistencies in primary repository archives.
Additionally, it would be good to prepare a cloud platform, stopping all VMs. Of course, students need to form an understanding of the need to turn off their own VM. However, in practice this is not always possible. Therefore, it is necessary to stop all VM programmatically, by means of a script. This can be done using the API features of the Apache CloudStack platform. Using API functions allows the developer to access data about cloud infrastructure objects. It is also possible to change the state of these objects. To generate a query that contains API functions, you must specify: • URL of the management server; • Service construct "api?". It contains the path to a certain API-function, and indicates the beginning of the parameters that are transmitted using the GET method. • Command. It is the name of the API-function. • ApiKey. The key, that can be generated for each user account. • Additional query options separated like GET queries using the "&" character. • Response format (JSON or XML). • Signature of the request.
Regardless of the protocol (HTTP or HTTPS) used to access the Apache Cloud-Stack API functions, the request must be signed. This allows the platform to confirm that the request was sent from a trusted accounting request that has the authority to execute the appropriate command. To sign a request, the developer must have an API key and an account secret key. They are generated by the platform administrator [27].
Here is our bash-script to stop all working users' VMs.
mysql --login-path=DailyBackup -D cloud -e "SELECT uuid FROM vm_instance WHERE type = \"User\" and state = \"running\";" > uuid.txt sed -i '1d' uuid.txt while read LINE; do php -q cloudstackapi.php "$LINE" ; done < uuid.txt In the first line we receive in a file from a database the list of user VM with a running state. The next command clears the first line because it does not contain a VM. The third line runs the cloudstackapi.php script. It generates a signature and calls the stopVirtualMachine API.
Another way to back up the current state of the VM is to create their snapshots. The Apache CloudStack platform provides 2 types of images [22]: • VM Snapshot -a hypervisor-driven point-in-time image of a virtual machine's disks. The exact mechanism of this is dependent on the hypervisor. • Volume snapshot -a point-in-time image of a specific volume. The process usually involves taking a VM snapshot and then copying the required volume to secondary storage and the deleting the VM snapshot.
This approach requires additional space on the secondary storage or data coping on the user's local disk. Such images can be taken by students from the web interface of the Apache CloudStack platform. Performing this action and turning off their own VMs after the end of their use are important components of ICT competence of the student.
However, experience shows that not all students perform these actions. Therefore, these are also worth automating with scripts. Among the API functions of the Cloud-Stack platform are relevant [19].
Another task of backing up our academic cloud is to estimate the time required to upload data to the cloud storage. Currently (October 2020) the sizes of our academic cloud storage is approximately as follows: • primary250 -120 Gb; • primary251 -80 Gb; • primary252 -140 Gb; • primary253 -80 Gb; • secondary -100 Gb.
Since we make a full copy of the primary storage, we need to download about 400 GB to the cloud storage each time. Let the speed of the Internet channel at night be 80 Mbps (10 Mbytes per second). Then it will take 11 hours to download 400*1024 MB. That's a lot. Therefore, we balanced Internet access through 2 providers. At the time of backup, our router routes hosts cloud0 and cloud1 through the first provider, and cloud2 and cloud3 through the second. In this case, a full backup takes about 5 hours and 30 minutes. This time is also significant, but is acceptable.
Another disadvantage of our scheme is the significant time required to download backups from the Google Drive service. However, this time will be significant if the management or storage servers fail. This means that we must back up the entire OS of the management server to fast local area network storage.

Conclusions
The private academic clouds should be used in cloud based learning environment, as they are necessary for education of future ICT specialists. Virtualization is one of the most up-to-date and advanced technologies for modelling many ICT objects. Despite the availability of educational grants from leading cloud vendors, many universities are deploying their own private academic clouds. During the production phase, administrators have a lot of work to do to maintain and support these academic clouds. Among these tasks, one of the most important is to ensure the productivity and elasticity of the cloud. Solving them will allow them to load the maximum number of VMs in the cloud infrastructure.
An important task in the maintenance of the academic private cloud is the backup of its components. To solve it effectively, you need to use different backup schemes such as full, incremental, differential. To save data, it is advisable to use both cloud and local storage. In any case, administrators should determine how long it will take to build and restore the entire cloud infrastructure. It is also advisable to use the API functions of the cloud platform. This will automate some maintenance tasks.
We see the prospects for our further research of our installation of a private academic cloud in the development of more efficient scripts based on a differential circuit. They should reduce the time it takes to create and copy all backups. According-ly, the time to recover data from it will be reduced. Also relevant the study of new versions of cloud platforms regarding the emergence of ready-made modules for backup. Probably, they will allow to solve many current problems.