A symbolic-arithmetic for teaching double-black node removal in red-black trees

A red-black (RB) tree is a data structure with red and black nodes coloration. The red and black color of nodes make up the principal component for balancing a RB tree. A balanced tree has an equal number of black nodes on any simple path. But when a black leaf node is deleted, a double-black (DB) node is formed, thus, causing a reduction in black heights and the tree becomes unbalanced. Rebalancing a RB tree with a DB node is a fairly complex process. Teaching and learning the removal of DB nodes is also challenging. This paper introduces a simplified novel method which is a symbolic-algebraic arithmetic procedure for the removal of DB nodes and the rebalancing of black heights in RB trees. This simplified approach has enhanced student learning of the DB node removal in RB trees. Feedback from students showed the learnability, workability and acceptance of the symbolic-algebraic method in balancing RB trees after a delete operation.


Introduction
Red-black (RB) trees are binary trees and by extension a binary search tree [7, 14,19].In a RB tree, the color of a node is either red or black.Any node, whether red or black, can be deleted due to data update or modification in the data structure.Fair enough, the deletion of a red node and tree rebalancing is straightforward as a deleted red node is replaced by a black node: an operation that we best explained as Red + Black = Black.But the deletion of a black node or the replacement node (that was moved from its position to the place of a deleted Red node) is where lies the complexity of recoloring and rebalancing of the RB tree.This paper considers a new approach for handling the complexity of recoloring and rebalancing in RB trees.Our approach is a step-by-step simulation of algebraic operation which involves the use of symbols: R (for red), B (for black), and DB (for double-black).Addition and subtraction of these symbolic colors are applied as the case may be in the balancing of the tree.
In classroom teaching and learning, the complexity of the red-black tree recoloration and rebalancing poses a challenge to both tutors and students alike [23].This is because: 1) the deletion of a black node reduces the number of black nodes along any simple path from the root node to any descendant leafnode; then, 2) a doubleblack node is created which replaces the deleted black node: this also explained as B + NULL-LEAF = DB; and 3) the double-black node is needed to be removed from the red-black tree.The removal of this double-black node is where lies the complexity.
The deletion of a black node and its replacement by a double-black, symbol-ically as, B + NULL-LEAF = DB reduces black heights in the tree.Black height refers to the number of black nodes on any path from the root node to any descendant leafnode.When black height is reduced and black nodes becomes unequal along any path in comparison to other paths, the RB tree becomes un-balanced.
That is, the tree requires a rebalance such that the DB node is removed from any path in the tree to have an equal number of black nodes.Double-black nodes have no place in RB trees.Yet when a black node is deleted, a DB node is formed.The conventional algorithm for removing DB nodes poses a strong challenge to students.While there are several scholarly works on node recoloring and balancing of the RB tree; not many works have addressed the problem of DB removal.This is the problem that this paper has addressed using symbolic arithmetic computation to remove the DB node and to present a new teaching approach to ease DB removal and tree rebalancing, subsequently.

Contributions
If there is an occurrence of a DB node as a result of a deleted node in a RB tree; how is the DB node removed?How are the nodes reassigned their colors after the removal of a DB node?From the foregoing the contribution of this paper are, namely, 1) to present a simplified symbolic-arithmetic procedure for teaching and learning DB node removal in computer science curriculum, 2) to demonstrate a symbolic-arithmetic approach for the recoloring and balancing of the RB tree, and 3) to project the supporting algorithm for the symbolic-algebraic arithmetic procedure for DB node removal and tree rebalancing.This article continues with section 2 as related works on the RB tree data structure; section 3 presents our symbolic-arithmetic methodology that is further simplified by an algorithm.Section 4 discusses our symbolic arithmetic operation using illustrative RB tree diagrams.In addition is students' feedback w.r.t.our symbolicalgebra method in comparison to the conventional RB tree DB removal algorithm.Section 5 is conclusions and further works.

Statement of the problem
The deletion of a black node in a RB tree causes a reduction in black heights, tree rotation and restructuring, as well as recoloring of nodes.This is a fairly complex and challenging algorithm to learn or teach in the process of balancing a RB tree.The statement of the problem thus states: There is a mathematically-based system of approach that can be used to teach and ease the learning of the conventional algorithm of double-black removal, and the subsequent recoloring of nodes and balancing of the RB tree.

Related works
A RB tree is one of several binary search trees.In data structures, every binary tree has their peculiar properties that defines, namely; its structure, height and balancing of the tree.One of the attributes of the RB tree is the "color" field in which every node in the tree is assigned a "color" which is either red or black [7,11,14] 2.1 Properties of red black trees a) Each node is either red or black.b) The root node is black.c) Each leaf node is black.d) The children of a red node are black.e) For each node, all simple paths from the node to descendent leaves contain the same number of nodes.f) Two consecutive nodes cannot be both red.g) A red-black tree is a binary search tree.
At the point when all of these requirements are met, a red-black tree is created.Furthermore, if the insert or delete operation (or method) is called for a given node, the structure of the tree and nodes' color also changes to reflect the requirements which helps the tree to rebalance itself [3].Based on node coloration; the subtrees (or children) of a black node can; i) both be blacks, ii) red and black, or iii) both reds (figure 1).But for a red node, its two children must always be black [23].This is because a red node must have a black parent as well as a black child node.Otherwise, the property that states "no two nodes that are connected side-by-side can be red" would be violated.
After an operational manipulation such as insertion, update or delete; data structures like the AVL and RB trees must be balanced [8].A RB tree is bal-anced if a simple path from the root node to every descendant leafnode has equal number of black nodes (figure 1).On the other hand; it is unbalanced if there are unequal number of black nodes on every simple path from the root to the leafnodes.The RB tree is a famous data structure for the storage and manage-ment of data.In the non-teaching fields, several studies e.g.[10,13,18,25] have been conducted on the impact analysis of RB trees on memory management and performance.Also [17] conducted a research on the application of RB trees in wireless sensor networks; and the use of RB trees for optimising costs in network trees with a prescribed algorithm [9].
The RB tree is an important topic in data structures in the study of computer science that students can find very challenging and difficult to learn [9,15,22].Fig. 1.A balanced red-black tree showing null leaves.In a Red-black visualizer [4].
As an abstract data type, that is aimed at understanding computer storage and software engineering; the operations performed on a RB are insertion, rotation, deletion, and recoloring of nodes.In looking out for new approaches to teaching RB trees, Wu et al. [22] proposed a customised platform for learning data struc-tures after stating series of steps to learn data structures with-ease but with no mention of topical areas in the subject.To improve the efficiency of teaching data structures, Seidametova [21] conducted experiments using different student groups e.g.Group1 vs. Group2 to compare the effectiveness of teaching strategy using visualization tool vs. flippedclassroom in the areas of Hashing and Trees (BST, RBT, AVL).The scores obtained were used to determine what group learned the most.King [12] reported how their university data structures & algorithm was redesigned to reflect experimental analysis.The revised curricu-lum and practice, thereafter, enabled students to include experimental analysis in their studies to connect computer science theory with software engineering practice.The report methodology was concluded with the use of Likert Rat-ing Scale for the collection of metrics.The report of Nipkow [16] presented a list of the most of the conventional topics in data structures & algorithms; and discussed that algorithms are logics.Thus, data structures course needs to be supported with critical computational thinking and formal proofs and logic.In [23] a top-down insertion method was projected to address the problem of single and double rotation through a granularity approach in order to balance the RB after insertion.The granularity approach prescribed a step-by-step selection of rules for students to follow and to balance the RB tree.Several approaches for addressing the teaching and/or learning of RB have been presented but most of which are in the rotation and recoloring of nodes e.g.[2,23] which did not make the problem any less.
Whereas, the insertion operation of a new node in a RB tree is the simpler of two hard problems: With the harder being the deletion of a black node or the replacement node [6] and the subsequent rebalancing of the tree.In order to address the removal of DB node, Zegour [24] used formalized statements in the description of nodes' color, recoloring and tree balancing without simplifying the challenges in students' learning of the DB removal algorithm.In [6] a parity-seeking delete algorithm was introduced with the goal similar to the aim of our paper: to introduce a pedagogically sound and easy way to understand the algorithm for RB tree deletion algorithm.The rationale which is to balance either by repairing a defective subtree also left a bit of complexities in understanding.The work of Sedgewick [20] presented 2-3 variants of the RB tree, called the Left-Leaning Red-Black trees (LLRB), and proposed concise number of deletion algorithms.However, the deletion algorithm of the LLRB is still complex and not suitable for education.Based on these complexities that is left with the deletion algorithm of the RB tree, the aim of this paper is not about insertion of node as several studies have been carried out on node insertion -which is pretty straight forward.But on DB node removal using basic arithmetic operations after a delete operation.On the deletion of a black node; the conventional DB removal algorithm is not deterministic in approach e.g.see relevant chapters in [7, 14] and visualization tools [4].Hence, the quest for a new mathematically-based algebraic model of operation.

Methodology
This paper presents a simple arithmetic process for removing the DB node and rebalancing of the RB tree.Our strategy is technically a simple addition or subtraction of the red R or black B color to/from an existing color of a given node; like (-) × (-) = (+) in an algebraic operation.Symbolically, we subtract a single black, -B, color from the two siblings of a node -of which one is a DB node -and then add a single black, +B, to the parent of the two siblings.Such that we have, -B(leftChild), -B(rightChild) and +B(parent); (figure 2).

The algebraic arithmetic of double-black node removal and tree balancing
The symbolic-algebraic arithmetic rules.Our symbolic-arithmetic opera-tion in the process of removal of a DB node in a RB tree after a delete operation on a black node so as to rebalance the black-height of the tree are given as: Eq. ( 6) The given operations imply the simple addition of red or black color to an existing node's color in order to obtain the resultant node color in the process of DB removal and black heights rebalancing of the tree.It should be noted that the color operands in the formulas above cannot be moved to the other side of the equal sign.Otherwise, conflicts may occur.
The change factor is the color added or subtracted to/from the original color.For example, if a node is initially black and an extra black color is added to it, we have as stated in Eq. (1): Fig. 3.An example of the change factor in equation (1).

The transmission rules of the change factor:
a) The path, or order, of the recoloring process travels from the DB node to its adjacent nodes; either along or against the directed edges to the deleted node.b) Start the symbolic-algebraic rule application at the level of the DB.c) The symbolic-algebraic rule application is always upward towards the root.d) The traversal stops when either of these condition is reached: A DB node is generated at the root node, or the tree is already balanced.

Delete operation and double-black node
In a RB tree, every simple path from the root node to any descendant leafnode must have equal number of black nodes.This means that the deletion of a red leafnode does not affect the number of black nodes along any simple path in a tree.There is however a problem, if an external black leafnode is deleted, or if an internal red node is deleted and then replaced by an external black leafnode.To correct a DB node occurrence, Besa and Eterovic [1] described it as the removal of a black unit from say node A and passing it to a different node B and if node B is black then it will have an extra black.A tree that has a DB node is not a RB tree [5].Thus, to remove DB nodes, Germane and Might [5] introduced the operation of DB node rotation.

Discussion
In this section we present our technical but simple algebraic rules for the removal of DB nodes that subsequently leads to the balancing of any RB tree as we recur up the tree from children to parent.A parent node has two sub-trees, namely, the left and right subtrees.

Recursive subtraction and addition of black color
Irrespective of whether a DB node is a left subtree or right subtree, our algebraic algorithm subtracts 1 black color i.e. -B from both subtrees; and add 1 black color +B to the parent node.Bottom-up the tree, and along the path-traversal of a given DB node towards the root of the tree; this process recursively continues if there is a re-occurrence of another DB node (by virtue of +B addition to a parent black node) unless the node is a root.

Deletion of a red leafnode.
As shown in figure 4 (a & b), the deletion of the leaf nodes 15 and 33 respectively is a simple case.This deletion operation does not in any way affect black heights in the tree nor has it made the tree unbalanced.However, as shown in the following section, a delete operation on an internal red node or on any black node causes an imbalance in RB trees.For our illustrations, we have used the following parameters to represent the nodes in any given subtree and their parent.Thus, P for the parent of double-black node U for the double-black, and S as sibling of the double-black node.
For discussion purposes, we shall use the notations, such as, B(P) for a black parent node, DB(U ) for the double-black node, and B(S) or R(S) for a black or red sibling of the DB node, respectively.In addition, the parameters P, U and S shall be substituted, where necessary, with their respective node values.It is important to note that DB nodes are NOT only produced from the deletion of black leaf nodes alone but are also formed in the position of a black leaf node that is replacing an internal red node that has been deleted (as explained in section "Deletion of an internal red node", figure 7).
Deletion of a black leafnode.Since the node 30 is a root node, and since every root node must be color black, then node 30 is black.Otherwise, we recur up the tree by reapplying the symbolic expression -B to both the DB node and its sibling s respectively, and +B to their parent p node as shown in figure 2.

Table 1.
Steps of balancing and color assignment for case "Deletion of a black leafnode"illustration 1.  Deletion of an internal red node.If the internal red node 25 is deleted, the value of this deleted position is replaced by the value of left child 15 which is still keeping the red color of the deleted node; but causing a DB node in the position of the replacement node.At this point in time, structurally, this is equivalent to deleting a black left child 15 and then producing a DB node.In both scenarios, our symbolicalgebraic arithmetic holds: -B(U ), -B(S) and +B(P).Therefore, the problem turns out to be the case in section "Deletion of a black leafnode" Illustration 2 above.Deletion of an internal black node.According to the binary search tree deletion rule, the deleted node 10 is replaced by its rightmost child of the left subtree -in this case it is the left child 5 -which is the only subtree.As shown in figure 8, the deletion of the internal black node 10 is structurally the same as deleting a red leaf node on the ground that there is no DB formation, which is a simple case rule B + R = B (Eq. ( 4)).Without a DB formation, our symbolic-algebraic color rule B + R = B only applies to the deleted node 10 in this case, and the replacement node 5 becomes NULL LEAF.Thus, the problem turns out to be the situation in section"Deletion of a red leafnode".

Steps
Deletion of a root node.By the deletion rule of the binary search tree, as applied in figure 8, the deleted node 20 is replaced by its rightmost (largest) node of the left subtree node 15 with its color unchanged.The reason is not just only because node 20 is the root but also by applying our color rule B + R = B.As shown in figure 9, the deletion of the root node 20 is structurally the same as deleting a red leaf node, which is again a simple case rule B + R = B. Therefore, the problem again turns to the situation in section "Deletion of a red leafnode".root node, which is always black, yet we can apply the our symbolic-algebraic rule.Firstly, B(root) + B = DB(root) after the delete operation on node 15 (Eq.( 1)).Secondly, applying (Eq.( 2)) we can arrive at a black root node from DB(root) -B = B.Here we prove the application of the symbolic-algebraic rules as given in table 4 in the process of balancing the tree.At this point, structurally, it is equivalent to deleting a black left child to produce a double-black NULL LEAF.
To apply our rules, we call the DB(U ) -B = B; B(30) -B = R; and B(10) + B = DB (10) respectively.Of course as a root node; DB(10) becomes black.The problem is similar to the situation in section "Deletion of a black leafnode" Illustration 2. Except that here in table 4 (Steps 3 & 4), we are comparing the use of two steps that involves the color operations +R and its inverse -R as against one step of figure 5 (Step 3) which is B -B = R. So, our take here is to de-emphasize the use of any additional steps as this would incur overhead.Table 5 reduces this overhead and it is more efficient to apply B -B = R which is a lesser step.This is the algorithmic representation of symbolic-algebraic operation.Using firstorder logic (FOL), the FOL notations are depicted as follows: B(s(x)) and R(s(x)) as the black and red sibling of the deleted node; and B(p(x)) and R(p(x)) as the black and red parent of the deleted node, respectively; while DB(x) is the double-black node.As we recur up the tree, a new DB is created, and the new DB gets a new sibling and parent.

Feedback from the symbolic algebraic teaching approach
For over three academic years this symbolic-algebraic (SA) method have been devised to assist students with the learning of double-black (DB) removal and RB tree rebalancing.In and out of class; two fundamental questions that were put before students are stated in the following subsection in Question 1 and 2. Below each of the questions are the tables 6 and 7, respectively, which depicts students' feedback as well as our analysis of these feedbacks.In the feedback analysis we have used the acronym, SA = symbolic-algebra, and TA = traditional (conventional) algorithm.

Question 1: What is your opinion on this new symbolic method in the understanding of the removal of DB and balancing of the tree?
Table 6.Feedback on the understanding of the symbolic-algebraic method.

Student feedback
Analysis For me, the new [symbolic] algebra (SA) algorithm is easier to understand compared to conventional approach, with less classifica-tions and more symbolic visual aids.But it still requires practicing to get the hang of it.* SA approach is simpler.* SA is visual and animated in process.* SA has less classification.* SA is needed to be learned too.

Symbolic algebraic application to tree bal-ance after rotations and node coloration has a fixed pattern to follow which is lacking in the traditional algorithm.
* SA provides a fixed length of steps.

The symbols method provides better understanding for node coloration. It explains what node needs a color change, why it needs to change and how it will change.
* SA gives clearer understanding.* SA gives the step and the exact node to color.

Question 2: What are the perceived differences to other approaches in literature and the original DB removal algorithm?
Table 7. Feedback on the conventional RB tree algorithm vs. the symbolic-algebraic method.

Student feedback
Analysis This [symbolic approach] is definitely sim-pler for me because it simplifies the process inherent in a traditional algorithm (TA).Both the traditional red-black tree algorithm and the new symbolic approach algorithm involves remembering several conditions for deletion and rotation before carrying out cor-responding operations.In this case [symbolic approach], less is definitely better, because it's not as complicated as it used to be.

The deletion operations of traditional red-black
[algorithm] are badly explained in many resources I read including my textbook.These operations are not obvious to under-stand and confuse many students when they first see it.The new symbolic method pro-vides steps to memorize the operations and makes removal of DB easier to learn.* TA is not clear to understand.* SA is a step-by-step operation.* SA makes DB removal easier to memorize and learn.

Research findings
Therefore, from our methodology, illustrations, and application of the symbolic-algebra operation; we affirm to the statement of problem in section 1.2 that there is a mathematically-based symbolic-algebraic method that has eased the learning and teaching of the double-black node removal, coloring and balancing of RB trees.https://doi.org/10.31812/educdim.7629 The deletion of a black node or the deletion of an internal red node and its subse-quent replacement by a black node in red-black (RB) trees causes a double-black (DB) node formation.In a RB tree, the DB node has no place.The challenges faced by students in the removal of the DB node and subsequent rebalancing of the tree with the allowed red and black node colors assignment to node is a fairly complex.After several years of teaching and research, this paper has addressed this difficulty in the teaching and learning of the DB node removal, and nodes recoloring by projecting a simplified and systematic algebraic-arithmetic proce-dure for both the removal of the DB node, and the recoloring of its siblings and adjacent (parent) nodes, subsequently.Our procedure has showed that there is a mathematically-based technique to ease the learning of DB removal in RB trees.This procedure which is a bottom-up approach that starts from the point of the deleted black node or the replacement node.This research work has stated six symbolic-algebraic formulas and demonstrated with illustrations the symbolic addition of a color and its inverse, +B and -B; and the symbolic addition and inverse of the color red +R and -R, respectively.This was then supported by an operational algorithm after testing with several RB tree structures.The illus-trations and testing showed that the symbolic-algebraic method conforms with, firstly; the conventional removal of DB node and node recoloring algorithm in RB tree visualization tools; and secondly, supported students' in-depth learning and understanding of DB removal.The next stage of this work is to continue the research on the use of the symbolic-arithmetic formulas on RB trees that involves node rotation with different cases, and to look at the computational time of symbolic-algebraic algorithm relative to the conventional algorithm for DB node removal and tree balancing.

Fig. 2 .
Fig. 2. Symbolic subtraction of the color black (represented as -B) from two siblings and the addition of color +B to their parent v.

Fig. 5 .
Fig. 5. (a, b, c, d): Deletion of black node 35 which left a double-black and black height rebalancing.

Fig. 6 .
Fig. 6. (a, b, c): Deletion of black node 20 which left a double-black and black-height rebalancing.

Fig. 7 .
Fig. 7. (a, b, c): Deletion of red node 25 and replacement by node 15 left a double-black in the position of node 15 and subsequent black-height rebalancing.

Illustration 1 :Fig. 9 . 2 :Fig. 10 .
Fig. 9. Deletion of the root node 20; and replacement by red node 15 which is the rightmost of the root's left subtree.

Table 2 .
Steps of balancing and color assignment for case "Deletion of a black leafnode"illustration 2.

Table 3 .
Steps of Balancing and Color Assignment for case "Deletion of an internal red node".

Table 4 .
Steps of balancing and color assignment for case "Deletion of a root node"illustration 2.

Table 5 .
A more efficient steps to balancing black height and color assignment for case"Deletion of a root node" illustration 2.