An unordered collection of elements is sorted by placing them in a monotonically ascending (or descending) order. The efficiency of sorting large data sets, both in terms of time and memory used, requires advances in sorting algorithms and their implementations in modern computing. Machine learning models could be used to improve these sorting algorithms. By analyzing experimental data, machine learning enables the creation of adaptable algorithms. In order to choose an algorithm based on the properties of the dataset, this article reviews evolutionary algorithms. Here are the topics to discuss.

## Contents

1. Sorting in machine learning
2. Genetic algorithms used to optimize sorting
3. Benefits of Using Genetic Algorithm

Since the dawn of computers, sorting has attracted a lot of attention as a fundamental data activity. Let’s understand sorting algorithms.

## Sorting in machine learning

A sort operation consists of placing data in a specific order. The sorting algorithm defines the technique for sorting data into a given order. Searching for data can be very efficient when the data is kept in a sorted fashion, which is why sorting is important. Representing data in a more understandable way is another use of sorting.

Data sorting algorithms may require a bit more room for comparison and short-term storage of a few data components. These algorithms are supposed to sort in place, for example, in the array itself, and they no longer take up space. This is called on-site sorting. An illustration of in-place sorting is bubble sorting. But for some sorting algorithms, the amount of space used by the program is greater than or equal to the number of elements to be sorted. Out-of-place sorting is defined as sorting with an equal or greater space requirement. An illustration of out-of-place sorting is merge sort.

Based on the effects after sorting the data, these sorting algorithms are classified as stable and unstable. For example, when we want to maintain the sequence of elements in a tuple, the stability of the algorithm is important.

If a sorting algorithm uses previously “sorted” elements in the list to be sorted, it is said to be adaptive. In other words, when sorting, adaptive algorithms will aim to avoid reordering items if the source list already has a sorted part of them. An algorithm that ignores previously sorted elements is said to be non-adaptive. To make sure the items are sorted correctly, they attempt to push each item into a new order.

### Why is optimization necessary?

Although program optimization has been largely automated by compiler technology, a great deal of human involvement is still required to produce high-quality code. There are two justifications for this assertion:

• Inconsistent compiler implementations.
• Traditional compilers do not contain semantic information, which limits their ability to modify data.

Are you looking for a comprehensive repository of Python libraries used in data science, check here.

## Genetic algorithms used to optimize sorting

The optimal algorithm can only be found by searching because there are no analytical models of the performance of sorting algorithms in terms of machine architectural parameters. Additionally, previous research on sorting complexity was based on the mistaken assumption that accessing each room takes the same amount of time given modern technology.

In a genetic algorithm, the values ​​of the parameters of the sorting and selection primitives, as well as their composition, determine the search space. The purpose of the search is to find the hierarchical sort best suited to the architectural characteristics of the machine and to the characteristics of the input game.

An analogy between the genetic structure and the behavior of chromosomes in the population serves as the basis for genetic algorithms. The basis for GAs based on this comparison is as follows.

• Members of the population compete for resources and mate.
• The successive individuals (the fittest) then mate to have more children than the other individuals.
• Parents sometimes have children who are better than either parent; this is because the genes of the “fittest” parents have been passed down through the generation.
• As a result, each subsequent generation becomes more environmentally friendly.

Sort and select primitives are used as tree nodes in the diagram. To create new children and modify the population, genetic operators are used. The two operations that the majority of genetic algorithms employ are crossover and mutation.

### crossing

Subtrees of various trees are exchanged during crossover. Crossover’s goal is to produce new offspring that perform better than its parents. When new children inherit the best subtrees from parents, this is likely to happen. In most cases, a single-point crossover is employed, with the crossover point being chosen at random.

### Mutation

This operator performs adjustments on a single tree. It gives the population variance. The population cannot continue to be the same after a given generation due to mutation. This strategy allows the search to partially escape local optima. In an effort to identify better values, the mutation changes the parameter values. The following modifications are possible with the mutation operator:

• Randomly change parameter values ​​when sorting and choosing base nodes.
• Switch two subtrees.
• Including a new subtree.
• Output a subtree. With this technique, unnecessary subtrees can be eliminated.

### fitness function

The probability of an individual reproducing is determined by the fitness function. The likelihood of an organism reproducing and evolving increases with fitness. The fitness function will be performance. But when designing the fitness feature, the following two factors were considered:

• A sorting algorithm that works well with all potential inputs is the goal. Therefore, the basic fitness of a tree is its average performance. A penalty is applied to variable-performance trees by increasing the base fitness by a number based on the standard deviation of their performance when sorting test inputs; however, the sorting algorithm should work consistently across all inputs is also a goal.
• The fitness variance of the population is significant in the first generations. In other words, some sort trees do much better than others. Since these few trees would have a considerably higher probability of reproducing, most children would be their offspring if our fitness function were directly proportional to the performance of the tree. These descendants will eventually make up the majority of the population. Premature convergence could result, which would prevent the algorithm from studying parts of the search space outside the neighborhood of highly fit trees. To solve this problem, our fitness function takes advantage of the performance order or rank of the population sorting trees. The absolute performance difference between trees is ignored when applying performance ranking, therefore trees with lower performance have a higher probability of reproducing than trees with higher performance. This eliminates the problem of early convergence and convergence to a local optimum.

The use of genetic algorithms over other optimization methods has certain advantages.

• Sorting algorithms can be expressed as a tree using primitives, each primitive representing a node. As a result, genetic algorithms can simply be used to explore the universe of potential trees for the best tree structure and the best parameter values ​​associated with each node.
• Genetic algorithms maintain the finest subtrees and increase their chances of reproduction. Because a subtree is also a sorting algorithm, sorting algorithms can make use of it.

A genetic algorithm is a promising problem-solving tool, but it has some flaws that could lead to inefficiency.

• Difficult to fine-tune the settings. This requires the determination of many parameters such as population size, mutation rate and maximum execution time, as well as the creation of selection, recombination and mutation algorithms. Finding viable alternatives for these is a difficult task with little or no theoretical support.
• There is no guarantee that convergence will occur. There is no guarantee that the algorithm will find a global optimum. It is possible that it will be taken in one of the local optima.

## Conclusion

Choosing an appropriate evolution algorithm is a critical decision. The evolution algorithm decides how many children will be produced, how many members of the current generation will be replaced, etc. With this article, we understood the optimization requirement in sorting and the use of GA to optimize sorting algorithms.

## References

Share. 