What is Unsupervised Classification?

What is Unsupervised Classification?

The technique of extracting information from satellite images is known as image categorization. Classifying every pixel in a digital image into one of several land use classes or themes is the aim of image classification. The vast amount of digital data that can be extracted from satellite photography needs to be processed in a way that is appropriate for the final user. This phase for many projects involves classifying the land according to its several use functions. There are two categories based on how the interpreter and computer interact throughout the classification process. Supervised and unsupervised classification procedures are the two primary categories that are utilized to produce categorized output.

An efficient technique for dividing remote sensor image data in multispectral feature space and obtaining land-cover information is supervised classification, often known as clustering.

Unsupervised classification typically only needs a small quantity of initial analyst input in comparison to supervised classification.

This is due to the fact that clustering typically does not require training data.

The procedure where numerical operations are conducted that search for natural groups of the spectral qualities of pixels, as studied in multispectral feature space.

A classification map with m spectral classes is the end product of the clustering procedure. The analyst then makes an effort to allocate or convert the spectral classes into relevant theme information classes (for example, agricultural, forest) after the fact.

Unsupervised categorization typically produces more categories than the user is interested in because the software does the majority of the processing on its own. This is the point where the user has to make decisions on which categories can be grouped together into a single land use category. To ascertain which approach is more appropriate for a particular circumstance, more image processing may be applied in either scenario. It must be kept in mind that maps are rudimentary attempts to portray what truly exists in the world and are never totally accurate.

The Landsat 5 Thematic Mapper (TM) scene is the image used to illustrate the point. This image’s ground resolution is 30 meters. Seven distinct bandwidths are used by Landsat TM to record data. These bandwidths encompass portions of the electromagnetic spectrum’s visible, infrared, and thermal infrared regions.

This composition comes from the same Landsat scene that is displayed using bands 3, 4, and 5 (shown in BGR, respectively). Most of the grassland in this scene is displayed with a combination of the green and pink colors, and some of the agricultural fields, which still have a significant amount of bare soil, are displayed in a pink hue.

The patterns of land use can be clearly observed by looking at the second composition. Along the Kansas River at the bottom of the picture is a sizable area of exposed earth. Tilled fields can also be seen all throughout the picture, with the northwest corner showing the highest density. It is possible to identify many tiny lakes and ponds (dark blue) from the larger image. The grassland areas are home to the majority of these little ponds. Cattle use these ponds, which are probably found in pastures, as watering holes. The woodland regions, Perry Reservoir’s dam, and its outflow are further significant land features.

The man-made dam at Perry Reservoir’s south end releases water into the Kansas River, while the forest stretches outward from the reservoir and follows tiny drainages. Several attempts were made to categorize the land uses into distinct groups in order to transform this image into a more useful format.

Concept of unsupervised classification

Unsupervised classification algorithms were used in an effort to categorize the different land uses in Idrisi. The user is not required to provide any information regarding the features present in the images while using unsupervised classification techniques. The ISOCLUST module in Idrisi was used to carry out this example. The user only needs to specify which bands Idrisi should employ to generate the classifications and how many classes to assign to the land cover features when using ISOCLUST.

It is now challenging to understand the image. Converting spectral groups into feature or theme classes requires decisions based on the image and categorized output. Additional resources and local expertise are helpful in making these choices. This work is made more accurate and efficient by ground truthing what is seen in the digital image with what was actually there when the image was taken. In the absence of this information, the different categories may be grouped together into land use categories using scientific reasoning. As seen in the illustration, six of the sixteen land cover categories were distinguished in the demonstration example.

Two main approaches to unsupervised classification exist.

Clustering

  • 1. K-means clustering
  • 2. Isodata clustering

1.  K-means: One of the most straightforward unsupervised learning algorithms for resolving the well-known clustering problem is K-means. The process uses a predetermined number of clusters (let’s say k clusters) to categorize a given data set in a straightforward and uncomplicated manner. Determining k centroids—one for each cluster—is the key concept. These centroids should be strategically positioned because different locations yield varied outcomes. Placing them as far apart as feasible is hence the preferred option. The next step is to associate each point in a given data set with the closest centroid.When no point is pending, the first step is completed and an early grouping is done. The k new centroids that arise from the preceding step must now be recalculated as the barycenters of the clusters. A new binding between the same data set points and the closest new centroid must be made once we have these k new centroids. A loop has been generated. We might see that this loop causes the k centroids to gradually shift positions until no more adjustments are made. Stated differently, centroids no longer move.

The distance of n data points from their respective cluster centers is indicated by the specified distance measure between a data point and the cluster center.

The following steps make up the algorithm:

  1. In the space that the objects being clustered represent, place K points. The first group centroids are represented by these points.
  2. Each object should be assigned to the group with the closest centroid.
  3. Once every object has been assigned, recalculate the K centroids’ locations.
  4. Steps 2 and 3 should be repeated until the centroids stop moving. This results in the items being divided into groups, from which the metric that needs to be minimized can be computed.

The k-means algorithm does not always identify the most ideal configuration, which corresponds to the global objective function minimum, even if it can be demonstrated that the process will always end. Additionally, the procedure is highly susceptible to the initial cluster centers that are chosen at random. To lessen this effect, the k-means algorithm might be executed several times. K-means is a straightforward technique that has been modified for use in a variety of issue areas.

2. ISODATA Clustering:An extensive collection of heuristic processes integrated into an iterative classification method is known as the Iterative Self-Organizing Data Analysis Technique (ISODATA). A large number of the algorithm’s phases are the product of testing and experience.
An adaptation of the k-means clustering algorithm, the ISODATA algorithm consists of the following steps: a) merging clusters if their multispectral feature space separation distance is less than a user-specified threshold, and

b) guidelines for dividing a single cluster into two.

SODATA is iterative because it makes a large number of passes through the remote sensing dataset until specified results are obtained, instead of just two passes.

•  ISODATA does not allocate its initial mean vectors based on the analysis of pixels in the first

in the same manner as the two-pass chain method. Instead, all Cmax clusters are initially assigned arbitrarily along an n-dimensional vector that passes between extremely precise locations in feature space. The mean and standard deviation of each band in the analysis are used to define the region in feature space. By automatically seeding the original Cmax vectors, this technique ensures that the cluster building process is not skewed by the first few lines of data.
Due to its low human input requirements, ISODATA is self-organizing. The analyst must typically define the following requirements in order to use typical ISODATA algorithms:

Cmax: the most clusters that the algorithm can find (for example, 20 clusters). However, after splitting and merging, it is not unusual to find fewer on the final classification map.
• T: the highest proportion of pixels that can have their class values remain constant over iterations. The ISODATA algorithm stops when this figure is reached. It’s possible that some datasets will never remain at the target percentage. Processing must be stopped and the parameter edited if this occurs.

M: The maximum number of times ISODATA can recalculate cluster mean vectors and identify pixels. When this figure is attained, the ISODATA algorithm stops. A cluster’s minimum number of members (%): A cluster is eliminated and its members are moved to another cluster if it has fewer members than the required percentage. A class’s likelihood of being divided is likewise impacted by this (see maximum standard deviation). Frequently, 0.01 is specified as the default minimum percentage of members.
Maximum standard deviation (σmax): A cluster is divided into two clusters when its standard deviation is higher than the designated maximum standard deviation and the class’s membership is more than twice the minimum number required. The average vectors The former class centers ±1 of standard deviation (σ) are the mean vectors for the two new clusters. Typically, maximum standard deviation values fall between 4.5 and 7.
Split separation value: This value replaces the standard deviation in figuring out the locations of the new mean vectors plus and minus the split separation value if it deviates from 0.0.
Cluster means’ minimum separation (C): Clusters are combined if their weighted distance is less than this threshold. 3.0 is frequently chosen as the default.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *