Binarization

MaheswaraReddy
3 min readMar 13, 2021

--

Binarization is used to convert numerical feature vector to a binary vector. Binarization is a operation on count data, in which data scientist can decide to consider only the presence or absence of a characteristic rather than a quantified number of occurrences. Otherwise, it can be used as a pre-processing step for estimators that consider random Boolean values. It consists if-else condition in raw implementation.

suppose if we have data,

data = [ [3, -0.5, 2, 1],

[2.2, 3, 0, 1.4],

[3.1, 1.5, 0, 1] ]

If we apply binarizer on this with the threshold of 1.5. The matrix will be converted as.

[ [1, 0, 1, 0],

[1, 1, 0, 0],

[1, 1, 0, 1] ]

Let me take some of the values and explain Binarization.

data[row 0, column 0] = 3, which is greater than threshold value that is 1.5, so it will be converted as 1

data[row 0, column 1] = -0.5, which is less than threshold value that is 1.5, so it will be converted as 0

data[row 0, column 2] = 2, which is greater than threshold value that is 1.5, so it will be converted as 1

data[row 0, column 3] = 1, which is less than threshold value that is 1.5, so it will be converted as 0

data[row 2, column 1] = 1.5, which is equal to threshold value that is 1.5, so it will be converted as 0.

Same we can see using python.

data = [ [3, -0.5, 2, 1],

[2.2, 3, 0, 1.4],

[3.1, 1.5, 0, 1] ]

Sklearn.preprocessing.binarizer() is a method belongs to pre-processing module.

sklearn.preprocessing.Binarizer(threshold)

where threshold is a floating value. Values less than or equal to threshold are considered as 0 and above threshold are considered as 1. Default threshold value is 0.

data_binarized = preprocessing.Binarizer(threshold=1.4).transform(data)

print (“\nBinarized data =”, data_binarized )

[ [1, 0, 1, 0],

[1, 1, 0, 0],

[1, 1, 0, 1] ]

Most of the binarization id used in image processing.

IMAGE BINARIZATION

Image binarization is the process of taking a grayscale image and converting it to black-and-white, essentially reducing the information contained within the image from 256 shades of gray to 2: black and white, a binary image. This is also known as image thresholding.

The threshold value will be calculated with different statistical techniques like mean, median, entropy or may be with histograms. Otsu algorithm is best algorithm to identify the threshold, which uses variances techniques to identify. There are hundreds of algorithms are there to find threshold But any how we will discuss this threshold values on image processing in later section of my blog page.

The below code and images are taken from OpenCV original site. Please find them below. In the below case 127 is considered as Threshold.

Code:

import cv2

import numpy as np

from matplotlib import pyplot as plt

img = cv2.imread(‘gradient.png’,0)

ret,thresh1 = cv2.threshold(img,127,255,cv2.THRESH_BINARY)

ret,thresh2 = cv2.threshold(img,127,255,cv2.THRESH_BINARY_INV)

ret,thresh3 = cv2.threshold(img,127,255,cv2.THRESH_TRUNC)

ret,thresh4 = cv2.threshold(img,127,255,cv2.THRESH_TOZERO)

ret,thresh5 = cv2.threshold(img,127,255,cv2.THRESH_TOZERO_INV)

titles = [‘Original Image’,’BINARY’,’BINARY_INV’,’TRUNC’,’TOZERO’,’TOZERO_INV’]

images = [img, thresh1, thresh2, thresh3, thresh4, thresh5]

for i in xrange(6):

plt.subplot(2,3,i+1),plt.imshow(images[i],’gray’)

plt.title(titles[i])

plt.xticks([]),plt.yticks([])

plt.show()

Thanks for reading my page. Appreciate your claps.

--

--

No responses yet