در این صفحه آخرین دستاوردهای علمی-پژوهشی اعضای انجمن بینایی ماشین و پردازش تصویر ایران اطلاع رسانی می‌گردد. بدین‌وسیله از اعضای محترم دعوت به عمل می‌آید لینک آخرین انتشارات علمی پژوهشی خود را به آدرس info@ismvip.ir با عنوان «دستاورد پژوهشی» ارسال فرمایند. با تشکر

Adaptive exploitation of pre-trained deep convolutional neural networks for robust visual tracking
Due to the automatic feature extraction procedure via multi-layer nonlinear transformations, the deep learning-based visual trackers have recently achieved a great success in challenging scenarios for visual tracking purposes. Although many of those trackers utilize the feature maps from pre-trained convolutional neural networks (CNNs), the effects of selecting different models and exploiting various combinations of their feature maps are still not compared completely. To the best of our knowledge, all those methods use a fixed number of convolutional feature maps without considering the scene attributes (e.g., occlusion, deformation, and fast motion) that might occur during tracking. As a pre-requisition, this paper proposes adaptive discriminative correlation filters (DCF) based on the methods that can exploit CNN models with different topologies. First, the paper provides a comprehensive analysis of four commonly used CNN models to determine the best feature maps of each model. Second, with the aid of analysis results as attribute dictionaries, an adaptive exploitation of deep features is proposed to improve the accuracy and robustness of visual trackers regarding video characteristics. Third, the generalization of proposed method is validated on various tracking datasets as well as CNN models with similar architectures. Finally, extensive experimental results demonstrate the effectiveness of proposed adaptive method compared with the state-of-the-art visual tracking methods.

Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei
https://doi.org/10.1007/s11042-020-10382-x
Facial expression recognition using a combination of enhanced local binary pattern and pyramid histogram of oriented gradients features extraction
Automatic facial expression recognition, which has many applications such as drivers, patients, and criminals’ emotions recognition, is a challenging task. This is due to the variety of individuals and facial expression variability in different conditions, for instance, gender, race, colour and changing illumination. In addition, there are many regions in a face image such as forehead, mouth, eyes, eyebrows, nose, cheeks and chin, and extracting features of all these regions are expensive in terms of computational time. Each of the six basic emotions of anger, disgust, fear, happiness, sadness and surprise affect some regions more than the other regions. The goal of this study is to evaluate the performance of enhanced local binary pattern, pyramid histogram of oriented gradients feature‐extraction algorithms and their combination in terms of recognition accuracy, feature vector length and computational time on one, two and three combined regions of a face image. Our experimental results show that the combination of both feature‐extraction algorithms yields an average recognition accuracy of 95.33% using three regions, that is, the mouth, nose and eyes on Cohn–Kanade dataset. Besides, the mouth region is the most important part in terms of accuracy in comparison to eyes, nose and combination of both eyes and nose regions.

Maede Sharifnejad, Asadollah Shahbahrami, Alireza Akoushideh, Reza Zare Hassanpour
https://doi.org/10.1049/ipr2.12037
CapsNet Regularization and its Conjugation with ResNet for Signature Identification
We propose a new regularization term for CapsNet that significantly improves the generalization power of the original method from small training data while requiring much fewer parameters, making it suitable for large input images. We also propose a very efficient DNN architecture that integrates CapsNet with ResNet to obtain the advantages of the two architectures. CapsNet allows a powerful understanding of the objects’ components and their positions, while ResNet provides efficient feature extraction and description. Our approach is general, and we demonstrate it on the problem of signature identification from images. To show our approach superiority, we provide several evaluations with different protocols. We also show that our approach outperforms the state-of-the-art on this problem with thorough experiments on three publicly available datasets CEDAR, MCYT, and UTSig.

Mahdi Jampour, SaeidAbbaasi, Malihe Javidi
https://doi.org/10.1016/j.patcog.2021.107851
Electrical fault detection in three-phase induction motor using deep network-based features of thermograms
In this paper, an automatic method is proposed for detecting the operating faults in three-phase induction motors based on thermal images. If these faults are not detected or fixed on time, they can lead to permanent motor failure. This is why non-invasive and non-destructive experiments are significantly considered. In this paper, first, the region of interest is detected in the thermograms using SIFT-based key-points matching. Then, these images are transformed into representative feature vectors based on a pre-trained convolutional neural network. Then, the training vector samples are clustered into cold and hot clusters by K-means. For each cluster, an SVM-based classifier is trained. The test feature vector samples are clustered and mapped into classes using the corresponding trained SVM-based classifiers. Evaluating the proposed method on the datasets including real thermal images, shows that this algorithm can detect 100% of the faults of the induction motor.

Majid Khanjani, Mehdi Ezoji
https://doi.org/10.1016/j.measurement.2020.108622
Architecture to improve the accuracy of automatic image annotation systems
Automatic image annotation (AIA) is an image retrieval mechanism to extract relative semantic tags from visual content. So far, the improvement of accuracy in newly developed such methods have been about 1 or 2% in the F1-score and the architectures seem to have room for improvement. Therefore, the authors designed a more detailed architecture for AIA and suggested new algorithms for its main parts. The proposed architecture has three main parts: feature extraction, learning, and annotation. They designed a novel learning method using machine learning and probability bases. In the annotation part, they suggest a novel method that gains the maximum benefit from the learning part. The combination of the proposed architecture, algorithms, and novel ideas resulted in new accuracy milestones in F1-score on most commonly used datasets. In their architecture, N+ measure which shows the number of tags with non-zero recalls showed that they could recall all tags for IAPRTC-12 and ESP-Games datasets.

Artin Ghostan Khatchatoorian, Mansour Jamzad
https://doi.org/10.1049/iet-cvi.2019.0500
A deep learning framework for Text-independent Writer Identification
Handwriting Writer Identification (HWI) refers to the process of handwriting text image analysis to identify the authorship of the documents. It has yielded promising results in various applications, including digital forensics, criminal purposes, exploring the writer of historical documents, etc. The complexity of the text image, especially in images with various handwriting makes the writer identification difficult. In this work, we propose an end-to-end system that relies on a straightforward yet well-designed deep network and very efficient feature extraction, emphasizing feature engineering. Our system is an extended version of ResNet by conjugating deep residual networks and a new traditional yet high-quality handwriting descriptor towards handwriting analysis. Our descriptor analyzes the handwriting thickness as a preliminary and essential feature for human handwriting characteristics. Our approach can also provide text-independent writer identification that we do not need to have the same handwriting content for learning our model. The proposed approach is evaluated and achieved consistent results on four public and well-known datasets of IAM, Firemaker, CVL, and CERUG-EN. We empirically demonstrate that our conjugated network outperforms the original ResNet, and it can work well for real-world applications in which patches with few letters exist.

Malihe Javidi, Mahdi. Jampour
https://doi.org/10.1016/j.engappai.2020.103912
Effectiveness of “rescue saccades” on the accuracy of tracking multiple moving targets: An eye-tracking study on the effects of target occlusions
Occlusion is one of the main challenges in tracking multiple moving objects. In almost all real-world scenarios, a moving object or a stationary obstacle occludes targets partially or completely for a short or long time during their movement. A previous study (Zelinsky & Todor, 2010) reported that subjects make timely saccades toward the object in danger of being occluded. Observers make these so-called “rescue saccades” to prevent target swapping. In this study, we examined whether these saccades are helpful. To this aim, we used as the stimuli recorded videos from natural movement of zebrafish larvae swimming freely in a circular container. We considered two main types of occlusion: object-object occlusions that naturally exist in the videos, and object-occluder occlusions created by adding a stationary doughnut-shape occluder in some videos. Four different scenarios were studied: (1) no occlusions, (2) only object-object occlusions, (3) only object-occluder occlusion, or (4) both object-object and object-occluder occlusions. For each condition, two set sizes (two and four) were applied. Participants’ eye movements were recorded during tracking, and rescue saccades were extracted afterward. The results showed that rescue saccades are helpful in handling object-object occlusions but had no reliable effect on tracking through object-occluder occlusions. The presence of occlusions generally increased visual sampling of the scenes; nevertheless, tracking accuracy declined due to occlusion.

Shiva Kamkar, Hamid Abrishami Moghaddam, Reza Lashgari, Lauri Oksama, Jie Li, Jukka Hyönä
https://jov.arvojournals.org/article.aspx?articleid=2771970
Histogram modification based enhancement along with contrast-changed image quality assessment
Contrast is the difference in visual characteristics which make an object more recognizable. Despite the significance of contrast enhancement (CE) in image processing applications, few attempts have been made on assessment of the contrast change. In this paper, a visual information fidelity-based contrast change metric (VIF-CCM) is presented which includes visual information fidelity (VIF), local entropy, correlation coefficient, and mean intensity measures. The validation results of the presented VIF-CCM show its efficiency and superiority over the state-of–the-arts image quality assessment metrics. A histogram modification based contrast enhancement (HMCE) method is also proposed in this paper. The proposed HMCE comprises of four steps: segmentation of the input image, employing a set of weighting constraints, applying the combination of adaptive gamma correction and equalization on modified histogram, and optimization the value of the constraint weights by PSO algorithm. Experimental results demonstrate that the proposed HMCE outperforms other existing CE methods subjectively and objectively.

Ayub Shokrollahi, Babak Mazloom-Nezhad Maybodi, Ahmad Mahmoudi-Aznaveh
https://link.springer.com/article/10.1007%2Fs11042-020-08830-9
Human mental search-based multilevel thresholding for image segmentation
Multilevel thresholding is one of the principal methods of image segmentation. These methods enjoy image histogram for segmentation. The quality of segmentation depends on the value of the selected thresholds. Since an exhaustive search is made for finding the optimum value of the objective function, the conventional methods of multilevel thresholding are time-consuming computationally, especially when the number of thresholds increases. Use of evolutionary algorithms has attracted a lot of attention under such circumstances. Human mental search algorithm is a population-based evolutionary algorithm inspired by the manner of human mental search in online auctions. This algorithm has three interesting operators: (1) clustering for finding the promising areas, (2) mental search for exploring the surrounding of every solution using Levy distribution, and (3) moving the solutions toward the promising area. In the present study, multilevel thresholding is proposed for image segmentation using human mental search algorithm. Kapur (entropy) and Otsu (between-class variance) criteria were used for this purpose. The advantages of the proposed method are described using twelve images and in comparison with other existing approaches, including genetic algorithm, particle swarm optimization, differential evolution, firefly algorithm, bat algorithm, gravitational search algorithm, and teaching-learning-based optimization. The obtained results indicated that the proposed method is highly efficient in multilevel image thresholding in terms of objective function value, peak signal to noise, structural similarity index, feature similarity index, and the curse of dimensionality. In addition, two nonparametric statistical tests verified the efficiency of the proposed algorithm, statistically.

Seyed Jalaleddin Mousavirad, Hossein Ebrahimpour-Komleh
https://www.sciencedirect.com/science/article/abs/pii/S1568494619301838
A new approach for oil tank detection using deep learning features with control false alarm rate in high-resolution satellite imagery
Oil tank detection is a challenging task, primarily due to high time-consumption. This paper aims at further investigating this challenge and proposes a new hierarchical approach to detect oil tanks, especially with respect to how false alarm rates are reduced. The proposed approach is divided into four stages: region of interest (ROI) extraction, circular object detection, feature extraction, and classification. The first stage, which is a key component of this approach to reduce false alarm and processing time, is applied by an improved faster region-based convolutional neural network (Faster R-CNN) to extract oil depots. In the second stage, a number of candidate objects of the target are selected from the extracted ROIs by a fast circle detection method. Afterwards, in the third stage, a robust feature extractor based on a combination of the output feature vectors from convolutional neural network (CNN), as a high-level feature extractor, and histogram of oriented gradients (HOG), as a low-level feature extractor, are used for representing features of various targets. Finally, the support vector machine (SVM) is employed for classification. The experimental results confirm that the proposed approach has good prediction accuracy and is able to reduce the false alarm rates.

Moein Zalpour, Gholamreza Akbarizadeh, Navid Alaei-Sheini
https://www.tandfonline.com/doi/abs/10.1080/01431161.2019.1685720
Sample complexity of classification with compressed input
One of the most studied problems in machine learning is finding reasonable constraints that guarantee the generalization of a learning algorithm. These constraints are usually expressed as some simplicity assumptions on the target. For instance, in the Vapnik–Chervonenkis (VC) theory the space of possible hypotheses is considered to have a limited VC dimension One way to formulate the simplicity assumption is via information theoretic concepts. In this paper, the constraint on the entropy of the input variable X is studied as a simplicity assumption. It is proven that the sample complexity to achieve an ∊- Probably Approximately Correct (PAC) hypothesis is bounded by ∊∊ which is sharp up to the ∊ factor (a and c are constants). Moreover, it is shown that if a feature learning process is employed to learn the compressed representation from the dataset, this bound no longer exists. These findings have important implications on the Information Bottleneck (IB) theory which had been utilized to explain the generalization power of Deep Neural Networks (DNNs), but its applicability for this purpose is currently under debate by researchers. In particular, this is a rigorous proof for the previous heuristic that compressed representations are exponentially easier to be learned. However, our analysis pinpoints two factors preventing the IB, in its current form, to be applicable in studying neural networks. Firstly, the exponential dependence of sample complexity on ∊., which can lead to a dramatic effect on the bounds in practical applications when ∊ is small. Secondly, our analysis reveals that arguments based on input compression are inherently insufficient to explain generalization of methods like DNNs in which the features are also learned using available data.

Hassan Hafez-Kolahi, Shohreh Kasaei, Mahdiyeh Soleymani-Baghshah
https://www.sciencedirect.com/science/article/abs/pii/S0925231220311516
Action recognition in freestyle wrestling using silhouette-skeleton features
Despite many advances made in Human Action Recognition (HAR), there are still challenges encouraging researchers to explore new methods. In this study, a new feature descriptor based on the silhouette skeleton called Histogram of Graph Nodes (HGN) is proposed. Unlike similar methods, which are strictly based on the articulated human body model, we extracted discriminative features solely using the foreground silhouettes. To this purpose, first, the skeletons of the silhouettes are converted into a graph, representing approximately articulated human body skeleton. By partitioning the region of the graph, the HGN is calculated in each frame. After that, we obtain the final feature vector by combining the HGNs in time. On the other hand, the recognition of two-person sports techniques is one of the areas that has not received adequate attention. To this end, we investigate the recognition of techniques in wrestling as a new computer vision application. In this regard, a dataset of the Freestyle Wrestling techniques (FSW) is introduced. We conducted extensive experiments using the proposed method on the provided dataset. In addition, we examined the proposed feature descriptor on the SBU and THETIS datasets, and the MHI-based features on the FSW dataset. We achieved 84.9% accuracy on FSW dataset while the results are 90.8% for SBU and 44% for THETIS datasets. The fact that experimental results are superior or comparable to other similar methods indicates the effectiveness of the proposed approach.

Ali Mottaghi, Mohsen Soryani, Hamid Seifi
https://www.sciencedirect.com/science/article/pii/S2215098619303052