در این صفحه آخرین دستاوردهای علمی-پژوهشی اعضای انجمن بینایی ماشین و پردازش تصویر ایران اطلاع رسانی میگردد. بدینوسیله از اعضای محترم دعوت به عمل میآید لینک آخرین انتشارات علمی پژوهشی خود را به آدرس email@example.com با عنوان «دستاورد پژوهشی» ارسال فرمایند. با تشکر
Architecture to improve the accuracy of automatic image annotation systems
Automatic image annotation (AIA) is an image retrieval mechanism to extract relative semantic tags from visual content. So far, the improvement of accuracy in newly developed such methods have been about 1 or 2% in the F1-score and the architectures seem to have room for improvement. Therefore, the authors designed a more detailed architecture for AIA and suggested new algorithms for its main parts. The proposed architecture has three main parts: feature extraction, learning, and annotation. They designed a novel learning method using machine learning and probability bases. In the annotation part, they suggest a novel method that gains the maximum benefit from the learning part. The combination of the proposed architecture, algorithms, and novel ideas resulted in new accuracy milestones in F1-score on most commonly used datasets. In their architecture, N+ measure which shows the number of tags with non-zero recalls showed that they could recall all tags for IAPRTC-12 and ESP-Games datasets.
Artin Ghostan Khatchatoorian, Mansour Jamzad
Effectiveness of “rescue saccades” on the accuracy of tracking multiple moving targets: An eye-tracking study on the effects of target occlusions
Occlusion is one of the main challenges in tracking multiple moving objects. In almost all real-world scenarios, a moving object or a stationary obstacle occludes targets partially or completely for a short or long time during their movement. A previous study (Zelinsky & Todor, 2010) reported that subjects make timely saccades toward the object in danger of being occluded. Observers make these so-called “rescue saccades” to prevent target swapping. In this study, we examined whether these saccades are helpful. To this aim, we used as the stimuli recorded videos from natural movement of zebrafish larvae swimming freely in a circular container. We considered two main types of occlusion: object-object occlusions that naturally exist in the videos, and object-occluder occlusions created by adding a stationary doughnut-shape occluder in some videos. Four different scenarios were studied: (1) no occlusions, (2) only object-object occlusions, (3) only object-occluder occlusion, or (4) both object-object and object-occluder occlusions. For each condition, two set sizes (two and four) were applied. Participants’ eye movements were recorded during tracking, and rescue saccades were extracted afterward. The results showed that rescue saccades are helpful in handling object-object occlusions but had no reliable effect on tracking through object-occluder occlusions. The presence of occlusions generally increased visual sampling of the scenes; nevertheless, tracking accuracy declined due to occlusion.
Shiva Kamkar, Hamid Abrishami Moghaddam, Reza Lashgari, Lauri Oksama, Jie Li, Jukka Hyönä
Histogram modification based enhancement along with contrast-changed image quality assessment
Contrast is the difference in visual characteristics which make an object more recognizable. Despite the significance of contrast enhancement (CE) in image processing applications, few attempts have been made on assessment of the contrast change. In this paper, a visual information fidelity-based contrast change metric (VIF-CCM) is presented which includes visual information fidelity (VIF), local entropy, correlation coefficient, and mean intensity measures. The validation results of the presented VIF-CCM show its efficiency and superiority over the state-of–the-arts image quality assessment metrics. A histogram modification based contrast enhancement (HMCE) method is also proposed in this paper. The proposed HMCE comprises of four steps: segmentation of the input image, employing a set of weighting constraints, applying the combination of adaptive gamma correction and equalization on modified histogram, and optimization the value of the constraint weights by PSO algorithm. Experimental results demonstrate that the proposed HMCE outperforms other existing CE methods subjectively and objectively.
Ayub Shokrollahi, Babak Mazloom-Nezhad Maybodi, Ahmad Mahmoudi-Aznaveh
Human mental search-based multilevel thresholding for image segmentation
Multilevel thresholding is one of the principal methods of image segmentation. These methods enjoy image histogram for segmentation. The quality of segmentation depends on the value of the selected thresholds. Since an exhaustive search is made for finding the optimum value of the objective function, the conventional methods of multilevel thresholding are time-consuming computationally, especially when the number of thresholds increases. Use of evolutionary algorithms has attracted a lot of attention under such circumstances. Human mental search algorithm is a population-based evolutionary algorithm inspired by the manner of human mental search in online auctions. This algorithm has three interesting operators: (1) clustering for finding the promising areas, (2) mental search for exploring the surrounding of every solution using Levy distribution, and (3) moving the solutions toward the promising area. In the present study, multilevel thresholding is proposed for image segmentation using human mental search algorithm. Kapur (entropy) and Otsu (between-class variance) criteria were used for this purpose. The advantages of the proposed method are described using twelve images and in comparison with other existing approaches, including genetic algorithm, particle swarm optimization, differential evolution, firefly algorithm, bat algorithm, gravitational search algorithm, and teaching-learning-based optimization. The obtained results indicated that the proposed method is highly efficient in multilevel image thresholding in terms of objective function value, peak signal to noise, structural similarity index, feature similarity index, and the curse of dimensionality. In addition, two nonparametric statistical tests verified the efficiency of the proposed algorithm, statistically.
Seyed Jalaleddin Mousavirad, Hossein Ebrahimpour-Komleh
A new approach for oil tank detection using deep learning features with control false alarm rate in high-resolution satellite imagery
Oil tank detection is a challenging task, primarily due to high time-consumption. This paper aims at further investigating this challenge and proposes a new hierarchical approach to detect oil tanks, especially with respect to how false alarm rates are reduced. The proposed approach is divided into four stages: region of interest (ROI) extraction, circular object detection, feature extraction, and classification. The first stage, which is a key component of this approach to reduce false alarm and processing time, is applied by an improved faster region-based convolutional neural network (Faster R-CNN) to extract oil depots. In the second stage, a number of candidate objects of the target are selected from the extracted ROIs by a fast circle detection method. Afterwards, in the third stage, a robust feature extractor based on a combination of the output feature vectors from convolutional neural network (CNN), as a high-level feature extractor, and histogram of oriented gradients (HOG), as a low-level feature extractor, are used for representing features of various targets. Finally, the support vector machine (SVM) is employed for classification. The experimental results confirm that the proposed approach has good prediction accuracy and is able to reduce the false alarm rates.
Moein Zalpour, Gholamreza Akbarizadeh, Navid Alaei-Sheini
Sample complexity of classification with compressed input
One of the most studied problems in machine learning is finding reasonable constraints that guarantee the generalization of a learning algorithm. These constraints are usually expressed as some simplicity assumptions on the target. For instance, in the Vapnik–Chervonenkis (VC) theory the space of possible hypotheses is considered to have a limited VC dimension One way to formulate the simplicity assumption is via information theoretic concepts. In this paper, the constraint on the entropy of the input variable X is studied as a simplicity assumption. It is proven that the sample complexity to achieve an ∊- Probably Approximately Correct (PAC) hypothesis is bounded by ∊∊ which is sharp up to the ∊ factor (a and c are constants). Moreover, it is shown that if a feature learning process is employed to learn the compressed representation from the dataset, this bound no longer exists. These findings have important implications on the Information Bottleneck (IB) theory which had been utilized to explain the generalization power of Deep Neural Networks (DNNs), but its applicability for this purpose is currently under debate by researchers. In particular, this is a rigorous proof for the previous heuristic that compressed representations are exponentially easier to be learned. However, our analysis pinpoints two factors preventing the IB, in its current form, to be applicable in studying neural networks. Firstly, the exponential dependence of sample complexity on ∊., which can lead to a dramatic effect on the bounds in practical applications when ∊ is small. Secondly, our analysis reveals that arguments based on input compression are inherently insufficient to explain generalization of methods like DNNs in which the features are also learned using available data.
Hassan Hafez-Kolahi, Shohreh Kasaei, Mahdiyeh Soleymani-Baghshah
Action recognition in freestyle wrestling using silhouette-skeleton features
Despite many advances made in Human Action Recognition (HAR), there are still challenges encouraging researchers to explore new methods. In this study, a new feature descriptor based on the silhouette skeleton called Histogram of Graph Nodes (HGN) is proposed. Unlike similar methods, which are strictly based on the articulated human body model, we extracted discriminative features solely using the foreground silhouettes. To this purpose, first, the skeletons of the silhouettes are converted into a graph, representing approximately articulated human body skeleton. By partitioning the region of the graph, the HGN is calculated in each frame. After that, we obtain the final feature vector by combining the HGNs in time. On the other hand, the recognition of two-person sports techniques is one of the areas that has not received adequate attention. To this end, we investigate the recognition of techniques in wrestling as a new computer vision application. In this regard, a dataset of the Freestyle Wrestling techniques (FSW) is introduced. We conducted extensive experiments using the proposed method on the provided dataset. In addition, we examined the proposed feature descriptor on the SBU and THETIS datasets, and the MHI-based features on the FSW dataset. We achieved 84.9% accuracy on FSW dataset while the results are 90.8% for SBU and 44% for THETIS datasets. The fact that experimental results are superior or comparable to other similar methods indicates the effectiveness of the proposed approach.
Ali Mottaghi, Mohsen Soryani, Hamid Seifi