دستاوردهای پژوهشی اعضاء | انجمن بینایی ماشین و پردازش تصویر ایران

در این صفحه آخرین دستاوردهای علمی-پژوهشی اعضای انجمن بینایی ماشین و پردازش تصویر ایران اطلاع رسانی می‌گردد. بدین‌وسیله از اعضای محترم دعوت به عمل می‌آید لینک آخرین انتشارات علمی پژوهشی خود را به آدرس info@ismvip.ir با عنوان «دستاورد پژوهشی» ارسال فرمایند. با تشکر

A Novel Approach for Real-Time Estimation of State of Charge in Li-Ion Battery Through Hybrid Methodology

Rechargeable batteries are essential components for modern energy systems and electric vehicles (EVs). Accurate estimation of State of Charge (SOC) plays a pivotal role in the reliable operation and efficiency of battery systems. While various methods have been developed to improve SOC estimation, there remains significant potential for further enhancement. This paper presents a hybrid SOC estimation technique specifically designed for EV battery management systems (BMS). The proposed method effectively mitigates the impact of cell deterioration, achieving high-precision SOC estimation. SOC serves as a critical parameter in BMS decision-making. This study integrates the Adaptive Extended Kalman Filter (AEKF) with a Li-ion cell model and the Coulomb Counting technique. Given the computational complexity inherent to AEKF and the susceptibility of the Coulomb Counting method to noise, their combination offers a novel approach characterized by improved accuracy and reduced complexity. The method was validated through extensive simulations in MATLAB-Simulink and experimental testing using a hardware test bench. The results were compared to those of the unscented Kalman filter-based SOC estimation, adaptive integral correction-based methods, and machine learning-based methods. The proposed adaptive strategy shows a 70% reduction in complexity compared to DEKF while achieving an SOC estimation accuracy of up to 1.02%.

Armin Emami; Gholamreza Akbarizadeh; Alimorad Mahmoudi

10.1109/ACCESS.2024.3475636

RI-ViT: A Multi-scale Hybrid Method Based on Vision Transformer for Breast Cancer Detection in Histopathological Images

Breast cancer is one of the most significant health threats to women worldwide. This disease manifests through abnormal proliferation of cells and the formation of tumors in breast tissue. Definitive breast cancer diagnosis is usually determined by analyzing tissue samples obtained from biopsies and reviewing them by pathologists. However, this method is highly dependent on the knowledge and experience of pathologists and may lead to errors due to the subjective nature of human interpretation and the high volume of cases. This study presents a multi-scale hybrid model based on Vision Transformer and residual networks for breast cancer detection in histopathological images, abbreviated as RI-ViT. In this approach, local features are extracted through a combination of residual stages and multi-scale learning, while global features are obtained using the attention mechanism in transformers. This combination enables simultaneous extraction of both local and global features from histopathological images, effectively improving the model’s performance in detecting complex cases. We have used an imbalanced and publicly available dataset called BreakHis to evaluate the performance of the RI-ViT model. The experimental results of the proposed model show that it achieves accuracies of 99.75%, 98.80%, 98.01%, and 97.53% at magnifications of 40X, 100X, 200X, and 400X, respectively. The RI-ViT model can also perform well in an magnification-independent mode. Results show that, regardless of the magnification level, it achieves an accuracy of 99.37%, demonstrating its superiority over other state-of-the-art models.

; ;

10.1109/ACCESS.2024.3514322

CapsNet Regularization and its Conjugation with ResNet for Signature Identification
We propose a new regularization term for CapsNet that significantly improves the generalization power of the original method from small training data while requiring much fewer parameters, making it suitable for large input images. We also propose a very efficient DNN architecture that integrates CapsNet with ResNet to obtain the advantages of the two architectures. CapsNet allows a powerful understanding of the objects’ components and their positions, while ResNet provides efficient feature extraction and description. Our approach is general, and we demonstrate it on the problem of signature identification from images. To show our approach superiority, we provide several evaluations with different protocols. We also show that our approach outperforms the state-of-the-art on this problem with thorough experiments on three publicly available datasets CEDAR, MCYT, and UTSig.

Mahdi Jampour, SaeidAbbaasi, Malihe Javidi
https://doi.org/10.1016/j.patcog.2021.107851

Electrical fault detection in three-phase induction motor using deep network-based features of thermograms
In this paper, an automatic method is proposed for detecting the operating faults in three-phase induction motors based on thermal images. If these faults are not detected or fixed on time, they can lead to permanent motor failure. This is why non-invasive and non-destructive experiments are significantly considered. In this paper, first, the region of interest is detected in the thermograms using SIFT-based key-points matching. Then, these images are transformed into representative feature vectors based on a pre-trained convolutional neural network. Then, the training vector samples are clustered into cold and hot clusters by K-means. For each cluster, an SVM-based classifier is trained. The test feature vector samples are clustered and mapped into classes using the corresponding trained SVM-based classifiers. Evaluating the proposed method on the datasets including real thermal images, shows that this algorithm can detect 100% of the faults of the induction motor.

Majid Khanjani, Mehdi Ezoji
https://doi.org/10.1016/j.measurement.2020.108622

Architecture to improve the accuracy of automatic image annotation systems
Automatic image annotation (AIA) is an image retrieval mechanism to extract relative semantic tags from visual content. So far, the improvement of accuracy in newly developed such methods have been about 1 or 2% in the F1-score and the architectures seem to have room for improvement. Therefore, the authors designed a more detailed architecture for AIA and suggested new algorithms for its main parts. The proposed architecture has three main parts: feature extraction, learning, and annotation. They designed a novel learning method using machine learning and probability bases. In the annotation part, they suggest a novel method that gains the maximum benefit from the learning part. The combination of the proposed architecture, algorithms, and novel ideas resulted in new accuracy milestones in F1-score on most commonly used datasets. In their architecture, N+ measure which shows the number of tags with non-zero recalls showed that they could recall all tags for IAPRTC-12 and ESP-Games datasets.

Artin Ghostan Khatchatoorian, Mansour Jamzad
https://doi.org/10.1049/iet-cvi.2019.0500

A deep learning framework for Text-independent Writer Identification
Handwriting Writer Identification (HWI) refers to the process of handwriting text image analysis to identify the authorship of the documents. It has yielded promising results in various applications, including digital forensics, criminal purposes, exploring the writer of historical documents, etc. The complexity of the text image, especially in images with various handwriting makes the writer identification difficult. In this work, we propose an end-to-end system that relies on a straightforward yet well-designed deep network and very efficient feature extraction, emphasizing feature engineering. Our system is an extended version of ResNet by conjugating deep residual networks and a new traditional yet high-quality handwriting descriptor towards handwriting analysis. Our descriptor analyzes the handwriting thickness as a preliminary and essential feature for human handwriting characteristics. Our approach can also provide text-independent writer identification that we do not need to have the same handwriting content for learning our model. The proposed approach is evaluated and achieved consistent results on four public and well-known datasets of IAM, Firemaker, CVL, and CERUG-EN. We empirically demonstrate that our conjugated network outperforms the original ResNet, and it can work well for real-world applications in which patches with few letters exist.

Malihe Javidi, Mahdi. Jampour
https://doi.org/10.1016/j.engappai.2020.103912

Effectiveness of “rescue saccades” on the accuracy of tracking multiple moving targets: An eye-tracking study on the effects of target occlusions
Occlusion is one of the main challenges in tracking multiple moving objects. In almost all real-world scenarios, a moving object or a stationary obstacle occludes targets partially or completely for a short or long time during their movement. A previous study (Zelinsky & Todor, 2010) reported that subjects make timely saccades toward the object in danger of being occluded. Observers make these so-called “rescue saccades” to prevent target swapping. In this study, we examined whether these saccades are helpful. To this aim, we used as the stimuli recorded videos from natural movement of zebrafish larvae swimming freely in a circular container. We considered two main types of occlusion: object-object occlusions that naturally exist in the videos, and object-occluder occlusions created by adding a stationary doughnut-shape occluder in some videos. Four different scenarios were studied: (1) no occlusions, (2) only object-object occlusions, (3) only object-occluder occlusion, or (4) both object-object and object-occluder occlusions. For each condition, two set sizes (two and four) were applied. Participants’ eye movements were recorded during tracking, and rescue saccades were extracted afterward. The results showed that rescue saccades are helpful in handling object-object occlusions but had no reliable effect on tracking through object-occluder occlusions. The presence of occlusions generally increased visual sampling of the scenes; nevertheless, tracking accuracy declined due to occlusion.

Shiva Kamkar, Hamid Abrishami Moghaddam, Reza Lashgari, Lauri Oksama, Jie Li, Jukka Hyönä
https://jov.arvojournals.org/article.aspx?articleid=2771970

Histogram modification based enhancement along with contrast-changed image quality assessment
Contrast is the difference in visual characteristics which make an object more recognizable. Despite the significance of contrast enhancement (CE) in image processing applications, few attempts have been made on assessment of the contrast change. In this paper, a visual information fidelity-based contrast change metric (VIF-CCM) is presented which includes visual information fidelity (VIF), local entropy, correlation coefficient, and mean intensity measures. The validation results of the presented VIF-CCM show its efficiency and superiority over the state-of–the-arts image quality assessment metrics. A histogram modification based contrast enhancement (HMCE) method is also proposed in this paper. The proposed HMCE comprises of four steps: segmentation of the input image, employing a set of weighting constraints, applying the combination of adaptive gamma correction and equalization on modified histogram, and optimization the value of the constraint weights by PSO algorithm. Experimental results demonstrate that the proposed HMCE outperforms other existing CE methods subjectively and objectively.

Ayub Shokrollahi, Babak Mazloom-Nezhad Maybodi, Ahmad Mahmoudi-Aznaveh
https://link.springer.com/article/10.1007%2Fs11042-020-08830-9

Human mental search-based multilevel thresholding for image segmentation
Multilevel thresholding is one of the principal methods of image segmentation. These methods enjoy image histogram for segmentation. The quality of segmentation depends on the value of the selected thresholds. Since an exhaustive search is made for finding the optimum value of the objective function, the conventional methods of multilevel thresholding are time-consuming computationally, especially when the number of thresholds increases. Use of evolutionary algorithms has attracted a lot of attention under such circumstances. Human mental search algorithm is a population-based evolutionary algorithm inspired by the manner of human mental search in online auctions. This algorithm has three interesting operators: (1) clustering for finding the promising areas, (2) mental search for exploring the surrounding of every solution using Levy distribution, and (3) moving the solutions toward the promising area. In the present study, multilevel thresholding is proposed for image segmentation using human mental search algorithm. Kapur (entropy) and Otsu (between-class variance) criteria were used for this purpose. The advantages of the proposed method are described using twelve images and in comparison with other existing approaches, including genetic algorithm, particle swarm optimization, differential evolution, firefly algorithm, bat algorithm, gravitational search algorithm, and teaching-learning-based optimization. The obtained results indicated that the proposed method is highly efficient in multilevel image thresholding in terms of objective function value, peak signal to noise, structural similarity index, feature similarity index, and the curse of dimensionality. In addition, two nonparametric statistical tests verified the efficiency of the proposed algorithm, statistically.

Seyed Jalaleddin Mousavirad, Hossein Ebrahimpour-Komleh
https://www.sciencedirect.com/science/article/abs/pii/S1568494619301838

A new approach for oil tank detection using deep learning features with control false alarm rate in high-resolution satellite imagery
Oil tank detection is a challenging task, primarily due to high time-consumption. This paper aims at further investigating this challenge and proposes a new hierarchical approach to detect oil tanks, especially with respect to how false alarm rates are reduced. The proposed approach is divided into four stages: region of interest (ROI) extraction, circular object detection, feature extraction, and classification. The first stage, which is a key component of this approach to reduce false alarm and processing time, is applied by an improved faster region-based convolutional neural network (Faster R-CNN) to extract oil depots. In the second stage, a number of candidate objects of the target are selected from the extracted ROIs by a fast circle detection method. Afterwards, in the third stage, a robust feature extractor based on a combination of the output feature vectors from convolutional neural network (CNN), as a high-level feature extractor, and histogram of oriented gradients (HOG), as a low-level feature extractor, are used for representing features of various targets. Finally, the support vector machine (SVM) is employed for classification. The experimental results confirm that the proposed approach has good prediction accuracy and is able to reduce the false alarm rates.

Moein Zalpour, Gholamreza Akbarizadeh, Navid Alaei-Sheini
https://www.tandfonline.com/doi/abs/10.1080/01431161.2019.1685720

Sample complexity of classification with compressed input
One of the most studied problems in machine learning is finding reasonable constraints that guarantee the generalization of a learning algorithm. These constraints are usually expressed as some simplicity assumptions on the target. For instance, in the Vapnik–Chervonenkis (VC) theory the space of possible hypotheses is considered to have a limited VC dimension One way to formulate the simplicity assumption is via information theoretic concepts. In this paper, the constraint on the entropy of the input variable X is studied as a simplicity assumption. It is proven that the sample complexity to achieve an ∊- Probably Approximately Correct (PAC) hypothesis is bounded by ∊∊ which is sharp up to the ∊ factor (a and c are constants). Moreover, it is shown that if a feature learning process is employed to learn the compressed representation from the dataset, this bound no longer exists. These findings have important implications on the Information Bottleneck (IB) theory which had been utilized to explain the generalization power of Deep Neural Networks (DNNs), but its applicability for this purpose is currently under debate by researchers. In particular, this is a rigorous proof for the previous heuristic that compressed representations are exponentially easier to be learned. However, our analysis pinpoints two factors preventing the IB, in its current form, to be applicable in studying neural networks. Firstly, the exponential dependence of sample complexity on ∊., which can lead to a dramatic effect on the bounds in practical applications when ∊ is small. Secondly, our analysis reveals that arguments based on input compression are inherently insufficient to explain generalization of methods like DNNs in which the features are also learned using available data.

Hassan Hafez-Kolahi, Shohreh Kasaei, Mahdiyeh Soleymani-Baghshah
https://www.sciencedirect.com/science/article/abs/pii/S0925231220311516

Action recognition in freestyle wrestling using silhouette-skeleton features
Despite many advances made in Human Action Recognition (HAR), there are still challenges encouraging researchers to explore new methods. In this study, a new feature descriptor based on the silhouette skeleton called Histogram of Graph Nodes (HGN) is proposed. Unlike similar methods, which are strictly based on the articulated human body model, we extracted discriminative features solely using the foreground silhouettes. To this purpose, first, the skeletons of the silhouettes are converted into a graph, representing approximately articulated human body skeleton. By partitioning the region of the graph, the HGN is calculated in each frame. After that, we obtain the final feature vector by combining the HGNs in time. On the other hand, the recognition of two-person sports techniques is one of the areas that has not received adequate attention. To this end, we investigate the recognition of techniques in wrestling as a new computer vision application. In this regard, a dataset of the Freestyle Wrestling techniques (FSW) is introduced. We conducted extensive experiments using the proposed method on the provided dataset. In addition, we examined the proposed feature descriptor on the SBU and THETIS datasets, and the MHI-based features on the FSW dataset. We achieved 84.9% accuracy on FSW dataset while the results are 90.8% for SBU and 44% for THETIS datasets. The fact that experimental results are superior or comparable to other similar methods indicates the effectiveness of the proposed approach.

Ali Mottaghi, Mohsen Soryani, Hamid Seifi
https://www.sciencedirect.com/science/article/pii/S2215098619303052