Article information

2026 , Volume 31, ¹ 1, p.92-105

Berikov V.B., Kutnenko O.A.

Weakly supervised multiple instance learning based on informative feature selection and sample filtering

Purpose. The paper addresses the problem of weakly supervised multiple instance learning, where sets of objects referred to as “bags” are analyzed. Each object is represented by a set of observations of certain features. A binary classification case is considered: one class is conventionally labelled as positive, and the other as negative. A bag is labelled as positive if it contains at least one positive object (the specific object is unknown); otherwise, the bag is labelled as negative. The goal is to predict classes for new bags to achieve the best quality metrics.

Methodology. Machine learning methods adapted to the problem are employed: 𝑘-nearest neighbours (𝑘NN), informative feature selection, sample filtering, and an ensemble approach to constructing decision functions. Additionally, the function of rival similarity (FRiS) is used to evaluate the degree of “unusualness” of bags. An experimental study and comparison with existing methods are conducted on the real-world problem of identifying proteins containing structures with a thioredoxin fold.

Findings. A method for solving the problem was developed, utilizing informative feature space selection, filtering (self-correction) of the training set, and the voting on a set of decision functions. The results of solving the protein identification problem were compared with a number of well-known algorithms using quality metrics.

Originality/value. The developed method enables the selection of the most informative feature sets, which is crucial for improving the quality and interpretability of solutions, as well as selfcorrection of the training set, reducing the impact of various errors associated with inaccurate labelling, outliers, etc. In numerical experiments with protein structure recognition data, comparison results with a number of well-known algorithms confirmed sufficiently high efficiency of the proposed method according to the balanced accuracy metric.


Keywords: weakly supervised learning, multi-instance classification, informative feature, filtering of sample objects

Author(s):
Berikov Vladimir Borisovich
Dr. , Associate Professor
Position: General Scientist
Office: Sobolev Institute of mathematics Siberian Branch of Russian Academy of Science
Address: 630090, Russia, Novosibirsk, 4, Acad. Koptyug Avenue
Phone Office: (383) 3297575
E-mail: berikov@math.nsc.ru
SPIN-code: 8108-2591

Kutnenko Olga Andreevna
PhD. , Associate Professor
Position: Senior Research Scientist
Office: Sobolev Institute of Mathematics Siberian Branch Russian Academy of Sciences
Address: 630090, Russia, Novosibirsk, 4, Acad. Koptyug Avenue
E-mail: olga@math.nsc.ru
SPIN-code: 7600-1424


Bibliography link:
Berikov V.B., Kutnenko O.A. Weakly supervised multiple instance learning based on informative feature selection and sample filtering // Computational technologies. 2026. V. 31. ¹ 1. P. 92-105
Home| Scope| Editorial Board| Content| Search| Subscription| Rules| Contacts
ISSN 1560-7534
© 2026 FRC ICT