Title: Active Learning Yields Poor Results in Multi-Label Task · Issue #191 · modAL-python/modAL · GitHub
Open Graph Title: Active Learning Yields Poor Results in Multi-Label Task · Issue #191 · modAL-python/modAL
X Title: Active Learning Yields Poor Results in Multi-Label Task · Issue #191 · modAL-python/modAL
Description: I am using modAL for an active learning project in multi-label classification. My implementation is in PyTorch, and I use DinoV2 as the backbone model. For the same dataset, I apply both active learning (using minimum confidence and aver...
Open Graph Description: I am using modAL for an active learning project in multi-label classification. My implementation is in PyTorch, and I use DinoV2 as the backbone model. For the same dataset, I apply both active lea...
X Description: I am using modAL for an active learning project in multi-label classification. My implementation is in PyTorch, and I use DinoV2 as the backbone model. For the same dataset, I apply both active lea...
Opengraph URL: https://github.com/modAL-python/modAL/issues/191
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Active Learning Yields Poor Results in Multi-Label Task","articleBody":"I am using modAL for an active learning project in multi-label classification. My implementation is in PyTorch, and I use DinoV2 as the backbone model.\r\n For the same dataset, I apply both active learning (using minimum confidence and average confidence strategies) and random sampling. I select the same number of samples in both strategies, but the results from random sampling are significantly better than those from the active learning approach. I would like to know if this discrepancy might be due to an issue with my code or the modAL library's handling of multi-label classification. Below is my active learning loop:\r\n\r\n```\r\nfor i in range(n_queries):\r\n if i == 12:\r\n n_instances = X_pool.shape[0]\r\n else:\r\n n_instances = batch(int(np.ceil(np.power(10, POWER))), BATCH_SIZE)\r\n\r\n print(f\"\\nQuery {i + 1}: Requesting {n_instances} samples from a pool of size {X_pool.shape[0]}\")\r\n\r\n if X_pool.shape[0] \u003c n_instances:\r\n print(\"Not enough samples left in the pool to query the desired number of instances.\")\r\n break\r\n\r\n query_idx, _ = learner.query(X_pool, n_instances=n_instances)\r\n query_idx = np.unique(query_idx)\r\n\r\n if len(query_idx) == 0:\r\n print(\"No indices were selected, which may indicate an issue with the query function or pool.\")\r\n continue\r\n\r\n # Add the newly selected samples to the cumulative training set\r\n cumulative_X_train.append(X_pool[query_idx])\r\n cumulative_y_train.append(y_pool[query_idx])\r\n\r\n # Concatenate all the samples to form the cumulative training data\r\n X_train_cumulative = np.concatenate(cumulative_X_train, axis=0)\r\n y_train_cumulative = np.concatenate(cumulative_y_train, axis=0)\r\n\r\n learner.teach(X_train_cumulative, y_train_cumulative)\r\n\r\n # Log the selected sample names\r\n selected_sample_names = train_df.loc[query_idx, \"image\"].tolist()\r\n print(f\"Selected samples in Query {i + 1}: {selected_sample_names}\")\r\n with open(samples_log_file, mode='a', newline='') as f:\r\n writer = csv.writer(f)\r\n writer.writerow([i + 1] + selected_sample_names)\r\n\r\n # Remove the selected samples from the pool\r\n X_pool = np.delete(X_pool, query_idx, axis=0)\r\n y_pool = np.delete(y_pool, query_idx, axis=0)\r\n\r\n # Evaluate the model\r\n y_pred = learner.predict(X_test_np)\r\n accuracy = accuracy_score(y_test_np, y_pred)\r\n f1 = f1_score(y_test_np, y_pred, average='macro')\r\n acc_test_data.append(accuracy)\r\n f1_test_data.append(f1)\r\n print(f\"Accuracy after query {i + 1}: {accuracy}\")\r\n print(f\"F1 Score after query {i + 1}: {f1}\")\r\n\r\n # Early stopping logic\r\n if f1 \u003e best_f1_score:\r\n best_f1_score = f1\r\n wait = 0\r\n else:\r\n wait += 1\r\n if wait \u003e= patience:\r\n print(f\"Stopping early after {i + 1} queries due to no improvement in F1 score.\")\r\n break\r\n\r\n total_samples += len(query_idx)\r\n print(f\"Total samples used for training after query {i + 1}: {total_samples}\")\r\n POWER += 0.25\r\n torch.cuda.empty_cache()\r\n\r\n```\r\n ","author":{"url":"https://github.com/shadikhamsehh","@type":"Person","name":"shadikhamsehh"},"datePublished":"2024-09-10T20:16:12.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":0},"url":"https://github.com/191/modAL/issues/191"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:dc7b6755-67a1-e623-c3ff-0579588882f9 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | B546:13DEEE:DE1031:130DFD5:698F08FD |
| html-safe-nonce | cb0e0158633e1de7804feb21cc8cf7bfb6369ff7dd7343a287dbae7d9fca6ddf |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCNTQ2OjEzREVFRTpERTEwMzE6MTMwREZENTo2OThGMDhGRCIsInZpc2l0b3JfaWQiOiIxMDA3NTA4NzIyNzgyNTA1MjEzIiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0= |
| visitor-hmac | a12b743906e586650f0137300f99397f0981c847e4d857a7dee8039304156e2b |
| hovercard-subject-tag | issue:2517672547 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/modAL-python/modAL/191/issue_layout |
| twitter:image | https://opengraph.githubassets.com/4f3724aba2432916981ea755d22207e12f7440018d18bd1065917a063c076724/modAL-python/modAL/issues/191 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/4f3724aba2432916981ea755d22207e12f7440018d18bd1065917a063c076724/modAL-python/modAL/issues/191 |
| og:image:alt | I am using modAL for an active learning project in multi-label classification. My implementation is in PyTorch, and I use DinoV2 as the backbone model. For the same dataset, I apply both active lea... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | shadikhamsehh |
| hostname | github.com |
| expected-hostname | github.com |
| None | 2da1a0d1318592c9965539b12269c4641177dfabfc86c3807992efb13e1d96ff |
| turbo-cache-control | no-preview |
| go-import | github.com/modAL-python/modAL git https://github.com/modAL-python/modAL.git |
| octolytics-dimension-user_id | 42179679 |
| octolytics-dimension-user_login | modAL-python |
| octolytics-dimension-repository_id | 110697473 |
| octolytics-dimension-repository_nwo | modAL-python/modAL |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 110697473 |
| octolytics-dimension-repository_network_root_nwo | modAL-python/modAL |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 66d26b1a7f81bd3ffe8d0f16abab43f6e64fd21a |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width