Title: Proof of concept for allowing non-sklearn estimators by adelevie · Pull Request #160 · modAL-python/modAL · GitHub
Open Graph Title: Proof of concept for allowing non-sklearn estimators by adelevie · Pull Request #160 · modAL-python/modAL
X Title: Proof of concept for allowing non-sklearn estimators by adelevie · Pull Request #160 · modAL-python/modAL
Description: Not sure if there is any desire for this feature, but in this PR I have sketched out a way to use virtually any estimator type with the ActiveLearner and BayesianOptimizer classes. Motivation Allow us to use other training and inference facilities, such as HuggingFace models that are trained using the Trainer class, use AWS SageMaker Estimators, etc. With this added flexibility, the training and inference does not need to even run on the same hardware as the modAL code. This brings the suite of sampling methods here to many new applications, particularly resource-intensive deep learning models that typically don't fit that great under the sklearn interface. Implementation Rather than call the classic sklearn estimator functions such as fit, predict, predict_proba, and score, this PR adds a layer of callables that can be overridden: fit_func, predict_func, predict_proba_func, and score_func. def __init__(self, estimator: BaseEstimator, query_strategy: Callable = uncertainty_sampling, X_training: Optional[modALinput] = None, y_training: Optional[modALinput] = None, bootstrap_init: bool = False, on_transformed: bool = False, force_all_finite: bool = True, fit_func: FitFunction = SKLearnFitFunction(), predict_func: PredictFunction = SKLearnPredictFunction(), predict_proba_func: PredictProbaFunction = SKLearnPredictProbaFunction(), score_func: ScoreFunction = SKLearnScoreFunction(), **fit_kwargs ) -> None: I added SKLearn implementations of each by default (included their corresponding Protocol classes as well). Here's how fit works: class FitFunction(Protocol): def __call__(self, estimator: GenericEstimator, X, y, **kwargs) -> GenericEstimator: raise NotImplementedError # ... class SKLearnFitFunction(FitFunction): def __call__(self, estimator: BaseEstimator, X, y, **kwargs) -> BaseEstimator: return estimator.fit(X=X, y=y, **kwargs) I'll also note that the changes in this PR don't break any of the existing tests. Usage When using SageMaker, we might implement fit and predict_proba in this manner: class CustomEstimator: hf_predictor: Union[HuggingFacePredictor, Predictor] hf_estimator: HuggingFace def __init__(self, hf_predictor: HuggingFacePredictor, hf_estimator: HuggingFace): self.hf_predictor = hf_predictor self.hf_estimator = hf_estimator class CustomFitFunction(FitFunction): def __call__(self, estimator: CustomEstimator, X, y, **kwargs) -> CustomEstimator: # notice we don't use `y` -- the label is baked into the HuggingFace Dataset return estimator.hf_estimator.fit(X=X, **kwargs) class CustomPredictProbaFunction(PredictProbaFunction): @staticmethod def hf_prediction_to_proba(predictions: Union[List[Dict], object], positive_class_label: str = 'LABEL_1', negative_class_label: str = 'LABEL_0') -> np.array: label_key: str = 'label' score_key: str = 'score' p = [] for prediction in predictions: if positive_class_label == prediction[label_key]: score = prediction[score_key] p.append([score, 1.0 - score]) if negative_class_label == prediction[label_key]: score = prediction[score_key] p.append([1.0 - score, score]) return np.array(p) def __call__(self, estimator: CustomEstimator, X, **kwargs) -> np.array: return self.hf_prediction_to_proba( predictions=estimator.hf_predictor.predict(dict(inputs=X)) ) estimator = CustomEstimator(hf_predictor=hf_predictor, hf_estimator=hf_estimator) learner = ActiveLearner( estimator=estimator, fit_func=CustomFitFunction(), predict_proba_func=CustomPredictProbaFunction(), X_training=train_dataset # standard HuggingFace Dataset instead of your typical types for `X` in `sklearn` ) If you've made it this far, I'd ask that you forgive the clunkiness. This was a rough sketch of an idea I wanted to get written down before I forgot it. Anyways, would love some feedback, and if you think this PR is worth finishing, let me know. I can say for me, this would unlock a lot of really useful applications.
Open Graph Description: Not sure if there is any desire for this feature, but in this PR I have sketched out a way to use virtually any estimator type with the ActiveLearner and BayesianOptimizer classes. Motivation Allow...
X Description: Not sure if there is any desire for this feature, but in this PR I have sketched out a way to use virtually any estimator type with the ActiveLearner and BayesianOptimizer classes. Motivation Allow...
Opengraph URL: https://github.com/modAL-python/modAL/pull/160
X: @github
Domain: patch-diff.githubusercontent.com
| route-pattern | /:user_id/:repository/pull/:id/files(.:format) |
| route-controller | pull_requests |
| route-action | files |
| fetch-nonce | v2:ec757a86-ad58-cf1d-eef2-750847779568 |
| current-catalog-service-hash | ae870bc5e265a340912cde392f23dad3671a0a881730ffdadd82f2f57d81641b |
| request-id | 86EA:1814C8:BCC0D3:F81FC0:698FB45B |
| html-safe-nonce | 93257c85a8c8d445793b08f1cd82350a7654a023c7afd10101898270a58dbbc4 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI4NkVBOjE4MTRDODpCQ0MwRDM6RjgxRkMwOjY5OEZCNDVCIiwidmlzaXRvcl9pZCI6IjUyMzE0Nzc0MjE5NDU4OTM5NzkiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | d2c2b70536df9170e791845277ddb799b2ad5282c8a13803089663b85017c562 |
| hovercard-subject-tag | pull_request:1023298701 |
| github-keyboard-shortcuts | repository,pull-request-list,pull-request-conversation,pull-request-files-changed,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/modAL-python/modAL/pull/160/files |
| twitter:image | https://avatars.githubusercontent.com/u/86790?s=400&v=4 |
| twitter:card | summary_large_image |
| og:image | https://avatars.githubusercontent.com/u/86790?s=400&v=4 |
| og:image:alt | Not sure if there is any desire for this feature, but in this PR I have sketched out a way to use virtually any estimator type with the ActiveLearner and BayesianOptimizer classes. Motivation Allow... |
| og:site_name | GitHub |
| og:type | object |
| hostname | github.com |
| expected-hostname | github.com |
| None | 6df359c0989bb4eb7656e0047ab7a57a6657880db88f5a202f4e51ddbc3dfce8 |
| turbo-cache-control | no-preview |
| diff-view | unified |
| go-import | github.com/modAL-python/modAL git https://github.com/modAL-python/modAL.git |
| octolytics-dimension-user_id | 42179679 |
| octolytics-dimension-user_login | modAL-python |
| octolytics-dimension-repository_id | 110697473 |
| octolytics-dimension-repository_nwo | modAL-python/modAL |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 110697473 |
| octolytics-dimension-repository_network_root_nwo | modAL-python/modAL |
| turbo-body-classes | logged-out env-production page-responsive full-width |
| disable-turbo | true |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | d09a7639fca70dcd33f2b127cabd422a73b10aef |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width