Title: Discussion: Pushing batches of data to online store: Should `conn.commit()` happen in the for loop or after? · Issue #4036 · feast-dev/feast · GitHub
Open Graph Title: Discussion: Pushing batches of data to online store: Should `conn.commit()` happen in the for loop or after? · Issue #4036 · feast-dev/feast
X Title: Discussion: Pushing batches of data to online store: Should `conn.commit()` happen in the for loop or after? · Issue #4036 · feast-dev/feast
Description: This piece of code in the online_write_batch function in the Postgres online store pushes data in batches to the online store. I was wondering whether it makes more sense to put the conn.commit() inside the for loop, or after the for loo...
Open Graph Description: This piece of code in the online_write_batch function in the Postgres online store pushes data in batches to the online store. I was wondering whether it makes more sense to put the conn.commit() i...
X Description: This piece of code in the online_write_batch function in the Postgres online store pushes data in batches to the online store. I was wondering whether it makes more sense to put the conn.commit() i...
Opengraph URL: https://github.com/feast-dev/feast/issues/4036
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Discussion: Pushing batches of data to online store: Should `conn.commit()` happen in the for loop or after?","articleBody":"[This](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/online_stores/contrib/postgres.py#L80-L101) piece of code in the `online_write_batch` function in the Postgres online store pushes data in batches to the online store. \r\n\r\nI was wondering whether it makes more sense to put the `conn.commit()` inside the for loop, or after the for loop. I would love to hear the different trade-offs between the two options! \r\n\r\nCopy of the code snippet here: \r\n\r\n```python\r\nbatch_size = 5000\r\nfor i in range(0, len(insert_values), batch_size):\r\n cur_batch = insert_values[i : i + batch_size]\r\n execute_values(\r\n cur,\r\n sql.SQL(\r\n \"\"\"\r\n INSERT INTO {}\r\n (entity_key, feature_name, value, event_ts, created_ts)\r\n VALUES %s\r\n ON CONFLICT (entity_key, feature_name) DO\r\n UPDATE SET\r\n value = EXCLUDED.value,\r\n event_ts = EXCLUDED.event_ts,\r\n created_ts = EXCLUDED.created_ts;\r\n \"\"\",\r\n ).format(sql.Identifier(_table_id(project, table))),\r\n cur_batch,\r\n page_size=batch_size,\r\n )\r\n conn.commit() # \u003c\u003c This is the point of interest. Should this be placed in the loop or after? \r\n```","author":{"url":"https://github.com/job-almekinders","@type":"Person","name":"job-almekinders"},"datePublished":"2024-03-23T10:55:18.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":4},"url":"https://github.com/4036/feast/issues/4036"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:f3e5422e-c12b-4f13-063f-3572d963c169 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 9662:2A680A:94B3EE:D05C8C:696FB25D |
| html-safe-nonce | 50c1135686d803eb845a931267814c7fb6a32373ebdd7257e4491760f5e75c46 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5NjYyOjJBNjgwQTo5NEIzRUU6RDA1QzhDOjY5NkZCMjVEIiwidmlzaXRvcl9pZCI6IjY3MjMzNDIxMzMzNDQ4NDIzMzMiLCJyZWdpb25fZWRnZSI6ImlhZCIsInJlZ2lvbl9yZW5kZXIiOiJpYWQifQ== |
| visitor-hmac | d71be77b6aac49d747609566d7d2362edc022502c4bda8bd7871b2adb7d54136 |
| hovercard-subject-tag | issue:2203822584 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/feast-dev/feast/4036/issue_layout |
| twitter:image | https://opengraph.githubassets.com/84612062ccb916f270f3217b1317ff961d7f75c3c238d032c204e0f3db6ca087/feast-dev/feast/issues/4036 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/84612062ccb916f270f3217b1317ff961d7f75c3c238d032c204e0f3db6ca087/feast-dev/feast/issues/4036 |
| og:image:alt | This piece of code in the online_write_batch function in the Postgres online store pushes data in batches to the online store. I was wondering whether it makes more sense to put the conn.commit() i... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | job-almekinders |
| hostname | github.com |
| expected-hostname | github.com |
| None | d146dfd2c89f9048de9fd6d73ec4ffcf201cc2c89880fcf1c73ff73970d46e64 |
| turbo-cache-control | no-preview |
| go-import | github.com/feast-dev/feast git https://github.com/feast-dev/feast.git |
| octolytics-dimension-user_id | 57027613 |
| octolytics-dimension-user_login | feast-dev |
| octolytics-dimension-repository_id | 161133770 |
| octolytics-dimension-repository_nwo | feast-dev/feast |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 161133770 |
| octolytics-dimension-repository_network_root_nwo | feast-dev/feast |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 1866f0fdabd6ce28d22bf272fe23b56a9d475be6 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width