Title: index.find() tries to reshape and fails · Issue #1822 · docarray/docarray · GitHub
Open Graph Title: index.find() tries to reshape and fails · Issue #1822 · docarray/docarray
X Title: index.find() tries to reshape and fails · Issue #1822 · docarray/docarray
Description: Initial Checks I have read and followed the docs and still think this is a bug Description Apologies the title of this is not the best. I have a very odd case and can't seem to understand what is causing it. I have also failed at recreat...
Open Graph Description: Initial Checks I have read and followed the docs and still think this is a bug Description Apologies the title of this is not the best. I have a very odd case and can't seem to understand what is c...
X Description: Initial Checks I have read and followed the docs and still think this is a bug Description Apologies the title of this is not the best. I have a very odd case and can't seem to understand what ...
Opengraph URL: https://github.com/docarray/docarray/issues/1822
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"index.find() tries to reshape and fails","articleBody":"### Initial Checks\r\n\r\n- [X] I have read and followed [the docs](https://docs.docarray.org/) and still think this is a bug\r\n\r\n### Description\r\n\r\nApologies the title of this is not the best. I have a very odd case and can't seem to understand what is causing it. I have also failed at recreating the issue in a simpler example.\r\n\r\nI have a Doc List where each document has been built with the same process however the data is obviously different for each doc. I am using the hnswlib backend.\r\n\r\nThe issue I have is after I built the doc list with no issues I then try to run a .find() on the individual elements of the doc list, some of which fail and some don't. The error I get on some of these can be seen in the traceback below.\r\n\r\nCode Snippet:\r\n```python\r\nclass AddressDoc(BaseDoc):\r\n ELID: int\r\n FULL_ADDRESS: str\r\n EMBEDDINGS: NdArray[768]\r\n\r\ndef build_doc_list(data):\r\n st = time.time()\r\n dl = DocList[AddressDoc](\r\n AddressDoc(\r\n ELID=0000000,\r\n FULL_ADDRESS=\"\",\r\n EMBEDDINGS=d[\"EMBEDDINGS\"],\r\n )\r\n for d in data\r\n )\r\n logger.info(f\"Doc list created... {time.time()-st}\")\r\n return dl\r\n\r\ndoc_index = HnswDocumentIndex[AddressDoc](work_dir=db_path)\r\ndl = build_doc_list(data)\r\n\r\n# This works!\r\nresults = doc_index.find(dl[2], search_field=\"EMBEDDINGS\", limit=1)\r\n\r\n# This doesn't!\r\nresults = doc_index.find(dl[3], search_field=\"EMBEDDINGS\", limit=1)\r\n\r\ntype(dl[2].EMBEDDINGS) == type(dl[3].EMBEDDINGS) # returns True\r\ntype(dl[2].EMBEDDINGS.shape) == type(dl[3].EMBEDDINGS.shape) # returns True\r\n\r\n```\r\nI have compared dl[2] and dl[3] left right and center and can't understand what the issue is. The embeddings array in both documents are the same shape which I have checked with numpy (.shape, .ndims, .size). I can't understand what the difference is between the two that causes the error below.\r\n\r\n\r\nTraceback below:\r\n```\r\nFile /usr/local/lib/python3.11/site-packages/docarray/index/abstract.py:503, in BaseDocIndex.find(self, query, search_field, limit, **kwargs)\r\n [501](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=500) query_vec = query\r\n [502](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=501) query_vec_np = self._to_numpy(query_vec)\r\n--\u003e [503](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=502) docs, scores = self._find(\r\n [504](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=503) query_vec_np, search_field=search_field, limit=limit, **kwargs\r\n [505](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=504) )\r\n [507](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=506) if isinstance(docs, List) and not isinstance(docs, DocList):\r\n [508](file:///usr/local/lib/python3.11/site-packages/docarray/index/abstract.py?line=507) docs = self._dict_list_to_docarray(docs)\r\n\r\nFile /usr/local/lib/python3.11/site-packages/docarray/index/backends/hnswlib.py:328, in HnswDocumentIndex._find(self, query, limit, search_field)\r\n [324](file:///usr/local/lib/python3.11/site-packages/docarray/index/backends/hnswlib.py?line=323) def _find(\r\n...\r\n--\u003e [197](file:///usr/local/lib/python3.11/site-packages/docarray/typing/tensor/ndarray.py?line=196) return cls._docarray_from_native(x.reshape(source.shape))\r\n [198](file:///usr/local/lib/python3.11/site-packages/docarray/typing/tensor/ndarray.py?line=197) elif len(source.shape) \u003e 0:\r\n [199](file:///usr/local/lib/python3.11/site-packages/docarray/typing/tensor/ndarray.py?line=198) return cls._docarray_from_native(np.zeros(source.shape))\r\n\r\nValueError: cannot reshape array of size 768 into shape (768,768)\r\n```\r\n\r\n\r\n### Example Code\r\n\r\n_No response_\r\n\r\n### Python, DocArray \u0026 OS Version\r\n\r\n```Text\r\n0.39.0\r\n```\r\n\r\n\r\n### Affected Components\r\n\r\n- [X] [Vector Database / Index](https://docs.docarray.org/user_guide/storing/docindex/)\r\n- [ ] [Representing](https://docs.docarray.org/user_guide/representing/first_step)\r\n- [ ] [Sending](https://docs.docarray.org/user_guide/sending/first_step/)\r\n- [ ] [storing](https://docs.docarray.org/user_guide/storing/first_step/)\r\n- [ ] [multi modal data type](https://docs.docarray.org/data_types/first_steps/)","author":{"url":"https://github.com/nikhilmakan02","@type":"Person","name":"nikhilmakan02"},"datePublished":"2023-10-12T05:28:01.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":10},"url":"https://github.com/1822/docarray/issues/1822"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:b435bc04-33d8-1e0c-f560-1d33bd83bc6a |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | C7C8:3D21D:1B2ABB6:254E4A9:6964BFC7 |
| html-safe-nonce | ee9d0105f54cf3dee73d4d18a8dbe661d9dd0d6d30259d359af86b83e1446254 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJDN0M4OjNEMjFEOjFCMkFCQjY6MjU0RTRBOTo2OTY0QkZDNyIsInZpc2l0b3JfaWQiOiI0OTUyMjY2MDk5MTU1ODQ1MDYzIiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0= |
| visitor-hmac | a6e1621fa46925d2e1e2bc65dce57eb3f5c575641245aaf4bbcb4500f591730c |
| hovercard-subject-tag | issue:1939203142 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/docarray/docarray/1822/issue_layout |
| twitter:image | https://opengraph.githubassets.com/268cc57c0b9daa4b575616b4968d11ebfa1fec97927cbd6215f50fca1b16f7ac/docarray/docarray/issues/1822 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/268cc57c0b9daa4b575616b4968d11ebfa1fec97927cbd6215f50fca1b16f7ac/docarray/docarray/issues/1822 |
| og:image:alt | Initial Checks I have read and followed the docs and still think this is a bug Description Apologies the title of this is not the best. I have a very odd case and can't seem to understand what is c... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | nikhilmakan02 |
| hostname | github.com |
| expected-hostname | github.com |
| None | 21df671ce2c9f1a16940ccbd3af6cb4f3f12a856929ca7eb1b4aea8e384ea442 |
| turbo-cache-control | no-preview |
| go-import | github.com/docarray/docarray git https://github.com/docarray/docarray.git |
| octolytics-dimension-user_id | 117445116 |
| octolytics-dimension-user_login | docarray |
| octolytics-dimension-repository_id | 438303578 |
| octolytics-dimension-repository_nwo | docarray/docarray |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 438303578 |
| octolytics-dimension-repository_network_root_nwo | docarray/docarray |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 5707c685ac172d50a0bdd7533dde4f8aabcf8eef |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width