Title: Loading audio tensors fails: ValueError: all input arrays must have the same shape · Issue #1875 · docarray/docarray · GitHub
Open Graph Title: Loading audio tensors fails: ValueError: all input arrays must have the same shape · Issue #1875 · docarray/docarray
X Title: Loading audio tensors fails: ValueError: all input arrays must have the same shape · Issue #1875 · docarray/docarray
Description: Initial Checks I have read and followed the docs and still think this is a bug Description I have created subclips of a video in .mp4 using ffmpeg (through moviepy): # moviepy.video.io.ffmpeg_tools.ffmpeg_extract_subclip def ffmpeg_extra...
Open Graph Description: Initial Checks I have read and followed the docs and still think this is a bug Description I have created subclips of a video in .mp4 using ffmpeg (through moviepy): # moviepy.video.io.ffmpeg_tools...
X Description: Initial Checks I have read and followed the docs and still think this is a bug Description I have created subclips of a video in .mp4 using ffmpeg (through moviepy): # moviepy.video.io.ffmpeg_tools...
Opengraph URL: https://github.com/docarray/docarray/issues/1875
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Loading audio tensors fails: ValueError: all input arrays must have the same shape","articleBody":"### Initial Checks\r\n\r\n- [X] I have read and followed [the docs](https://docs.docarray.org/) and still think this is a bug\r\n\r\n### Description\r\n\r\nI have created subclips of a video in .mp4 using ffmpeg (through moviepy):\r\n```py\r\n# moviepy.video.io.ffmpeg_tools.ffmpeg_extract_subclip\r\n\r\ndef ffmpeg_extract_subclip(filename, t1, t2, targetname=None):\r\n \"\"\" Makes a new video file playing video file ``filename`` between\r\n the times ``t1`` and ``t2``. \"\"\"\r\n name, ext = os.path.splitext(filename)\r\n if not targetname:\r\n T1, T2 = [int(1000*t) for t in [t1, t2]]\r\n targetname = \"%sSUB%d_%d.%s\" % (name, T1, T2, ext)\r\n \r\n cmd = [get_setting(\"FFMPEG_BINARY\"),\"-y\",\r\n \"-ss\", \"%0.2f\"%t1,\r\n \"-i\", filename,\r\n \"-t\", \"%0.2f\"%(t2-t1),\r\n \"-map\", \"0\", \"-vcodec\", \"copy\", \"-acodec\", \"copy\", targetname]\r\n\r\n subprocess_call(cmd)\r\n```\r\n\r\nOutput:\r\n\u003cimg width=\"500\" alt=\"image\" src=\"https://github.com/docarray/docarray/assets/20426965/cf828df3-82e7-4eea-9e67-81838a1e8ecc\"\u003e\r\n\r\nThe subclip path is passed to `VideoUrl`:\r\n```py\r\nsubclip = VideoUrl(\"\u003csubclip_path\u003e\")\r\n```\r\n\r\nTrying to load the tensors fails:\r\n```py\r\ntensors = subclip.load()\r\n```\r\n```\r\n---------------------------------------------------------------------------\r\nValueError Traceback (most recent call last)\r\nCell In[21], line 1\r\n----\u003e 1 tensors = subclip.load()\r\n\r\nFile ~/Projects/chrisammon3000/experiments/docarray/docarray-test/.venv/lib/python3.11/site-packages/docarray/typing/url/video_url.py:96, in VideoUrl.load(self, **kwargs)\r\n 33 \"\"\"\r\n 34 Load the data from the url into a `NamedTuple` of\r\n 35 [`VideoNdArray`][docarray.typing.VideoNdArray],\r\n (...)\r\n 93 [`NdArray`][docarray.typing.NdArray] of the key frame indices.\r\n 94 \"\"\"\r\n 95 buffer = self.load_bytes(**kwargs)\r\n---\u003e 96 return buffer.load()\r\n\r\nFile ~/Projects/chrisammon3000/experiments/docarray/docarray-test/.venv/lib/python3.11/site-packages/docarray/typing/bytes/video_bytes.py:86, in VideoBytes.load(self, **kwargs)\r\n 84 audio = parse_obj_as(AudioNdArray, np.array(audio_frames))\r\n 85 else:\r\n---\u003e 86 audio = parse_obj_as(AudioNdArray, np.stack(audio_frames))\r\n 88 video = parse_obj_as(VideoNdArray, np.stack(video_frames))\r\n 89 indices = parse_obj_as(NdArray, keyframe_indices)\r\n\r\nFile ~/Projects/chrisammon3000/experiments/docarray/docarray-test/.venv/lib/python3.11/site-packages/numpy/core/shape_base.py:449, in stack(arrays, axis, out, dtype, casting)\r\n 447 shapes = {arr.shape for arr in arrays}\r\n 448 if len(shapes) != 1:\r\n--\u003e 449 raise ValueError('all input arrays must have the same shape')\r\n 451 result_ndim = arrays[0].ndim + 1\r\n 452 axis = normalize_axis_index(axis, result_ndim)\r\n\r\nValueError: all input arrays must have the same shape\r\n```\r\n\r\nStepping through the code shows that the first audio frame has a sample rate of 16:\r\n\u003cimg width=\"1000\" alt=\"image\" src=\"https://github.com/docarray/docarray/assets/20426965/26bad5b4-9077-4dd7-be5c-1bc2c6cb653a\"\u003e\r\n\r\nThe second and all subsequent frames have 1024 samples:\r\n\u003cimg width=\"1246\" alt=\"image\" src=\"https://github.com/docarray/docarray/assets/20426965/83e963b4-bafb-4a28-8fdc-6702f49d7106\"\u003e\r\n\r\nSo this results in arrays with different shapes for the audio.\r\n\r\nWhat Ive tried:\r\n- I have tried adjusting the options for ffmpeg like converting to AAC ad specifying audio channels and it does fix the problem, however it takes about 10 times longer to create the subclips. \r\n- Using a preprocessing step to pad the arrays before reading them into DocArray would require reading and writing each subclip again\r\n\r\nIf there is a way to handle the shape mismatch inside DocArray that would be great because it would let me create the subclips and model them as quickly as possible. It would need to be added to this block:\r\nhttps://github.com/docarray/docarray/blob/f71a5e6af58b77fdeb15ba27abd0b7d40b84fd09/docarray/typing/bytes/video_bytes.py#L83-L86\r\n\r\n### Example Code\r\n\r\n```Python\r\nimport os\r\nfrom pathlib import Path\r\nimport numpy as np\r\nfrom docarray.typing import VideoUrl\r\nfrom moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip\r\n\r\n\r\ndef generate_subclips(parent_path, video_id, video_uri, video_duration, duration=60):\r\n subclips_path = Path(parent_path) / \"subclips\"\r\n subclips_path.mkdir(exist_ok=True)\r\n\r\n start_times = np.arange(0, video_duration, duration)\r\n end_times = np.append(start_times[1:], video_duration)\r\n clip_times = list(zip(start_times, end_times))\r\n\r\n for start_time, end_time in clip_times:\r\n # filename should have start_end seconds as part of the name\r\n output_file_path = subclips_path / f\"{video_id}__{start_time}_{end_time}.{video_uri.suffix[1:]}\"\r\n ffmpeg_extract_subclip(video_uri, start_time, end_time, targetname=output_file_path)\r\n\r\n# Example usage\r\n# parent_path = 'path/to/parent/directory'\r\n# video_id = 'example_video_id'\r\n# video_uri = Path('path/to/video.mp4')\r\n# video_duration = 1200 # for example, 20 minutes\r\n# generate_subclips(parent_path, video_id, video_uri, video_duration, duration=60)\r\n\r\ndef sort_key(path):\r\n \"\"\"Sorts by the start time in the subclip file name\r\n For example: Fu7YkoRWKB8_Y__0_60.mp4 will sort by `0`\r\n \"\"\"\r\n # Extract the integer after \"__\" from the filename\r\n return int(path.stem.split('__')[1].split('_')[0])\r\n\r\nsubclips_dir = Path(os.getcwd()).parent / \"subclips\"\r\n\r\n# create subclips\r\ngenerate_subclips(subclips_dir, \u003cvideo_id\u003e, \u003cvideo_uri\u003e, \u003cvideo_duration\u003e, duration=60)\r\nsubclips_paths = sorted(subclips_dir.iterdir(), key=sort_key)\r\nvideo_urls = [VideoUrl(f\"{str(subclip)}\") for subclip in subclips_paths]\r\n\r\n# load tensors\r\n# the first subclip might work...\r\nsubclip0 = VideoUrl(str(subclips_paths[0]))\r\nsubclip0_tensors = subclip.load()\r\n\r\n# but the second and other subclips throw the shape mismatch error\r\nsubclip1 = VideoUrl(str(subclips_paths[1]))\r\nsubclip1_tensors = subclip.load()\r\n```\r\n\r\n\r\n### Python, DocArray \u0026 OS Version\r\n\r\n```Text\r\n0.40.0\r\n```\r\n\r\n\r\n### Affected Components\r\n\r\n- [ ] [Vector Database / Index](https://docs.docarray.org/user_guide/storing/docindex/)\r\n- [X] [Representing](https://docs.docarray.org/user_guide/representing/first_step)\r\n- [ ] [Sending](https://docs.docarray.org/user_guide/sending/first_step/)\r\n- [ ] [storing](https://docs.docarray.org/user_guide/storing/first_step/)\r\n- [ ] [multi modal data type](https://docs.docarray.org/data_types/first_steps/)","author":{"url":"https://github.com/chrisammon3000","@type":"Person","name":"chrisammon3000"},"datePublished":"2024-03-06T20:13:45.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":4},"url":"https://github.com/1875/docarray/issues/1875"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:b2150a24-c5a1-9aab-1854-29c66d4bf22f |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 95F6:1B7AB8:12111DA:1790E69:6982B13D |
| html-safe-nonce | ee2bd1ff26501646cd538869cd72d77b97ad9985dba4b59632346d673f6fc02f |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5NUY2OjFCN0FCODoxMjExMURBOjE3OTBFNjk6Njk4MkIxM0QiLCJ2aXNpdG9yX2lkIjoiMzIzNjU5MzMzMDYxMTcyODcwMiIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 04706cfa1ec3cb0c2f9965c2e2d39a947b4124fa7e0d93c51bac17e696f10708 |
| hovercard-subject-tag | issue:2172345212 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/docarray/docarray/1875/issue_layout |
| twitter:image | https://opengraph.githubassets.com/5d039dae6c8455a9fecb48ce550e6b0f4f5bb1b093d0a4c382e06267eb2c4a0d/docarray/docarray/issues/1875 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/5d039dae6c8455a9fecb48ce550e6b0f4f5bb1b093d0a4c382e06267eb2c4a0d/docarray/docarray/issues/1875 |
| og:image:alt | Initial Checks I have read and followed the docs and still think this is a bug Description I have created subclips of a video in .mp4 using ffmpeg (through moviepy): # moviepy.video.io.ffmpeg_tools... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | chrisammon3000 |
| hostname | github.com |
| expected-hostname | github.com |
| None | c6741528aa4d4ba81c57bbff8f8cc0de0e9115cb2993431dc9dac8d489f7b4ee |
| turbo-cache-control | no-preview |
| go-import | github.com/docarray/docarray git https://github.com/docarray/docarray.git |
| octolytics-dimension-user_id | 117445116 |
| octolytics-dimension-user_login | docarray |
| octolytics-dimension-repository_id | 438303578 |
| octolytics-dimension-repository_nwo | docarray/docarray |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 438303578 |
| octolytics-dimension-repository_network_root_nwo | docarray/docarray |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | c547e382acfeb1148025e943f9b5dc5a5e306b0e |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width