| route-pattern | /_view_fragments/voltron/pull_requests/show/:user_id/:repository/:id/pull_request_layout(.:format) |
| route-controller | voltron_pull_requests_fragments |
| route-action | pull_request_layout |
| fetch-nonce | v2:b9653658-9fdf-112e-8089-889a140b69e8 |
| current-catalog-service-hash | ae870bc5e265a340912cde392f23dad3671a0a881730ffdadd82f2f57d81641b |
| request-id | B654:D9BB8:2811AF3:356F6E0:6992E391 |
| html-safe-nonce | eb758a0cd0ef3bea30d4b0e2005e065f957c61f8e9816730f8219f348df189fe |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCNjU0OkQ5QkI4OjI4MTFBRjM6MzU2RjZFMDo2OTkyRTM5MSIsInZpc2l0b3JfaWQiOiI2OTY4MjYzOTMxODkxNzM3NDg5IiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0= |
| visitor-hmac | e898e54c2d96952d26810aa8eecc9cc1b1ff6800c887fedd656fe632e53b06d3 |
| hovercard-subject-tag | pull_request:1377064656 |
| github-keyboard-shortcuts | repository,pull-request-list,pull-request-conversation,pull-request-files-changed,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | ///voltron/pull_requests_fragments/pull_request_layout |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/voltron/pull_requests/show/ggml-org/llama.cpp/1684/pull_request_layout |
| twitter:image | https://opengraph.githubassets.com/79b184c3bf64df24abea9e326abf9e384ebcb580c1fda3ff8e0c20e638fe514e/ggml-org/llama.cpp/pull/1684 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/79b184c3bf64df24abea9e326abf9e384ebcb580c1fda3ff8e0c20e638fe514e/ggml-org/llama.cpp/pull/1684 |
| og:image:alt | What
This PR adds a series of 2-6 bit quantization methods, along with quantization mixes, as proposed in #1240 and #1256. Scalar, AVX2, ARM_NEON, and CUDA implementations are provided.
Why
This is... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | ikawrakow |
| hostname | github.com |
| expected-hostname | github.com |
| None | 42c603b9d642c4a9065a51770f75e5e27132fef0e858607f5c9cb7e422831a7b |
| turbo-cache-control | no-cache |
| go-import | github.com/ggml-org/llama.cpp git https://github.com/ggml-org/llama.cpp.git |
| octolytics-dimension-user_id | 134263123 |
| octolytics-dimension-user_login | ggml-org |
| octolytics-dimension-repository_id | 612354784 |
| octolytics-dimension-repository_nwo | ggml-org/llama.cpp |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 612354784 |
| octolytics-dimension-repository_network_root_nwo | ggml-org/llama.cpp |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 84dcb133269e3cfe6e0296cc85fbacb92cae92bb |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
| Skip to content | https://github.com/ggml-org/llama.cpp/pull/1684#start-of-content |
|
| https://github.com/ |
|
Sign in
| https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684 |
| GitHub CopilotWrite better code with AI | https://github.com/features/copilot |
| GitHub SparkBuild and deploy intelligent apps | https://github.com/features/spark |
| GitHub ModelsManage and compare prompts | https://github.com/features/models |
| MCP RegistryNewIntegrate external tools | https://github.com/mcp |
| ActionsAutomate any workflow | https://github.com/features/actions |
| CodespacesInstant dev environments | https://github.com/features/codespaces |
| IssuesPlan and track work | https://github.com/features/issues |
| Code ReviewManage code changes | https://github.com/features/code-review |
| GitHub Advanced SecurityFind and fix vulnerabilities | https://github.com/security/advanced-security |
| Code securitySecure your code as you build | https://github.com/security/advanced-security/code-security |
| Secret protectionStop leaks before they start | https://github.com/security/advanced-security/secret-protection |
| Why GitHub | https://github.com/why-github |
| Documentation | https://docs.github.com |
| Blog | https://github.blog |
| Changelog | https://github.blog/changelog |
| Marketplace | https://github.com/marketplace |
| View all features | https://github.com/features |
| Enterprises | https://github.com/enterprise |
| Small and medium teams | https://github.com/team |
| Startups | https://github.com/enterprise/startups |
| Nonprofits | https://github.com/solutions/industry/nonprofits |
| App Modernization | https://github.com/solutions/use-case/app-modernization |
| DevSecOps | https://github.com/solutions/use-case/devsecops |
| DevOps | https://github.com/solutions/use-case/devops |
| CI/CD | https://github.com/solutions/use-case/ci-cd |
| View all use cases | https://github.com/solutions/use-case |
| Healthcare | https://github.com/solutions/industry/healthcare |
| Financial services | https://github.com/solutions/industry/financial-services |
| Manufacturing | https://github.com/solutions/industry/manufacturing |
| Government | https://github.com/solutions/industry/government |
| View all industries | https://github.com/solutions/industry |
| View all solutions | https://github.com/solutions |
| AI | https://github.com/resources/articles?topic=ai |
| Software Development | https://github.com/resources/articles?topic=software-development |
| DevOps | https://github.com/resources/articles?topic=devops |
| Security | https://github.com/resources/articles?topic=security |
| View all topics | https://github.com/resources/articles |
| Customer stories | https://github.com/customer-stories |
| Events & webinars | https://github.com/resources/events |
| Ebooks & reports | https://github.com/resources/whitepapers |
| Business insights | https://github.com/solutions/executive-insights |
| GitHub Skills | https://skills.github.com |
| Documentation | https://docs.github.com |
| Customer support | https://support.github.com |
| Community forum | https://github.com/orgs/community/discussions |
| Trust center | https://github.com/trust-center |
| Partners | https://github.com/partners |
| GitHub SponsorsFund open source developers | https://github.com/sponsors |
| Security Lab | https://securitylab.github.com |
| Maintainer Community | https://maintainers.github.com |
| Accelerator | https://github.com/accelerator |
| Archive Program | https://archiveprogram.github.com |
| Topics | https://github.com/topics |
| Trending | https://github.com/trending |
| Collections | https://github.com/collections |
| Enterprise platformAI-powered developer platform | https://github.com/enterprise |
| GitHub Advanced SecurityEnterprise-grade security features | https://github.com/security/advanced-security |
| Copilot for BusinessEnterprise-grade AI features | https://github.com/features/copilot/copilot-business |
| Premium SupportEnterprise-grade 24/7 support | https://github.com/premium-support |
| Pricing | https://github.com/pricing |
| Search syntax tips | https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax |
| documentation | https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax |
|
Sign in
| https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684 |
|
Sign up
| https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fpull_requests_fragments%2Fpull_request_layout&source=header-repo&source_repo=ggml-org%2Fllama.cpp |
| Reload | https://github.com/ggml-org/llama.cpp/pull/1684 |
| Reload | https://github.com/ggml-org/llama.cpp/pull/1684 |
| Reload | https://github.com/ggml-org/llama.cpp/pull/1684 |
|
ggml-org
| https://github.com/ggml-org |
| llama.cpp | https://github.com/ggml-org/llama.cpp |
|
Notifications
| https://github.com/login?return_to=%2Fggml-org%2Fllama.cpp |
|
Fork
14.9k
| https://github.com/login?return_to=%2Fggml-org%2Fllama.cpp |
|
Star
95.1k
| https://github.com/login?return_to=%2Fggml-org%2Fllama.cpp |
|
Code
| https://github.com/ggml-org/llama.cpp |
|
Issues
406
| https://github.com/ggml-org/llama.cpp/issues |
|
Pull requests
719
| https://github.com/ggml-org/llama.cpp/pulls |
|
Discussions
| https://github.com/ggml-org/llama.cpp/discussions |
|
Actions
| https://github.com/ggml-org/llama.cpp/actions |
|
Projects
1
| https://github.com/ggml-org/llama.cpp/projects |
|
Wiki
| https://github.com/ggml-org/llama.cpp/wiki |
|
Security
10
| https://github.com/ggml-org/llama.cpp/security |
|
Insights
| https://github.com/ggml-org/llama.cpp/pulse |
|
Code
| https://github.com/ggml-org/llama.cpp |
|
Issues
| https://github.com/ggml-org/llama.cpp/issues |
|
Pull requests
| https://github.com/ggml-org/llama.cpp/pulls |
|
Discussions
| https://github.com/ggml-org/llama.cpp/discussions |
|
Actions
| https://github.com/ggml-org/llama.cpp/actions |
|
Projects
| https://github.com/ggml-org/llama.cpp/projects |
|
Wiki
| https://github.com/ggml-org/llama.cpp/wiki |
|
Security
| https://github.com/ggml-org/llama.cpp/security |
|
Insights
| https://github.com/ggml-org/llama.cpp/pulse |
| ggerganov | https://github.com/ggerganov |
| master | https://github.com/ggml-org/llama.cpp/tree/master |
| ik/k_quants | https://github.com/ggml-org/llama.cpp/tree/ik/k_quants |
| Conversation | https://github.com/ggml-org/llama.cpp/pull/1684 |
| Commits32 (32) | https://github.com/ggml-org/llama.cpp/pull/1684/commits |
| Checks | https://github.com/ggml-org/llama.cpp/pull/1684/checks |
| Files changed | https://github.com/ggml-org/llama.cpp/pull/1684/files |
| k-quants | https://github.com/ggml-org/llama.cpp/pull/1684#top |
| ggerganov | https://github.com/ggerganov |
| master | https://github.com/ggml-org/llama.cpp/tree/master |
| ik/k_quants | https://github.com/ggml-org/llama.cpp/tree/ik/k_quants |
|
| https://github.com/ikawrakow |
| ikawrakow | https://github.com/ikawrakow |
| Jun 3, 2023 | https://github.com/ggml-org/llama.cpp/pull/1684#issue-1739619305 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| #1240 | https://github.com/ggml-org/llama.cpp/issues/1240 |
| #1256 | https://github.com/ggml-org/llama.cpp/issues/1256 |
| https://private-user-images.githubusercontent.com/48489457/243093269-07aa49f0-4951-407f-9789-0b5a01ce95b8.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzEyMzQ0OTMsIm5iZiI6MTc3MTIzNDE5MywicGF0aCI6Ii80ODQ4OTQ1Ny8yNDMwOTMyNjktMDdhYTQ5ZjAtNDk1MS00MDdmLTk3ODktMGI1YTAxY2U5NWI4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMjE2VDA5Mjk1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ0N2VlMDQ2ZTYxNDIzOTVjMTdmNjhmNDNjMmFlMTY0NmYyYzZkN2E0NzVkMGUyZjE2ODI3MDBjNWQ4ODM5NDcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.kIgAi_ZzqVgEgImmBvGY8HJXlC1LqgZlDnsYdwni_ms |
| this table on the main page | https://github.com/ggerganov/llama.cpp#quantization |
| https://private-user-images.githubusercontent.com/48489457/243100011-365b503c-086a-4f41-8a7a-3c0957f75219.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzEyMzQ0OTMsIm5iZiI6MTc3MTIzNDE5MywicGF0aCI6Ii80ODQ4OTQ1Ny8yNDMxMDAwMTEtMzY1YjUwM2MtMDg2YS00ZjQxLThhN2EtM2MwOTU3Zjc1MjE5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMjE2VDA5Mjk1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTI0ZGU5ZTRiYTIwNDQ3ZGU5NGM4OTY2MTE1MTg1YWM3MzkwMDFjMjE0NzM5Y2M2MWMwZDRkZGU3Yzk3OGUzMWImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.N3qzMTH4BSb7TosQPzxa9gt4_Dcrng8n3NcqpcpFneE |
| @ggerganov | https://github.com/ggerganov |
| main page | https://github.com/ggerganov/llama.cpp#quantization |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| June 3, 2023 14:43 | https://github.com/ggml-org/llama.cpp/pull/1684#commits-pushed-8673a41 |
| Starting to add k-quantization to ggml | https://github.com/ggml-org/llama.cpp/pull/1684/commits/8673a41385048f2b1089b8fc29df1e1020ab683a |
| 8673a41 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/8673a41385048f2b1089b8fc29df1e1020ab683a |
| Adding Q3_K and Q8_K (de)-quantization | https://github.com/ggml-org/llama.cpp/pull/1684/commits/b4f71347ff51a90c842d6240f9a8628d03c3d2ac |
| b4f7134 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/b4f71347ff51a90c842d6240f9a8628d03c3d2ac |
| Q3_K now working on CUDA and AVX2/scalar | https://github.com/ggml-org/llama.cpp/pull/1684/commits/c93cce3a450f23c1678135929caf0f177052f132 |
| c93cce3 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/c93cce3a450f23c1678135929caf0f177052f132 |
| Some improvement for Q3_K on CUDA | https://github.com/ggml-org/llama.cpp/pull/1684/commits/a3c06730890f7a2c3521724131db458865f99d05 |
| a3c0673 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/a3c06730890f7a2c3521724131db458865f99d05 |
| Some more CUDA optimizations for Q3_K | https://github.com/ggml-org/llama.cpp/pull/1684/commits/3d8b1de3f765ecc50df608db2e0e95870e8cf9b2 |
| 3d8b1de | https://github.com/ggml-org/llama.cpp/pull/1684/commits/3d8b1de3f765ecc50df608db2e0e95870e8cf9b2 |
| Adding Q4_K - scalar, AVX2, CUDA | https://github.com/ggml-org/llama.cpp/pull/1684/commits/a0b8e9f3c90e482dbe0ca82f45f585de24f1ba67 |
| a0b8e9f | https://github.com/ggml-org/llama.cpp/pull/1684/commits/a0b8e9f3c90e482dbe0ca82f45f585de24f1ba67 |
| Adding Q6_K - scalar, AVX2, CUDA | https://github.com/ggml-org/llama.cpp/pull/1684/commits/cf221afb555a945be4d1e4153e38808a9d21a4cb |
| cf221af | https://github.com/ggml-org/llama.cpp/pull/1684/commits/cf221afb555a945be4d1e4153e38808a9d21a4cb |
| Adding Q5_K - scalar, AVX2, CUDA | https://github.com/ggml-org/llama.cpp/pull/1684/commits/b835d0f49f2e06e1ee918e799cd1316073ee8db7 |
| b835d0f | https://github.com/ggml-org/llama.cpp/pull/1684/commits/b835d0f49f2e06e1ee918e799cd1316073ee8db7 |
| Per convention, all QX_K quantizations use Q5_K for output.weight | https://github.com/ggml-org/llama.cpp/pull/1684/commits/5c5191ab68f28bc24ae26303d73a7ad08015880a |
| 5c5191a | https://github.com/ggml-org/llama.cpp/pull/1684/commits/5c5191ab68f28bc24ae26303d73a7ad08015880a |
| Adding quantization mixes | https://github.com/ggml-org/llama.cpp/pull/1684/commits/d537b97cb812e896b0319d69838fa1adb8e48585 |
| d537b97 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/d537b97cb812e896b0319d69838fa1adb8e48585 |
| Quantization mixes: didn't quite get what I wanted in the last commit | https://github.com/ggml-org/llama.cpp/pull/1684/commits/54f808db2bae036a370a4b990e8fabe8aa8aced0 |
| 54f808d | https://github.com/ggml-org/llama.cpp/pull/1684/commits/54f808db2bae036a370a4b990e8fabe8aa8aced0 |
| Q4_K dot product for ARM_NEON | https://github.com/ggml-org/llama.cpp/pull/1684/commits/a2533a72a3d9032a8e45b38e9e50d190be242cfe |
| a2533a7 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/a2533a72a3d9032a8e45b38e9e50d190be242cfe |
| Q6_K dot product for ARM_NEON | https://github.com/ggml-org/llama.cpp/pull/1684/commits/5ca15ce1551728495e2a8dc01fe40b51292421ad |
| 5ca15ce | https://github.com/ggml-org/llama.cpp/pull/1684/commits/5ca15ce1551728495e2a8dc01fe40b51292421ad |
| Q5_K dot product for ARM_NEON | https://github.com/ggml-org/llama.cpp/pull/1684/commits/a197eb50d1f5739a76cedd2e824cd30e46bcfcad |
| a197eb5 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/a197eb50d1f5739a76cedd2e824cd30e46bcfcad |
| Adding Q3_K dot for ARM_NEON | https://github.com/ggml-org/llama.cpp/pull/1684/commits/13264fa067e200fe891977d48862ef610ad24daa |
| 13264fa | https://github.com/ggml-org/llama.cpp/pull/1684/commits/13264fa067e200fe891977d48862ef610ad24daa |
| A very slightly faster ARM_NEON Q3_K dot | https://github.com/ggml-org/llama.cpp/pull/1684/commits/4faa040c20e2f92d2c7e44cf24146400200b89fa |
| 4faa040 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/4faa040c20e2f92d2c7e44cf24146400200b89fa |
| Adding Q2_K - just CUDA for now | https://github.com/ggml-org/llama.cpp/pull/1684/commits/b439efb7129c5f2eca243116c158d2a056322273 |
| b439efb | https://github.com/ggml-org/llama.cpp/pull/1684/commits/b439efb7129c5f2eca243116c158d2a056322273 |
| Adding scalar and AVX2 Q2_K dot | https://github.com/ggml-org/llama.cpp/pull/1684/commits/8516fdf728d90462c48982fd5e8f56dad07aa823 |
| 8516fdf | https://github.com/ggml-org/llama.cpp/pull/1684/commits/8516fdf728d90462c48982fd5e8f56dad07aa823 |
| Adding ARM_NEON Q2_K dot | https://github.com/ggml-org/llama.cpp/pull/1684/commits/6ec70579cb266fbac560bac8dc053a176cab381c |
| 6ec7057 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/6ec70579cb266fbac560bac8dc053a176cab381c |
| A slightly faster ARM_NEON Q2_K dot | https://github.com/ggml-org/llama.cpp/pull/1684/commits/7bcc37676ad08c8574b1a8afc4d6c7ac56a86c5d |
| 7bcc376 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/7bcc37676ad08c8574b1a8afc4d6c7ac56a86c5d |
| Fixed bug in Q2_K CUDA dot product kernel | https://github.com/ggml-org/llama.cpp/pull/1684/commits/e51ce72e03fe487f5b1d614287a6724559882afe |
| e51ce72 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/e51ce72e03fe487f5b1d614287a6724559882afe |
| Don't print zeros/NaNs when no count histogram has been collected | https://github.com/ggml-org/llama.cpp/pull/1684/commits/c5959d53ffa1ce52c69ec983ad40a570af559551 |
| c5959d5 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/c5959d53ffa1ce52c69ec983ad40a570af559551 |
| A 10% faster CUDA vector dot kernel for Q3_K | https://github.com/ggml-org/llama.cpp/pull/1684/commits/9a9c5a0c80ea0a279a56214cc10a1578f51bb672 |
| 9a9c5a0 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/9a9c5a0c80ea0a279a56214cc10a1578f51bb672 |
| A slightly daster Q4_K AVX2 dot product | https://github.com/ggml-org/llama.cpp/pull/1684/commits/894210a3519eb39148189ea7a4094aa076bee2d7 |
| 894210a | https://github.com/ggml-org/llama.cpp/pull/1684/commits/894210a3519eb39148189ea7a4094aa076bee2d7 |
| A slightly faster ARM_NEON A4_K dot product | https://github.com/ggml-org/llama.cpp/pull/1684/commits/abd99a89a780843cd803fa75d318f98370601432 |
| abd99a8 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/abd99a89a780843cd803fa75d318f98370601432 |
| Minor | https://github.com/ggml-org/llama.cpp/pull/1684/commits/8f5d42db9b4a5f9b86ffc503d510fb6a5ce06434 |
| 8f5d42d | https://github.com/ggml-org/llama.cpp/pull/1684/commits/8f5d42db9b4a5f9b86ffc503d510fb6a5ce06434 |
| https://github.com/ikawrakow |
| ikawrakow | https://github.com/ikawrakow |
| ggerganov | https://github.com/ggerganov |
| June 3, 2023 15:24 | https://github.com/ggml-org/llama.cpp/pull/1684#event-9421648120 |
| https://github.com/apps/github-actions |
| Sign in to view | https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| Fix quantization error test | https://github.com/ggml-org/llama.cpp/pull/1684/commits/6ef13823b81ce9d0f15e58e947d25a82dad83fd3 |
| 6ef1382 | https://github.com/ggml-org/llama.cpp/pull/1684/commits/6ef13823b81ce9d0f15e58e947d25a82dad83fd3 |
| https://github.com/apps/github-actions |
| Sign in to view | https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/mofosyne |
| mofosyne | https://github.com/mofosyne |
|
Tensor Encoding Scheme
| https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Tensor%20Encoding%20Scheme%22 |
|
Review Complexity : High
| https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Review%20Complexity%20%3A%20High%22 |
| May 25, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#event-12930819627 |
| https://github.com/Seedmanc |
| Seedmanc | https://github.com/Seedmanc |
| May 29, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2137025238 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/mofosyne |
| mofosyne | https://github.com/mofosyne |
|
May 29, 2024
| https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-2319448482 |
|
Performance improvements on Arm for legacy and k-quants
mozilla-ai/llamafile#453
| https://github.com/mozilla-ai/llamafile/pull/453 |
| https://github.com/kaizizzzzzz |
| kaizizzzzzz | https://github.com/kaizizzzzzz |
| Jul 30, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2257278836 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/Green-Sky |
| Green-Sky | https://github.com/Green-Sky |
| Jul 30, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2257773570 |
| @kaizizzzzzz | https://github.com/kaizizzzzzz |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/kaizizzzzzz |
| kaizizzzzzz | https://github.com/kaizizzzzzz |
| Jul 30, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2258513294 |
| @Green-Sky | https://github.com/Green-Sky |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/kaizizzzzzz |
| kaizizzzzzz | https://github.com/kaizizzzzzz |
| Jul 31, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2259565236 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
|
A Visual Guide to Quantization
guevara/read-it-later#11678
| https://github.com/guevara/read-it-later/issues/11678 |
|
A Visual Guide to Quantization
guevara/read-it-later#11692
| https://github.com/guevara/read-it-later/issues/11692 |
| https://github.com/asomoza |
| asomoza | https://github.com/asomoza |
|
Aug 19, 2024
| https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-2462431761 |
|
NF4 Flux params in diffusers
huggingface/diffusers#9165
| https://github.com/huggingface/diffusers/issues/9165 |
| https://github.com/fedric95 |
| fedric95 | https://github.com/fedric95 |
| Sep 8, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2336822836 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/BarfingLemurs |
| BarfingLemurs | https://github.com/BarfingLemurs |
|
Sep 26, 2024
| https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-2549262713 |
|
about precision loss
microsoft/T-MAC#52
| https://github.com/microsoft/T-MAC/issues/52 |
| https://github.com/HAOYON-666 |
| HAOYON-666 | https://github.com/HAOYON-666 |
| Sep 29, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2381062705 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/HAOYON-666 |
| HAOYON-666 | https://github.com/HAOYON-666 |
| Sep 29, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2381062870 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/QingtaoLi1 |
| QingtaoLi1 | https://github.com/QingtaoLi1 |
|
Nov 5, 2024
| https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-2634781604 |
|
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method
#10181
| https://github.com/ggml-org/llama.cpp/pull/10181 |
| https://github.com/SamuelHafner |
| SamuelHafner | https://github.com/SamuelHafner |
| Nov 13, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2473738608 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/Green-Sky |
| Green-Sky | https://github.com/Green-Sky |
| Nov 13, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2474094689 |
| @SamuelHafner | https://github.com/SamuelHafner |
| @ikawrakow | https://github.com/ikawrakow |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/SamuelHafner |
| SamuelHafner | https://github.com/SamuelHafner |
| Nov 13, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2474235194 |
| @Green-Sky | https://github.com/Green-Sky |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/ikawrakow |
| ikawrakow | https://github.com/ikawrakow |
| Nov 13, 2024 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2474462323 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/Bearsaerker |
| Bearsaerker | https://github.com/Bearsaerker |
|
Mar 12, 2025
| https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-2913690728 |
|
Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache.
#12352
| https://github.com/ggml-org/llama.cpp/issues/12352 |
| https://github.com/jiafatom |
| jiafatom | https://github.com/jiafatom |
|
Apr 15, 2025
| https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-2986449850 |
|
k quant
intel/neural-compressor#2169
| https://github.com/intel/neural-compressor/pull/2169 |
| https://github.com/lgyStoic |
| lgyStoic | https://github.com/lgyStoic |
| Apr 19, 2025 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816490255 |
| #1256 | https://github.com/ggml-org/llama.cpp/issues/1256 |
| @ikawrakow | https://github.com/ikawrakow |
| https://private-user-images.githubusercontent.com/4526646/435352222-4bb05eb0-f8ed-46c4-8b1f-251de1a77599.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzEyMzQ0OTQsIm5iZiI6MTc3MTIzNDE5NCwicGF0aCI6Ii80NTI2NjQ2LzQzNTM1MjIyMi00YmIwNWViMC1mOGVkLTQ2YzQtOGIxZi0yNTFkZTFhNzc1OTkucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI2MDIxNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNjAyMTZUMDkyOTU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YzA2MjA2Nzk3MzI2ODliYTliYjM3M2Y1MjQ1NGQ1ZDQ5MjhjNWM2YzZiODMyMTg4NzBlZjQ1NzkxY2VjMzQ2NCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.tSgh3UijWK2I3Veu3_b2ulHO6Au84Wpx2oA7kW8i3-s |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/jrudolph |
| jrudolph | https://github.com/jrudolph |
| Apr 19, 2025 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816577012 |
| https://github.com/ggml-org/llama.cpp/blob/6408210082cc0a61b992b487be7e2ff2efbb9e36/ggml/src/ggml-common.h#L175 | https://github.com/ggml-org/llama.cpp/blob/6408210082cc0a61b992b487be7e2ff2efbb9e36/ggml/src/ggml-common.h#L175 |
| … | https://github.com/ggml-org/llama.cpp/pull/1684 |
| #1256 | https://github.com/ggml-org/llama.cpp/issues/1256 |
| #1256 | https://github.com/ggml-org/llama.cpp/issues/1256 |
| @ikawrakow | https://github.com/ikawrakow |
| https://github.com/ikawrakow | https://github.com/ikawrakow |
| https://github.com/user-attachments/assets/4bb05eb0-f8ed-46c4-8b1f-251de1a77599 | https://github.com/user-attachments/assets/4bb05eb0-f8ed-46c4-8b1f-251de1a77599 |
| #1684 (comment) | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816490255 |
| https://github.com/notifications/unsubscribe-auth/AAACNDALZMI6MLOE5KYUM4L22G5CHAVCNFSM6AAAAAAYZLHGCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJWGQ4TAMRVGU | https://github.com/notifications/unsubscribe-auth/AAACNDALZMI6MLOE5KYUM4L22G5CHAVCNFSM6AAAAAAYZLHGCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJWGQ4TAMRVGU |
| ggml-org/llama.cpp#1684 | https://github.com/ggml-org/llama.cpp/pull/1684 |
| #1684 (comment) | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816490255 |
| #1256 | https://github.com/ggml-org/llama.cpp/issues/1256 |
| #1256 | https://github.com/ggml-org/llama.cpp/issues/1256 |
| @ikawrakow | https://github.com/ikawrakow |
| https://github.com/ikawrakow | https://github.com/ikawrakow |
| https://github.com/user-attachments/assets/4bb05eb0-f8ed-46c4-8b1f-251de1a77599 | https://github.com/user-attachments/assets/4bb05eb0-f8ed-46c4-8b1f-251de1a77599 |
| #1684 (comment) | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816490255 |
| https://github.com/notifications/unsubscribe-auth/AAACNDALZMI6MLOE5KYUM4L22G5CHAVCNFSM6AAAAAAYZLHGCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJWGQ4TAMRVGU | https://github.com/notifications/unsubscribe-auth/AAACNDALZMI6MLOE5KYUM4L22G5CHAVCNFSM6AAAAAAYZLHGCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJWGQ4TAMRVGU |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/nickfraser |
| nickfraser | https://github.com/nickfraser |
|
May 28, 2025
| https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-3077269885 |
|
Feat (brevitas_examples/llm): GGUF export
Xilinx/brevitas#1291
| https://github.com/Xilinx/brevitas/pull/1291 |
| https://github.com/mmwillet |
| mmwillet | https://github.com/mmwillet |
|
May 31, 2025
| https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-2998177116 |
|
Add and test quantization from Kokoro
mmwillet/TTS.cpp#15
| https://github.com/mmwillet/TTS.cpp/issues/15 |
| https://github.com/MaoJianwei |
| MaoJianwei | https://github.com/MaoJianwei |
| Jun 23, 2025 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2995316277 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/ikawrakow |
| ikawrakow | https://github.com/ikawrakow |
|
Jun 27, 2025
| https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-3178789455 |
|
Use cuBLAS for large batches and quants with block size 16
ikawrakow/ik_llama.cpp#559
| https://github.com/ikawrakow/ik_llama.cpp/pull/559 |
| https://github.com/jiangshibiao |
| jiangshibiao | https://github.com/jiangshibiao |
| Sep 7, 2025 | https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-3263707425 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
| https://github.com/michalharakal |
| michalharakal | https://github.com/michalharakal |
|
Nov 23, 2025
| https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-3656240522 |
|
[R] reasearch quantization
SKaiNET-developers/SKaiNET#208
| https://github.com/SKaiNET-developers/SKaiNET/issues/208 |
| https://github.co/hiddenchars |
| https://github.com/ggml-org/llama.cpp/pull/{{ revealButtonHref }} |
| Sign up for free | https://github.com/join?source=comment-repo |
| Sign in to comment | https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684 |
|
| https://github.com/apps/github-actions |
|
github-actions[bot]
| https://github.com/apps/github-actions |
|
| https://github.com/ggml-org/llama.cpp/pull/1684/files/af275faececaefb2e479001a579814a51d4f0067 |
|
| https://github.com/ggerganov |
|
ggerganov
| https://github.com/ggerganov |
|
| https://github.com/ggml-org/llama.cpp/pull/1684/files/af275faececaefb2e479001a579814a51d4f0067 |
|
| https://github.com/fleszar1 |
|
fleszar1
| https://github.com/fleszar1 |
|
| https://github.com/ggml-org/llama.cpp/pull/1684/files/af275faececaefb2e479001a579814a51d4f0067 |
|
high priority
| https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22high%20priority%22 |
|
Less than 4 bits
| https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Less%20than%204%20bits%22 |
|
research 🔬
| https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22research%20%F0%9F%94%AC%22 |
|
Review Complexity : High
| https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Review%20Complexity%20%3A%20High%22 |
|
Tensor Encoding Scheme
| https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Tensor%20Encoding%20Scheme%22 |
| Please reload this page | https://github.com/ggml-org/llama.cpp/pull/1684 |
|
| https://github.com/ikawrakow |
|
| https://github.com/KerfuffleV2 |
|
| https://github.com/Midaychi |
|
| https://github.com/EwoutH |
|
| https://github.com/0cc4m |
|
| https://github.com/TheBloke |
|
| https://github.com/AlvL1225 |
|
| https://github.com/shouyiwang |
|
| https://github.com/alankila |
|
| https://github.com/Alumniminium |
|
| https://github.com/bbecausereasonss |
|
| https://github.com/mirek190 |
|
| https://github.com/okpatil4u |
|
| https://github.com/x4080 |
|
| https://github.com/pbronez |
|
| https://github.com/cosmic-snow |
|
| https://github.com/viperwasp |
|
| https://github.com/zhaohb |
|
| https://github.com/JohannesGaessler |
|
| https://github.com/RonanKMcGovern |
|
| https://github.com |
| Terms | https://docs.github.com/site-policy/github-terms/github-terms-of-service |
| Privacy | https://docs.github.com/site-policy/privacy-policies/github-privacy-statement |
| Security | https://github.com/security |
| Status | https://www.githubstatus.com/ |
| Community | https://github.community/ |
| Docs | https://docs.github.com/ |
| Contact | https://support.github.com?tags=dotcom-footer |