René's URL Explorer Experiment

Title: k-quants by ikawrakow · Pull Request #1684 · ggml-org/llama.cpp · GitHub

Open Graph Title: k-quants by ikawrakow · Pull Request #1684 · ggml-org/llama.cpp

X Title: k-quants by ikawrakow · Pull Request #1684 · ggml-org/llama.cpp

Description: LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.

Open Graph Description: What This PR adds a series of 2-6 bit quantization methods, along with quantization mixes, as proposed in #1240 and #1256. Scalar, AVX2, ARM_NEON, and CUDA implementations are provided. Why This is...

X Description: What This PR adds a series of 2-6 bit quantization methods, along with quantization mixes, as proposed in #1240 and #1256. Scalar, AVX2, ARM_NEON, and CUDA implementations are provided. Why This is...

Opengraph URL: https://github.com/ggml-org/llama.cpp/pull/1684

X: @github

direct link

Domain: github.com

route-pattern	/_view_fragments/voltron/pull_requests/show/:user_id/:repository/:id/pull_request_layout(.:format)
route-controller	voltron_pull_requests_fragments
route-action	pull_request_layout
fetch-nonce	v2:b9653658-9fdf-112e-8089-889a140b69e8
current-catalog-service-hash	ae870bc5e265a340912cde392f23dad3671a0a881730ffdadd82f2f57d81641b
request-id	B654:D9BB8:2811AF3:356F6E0:6992E391
html-safe-nonce	eb758a0cd0ef3bea30d4b0e2005e065f957c61f8e9816730f8219f348df189fe
visitor-payload	eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCNjU0OkQ5QkI4OjI4MTFBRjM6MzU2RjZFMDo2OTkyRTM5MSIsInZpc2l0b3JfaWQiOiI2OTY4MjYzOTMxODkxNzM3NDg5IiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0=
visitor-hmac	e898e54c2d96952d26810aa8eecc9cc1b1ff6800c887fedd656fe632e53b06d3
hovercard-subject-tag	pull_request:1377064656
github-keyboard-shortcuts	repository,pull-request-list,pull-request-conversation,pull-request-files-changed,copilot
google-site-verification	Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-url	https://collector.github.com/github/collect
analytics-location	///voltron/pull_requests_fragments/pull_request_layout
fb:app_id	1401488693436528
apple-itunes-app	app-id=1477376905, app-argument=https://github.com/_view_fragments/voltron/pull_requests/show/ggml-org/llama.cpp/1684/pull_request_layout
twitter:image	https://opengraph.githubassets.com/79b184c3bf64df24abea9e326abf9e384ebcb580c1fda3ff8e0c20e638fe514e/ggml-org/llama.cpp/pull/1684
twitter:card	summary_large_image
og:image	https://opengraph.githubassets.com/79b184c3bf64df24abea9e326abf9e384ebcb580c1fda3ff8e0c20e638fe514e/ggml-org/llama.cpp/pull/1684
og:image:alt	What This PR adds a series of 2-6 bit quantization methods, along with quantization mixes, as proposed in #1240 and #1256. Scalar, AVX2, ARM_NEON, and CUDA implementations are provided. Why This is...
og:image:width	1200
og:image:height	600
og:site_name	GitHub
og:type	object
og:author:username	ikawrakow
hostname	github.com
expected-hostname	github.com
None	42c603b9d642c4a9065a51770f75e5e27132fef0e858607f5c9cb7e422831a7b
turbo-cache-control	no-cache
go-import	github.com/ggml-org/llama.cpp git https://github.com/ggml-org/llama.cpp.git
octolytics-dimension-user_id	134263123
octolytics-dimension-user_login	ggml-org
octolytics-dimension-repository_id	612354784
octolytics-dimension-repository_nwo	ggml-org/llama.cpp
octolytics-dimension-repository_public	true
octolytics-dimension-repository_is_fork	false
octolytics-dimension-repository_network_root_id	612354784
octolytics-dimension-repository_network_root_nwo	ggml-org/llama.cpp
turbo-body-classes	logged-out env-production page-responsive
disable-turbo	false
browser-stats-url	https://api.github.com/_private/browser/stats
browser-errors-url	https://api.github.com/_private/browser/errors
release	84dcb133269e3cfe6e0296cc85fbacb92cae92bb
ui-target	full
theme-color	#1e2327
color-scheme	light dark

Links:

Skip to content	https://github.com/ggml-org/llama.cpp/pull/1684#start-of-content
	https://github.com/
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684
GitHub CopilotWrite better code with AI	https://github.com/features/copilot
GitHub SparkBuild and deploy intelligent apps	https://github.com/features/spark
GitHub ModelsManage and compare prompts	https://github.com/features/models
MCP RegistryNewIntegrate external tools	https://github.com/mcp
ActionsAutomate any workflow	https://github.com/features/actions
CodespacesInstant dev environments	https://github.com/features/codespaces
IssuesPlan and track work	https://github.com/features/issues
Code ReviewManage code changes	https://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilities	https://github.com/security/advanced-security
Code securitySecure your code as you build	https://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they start	https://github.com/security/advanced-security/secret-protection
Why GitHub	https://github.com/why-github
Documentation	https://docs.github.com
Blog	https://github.blog
Changelog	https://github.blog/changelog
Marketplace	https://github.com/marketplace
View all features	https://github.com/features
Enterprises	https://github.com/enterprise
Small and medium teams	https://github.com/team
Startups	https://github.com/enterprise/startups
Nonprofits	https://github.com/solutions/industry/nonprofits
App Modernization	https://github.com/solutions/use-case/app-modernization
DevSecOps	https://github.com/solutions/use-case/devsecops
DevOps	https://github.com/solutions/use-case/devops
CI/CD	https://github.com/solutions/use-case/ci-cd
View all use cases	https://github.com/solutions/use-case
Healthcare	https://github.com/solutions/industry/healthcare
Financial services	https://github.com/solutions/industry/financial-services
Manufacturing	https://github.com/solutions/industry/manufacturing
Government	https://github.com/solutions/industry/government
View all industries	https://github.com/solutions/industry
View all solutions	https://github.com/solutions
AI	https://github.com/resources/articles?topic=ai
Software Development	https://github.com/resources/articles?topic=software-development
DevOps	https://github.com/resources/articles?topic=devops
Security	https://github.com/resources/articles?topic=security
View all topics	https://github.com/resources/articles
Customer stories	https://github.com/customer-stories
Events & webinars	https://github.com/resources/events
Ebooks & reports	https://github.com/resources/whitepapers
Business insights	https://github.com/solutions/executive-insights
GitHub Skills	https://skills.github.com
Documentation	https://docs.github.com
Customer support	https://support.github.com
Community forum	https://github.com/orgs/community/discussions
Trust center	https://github.com/trust-center
Partners	https://github.com/partners
GitHub SponsorsFund open source developers	https://github.com/sponsors
Security Lab	https://securitylab.github.com
Maintainer Community	https://maintainers.github.com
Accelerator	https://github.com/accelerator
Archive Program	https://archiveprogram.github.com
Topics	https://github.com/topics
Trending	https://github.com/trending
Collections	https://github.com/collections
Enterprise platformAI-powered developer platform	https://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security features	https://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI features	https://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 support	https://github.com/premium-support
Pricing	https://github.com/pricing
Search syntax tips	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentation	https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684
Sign up	https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fpull_requests_fragments%2Fpull_request_layout&source=header-repo&source_repo=ggml-org%2Fllama.cpp
Reload	https://github.com/ggml-org/llama.cpp/pull/1684
Reload	https://github.com/ggml-org/llama.cpp/pull/1684
Reload	https://github.com/ggml-org/llama.cpp/pull/1684
ggml-org	https://github.com/ggml-org
llama.cpp	https://github.com/ggml-org/llama.cpp
Notifications	https://github.com/login?return_to=%2Fggml-org%2Fllama.cpp
Fork 14.9k	https://github.com/login?return_to=%2Fggml-org%2Fllama.cpp
Star 95.1k	https://github.com/login?return_to=%2Fggml-org%2Fllama.cpp
Code	https://github.com/ggml-org/llama.cpp
Issues 406	https://github.com/ggml-org/llama.cpp/issues
Pull requests 719	https://github.com/ggml-org/llama.cpp/pulls
Discussions	https://github.com/ggml-org/llama.cpp/discussions
Actions	https://github.com/ggml-org/llama.cpp/actions
Projects 1	https://github.com/ggml-org/llama.cpp/projects
Wiki	https://github.com/ggml-org/llama.cpp/wiki
Security 10	https://github.com/ggml-org/llama.cpp/security
Insights	https://github.com/ggml-org/llama.cpp/pulse
Code	https://github.com/ggml-org/llama.cpp
Issues	https://github.com/ggml-org/llama.cpp/issues
Pull requests	https://github.com/ggml-org/llama.cpp/pulls
Discussions	https://github.com/ggml-org/llama.cpp/discussions
Actions	https://github.com/ggml-org/llama.cpp/actions
Projects	https://github.com/ggml-org/llama.cpp/projects
Wiki	https://github.com/ggml-org/llama.cpp/wiki
Security	https://github.com/ggml-org/llama.cpp/security
Insights	https://github.com/ggml-org/llama.cpp/pulse
ggerganov	https://github.com/ggerganov
master	https://github.com/ggml-org/llama.cpp/tree/master
ik/k_quants	https://github.com/ggml-org/llama.cpp/tree/ik/k_quants
Conversation	https://github.com/ggml-org/llama.cpp/pull/1684
Commits32 (32)	https://github.com/ggml-org/llama.cpp/pull/1684/commits
Checks	https://github.com/ggml-org/llama.cpp/pull/1684/checks
Files changed	https://github.com/ggml-org/llama.cpp/pull/1684/files
k-quants	https://github.com/ggml-org/llama.cpp/pull/1684#top
ggerganov	https://github.com/ggerganov
master	https://github.com/ggml-org/llama.cpp/tree/master
ik/k_quants	https://github.com/ggml-org/llama.cpp/tree/ik/k_quants
	https://github.com/ikawrakow
ikawrakow	https://github.com/ikawrakow
Jun 3, 2023	https://github.com/ggml-org/llama.cpp/pull/1684#issue-1739619305
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
#1240	https://github.com/ggml-org/llama.cpp/issues/1240
#1256	https://github.com/ggml-org/llama.cpp/issues/1256
	https://private-user-images.githubusercontent.com/48489457/243093269-07aa49f0-4951-407f-9789-0b5a01ce95b8.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzEyMzQ0OTMsIm5iZiI6MTc3MTIzNDE5MywicGF0aCI6Ii80ODQ4OTQ1Ny8yNDMwOTMyNjktMDdhYTQ5ZjAtNDk1MS00MDdmLTk3ODktMGI1YTAxY2U5NWI4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMjE2VDA5Mjk1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ0N2VlMDQ2ZTYxNDIzOTVjMTdmNjhmNDNjMmFlMTY0NmYyYzZkN2E0NzVkMGUyZjE2ODI3MDBjNWQ4ODM5NDcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.kIgAi_ZzqVgEgImmBvGY8HJXlC1LqgZlDnsYdwni_ms
this table on the main page	https://github.com/ggerganov/llama.cpp#quantization
	https://private-user-images.githubusercontent.com/48489457/243100011-365b503c-086a-4f41-8a7a-3c0957f75219.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzEyMzQ0OTMsIm5iZiI6MTc3MTIzNDE5MywicGF0aCI6Ii80ODQ4OTQ1Ny8yNDMxMDAwMTEtMzY1YjUwM2MtMDg2YS00ZjQxLThhN2EtM2MwOTU3Zjc1MjE5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMjE2VDA5Mjk1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTI0ZGU5ZTRiYTIwNDQ3ZGU5NGM4OTY2MTE1MTg1YWM3MzkwMDFjMjE0NzM5Y2M2MWMwZDRkZGU3Yzk3OGUzMWImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.N3qzMTH4BSb7TosQPzxa9gt4_Dcrng8n3NcqpcpFneE
@ggerganov	https://github.com/ggerganov
main page	https://github.com/ggerganov/llama.cpp#quantization
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
June 3, 2023 14:43	https://github.com/ggml-org/llama.cpp/pull/1684#commits-pushed-8673a41
Starting to add k-quantization to ggml	https://github.com/ggml-org/llama.cpp/pull/1684/commits/8673a41385048f2b1089b8fc29df1e1020ab683a
8673a41	https://github.com/ggml-org/llama.cpp/pull/1684/commits/8673a41385048f2b1089b8fc29df1e1020ab683a
Adding Q3_K and Q8_K (de)-quantization	https://github.com/ggml-org/llama.cpp/pull/1684/commits/b4f71347ff51a90c842d6240f9a8628d03c3d2ac
b4f7134	https://github.com/ggml-org/llama.cpp/pull/1684/commits/b4f71347ff51a90c842d6240f9a8628d03c3d2ac
Q3_K now working on CUDA and AVX2/scalar	https://github.com/ggml-org/llama.cpp/pull/1684/commits/c93cce3a450f23c1678135929caf0f177052f132
c93cce3	https://github.com/ggml-org/llama.cpp/pull/1684/commits/c93cce3a450f23c1678135929caf0f177052f132
Some improvement for Q3_K on CUDA	https://github.com/ggml-org/llama.cpp/pull/1684/commits/a3c06730890f7a2c3521724131db458865f99d05
a3c0673	https://github.com/ggml-org/llama.cpp/pull/1684/commits/a3c06730890f7a2c3521724131db458865f99d05
Some more CUDA optimizations for Q3_K	https://github.com/ggml-org/llama.cpp/pull/1684/commits/3d8b1de3f765ecc50df608db2e0e95870e8cf9b2
3d8b1de	https://github.com/ggml-org/llama.cpp/pull/1684/commits/3d8b1de3f765ecc50df608db2e0e95870e8cf9b2
Adding Q4_K - scalar, AVX2, CUDA	https://github.com/ggml-org/llama.cpp/pull/1684/commits/a0b8e9f3c90e482dbe0ca82f45f585de24f1ba67
a0b8e9f	https://github.com/ggml-org/llama.cpp/pull/1684/commits/a0b8e9f3c90e482dbe0ca82f45f585de24f1ba67
Adding Q6_K - scalar, AVX2, CUDA	https://github.com/ggml-org/llama.cpp/pull/1684/commits/cf221afb555a945be4d1e4153e38808a9d21a4cb
cf221af	https://github.com/ggml-org/llama.cpp/pull/1684/commits/cf221afb555a945be4d1e4153e38808a9d21a4cb
Adding Q5_K - scalar, AVX2, CUDA	https://github.com/ggml-org/llama.cpp/pull/1684/commits/b835d0f49f2e06e1ee918e799cd1316073ee8db7
b835d0f	https://github.com/ggml-org/llama.cpp/pull/1684/commits/b835d0f49f2e06e1ee918e799cd1316073ee8db7
Per convention, all QX_K quantizations use Q5_K for output.weight	https://github.com/ggml-org/llama.cpp/pull/1684/commits/5c5191ab68f28bc24ae26303d73a7ad08015880a
5c5191a	https://github.com/ggml-org/llama.cpp/pull/1684/commits/5c5191ab68f28bc24ae26303d73a7ad08015880a
Adding quantization mixes	https://github.com/ggml-org/llama.cpp/pull/1684/commits/d537b97cb812e896b0319d69838fa1adb8e48585
d537b97	https://github.com/ggml-org/llama.cpp/pull/1684/commits/d537b97cb812e896b0319d69838fa1adb8e48585
Quantization mixes: didn't quite get what I wanted in the last commit	https://github.com/ggml-org/llama.cpp/pull/1684/commits/54f808db2bae036a370a4b990e8fabe8aa8aced0
54f808d	https://github.com/ggml-org/llama.cpp/pull/1684/commits/54f808db2bae036a370a4b990e8fabe8aa8aced0
Q4_K dot product for ARM_NEON	https://github.com/ggml-org/llama.cpp/pull/1684/commits/a2533a72a3d9032a8e45b38e9e50d190be242cfe
a2533a7	https://github.com/ggml-org/llama.cpp/pull/1684/commits/a2533a72a3d9032a8e45b38e9e50d190be242cfe
Q6_K dot product for ARM_NEON	https://github.com/ggml-org/llama.cpp/pull/1684/commits/5ca15ce1551728495e2a8dc01fe40b51292421ad
5ca15ce	https://github.com/ggml-org/llama.cpp/pull/1684/commits/5ca15ce1551728495e2a8dc01fe40b51292421ad
Q5_K dot product for ARM_NEON	https://github.com/ggml-org/llama.cpp/pull/1684/commits/a197eb50d1f5739a76cedd2e824cd30e46bcfcad
a197eb5	https://github.com/ggml-org/llama.cpp/pull/1684/commits/a197eb50d1f5739a76cedd2e824cd30e46bcfcad
Adding Q3_K dot for ARM_NEON	https://github.com/ggml-org/llama.cpp/pull/1684/commits/13264fa067e200fe891977d48862ef610ad24daa
13264fa	https://github.com/ggml-org/llama.cpp/pull/1684/commits/13264fa067e200fe891977d48862ef610ad24daa
A very slightly faster ARM_NEON Q3_K dot	https://github.com/ggml-org/llama.cpp/pull/1684/commits/4faa040c20e2f92d2c7e44cf24146400200b89fa
4faa040	https://github.com/ggml-org/llama.cpp/pull/1684/commits/4faa040c20e2f92d2c7e44cf24146400200b89fa
Adding Q2_K - just CUDA for now	https://github.com/ggml-org/llama.cpp/pull/1684/commits/b439efb7129c5f2eca243116c158d2a056322273
b439efb	https://github.com/ggml-org/llama.cpp/pull/1684/commits/b439efb7129c5f2eca243116c158d2a056322273
Adding scalar and AVX2 Q2_K dot	https://github.com/ggml-org/llama.cpp/pull/1684/commits/8516fdf728d90462c48982fd5e8f56dad07aa823
8516fdf	https://github.com/ggml-org/llama.cpp/pull/1684/commits/8516fdf728d90462c48982fd5e8f56dad07aa823
Adding ARM_NEON Q2_K dot	https://github.com/ggml-org/llama.cpp/pull/1684/commits/6ec70579cb266fbac560bac8dc053a176cab381c
6ec7057	https://github.com/ggml-org/llama.cpp/pull/1684/commits/6ec70579cb266fbac560bac8dc053a176cab381c
A slightly faster ARM_NEON Q2_K dot	https://github.com/ggml-org/llama.cpp/pull/1684/commits/7bcc37676ad08c8574b1a8afc4d6c7ac56a86c5d
7bcc376	https://github.com/ggml-org/llama.cpp/pull/1684/commits/7bcc37676ad08c8574b1a8afc4d6c7ac56a86c5d
Fixed bug in Q2_K CUDA dot product kernel	https://github.com/ggml-org/llama.cpp/pull/1684/commits/e51ce72e03fe487f5b1d614287a6724559882afe
e51ce72	https://github.com/ggml-org/llama.cpp/pull/1684/commits/e51ce72e03fe487f5b1d614287a6724559882afe
Don't print zeros/NaNs when no count histogram has been collected	https://github.com/ggml-org/llama.cpp/pull/1684/commits/c5959d53ffa1ce52c69ec983ad40a570af559551
c5959d5	https://github.com/ggml-org/llama.cpp/pull/1684/commits/c5959d53ffa1ce52c69ec983ad40a570af559551
A 10% faster CUDA vector dot kernel for Q3_K	https://github.com/ggml-org/llama.cpp/pull/1684/commits/9a9c5a0c80ea0a279a56214cc10a1578f51bb672
9a9c5a0	https://github.com/ggml-org/llama.cpp/pull/1684/commits/9a9c5a0c80ea0a279a56214cc10a1578f51bb672
A slightly daster Q4_K AVX2 dot product	https://github.com/ggml-org/llama.cpp/pull/1684/commits/894210a3519eb39148189ea7a4094aa076bee2d7
894210a	https://github.com/ggml-org/llama.cpp/pull/1684/commits/894210a3519eb39148189ea7a4094aa076bee2d7
A slightly faster ARM_NEON A4_K dot product	https://github.com/ggml-org/llama.cpp/pull/1684/commits/abd99a89a780843cd803fa75d318f98370601432
abd99a8	https://github.com/ggml-org/llama.cpp/pull/1684/commits/abd99a89a780843cd803fa75d318f98370601432
Minor	https://github.com/ggml-org/llama.cpp/pull/1684/commits/8f5d42db9b4a5f9b86ffc503d510fb6a5ce06434
8f5d42d	https://github.com/ggml-org/llama.cpp/pull/1684/commits/8f5d42db9b4a5f9b86ffc503d510fb6a5ce06434
	https://github.com/ikawrakow
ikawrakow	https://github.com/ikawrakow
ggerganov	https://github.com/ggerganov
June 3, 2023 15:24	https://github.com/ggml-org/llama.cpp/pull/1684#event-9421648120
	https://github.com/apps/github-actions
Sign in to view	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
Fix quantization error test	https://github.com/ggml-org/llama.cpp/pull/1684/commits/6ef13823b81ce9d0f15e58e947d25a82dad83fd3
6ef1382	https://github.com/ggml-org/llama.cpp/pull/1684/commits/6ef13823b81ce9d0f15e58e947d25a82dad83fd3
	https://github.com/apps/github-actions
Sign in to view	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/mofosyne
mofosyne	https://github.com/mofosyne
Tensor Encoding Scheme	https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Tensor%20Encoding%20Scheme%22
Review Complexity : High	https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Review%20Complexity%20%3A%20High%22
May 25, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#event-12930819627
	https://github.com/Seedmanc
Seedmanc	https://github.com/Seedmanc
May 29, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2137025238
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/mofosyne
mofosyne	https://github.com/mofosyne
May 29, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-2319448482
Performance improvements on Arm for legacy and k-quants mozilla-ai/llamafile#453	https://github.com/mozilla-ai/llamafile/pull/453
	https://github.com/kaizizzzzzz
kaizizzzzzz	https://github.com/kaizizzzzzz
Jul 30, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2257278836
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/Green-Sky
Green-Sky	https://github.com/Green-Sky
Jul 30, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2257773570
@kaizizzzzzz	https://github.com/kaizizzzzzz
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/kaizizzzzzz
kaizizzzzzz	https://github.com/kaizizzzzzz
Jul 30, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2258513294
@Green-Sky	https://github.com/Green-Sky
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/kaizizzzzzz
kaizizzzzzz	https://github.com/kaizizzzzzz
Jul 31, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2259565236
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
A Visual Guide to Quantization guevara/read-it-later#11678	https://github.com/guevara/read-it-later/issues/11678
A Visual Guide to Quantization guevara/read-it-later#11692	https://github.com/guevara/read-it-later/issues/11692
	https://github.com/asomoza
asomoza	https://github.com/asomoza
Aug 19, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-2462431761
NF4 Flux params in diffusers huggingface/diffusers#9165	https://github.com/huggingface/diffusers/issues/9165
	https://github.com/fedric95
fedric95	https://github.com/fedric95
Sep 8, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2336822836
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/BarfingLemurs
BarfingLemurs	https://github.com/BarfingLemurs
Sep 26, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-2549262713
about precision loss microsoft/T-MAC#52	https://github.com/microsoft/T-MAC/issues/52
	https://github.com/HAOYON-666
HAOYON-666	https://github.com/HAOYON-666
Sep 29, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2381062705
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/HAOYON-666
HAOYON-666	https://github.com/HAOYON-666
Sep 29, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2381062870
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/QingtaoLi1
QingtaoLi1	https://github.com/QingtaoLi1
Nov 5, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-2634781604
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method #10181	https://github.com/ggml-org/llama.cpp/pull/10181
	https://github.com/SamuelHafner
SamuelHafner	https://github.com/SamuelHafner
Nov 13, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2473738608
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/Green-Sky
Green-Sky	https://github.com/Green-Sky
Nov 13, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2474094689
@SamuelHafner	https://github.com/SamuelHafner
@ikawrakow	https://github.com/ikawrakow
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/SamuelHafner
SamuelHafner	https://github.com/SamuelHafner
Nov 13, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2474235194
@Green-Sky	https://github.com/Green-Sky
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/ikawrakow
ikawrakow	https://github.com/ikawrakow
Nov 13, 2024	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2474462323
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/Bearsaerker
Bearsaerker	https://github.com/Bearsaerker
Mar 12, 2025	https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-2913690728
Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352	https://github.com/ggml-org/llama.cpp/issues/12352
	https://github.com/jiafatom
jiafatom	https://github.com/jiafatom
Apr 15, 2025	https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-2986449850
k quant intel/neural-compressor#2169	https://github.com/intel/neural-compressor/pull/2169
	https://github.com/lgyStoic
lgyStoic	https://github.com/lgyStoic
Apr 19, 2025	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816490255
#1256	https://github.com/ggml-org/llama.cpp/issues/1256
@ikawrakow	https://github.com/ikawrakow
	https://private-user-images.githubusercontent.com/4526646/435352222-4bb05eb0-f8ed-46c4-8b1f-251de1a77599.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzEyMzQ0OTQsIm5iZiI6MTc3MTIzNDE5NCwicGF0aCI6Ii80NTI2NjQ2LzQzNTM1MjIyMi00YmIwNWViMC1mOGVkLTQ2YzQtOGIxZi0yNTFkZTFhNzc1OTkucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI2MDIxNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNjAyMTZUMDkyOTU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YzA2MjA2Nzk3MzI2ODliYTliYjM3M2Y1MjQ1NGQ1ZDQ5MjhjNWM2YzZiODMyMTg4NzBlZjQ1NzkxY2VjMzQ2NCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.tSgh3UijWK2I3Veu3_b2ulHO6Au84Wpx2oA7kW8i3-s
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/jrudolph
jrudolph	https://github.com/jrudolph
Apr 19, 2025	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816577012
https://github.com/ggml-org/llama.cpp/blob/6408210082cc0a61b992b487be7e2ff2efbb9e36/ggml/src/ggml-common.h#L175	https://github.com/ggml-org/llama.cpp/blob/6408210082cc0a61b992b487be7e2ff2efbb9e36/ggml/src/ggml-common.h#L175
…	https://github.com/ggml-org/llama.cpp/pull/1684
#1256	https://github.com/ggml-org/llama.cpp/issues/1256
#1256	https://github.com/ggml-org/llama.cpp/issues/1256
@ikawrakow	https://github.com/ikawrakow
https://github.com/ikawrakow	https://github.com/ikawrakow
https://github.com/user-attachments/assets/4bb05eb0-f8ed-46c4-8b1f-251de1a77599	https://github.com/user-attachments/assets/4bb05eb0-f8ed-46c4-8b1f-251de1a77599
#1684 (comment)	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816490255
https://github.com/notifications/unsubscribe-auth/AAACNDALZMI6MLOE5KYUM4L22G5CHAVCNFSM6AAAAAAYZLHGCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJWGQ4TAMRVGU	https://github.com/notifications/unsubscribe-auth/AAACNDALZMI6MLOE5KYUM4L22G5CHAVCNFSM6AAAAAAYZLHGCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJWGQ4TAMRVGU
ggml-org/llama.cpp#1684	https://github.com/ggml-org/llama.cpp/pull/1684
#1684 (comment)	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816490255
#1256	https://github.com/ggml-org/llama.cpp/issues/1256
#1256	https://github.com/ggml-org/llama.cpp/issues/1256
@ikawrakow	https://github.com/ikawrakow
https://github.com/ikawrakow	https://github.com/ikawrakow
https://github.com/user-attachments/assets/4bb05eb0-f8ed-46c4-8b1f-251de1a77599	https://github.com/user-attachments/assets/4bb05eb0-f8ed-46c4-8b1f-251de1a77599
#1684 (comment)	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816490255
https://github.com/notifications/unsubscribe-auth/AAACNDALZMI6MLOE5KYUM4L22G5CHAVCNFSM6AAAAAAYZLHGCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJWGQ4TAMRVGU	https://github.com/notifications/unsubscribe-auth/AAACNDALZMI6MLOE5KYUM4L22G5CHAVCNFSM6AAAAAAYZLHGCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJWGQ4TAMRVGU
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/nickfraser
nickfraser	https://github.com/nickfraser
May 28, 2025	https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-3077269885
Feat (brevitas_examples/llm): GGUF export Xilinx/brevitas#1291	https://github.com/Xilinx/brevitas/pull/1291
	https://github.com/mmwillet
mmwillet	https://github.com/mmwillet
May 31, 2025	https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-2998177116
Add and test quantization from Kokoro mmwillet/TTS.cpp#15	https://github.com/mmwillet/TTS.cpp/issues/15
	https://github.com/MaoJianwei
MaoJianwei	https://github.com/MaoJianwei
Jun 23, 2025	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2995316277
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/ikawrakow
ikawrakow	https://github.com/ikawrakow
Jun 27, 2025	https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-3178789455
Use cuBLAS for large batches and quants with block size 16 ikawrakow/ik_llama.cpp#559	https://github.com/ikawrakow/ik_llama.cpp/pull/559
	https://github.com/jiangshibiao
jiangshibiao	https://github.com/jiangshibiao
Sep 7, 2025	https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-3263707425
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/michalharakal
michalharakal	https://github.com/michalharakal
Nov 23, 2025	https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-3656240522
[R] reasearch quantization SKaiNET-developers/SKaiNET#208	https://github.com/SKaiNET-developers/SKaiNET/issues/208
	https://github.co/hiddenchars
	https://github.com/ggml-org/llama.cpp/pull/{{ revealButtonHref }}
Sign up for free	https://github.com/join?source=comment-repo
Sign in to comment	https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684
	https://github.com/apps/github-actions
github-actions[bot]	https://github.com/apps/github-actions
	https://github.com/ggml-org/llama.cpp/pull/1684/files/af275faececaefb2e479001a579814a51d4f0067
	https://github.com/ggerganov
ggerganov	https://github.com/ggerganov
	https://github.com/ggml-org/llama.cpp/pull/1684/files/af275faececaefb2e479001a579814a51d4f0067
	https://github.com/fleszar1
fleszar1	https://github.com/fleszar1
	https://github.com/ggml-org/llama.cpp/pull/1684/files/af275faececaefb2e479001a579814a51d4f0067
high priority	https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22high%20priority%22
Less than 4 bits	https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Less%20than%204%20bits%22
research 🔬	https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22research%20%F0%9F%94%AC%22
Review Complexity : High	https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Review%20Complexity%20%3A%20High%22
Tensor Encoding Scheme	https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Tensor%20Encoding%20Scheme%22
Please reload this page	https://github.com/ggml-org/llama.cpp/pull/1684
	https://github.com/ikawrakow
	https://github.com/KerfuffleV2
	https://github.com/Midaychi
	https://github.com/EwoutH
	https://github.com/0cc4m
	https://github.com/TheBloke
	https://github.com/AlvL1225
	https://github.com/shouyiwang
	https://github.com/alankila
	https://github.com/Alumniminium
	https://github.com/bbecausereasonss
	https://github.com/mirek190
	https://github.com/okpatil4u
	https://github.com/x4080
	https://github.com/pbronez
	https://github.com/cosmic-snow
	https://github.com/viperwasp
	https://github.com/zhaohb
	https://github.com/JohannesGaessler
	https://github.com/RonanKMcGovern
	https://github.com
Terms	https://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacy	https://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Security	https://github.com/security
Status	https://www.githubstatus.com/
Community	https://github.community/
Docs	https://docs.github.com/
Contact	https://support.github.com?tags=dotcom-footer

Viewport: width=device-width

URLs of crawlers that visited me.