René's URL Explorer Experiment


Title: k-quants by ikawrakow · Pull Request #1684 · ggml-org/llama.cpp · GitHub

Open Graph Title: k-quants by ikawrakow · Pull Request #1684 · ggml-org/llama.cpp

X Title: k-quants by ikawrakow · Pull Request #1684 · ggml-org/llama.cpp

Description: LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.

Open Graph Description: What This PR adds a series of 2-6 bit quantization methods, along with quantization mixes, as proposed in #1240 and #1256. Scalar, AVX2, ARM_NEON, and CUDA implementations are provided. Why This is...

X Description: What This PR adds a series of 2-6 bit quantization methods, along with quantization mixes, as proposed in #1240 and #1256. Scalar, AVX2, ARM_NEON, and CUDA implementations are provided. Why This is...

Opengraph URL: https://github.com/ggml-org/llama.cpp/pull/1684

X: @github

direct link

Domain: github.com

route-pattern/_view_fragments/voltron/pull_requests/show/:user_id/:repository/:id/pull_request_layout(.:format)
route-controllervoltron_pull_requests_fragments
route-actionpull_request_layout
fetch-noncev2:b9653658-9fdf-112e-8089-889a140b69e8
current-catalog-service-hashae870bc5e265a340912cde392f23dad3671a0a881730ffdadd82f2f57d81641b
request-idB654:D9BB8:2811AF3:356F6E0:6992E391
html-safe-nonceeb758a0cd0ef3bea30d4b0e2005e065f957c61f8e9816730f8219f348df189fe
visitor-payloadeyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiJCNjU0OkQ5QkI4OjI4MTFBRjM6MzU2RjZFMDo2OTkyRTM5MSIsInZpc2l0b3JfaWQiOiI2OTY4MjYzOTMxODkxNzM3NDg5IiwicmVnaW9uX2VkZ2UiOiJpYWQiLCJyZWdpb25fcmVuZGVyIjoiaWFkIn0=
visitor-hmace898e54c2d96952d26810aa8eecc9cc1b1ff6800c887fedd656fe632e53b06d3
hovercard-subject-tagpull_request:1377064656
github-keyboard-shortcutsrepository,pull-request-list,pull-request-conversation,pull-request-files-changed,copilot
google-site-verificationApib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I
octolytics-urlhttps://collector.github.com/github/collect
analytics-location///voltron/pull_requests_fragments/pull_request_layout
fb:app_id1401488693436528
apple-itunes-appapp-id=1477376905, app-argument=https://github.com/_view_fragments/voltron/pull_requests/show/ggml-org/llama.cpp/1684/pull_request_layout
twitter:imagehttps://opengraph.githubassets.com/79b184c3bf64df24abea9e326abf9e384ebcb580c1fda3ff8e0c20e638fe514e/ggml-org/llama.cpp/pull/1684
twitter:cardsummary_large_image
og:imagehttps://opengraph.githubassets.com/79b184c3bf64df24abea9e326abf9e384ebcb580c1fda3ff8e0c20e638fe514e/ggml-org/llama.cpp/pull/1684
og:image:altWhat This PR adds a series of 2-6 bit quantization methods, along with quantization mixes, as proposed in #1240 and #1256. Scalar, AVX2, ARM_NEON, and CUDA implementations are provided. Why This is...
og:image:width1200
og:image:height600
og:site_nameGitHub
og:typeobject
og:author:usernameikawrakow
hostnamegithub.com
expected-hostnamegithub.com
None42c603b9d642c4a9065a51770f75e5e27132fef0e858607f5c9cb7e422831a7b
turbo-cache-controlno-cache
go-importgithub.com/ggml-org/llama.cpp git https://github.com/ggml-org/llama.cpp.git
octolytics-dimension-user_id134263123
octolytics-dimension-user_loginggml-org
octolytics-dimension-repository_id612354784
octolytics-dimension-repository_nwoggml-org/llama.cpp
octolytics-dimension-repository_publictrue
octolytics-dimension-repository_is_forkfalse
octolytics-dimension-repository_network_root_id612354784
octolytics-dimension-repository_network_root_nwoggml-org/llama.cpp
turbo-body-classeslogged-out env-production page-responsive
disable-turbofalse
browser-stats-urlhttps://api.github.com/_private/browser/stats
browser-errors-urlhttps://api.github.com/_private/browser/errors
release84dcb133269e3cfe6e0296cc85fbacb92cae92bb
ui-targetfull
theme-color#1e2327
color-schemelight dark

Links:

Skip to contenthttps://github.com/ggml-org/llama.cpp/pull/1684#start-of-content
https://github.com/
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684
GitHub CopilotWrite better code with AIhttps://github.com/features/copilot
GitHub SparkBuild and deploy intelligent appshttps://github.com/features/spark
GitHub ModelsManage and compare promptshttps://github.com/features/models
MCP RegistryNewIntegrate external toolshttps://github.com/mcp
ActionsAutomate any workflowhttps://github.com/features/actions
CodespacesInstant dev environmentshttps://github.com/features/codespaces
IssuesPlan and track workhttps://github.com/features/issues
Code ReviewManage code changeshttps://github.com/features/code-review
GitHub Advanced SecurityFind and fix vulnerabilitieshttps://github.com/security/advanced-security
Code securitySecure your code as you buildhttps://github.com/security/advanced-security/code-security
Secret protectionStop leaks before they starthttps://github.com/security/advanced-security/secret-protection
Why GitHubhttps://github.com/why-github
Documentationhttps://docs.github.com
Bloghttps://github.blog
Changeloghttps://github.blog/changelog
Marketplacehttps://github.com/marketplace
View all featureshttps://github.com/features
Enterpriseshttps://github.com/enterprise
Small and medium teamshttps://github.com/team
Startupshttps://github.com/enterprise/startups
Nonprofitshttps://github.com/solutions/industry/nonprofits
App Modernizationhttps://github.com/solutions/use-case/app-modernization
DevSecOpshttps://github.com/solutions/use-case/devsecops
DevOpshttps://github.com/solutions/use-case/devops
CI/CDhttps://github.com/solutions/use-case/ci-cd
View all use caseshttps://github.com/solutions/use-case
Healthcarehttps://github.com/solutions/industry/healthcare
Financial serviceshttps://github.com/solutions/industry/financial-services
Manufacturinghttps://github.com/solutions/industry/manufacturing
Governmenthttps://github.com/solutions/industry/government
View all industrieshttps://github.com/solutions/industry
View all solutionshttps://github.com/solutions
AIhttps://github.com/resources/articles?topic=ai
Software Developmenthttps://github.com/resources/articles?topic=software-development
DevOpshttps://github.com/resources/articles?topic=devops
Securityhttps://github.com/resources/articles?topic=security
View all topicshttps://github.com/resources/articles
Customer storieshttps://github.com/customer-stories
Events & webinarshttps://github.com/resources/events
Ebooks & reportshttps://github.com/resources/whitepapers
Business insightshttps://github.com/solutions/executive-insights
GitHub Skillshttps://skills.github.com
Documentationhttps://docs.github.com
Customer supporthttps://support.github.com
Community forumhttps://github.com/orgs/community/discussions
Trust centerhttps://github.com/trust-center
Partnershttps://github.com/partners
GitHub SponsorsFund open source developershttps://github.com/sponsors
Security Labhttps://securitylab.github.com
Maintainer Communityhttps://maintainers.github.com
Acceleratorhttps://github.com/accelerator
Archive Programhttps://archiveprogram.github.com
Topicshttps://github.com/topics
Trendinghttps://github.com/trending
Collectionshttps://github.com/collections
Enterprise platformAI-powered developer platformhttps://github.com/enterprise
GitHub Advanced SecurityEnterprise-grade security featureshttps://github.com/security/advanced-security
Copilot for BusinessEnterprise-grade AI featureshttps://github.com/features/copilot/copilot-business
Premium SupportEnterprise-grade 24/7 supporthttps://github.com/premium-support
Pricinghttps://github.com/pricing
Search syntax tipshttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
documentationhttps://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax
Sign in https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684
Sign up https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E%2F%3Crepo-name%3E%2Fvoltron%2Fpull_requests_fragments%2Fpull_request_layout&source=header-repo&source_repo=ggml-org%2Fllama.cpp
Reloadhttps://github.com/ggml-org/llama.cpp/pull/1684
Reloadhttps://github.com/ggml-org/llama.cpp/pull/1684
Reloadhttps://github.com/ggml-org/llama.cpp/pull/1684
ggml-org https://github.com/ggml-org
llama.cpphttps://github.com/ggml-org/llama.cpp
Notifications https://github.com/login?return_to=%2Fggml-org%2Fllama.cpp
Fork 14.9k https://github.com/login?return_to=%2Fggml-org%2Fllama.cpp
Star 95.1k https://github.com/login?return_to=%2Fggml-org%2Fllama.cpp
Code https://github.com/ggml-org/llama.cpp
Issues 406 https://github.com/ggml-org/llama.cpp/issues
Pull requests 719 https://github.com/ggml-org/llama.cpp/pulls
Discussions https://github.com/ggml-org/llama.cpp/discussions
Actions https://github.com/ggml-org/llama.cpp/actions
Projects 1 https://github.com/ggml-org/llama.cpp/projects
Wiki https://github.com/ggml-org/llama.cpp/wiki
Security 10 https://github.com/ggml-org/llama.cpp/security
Insights https://github.com/ggml-org/llama.cpp/pulse
Code https://github.com/ggml-org/llama.cpp
Issues https://github.com/ggml-org/llama.cpp/issues
Pull requests https://github.com/ggml-org/llama.cpp/pulls
Discussions https://github.com/ggml-org/llama.cpp/discussions
Actions https://github.com/ggml-org/llama.cpp/actions
Projects https://github.com/ggml-org/llama.cpp/projects
Wiki https://github.com/ggml-org/llama.cpp/wiki
Security https://github.com/ggml-org/llama.cpp/security
Insights https://github.com/ggml-org/llama.cpp/pulse
ggerganovhttps://github.com/ggerganov
masterhttps://github.com/ggml-org/llama.cpp/tree/master
ik/k_quantshttps://github.com/ggml-org/llama.cpp/tree/ik/k_quants
Conversationhttps://github.com/ggml-org/llama.cpp/pull/1684
Commits32 (32)https://github.com/ggml-org/llama.cpp/pull/1684/commits
Checkshttps://github.com/ggml-org/llama.cpp/pull/1684/checks
Files changedhttps://github.com/ggml-org/llama.cpp/pull/1684/files
k-quantshttps://github.com/ggml-org/llama.cpp/pull/1684#top
ggerganovhttps://github.com/ggerganov
masterhttps://github.com/ggml-org/llama.cpp/tree/master
ik/k_quantshttps://github.com/ggml-org/llama.cpp/tree/ik/k_quants
https://github.com/ikawrakow
ikawrakowhttps://github.com/ikawrakow
Jun 3, 2023https://github.com/ggml-org/llama.cpp/pull/1684#issue-1739619305
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
#1240https://github.com/ggml-org/llama.cpp/issues/1240
#1256https://github.com/ggml-org/llama.cpp/issues/1256
https://private-user-images.githubusercontent.com/48489457/243093269-07aa49f0-4951-407f-9789-0b5a01ce95b8.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzEyMzQ0OTMsIm5iZiI6MTc3MTIzNDE5MywicGF0aCI6Ii80ODQ4OTQ1Ny8yNDMwOTMyNjktMDdhYTQ5ZjAtNDk1MS00MDdmLTk3ODktMGI1YTAxY2U5NWI4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMjE2VDA5Mjk1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ0N2VlMDQ2ZTYxNDIzOTVjMTdmNjhmNDNjMmFlMTY0NmYyYzZkN2E0NzVkMGUyZjE2ODI3MDBjNWQ4ODM5NDcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.kIgAi_ZzqVgEgImmBvGY8HJXlC1LqgZlDnsYdwni_ms
this table on the main pagehttps://github.com/ggerganov/llama.cpp#quantization
https://private-user-images.githubusercontent.com/48489457/243100011-365b503c-086a-4f41-8a7a-3c0957f75219.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzEyMzQ0OTMsIm5iZiI6MTc3MTIzNDE5MywicGF0aCI6Ii80ODQ4OTQ1Ny8yNDMxMDAwMTEtMzY1YjUwM2MtMDg2YS00ZjQxLThhN2EtM2MwOTU3Zjc1MjE5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMjE2VDA5Mjk1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTI0ZGU5ZTRiYTIwNDQ3ZGU5NGM4OTY2MTE1MTg1YWM3MzkwMDFjMjE0NzM5Y2M2MWMwZDRkZGU3Yzk3OGUzMWImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.N3qzMTH4BSb7TosQPzxa9gt4_Dcrng8n3NcqpcpFneE
@ggerganovhttps://github.com/ggerganov
main pagehttps://github.com/ggerganov/llama.cpp#quantization
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
June 3, 2023 14:43https://github.com/ggml-org/llama.cpp/pull/1684#commits-pushed-8673a41
Starting to add k-quantization to ggmlhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/8673a41385048f2b1089b8fc29df1e1020ab683a
8673a41https://github.com/ggml-org/llama.cpp/pull/1684/commits/8673a41385048f2b1089b8fc29df1e1020ab683a
Adding Q3_K and Q8_K (de)-quantizationhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/b4f71347ff51a90c842d6240f9a8628d03c3d2ac
b4f7134https://github.com/ggml-org/llama.cpp/pull/1684/commits/b4f71347ff51a90c842d6240f9a8628d03c3d2ac
Q3_K now working on CUDA and AVX2/scalarhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/c93cce3a450f23c1678135929caf0f177052f132
c93cce3https://github.com/ggml-org/llama.cpp/pull/1684/commits/c93cce3a450f23c1678135929caf0f177052f132
Some improvement for Q3_K on CUDAhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/a3c06730890f7a2c3521724131db458865f99d05
a3c0673https://github.com/ggml-org/llama.cpp/pull/1684/commits/a3c06730890f7a2c3521724131db458865f99d05
Some more CUDA optimizations for Q3_Khttps://github.com/ggml-org/llama.cpp/pull/1684/commits/3d8b1de3f765ecc50df608db2e0e95870e8cf9b2
3d8b1dehttps://github.com/ggml-org/llama.cpp/pull/1684/commits/3d8b1de3f765ecc50df608db2e0e95870e8cf9b2
Adding Q4_K - scalar, AVX2, CUDAhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/a0b8e9f3c90e482dbe0ca82f45f585de24f1ba67
a0b8e9fhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/a0b8e9f3c90e482dbe0ca82f45f585de24f1ba67
Adding Q6_K - scalar, AVX2, CUDAhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/cf221afb555a945be4d1e4153e38808a9d21a4cb
cf221afhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/cf221afb555a945be4d1e4153e38808a9d21a4cb
Adding Q5_K - scalar, AVX2, CUDAhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/b835d0f49f2e06e1ee918e799cd1316073ee8db7
b835d0fhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/b835d0f49f2e06e1ee918e799cd1316073ee8db7
Per convention, all QX_K quantizations use Q5_K for output.weighthttps://github.com/ggml-org/llama.cpp/pull/1684/commits/5c5191ab68f28bc24ae26303d73a7ad08015880a
5c5191ahttps://github.com/ggml-org/llama.cpp/pull/1684/commits/5c5191ab68f28bc24ae26303d73a7ad08015880a
Adding quantization mixeshttps://github.com/ggml-org/llama.cpp/pull/1684/commits/d537b97cb812e896b0319d69838fa1adb8e48585
d537b97https://github.com/ggml-org/llama.cpp/pull/1684/commits/d537b97cb812e896b0319d69838fa1adb8e48585
Quantization mixes: didn't quite get what I wanted in the last commithttps://github.com/ggml-org/llama.cpp/pull/1684/commits/54f808db2bae036a370a4b990e8fabe8aa8aced0
54f808dhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/54f808db2bae036a370a4b990e8fabe8aa8aced0
Q4_K dot product for ARM_NEONhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/a2533a72a3d9032a8e45b38e9e50d190be242cfe
a2533a7https://github.com/ggml-org/llama.cpp/pull/1684/commits/a2533a72a3d9032a8e45b38e9e50d190be242cfe
Q6_K dot product for ARM_NEONhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/5ca15ce1551728495e2a8dc01fe40b51292421ad
5ca15cehttps://github.com/ggml-org/llama.cpp/pull/1684/commits/5ca15ce1551728495e2a8dc01fe40b51292421ad
Q5_K dot product for ARM_NEONhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/a197eb50d1f5739a76cedd2e824cd30e46bcfcad
a197eb5https://github.com/ggml-org/llama.cpp/pull/1684/commits/a197eb50d1f5739a76cedd2e824cd30e46bcfcad
Adding Q3_K dot for ARM_NEONhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/13264fa067e200fe891977d48862ef610ad24daa
13264fahttps://github.com/ggml-org/llama.cpp/pull/1684/commits/13264fa067e200fe891977d48862ef610ad24daa
A very slightly faster ARM_NEON Q3_K dothttps://github.com/ggml-org/llama.cpp/pull/1684/commits/4faa040c20e2f92d2c7e44cf24146400200b89fa
4faa040https://github.com/ggml-org/llama.cpp/pull/1684/commits/4faa040c20e2f92d2c7e44cf24146400200b89fa
Adding Q2_K - just CUDA for nowhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/b439efb7129c5f2eca243116c158d2a056322273
b439efbhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/b439efb7129c5f2eca243116c158d2a056322273
Adding scalar and AVX2 Q2_K dothttps://github.com/ggml-org/llama.cpp/pull/1684/commits/8516fdf728d90462c48982fd5e8f56dad07aa823
8516fdfhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/8516fdf728d90462c48982fd5e8f56dad07aa823
Adding ARM_NEON Q2_K dothttps://github.com/ggml-org/llama.cpp/pull/1684/commits/6ec70579cb266fbac560bac8dc053a176cab381c
6ec7057https://github.com/ggml-org/llama.cpp/pull/1684/commits/6ec70579cb266fbac560bac8dc053a176cab381c
A slightly faster ARM_NEON Q2_K dothttps://github.com/ggml-org/llama.cpp/pull/1684/commits/7bcc37676ad08c8574b1a8afc4d6c7ac56a86c5d
7bcc376https://github.com/ggml-org/llama.cpp/pull/1684/commits/7bcc37676ad08c8574b1a8afc4d6c7ac56a86c5d
Fixed bug in Q2_K CUDA dot product kernelhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/e51ce72e03fe487f5b1d614287a6724559882afe
e51ce72https://github.com/ggml-org/llama.cpp/pull/1684/commits/e51ce72e03fe487f5b1d614287a6724559882afe
Don't print zeros/NaNs when no count histogram has been collectedhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/c5959d53ffa1ce52c69ec983ad40a570af559551
c5959d5https://github.com/ggml-org/llama.cpp/pull/1684/commits/c5959d53ffa1ce52c69ec983ad40a570af559551
A 10% faster CUDA vector dot kernel for Q3_Khttps://github.com/ggml-org/llama.cpp/pull/1684/commits/9a9c5a0c80ea0a279a56214cc10a1578f51bb672
9a9c5a0https://github.com/ggml-org/llama.cpp/pull/1684/commits/9a9c5a0c80ea0a279a56214cc10a1578f51bb672
A slightly daster Q4_K AVX2 dot producthttps://github.com/ggml-org/llama.cpp/pull/1684/commits/894210a3519eb39148189ea7a4094aa076bee2d7
894210ahttps://github.com/ggml-org/llama.cpp/pull/1684/commits/894210a3519eb39148189ea7a4094aa076bee2d7
A slightly faster ARM_NEON A4_K dot producthttps://github.com/ggml-org/llama.cpp/pull/1684/commits/abd99a89a780843cd803fa75d318f98370601432
abd99a8https://github.com/ggml-org/llama.cpp/pull/1684/commits/abd99a89a780843cd803fa75d318f98370601432
Minorhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/8f5d42db9b4a5f9b86ffc503d510fb6a5ce06434
8f5d42dhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/8f5d42db9b4a5f9b86ffc503d510fb6a5ce06434
https://github.com/ikawrakow
ikawrakowhttps://github.com/ikawrakow
ggerganovhttps://github.com/ggerganov
June 3, 2023 15:24https://github.com/ggml-org/llama.cpp/pull/1684#event-9421648120
https://github.com/apps/github-actions
Sign in to viewhttps://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
Fix quantization error testhttps://github.com/ggml-org/llama.cpp/pull/1684/commits/6ef13823b81ce9d0f15e58e947d25a82dad83fd3
6ef1382https://github.com/ggml-org/llama.cpp/pull/1684/commits/6ef13823b81ce9d0f15e58e947d25a82dad83fd3
https://github.com/apps/github-actions
Sign in to viewhttps://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/mofosyne
mofosynehttps://github.com/mofosyne
Tensor Encoding Scheme https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Tensor%20Encoding%20Scheme%22
Review Complexity : High https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Review%20Complexity%20%3A%20High%22
May 25, 2024https://github.com/ggml-org/llama.cpp/pull/1684#event-12930819627
https://github.com/Seedmanc
Seedmanchttps://github.com/Seedmanc
May 29, 2024https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2137025238
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/mofosyne
mofosynehttps://github.com/mofosyne
May 29, 2024 https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-2319448482
Performance improvements on Arm for legacy and k-quants mozilla-ai/llamafile#453 https://github.com/mozilla-ai/llamafile/pull/453
https://github.com/kaizizzzzzz
kaizizzzzzzhttps://github.com/kaizizzzzzz
Jul 30, 2024https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2257278836
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/Green-Sky
Green-Skyhttps://github.com/Green-Sky
Jul 30, 2024https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2257773570
@kaizizzzzzzhttps://github.com/kaizizzzzzz
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/kaizizzzzzz
kaizizzzzzzhttps://github.com/kaizizzzzzz
Jul 30, 2024https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2258513294
@Green-Skyhttps://github.com/Green-Sky
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/kaizizzzzzz
kaizizzzzzzhttps://github.com/kaizizzzzzz
Jul 31, 2024https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2259565236
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
A Visual Guide to Quantization guevara/read-it-later#11678 https://github.com/guevara/read-it-later/issues/11678
A Visual Guide to Quantization guevara/read-it-later#11692 https://github.com/guevara/read-it-later/issues/11692
https://github.com/asomoza
asomozahttps://github.com/asomoza
Aug 19, 2024 https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-2462431761
NF4 Flux params in diffusers huggingface/diffusers#9165 https://github.com/huggingface/diffusers/issues/9165
https://github.com/fedric95
fedric95https://github.com/fedric95
Sep 8, 2024https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2336822836
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/BarfingLemurs
BarfingLemurshttps://github.com/BarfingLemurs
Sep 26, 2024 https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-2549262713
about precision loss microsoft/T-MAC#52 https://github.com/microsoft/T-MAC/issues/52
https://github.com/HAOYON-666
HAOYON-666https://github.com/HAOYON-666
Sep 29, 2024https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2381062705
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/HAOYON-666
HAOYON-666https://github.com/HAOYON-666
Sep 29, 2024https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2381062870
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/QingtaoLi1
QingtaoLi1https://github.com/QingtaoLi1
Nov 5, 2024 https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-2634781604
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method #10181 https://github.com/ggml-org/llama.cpp/pull/10181
https://github.com/SamuelHafner
SamuelHafnerhttps://github.com/SamuelHafner
Nov 13, 2024https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2473738608
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/Green-Sky
Green-Skyhttps://github.com/Green-Sky
Nov 13, 2024https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2474094689
@SamuelHafnerhttps://github.com/SamuelHafner
@ikawrakowhttps://github.com/ikawrakow
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/SamuelHafner
SamuelHafnerhttps://github.com/SamuelHafner
Nov 13, 2024https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2474235194
@Green-Skyhttps://github.com/Green-Sky
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/ikawrakow
ikawrakowhttps://github.com/ikawrakow
Nov 13, 2024https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2474462323
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/Bearsaerker
Bearsaerkerhttps://github.com/Bearsaerker
Mar 12, 2025 https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-2913690728
Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352 https://github.com/ggml-org/llama.cpp/issues/12352
https://github.com/jiafatom
jiafatomhttps://github.com/jiafatom
Apr 15, 2025 https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-2986449850
k quant intel/neural-compressor#2169 https://github.com/intel/neural-compressor/pull/2169
https://github.com/lgyStoic
lgyStoichttps://github.com/lgyStoic
Apr 19, 2025https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816490255
#1256https://github.com/ggml-org/llama.cpp/issues/1256
@ikawrakowhttps://github.com/ikawrakow
https://private-user-images.githubusercontent.com/4526646/435352222-4bb05eb0-f8ed-46c4-8b1f-251de1a77599.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzEyMzQ0OTQsIm5iZiI6MTc3MTIzNDE5NCwicGF0aCI6Ii80NTI2NjQ2LzQzNTM1MjIyMi00YmIwNWViMC1mOGVkLTQ2YzQtOGIxZi0yNTFkZTFhNzc1OTkucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI2MDIxNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNjAyMTZUMDkyOTU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YzA2MjA2Nzk3MzI2ODliYTliYjM3M2Y1MjQ1NGQ1ZDQ5MjhjNWM2YzZiODMyMTg4NzBlZjQ1NzkxY2VjMzQ2NCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.tSgh3UijWK2I3Veu3_b2ulHO6Au84Wpx2oA7kW8i3-s
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/jrudolph
jrudolphhttps://github.com/jrudolph
Apr 19, 2025https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816577012
https://github.com/ggml-org/llama.cpp/blob/6408210082cc0a61b992b487be7e2ff2efbb9e36/ggml/src/ggml-common.h#L175https://github.com/ggml-org/llama.cpp/blob/6408210082cc0a61b992b487be7e2ff2efbb9e36/ggml/src/ggml-common.h#L175
https://github.com/ggml-org/llama.cpp/pull/1684
#1256https://github.com/ggml-org/llama.cpp/issues/1256
#1256https://github.com/ggml-org/llama.cpp/issues/1256
@ikawrakowhttps://github.com/ikawrakow
https://github.com/ikawrakowhttps://github.com/ikawrakow
https://github.com/user-attachments/assets/4bb05eb0-f8ed-46c4-8b1f-251de1a77599https://github.com/user-attachments/assets/4bb05eb0-f8ed-46c4-8b1f-251de1a77599
#1684 (comment)https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816490255
https://github.com/notifications/unsubscribe-auth/AAACNDALZMI6MLOE5KYUM4L22G5CHAVCNFSM6AAAAAAYZLHGCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJWGQ4TAMRVGUhttps://github.com/notifications/unsubscribe-auth/AAACNDALZMI6MLOE5KYUM4L22G5CHAVCNFSM6AAAAAAYZLHGCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJWGQ4TAMRVGU
ggml-org/llama.cpp#1684https://github.com/ggml-org/llama.cpp/pull/1684
#1684 (comment)https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816490255
#1256https://github.com/ggml-org/llama.cpp/issues/1256
#1256https://github.com/ggml-org/llama.cpp/issues/1256
@ikawrakowhttps://github.com/ikawrakow
https://github.com/ikawrakowhttps://github.com/ikawrakow
https://github.com/user-attachments/assets/4bb05eb0-f8ed-46c4-8b1f-251de1a77599https://github.com/user-attachments/assets/4bb05eb0-f8ed-46c4-8b1f-251de1a77599
#1684 (comment)https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2816490255
https://github.com/notifications/unsubscribe-auth/AAACNDALZMI6MLOE5KYUM4L22G5CHAVCNFSM6AAAAAAYZLHGCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJWGQ4TAMRVGUhttps://github.com/notifications/unsubscribe-auth/AAACNDALZMI6MLOE5KYUM4L22G5CHAVCNFSM6AAAAAAYZLHGCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJWGQ4TAMRVGU
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/nickfraser
nickfraserhttps://github.com/nickfraser
May 28, 2025 https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-3077269885
Feat (brevitas_examples/llm): GGUF export Xilinx/brevitas#1291 https://github.com/Xilinx/brevitas/pull/1291
https://github.com/mmwillet
mmwillethttps://github.com/mmwillet
May 31, 2025 https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-2998177116
Add and test quantization from Kokoro mmwillet/TTS.cpp#15 https://github.com/mmwillet/TTS.cpp/issues/15
https://github.com/MaoJianwei
MaoJianweihttps://github.com/MaoJianwei
Jun 23, 2025https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-2995316277
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/ikawrakow
ikawrakowhttps://github.com/ikawrakow
Jun 27, 2025 https://github.com/ggml-org/llama.cpp/pull/1684#ref-pullrequest-3178789455
Use cuBLAS for large batches and quants with block size 16 ikawrakow/ik_llama.cpp#559 https://github.com/ikawrakow/ik_llama.cpp/pull/559
https://github.com/jiangshibiao
jiangshibiaohttps://github.com/jiangshibiao
Sep 7, 2025https://github.com/ggml-org/llama.cpp/pull/1684#issuecomment-3263707425
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/michalharakal
michalharakalhttps://github.com/michalharakal
Nov 23, 2025 https://github.com/ggml-org/llama.cpp/pull/1684#ref-issue-3656240522
[R] reasearch quantization SKaiNET-developers/SKaiNET#208 https://github.com/SKaiNET-developers/SKaiNET/issues/208
https://github.co/hiddenchars
https://github.com/ggml-org/llama.cpp/pull/{{ revealButtonHref }}
Sign up for freehttps://github.com/join?source=comment-repo
Sign in to commenthttps://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fggml-org%2Fllama.cpp%2Fpull%2F1684
https://github.com/apps/github-actions
github-actions[bot] https://github.com/apps/github-actions
https://github.com/ggml-org/llama.cpp/pull/1684/files/af275faececaefb2e479001a579814a51d4f0067
https://github.com/ggerganov
ggerganov https://github.com/ggerganov
https://github.com/ggml-org/llama.cpp/pull/1684/files/af275faececaefb2e479001a579814a51d4f0067
https://github.com/fleszar1
fleszar1 https://github.com/fleszar1
https://github.com/ggml-org/llama.cpp/pull/1684/files/af275faececaefb2e479001a579814a51d4f0067
high priority https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22high%20priority%22
Less than 4 bits https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Less%20than%204%20bits%22
research 🔬 https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22research%20%F0%9F%94%AC%22
Review Complexity : High https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Review%20Complexity%20%3A%20High%22
Tensor Encoding Scheme https://github.com/ggml-org/llama.cpp/issues?q=state%3Aopen%20label%3A%22Tensor%20Encoding%20Scheme%22
Please reload this pagehttps://github.com/ggml-org/llama.cpp/pull/1684
https://github.com/ikawrakow
https://github.com/KerfuffleV2
https://github.com/Midaychi
https://github.com/EwoutH
https://github.com/0cc4m
https://github.com/TheBloke
https://github.com/AlvL1225
https://github.com/shouyiwang
https://github.com/alankila
https://github.com/Alumniminium
https://github.com/bbecausereasonss
https://github.com/mirek190
https://github.com/okpatil4u
https://github.com/x4080
https://github.com/pbronez
https://github.com/cosmic-snow
https://github.com/viperwasp
https://github.com/zhaohb
https://github.com/JohannesGaessler
https://github.com/RonanKMcGovern
https://github.com
Termshttps://docs.github.com/site-policy/github-terms/github-terms-of-service
Privacyhttps://docs.github.com/site-policy/privacy-policies/github-privacy-statement
Securityhttps://github.com/security
Statushttps://www.githubstatus.com/
Communityhttps://github.community/
Docshttps://docs.github.com/
Contacthttps://support.github.com?tags=dotcom-footer

Viewport: width=device-width


URLs of crawlers that visited me.