* handle partially quantized models
- fix for #53#71#69#74
- in order to test the models
- I added a default prompt of an appropriate form
- while working on the model configuration also added additional stop tokens (#74)
- fixed the repetitionPenalty code (#71)
* implement LoRA / QLoRA
- example of using MLX to fine-tune an LLM with low rank adaptation (LoRA) for a target task
- see also https://arxiv.org/abs/2106.09685
- based on https://github.com/ml-explore/mlx-examples/tree/main/lora
* add some command line flags I found useful during use
- --quiet -- don't print decorator text, just the generated text
- --prompt @/tmp/file.txt -- load prompt from file
* user can specify path to model OR model identifier in huggingface
* update mlx-swift reference
Co-authored-by: Ashraful Islam <ashraful.meche@gmail.com>
Co-authored-by: JustinMeans <46542161+JustinMeans@users.noreply.github.com>
- remove async llm generation -- this is just doubling our work
- and does not match the style used in the example applications
- package generation parameters into a struct
- refactor command line arguments into distinct pieces based on their use
- this will be reusable in the lora commands
* Add Package.swift for LLM and MNIST
* Make ModelType properties public
* Make ModelType method createModel public
* Add installation instructions to readme
* Feat: LLMEval UI Improvements
1. adds Markdown rendering in the UI
2. Adds init time and token/second stat
3. Minor UI enhancements
* feat: adds a copy to clipboard button for llm outputs
* adds scrollviewreader to sync with main
* ran pre-format to resolve formatting issues
* updates the missing dependency in project definition
* feat: switch between plain text and markdown
adds a segemented picker to switch between plain text and markdown
* switch swift-tokenizers to main, remove some workarounds
- swift-tokenizers is getting a lot of updates and fixes, let's track main for now
- remove some workarounds that are no longer needed
- https://github.com/huggingface/swift-transformers/issues/63
* add buffer cache limit
* swift-format
* a more reasonable size
* add memory stats to command line tool, update to final api
* add note about changing models