Skip to content

Commit 8cdc33f

Browse files
committed
update README and setup
1 parent 83164dd commit 8cdc33f

2 files changed

Lines changed: 19 additions & 3 deletions

File tree

README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
In this work, we formulate tokenization as an optimization objective, show that it is NP-hard via a simple reduction from vertex cover, and propose a polynomial-time greedy algorithm **GreedTok**.
33
Our formulation naturally relaxes to the well-studied weighted maximum coverage problem which has a simple $(1 - 1/e)$-approximation greedy algorithm.
44

5+
To do: Huggingface AutoTokenizer interface
6+
57
### GreedTok
68
1. If using python wrapper
79

@@ -59,4 +61,12 @@ Our formulation naturally relaxes to the well-studied weighted maximum coverage
5961
Evaluations in [eval_notebook.ipynb](https://github.com/PreferredAI/aoatt/blob/main/eval_notebook.ipynb)
6062
6163
### Citation
62-
TBD
64+
```
65+
@article{lim2025partition,
66+
title={A partition cover approach to tokenization},
67+
author={Lim, Jia Peng and Choo, Davin and Lauw, Hady W.},
68+
year={2025},
69+
journal={arXiv preprint arXiv:2501.06246},
70+
url={https://arxiv.org/abs/2501.06246},
71+
}
72+
```

setup.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from sysconfig import get_path
22
from setuptools import setup, Extension
3+
from pathlib import Path
34

45
PATH_PREFIX = get_path('data')
56
module1 = Extension(f'greedy_builder',
@@ -13,15 +14,20 @@
1314
libraries = ['tbb'],
1415
sources = ['pcatt/greedy_builder.cpp'])
1516

17+
this_directory = Path(__file__).parent
18+
long_description = (this_directory / "README.md").read_text()
19+
1620
setup(
1721
name="greedtok",
18-
version="0.1",
22+
version="0.13",
1923
description="Partition Cover Approach to Tokenization",
2024
author="JP Lim",
2125
author_email="jiapeng.lim.2021@phdcs.smu.edu.sg",
2226
license = "MIT",
2327
setup_requires=['pybind11', 'tbb-devel'],
2428
url = "https://github.com/PreferredAI/pcatt/",
2529
download_url = "https://github.com/PreferredAI/pcatt/archive/refs/tags/v0.13.tar.gz",
26-
ext_modules = [module1]
30+
ext_modules = [module1],
31+
long_description=long_description,
32+
long_description_content_type='text/markdown'
2733
)

0 commit comments

Comments
 (0)