Merge. Synthesize. Create. Dialektik generates new content by fusing ideas from diverse sources, revealing unexpected insights and perspectives through a dialectical process.
- Loads and processes datasets from multiple sources
- Summarizes text into concise bullet points
- Performs semantic search for topic-based content selection
- Generates thesis, antithesis, and synthesis from summarized content
- Uses phi-3-vision-mlx for text generation and embeddings
- Python 3.8+
datasets
huggingface_hub
phi_3_vision_mlx
mlx
fire
To install Dialektik with all required dependencies:
pip install dialektik
Note: Make sure you have the necessary system requirements to run phi-3-vision-mlx and mlx.
Dialektik can be used from the command line after installation. Here are some example usages:
-
Generate a synthesis with default settings:
python dialektik.py
-
Specify a topic for semantic search:
python dialektik.py --topic "AI agents"
-
Specify sources and exclude certain terms:
python dialektik.py --list_source arxiv --list_exclude MIRI "Machine Intelligence Research Institute"
-
Set the number of books and bullet points per book:
python dialektik.py --num_book 5 --per_book 4
-
Use a different language model via API:
python dialektik.py --llm_model "mistralai/Mistral-Nemo-Instruct-2407"
-
For a full list of options, use:
python dialektik.py --help
You can also use Dialektik in your Python scripts:
from dialektik import synthesize
# Generate a synthesis with default settings
thesis, antithesis, synthesis = synthesize()
# Customize the synthesis process
output = synthesize(
topic="AI agents",
list_source=['arxiv'],
list_exclude=['MIRI', 'Machine Intelligence Research Institute'],
num_book=3,
per_book=3
)
The default dataset at 'JosefAlbers/StampyAI-alignment-research-dataset' is publicly available. You don't need to set up any environment variables to use dialektik
with this dataset.
If you want to use your own dataset:
- Prepare your dataset according to the required format.
- Modify the
PATH_DS
variable in the code to point to your dataset. - If your dataset is private or requires authentication, set up the following environment variables:
HF_WRITE_TOKEN
: Hugging Face write token (for pushing datasets)HF_READ_TOKEN
: Hugging Face read token (for accessing private datasets)
Dialektik now uses phi-3-vision-mlx for text generation and embeddings. This model is not easily swappable, but you can modify the pv.load()
and pv.generate()
calls in the code if you need to use a different model.
The synthesize()
function generates three outputs:
- Thesis: An article exploring the main themes and insights from the selected sources.
- Antithesis: A text presenting alternative perspectives and counterarguments to the thesis.
- Synthesis: A reconciliation of the thesis and antithesis, presenting a new, unified viewpoint.
All outputs are saved in the 'syntheses' folder with timestamps for easy reference.
This project is licensed under the MIT License.
Contributions to Dialektik are always welcome! Please feel free to submit a Pull Request.