Text Generation Examples
The anote-generate
SDK supports various text generation tasks. Here are practical examples for different use cases:
1. Customer Support Conversations
Generate synthetic customer support conversations for training chatbots:
from anotegenerate.core import AnoteGenerate
sdk = AnoteGenerate(api_key="your-api-key")
# Generate customer support conversations
result = sdk.generate(
task_type="text",
prompt="Generate customer support conversations about product issues",
num_rows=5,
columns=["customer_message", "agent_response", "issue_type", "resolution"]
)
print(result)
Generated Output:
Customer Message | Agent Response | Issue Type | Resolution |
---|---|---|---|
I can't log into my account | I'm sorry to hear that. Let me help you troubleshoot. Can you try clearing your browser cache? | login_problem | cache_cleared |
My order hasn't arrived yet | I'll check the status of your order. Can you provide your order number? | delivery_delay | tracking_provided |
The app keeps crashing | Let me help you with that. What device and operating system are you using? | app_crash | device_info_collected |
I want to cancel my subscription | I understand. Let me help you with the cancellation process. | subscription_cancel | cancellation_initiated |
How do I reset my password? | I can help you reset your password. I'll send a reset link to your email. | password_reset | reset_link_sent |
2. Product Reviews
Generate synthetic product reviews for sentiment analysis training:
# Generate product reviews
result = sdk.generate(
task_type="text",
prompt="Generate realistic product reviews for electronics with sentiment labels",
num_rows=4,
columns=["product_name", "review_text", "rating", "sentiment", "category"]
)
Generated Output:
Product Name | Review Text | Rating | Sentiment | Category |
---|---|---|---|---|
Wireless Headphones | Great sound quality and comfortable for long listening sessions. Battery life is impressive. | 5 | positive | audio |
Smartphone | Camera quality is disappointing and the battery drains too quickly. | 2 | negative | mobile |
Gaming Laptop | Excellent performance for gaming, but the fan noise is quite loud during heavy use. | 4 | positive | computers |
Smart Watch | The fitness tracking features are accurate, but the battery life could be better. | 3 | neutral | wearables |
3. Named Entity Recognition (NER) Data
Generate synthetic text with labeled entities for NER model training:
# Generate NER training data
result = sdk.generate(
task_type="text",
prompt="Generate sentences with person names, locations, and organizations for NER training",
num_rows=3,
columns=["text", "entities"]
)
Generated Output:
Text | Entities |
---|---|
John Smith works at Microsoft in Seattle. | [{'start': 0, 'end': 10, 'label': 'PERSON'}, {'start': 21, 'end': 30, 'label': 'ORG'}, {'start': 34, 'end': 41, 'label': 'LOCATION'}] |
Dr. Sarah Johnson visited Paris last month. | [{'start': 0, 'end': 13, 'label': 'PERSON'}, {'start': 22, 'end': 27, 'label': 'LOCATION'}] |
The CEO of Apple, Tim Cook, announced new products in San Francisco. | [{'start': 12, 'end': 16, 'label': 'ORG'}, {'start': 18, 'end': 26, 'label': 'PERSON'}, {'start': 47, 'end': 60, 'label': 'LOCATION'}] |
4. Question-Answering Pairs
Generate synthetic QA pairs for training question-answering models:
# Generate QA training data
result = sdk.generate(
task_type="text",
prompt="Generate question-answer pairs about machine learning concepts",
num_rows=3,
columns=["question", "answer", "topic", "difficulty"]
)
Generated Output:
Question | Answer | Topic | Difficulty |
---|---|---|---|
What is supervised learning? | Supervised learning is a type of machine learning where the model learns from labeled training data to make predictions on new, unseen data. | machine_learning_basics | beginner |
How does gradient descent work? | Gradient descent is an optimization algorithm that iteratively adjusts model parameters to minimize the loss function by following the negative gradient. | optimization | intermediate |
What is overfitting in machine learning? | Overfitting occurs when a model learns the training data too well, including noise and irrelevant patterns, leading to poor generalization on new data. | model_evaluation | intermediate |
5. Intent Classification Data
Generate synthetic utterances for intent classification training:
# Generate intent classification data
result = sdk.generate(
task_type="text",
prompt="Generate user utterances for different chatbot intents",
num_rows=4,
columns=["utterance", "intent", "confidence"]
)
Generated Output:
Utterance | Intent | Confidence |
---|---|---|
What's the weather like today? | weather_inquiry | 0.95 |
I need help with my account | support_request | 0.88 |
Book a flight to New York | booking_request | 0.92 |
What time does the store close? | business_hours | 0.87 |
6. Multi-turn Conversations
Generate synthetic multi-turn conversations for dialogue systems:
# Generate multi-turn conversations
result = sdk.generate(
task_type="text",
prompt="Generate multi-turn conversations between a user and a travel assistant",
num_rows=2,
columns=["conversation_id", "turn_number", "speaker", "message", "intent"]
)
Generated Output:
Conversation ID | Turn Number | Speaker | Message | Intent |
---|---|---|---|---|
conv_001 | 1 | user | I want to book a flight to Paris | booking_request |
conv_001 | 2 | assistant | I'd be happy to help you book a flight to Paris. When would you like to travel? | clarification |
conv_002 | 1 | user | What's the weather like in Tokyo? | weather_inquiry |
conv_002 | 2 | assistant | The weather in Tokyo is currently sunny with a temperature of 22°C. | information_provided |
Advanced Usage: Custom Prompts
You can customize the generation by providing more specific prompts:
# Generate domain-specific content
result = sdk.generate(
task_type="text",
prompt="Generate medical symptom descriptions with severity levels",
num_rows=3,
columns=["symptom", "description", "severity", "body_part"]
)
Generated Output:
Symptom | Description | Severity | Body Part |
---|---|---|---|
headache | Dull, persistent pain in the forehead region | moderate | head |
fever | Elevated body temperature with chills and sweating | high | systemic |
cough | Dry, irritating cough that worsens at night | mild | respiratory |
Tips for Better Generation
- Be Specific: Use detailed prompts to get more targeted results
- Iterate: Start with small batches and refine your prompts based on results
- Validate: Always review generated data for quality and relevance
- Combine with Human Data: Use synthetic data to augment real human-annotated datasets
Output Formats
The API returns data in multiple formats:
- JSON: Structured data with all columns and metadata
- CSV: Tabular format for easy import into data analysis tools
- Downloadable Files: Direct file downloads for large datasets
Each generated dataset includes metadata about the generation parameters and can be directly used for model training or evaluation.