Text Generation Examples

The anote-generate SDK supports various text generation tasks. Here are practical examples for different use cases:

1. Customer Support Conversations

Generate synthetic customer support conversations for training chatbots:

from anotegenerate.core import AnoteGenerate

sdk = AnoteGenerate(api_key="your-api-key")

# Generate customer support conversations
result = sdk.generate(
    task_type="text",
    prompt="Generate customer support conversations about product issues",
    num_rows=5,
    columns=["customer_message", "agent_response", "issue_type", "resolution"]
)

print(result)

Generated Output:

Customer Message	Agent Response	Issue Type	Resolution
I can't log into my account	I'm sorry to hear that. Let me help you troubleshoot. Can you try clearing your browser cache?	login_problem	cache_cleared
My order hasn't arrived yet	I'll check the status of your order. Can you provide your order number?	delivery_delay	tracking_provided
The app keeps crashing	Let me help you with that. What device and operating system are you using?	app_crash	device_info_collected
I want to cancel my subscription	I understand. Let me help you with the cancellation process.	subscription_cancel	cancellation_initiated
How do I reset my password?	I can help you reset your password. I'll send a reset link to your email.	password_reset	reset_link_sent

2. Product Reviews

Generate synthetic product reviews for sentiment analysis training:

# Generate product reviews
result = sdk.generate(
    task_type="text",
    prompt="Generate realistic product reviews for electronics with sentiment labels",
    num_rows=4,
    columns=["product_name", "review_text", "rating", "sentiment", "category"]
)

Generated Output:

Product Name	Review Text	Rating	Sentiment	Category
Wireless Headphones	Great sound quality and comfortable for long listening sessions. Battery life is impressive.	5	positive	audio
Smartphone	Camera quality is disappointing and the battery drains too quickly.	2	negative	mobile
Gaming Laptop	Excellent performance for gaming, but the fan noise is quite loud during heavy use.	4	positive	computers
Smart Watch	The fitness tracking features are accurate, but the battery life could be better.	3	neutral	wearables

3. Named Entity Recognition (NER) Data

Generate synthetic text with labeled entities for NER model training:

# Generate NER training data
result = sdk.generate(
    task_type="text",
    prompt="Generate sentences with person names, locations, and organizations for NER training",
    num_rows=3,
    columns=["text", "entities"]
)

Generated Output:

Text	Entities
John Smith works at Microsoft in Seattle.	[{'start': 0, 'end': 10, 'label': 'PERSON'}, {'start': 21, 'end': 30, 'label': 'ORG'}, {'start': 34, 'end': 41, 'label': 'LOCATION'}]
Dr. Sarah Johnson visited Paris last month.	[{'start': 0, 'end': 13, 'label': 'PERSON'}, {'start': 22, 'end': 27, 'label': 'LOCATION'}]
The CEO of Apple, Tim Cook, announced new products in San Francisco.	[{'start': 12, 'end': 16, 'label': 'ORG'}, {'start': 18, 'end': 26, 'label': 'PERSON'}, {'start': 47, 'end': 60, 'label': 'LOCATION'}]

4. Question-Answering Pairs

Generate synthetic QA pairs for training question-answering models:

# Generate QA training data
result = sdk.generate(
    task_type="text",
    prompt="Generate question-answer pairs about machine learning concepts",
    num_rows=3,
    columns=["question", "answer", "topic", "difficulty"]
)

Generated Output:

Question	Answer	Topic	Difficulty
What is supervised learning?	Supervised learning is a type of machine learning where the model learns from labeled training data to make predictions on new, unseen data.	machine_learning_basics	beginner
How does gradient descent work?	Gradient descent is an optimization algorithm that iteratively adjusts model parameters to minimize the loss function by following the negative gradient.	optimization	intermediate
What is overfitting in machine learning?	Overfitting occurs when a model learns the training data too well, including noise and irrelevant patterns, leading to poor generalization on new data.	model_evaluation	intermediate

5. Intent Classification Data

Generate synthetic utterances for intent classification training:

# Generate intent classification data
result = sdk.generate(
    task_type="text",
    prompt="Generate user utterances for different chatbot intents",
    num_rows=4,
    columns=["utterance", "intent", "confidence"]
)

Generated Output:

Utterance	Intent	Confidence
What's the weather like today?	weather_inquiry	0.95
I need help with my account	support_request	0.88
Book a flight to New York	booking_request	0.92
What time does the store close?	business_hours	0.87

6. Multi-turn Conversations

Generate synthetic multi-turn conversations for dialogue systems:

# Generate multi-turn conversations
result = sdk.generate(
    task_type="text",
    prompt="Generate multi-turn conversations between a user and a travel assistant",
    num_rows=2,
    columns=["conversation_id", "turn_number", "speaker", "message", "intent"]
)

Generated Output:

Conversation ID	Turn Number	Speaker	Message	Intent
conv_001	1	user	I want to book a flight to Paris	booking_request
conv_001	2	assistant	I'd be happy to help you book a flight to Paris. When would you like to travel?	clarification
conv_002	1	user	What's the weather like in Tokyo?	weather_inquiry
conv_002	2	assistant	The weather in Tokyo is currently sunny with a temperature of 22°C.	information_provided

Advanced Usage: Custom Prompts

You can customize the generation by providing more specific prompts:

# Generate domain-specific content
result = sdk.generate(
    task_type="text",
    prompt="Generate medical symptom descriptions with severity levels",
    num_rows=3,
    columns=["symptom", "description", "severity", "body_part"]
)

Generated Output:

Symptom	Description	Severity	Body Part
headache	Dull, persistent pain in the forehead region	moderate	head
fever	Elevated body temperature with chills and sweating	high	systemic
cough	Dry, irritating cough that worsens at night	mild	respiratory

Tips for Better Generation

Be Specific: Use detailed prompts to get more targeted results
Iterate: Start with small batches and refine your prompts based on results
Validate: Always review generated data for quality and relevance
Combine with Human Data: Use synthetic data to augment real human-annotated datasets

Output Formats

The API returns data in multiple formats:

JSON: Structured data with all columns and metadata
CSV: Tabular format for easy import into data analysis tools
Downloadable Files: Direct file downloads for large datasets

Each generated dataset includes metadata about the generation parameters and can be directly used for model training or evaluation.