Skip to content

Text Generation Examples

The anote-generate SDK supports various text generation tasks. Here are practical examples for different use cases:

1. Customer Support Conversations

Generate synthetic customer support conversations for training chatbots:

from anotegenerate.core import AnoteGenerate

sdk = AnoteGenerate(api_key="your-api-key")

# Generate customer support conversations
result = sdk.generate(
    task_type="text",
    prompt="Generate customer support conversations about product issues",
    num_rows=5,
    columns=["customer_message", "agent_response", "issue_type", "resolution"]
)

print(result)

Generated Output:

Customer Message Agent Response Issue Type Resolution
I can't log into my account I'm sorry to hear that. Let me help you troubleshoot. Can you try clearing your browser cache? login_problem cache_cleared
My order hasn't arrived yet I'll check the status of your order. Can you provide your order number? delivery_delay tracking_provided
The app keeps crashing Let me help you with that. What device and operating system are you using? app_crash device_info_collected
I want to cancel my subscription I understand. Let me help you with the cancellation process. subscription_cancel cancellation_initiated
How do I reset my password? I can help you reset your password. I'll send a reset link to your email. password_reset reset_link_sent

2. Product Reviews

Generate synthetic product reviews for sentiment analysis training:

# Generate product reviews
result = sdk.generate(
    task_type="text",
    prompt="Generate realistic product reviews for electronics with sentiment labels",
    num_rows=4,
    columns=["product_name", "review_text", "rating", "sentiment", "category"]
)

Generated Output:

Product Name Review Text Rating Sentiment Category
Wireless Headphones Great sound quality and comfortable for long listening sessions. Battery life is impressive. 5 positive audio
Smartphone Camera quality is disappointing and the battery drains too quickly. 2 negative mobile
Gaming Laptop Excellent performance for gaming, but the fan noise is quite loud during heavy use. 4 positive computers
Smart Watch The fitness tracking features are accurate, but the battery life could be better. 3 neutral wearables

3. Named Entity Recognition (NER) Data

Generate synthetic text with labeled entities for NER model training:

# Generate NER training data
result = sdk.generate(
    task_type="text",
    prompt="Generate sentences with person names, locations, and organizations for NER training",
    num_rows=3,
    columns=["text", "entities"]
)

Generated Output:

Text Entities
John Smith works at Microsoft in Seattle. [{'start': 0, 'end': 10, 'label': 'PERSON'}, {'start': 21, 'end': 30, 'label': 'ORG'}, {'start': 34, 'end': 41, 'label': 'LOCATION'}]
Dr. Sarah Johnson visited Paris last month. [{'start': 0, 'end': 13, 'label': 'PERSON'}, {'start': 22, 'end': 27, 'label': 'LOCATION'}]
The CEO of Apple, Tim Cook, announced new products in San Francisco. [{'start': 12, 'end': 16, 'label': 'ORG'}, {'start': 18, 'end': 26, 'label': 'PERSON'}, {'start': 47, 'end': 60, 'label': 'LOCATION'}]

4. Question-Answering Pairs

Generate synthetic QA pairs for training question-answering models:

# Generate QA training data
result = sdk.generate(
    task_type="text",
    prompt="Generate question-answer pairs about machine learning concepts",
    num_rows=3,
    columns=["question", "answer", "topic", "difficulty"]
)

Generated Output:

Question Answer Topic Difficulty
What is supervised learning? Supervised learning is a type of machine learning where the model learns from labeled training data to make predictions on new, unseen data. machine_learning_basics beginner
How does gradient descent work? Gradient descent is an optimization algorithm that iteratively adjusts model parameters to minimize the loss function by following the negative gradient. optimization intermediate
What is overfitting in machine learning? Overfitting occurs when a model learns the training data too well, including noise and irrelevant patterns, leading to poor generalization on new data. model_evaluation intermediate

5. Intent Classification Data

Generate synthetic utterances for intent classification training:

# Generate intent classification data
result = sdk.generate(
    task_type="text",
    prompt="Generate user utterances for different chatbot intents",
    num_rows=4,
    columns=["utterance", "intent", "confidence"]
)

Generated Output:

Utterance Intent Confidence
What's the weather like today? weather_inquiry 0.95
I need help with my account support_request 0.88
Book a flight to New York booking_request 0.92
What time does the store close? business_hours 0.87

6. Multi-turn Conversations

Generate synthetic multi-turn conversations for dialogue systems:

# Generate multi-turn conversations
result = sdk.generate(
    task_type="text",
    prompt="Generate multi-turn conversations between a user and a travel assistant",
    num_rows=2,
    columns=["conversation_id", "turn_number", "speaker", "message", "intent"]
)

Generated Output:

Conversation ID Turn Number Speaker Message Intent
conv_001 1 user I want to book a flight to Paris booking_request
conv_001 2 assistant I'd be happy to help you book a flight to Paris. When would you like to travel? clarification
conv_002 1 user What's the weather like in Tokyo? weather_inquiry
conv_002 2 assistant The weather in Tokyo is currently sunny with a temperature of 22°C. information_provided

Advanced Usage: Custom Prompts

You can customize the generation by providing more specific prompts:

# Generate domain-specific content
result = sdk.generate(
    task_type="text",
    prompt="Generate medical symptom descriptions with severity levels",
    num_rows=3,
    columns=["symptom", "description", "severity", "body_part"]
)

Generated Output:

Symptom Description Severity Body Part
headache Dull, persistent pain in the forehead region moderate head
fever Elevated body temperature with chills and sweating high systemic
cough Dry, irritating cough that worsens at night mild respiratory

Tips for Better Generation

  1. Be Specific: Use detailed prompts to get more targeted results
  2. Iterate: Start with small batches and refine your prompts based on results
  3. Validate: Always review generated data for quality and relevance
  4. Combine with Human Data: Use synthetic data to augment real human-annotated datasets

Output Formats

The API returns data in multiple formats:

  • JSON: Structured data with all columns and metadata
  • CSV: Tabular format for easy import into data analysis tools
  • Downloadable Files: Direct file downloads for large datasets

Each generated dataset includes metadata about the generation parameters and can be directly used for model training or evaluation.