# OTEL Trace Ingestion

[OpenTelemetry](https://opentelemetry.io/docs/what-is-opentelemetry/) (OTEL) Trace Ingestion allows for the richest data to be uploaded to your Project, but requires some off-platform coding and does not support backfilling data. This guide provides comprehensive instructions for instrumenting your AI agent application to send OpenTelemetry (OTEL) traces to DBNL

## Prerequisites

**DBNL Credentials**: You'll need:

* DBNL API URL (e.g., `http://localhost:8080/api`)
* API Token (Bearer token for [authentication](https://github.com/dbnlAI/docs/blob/main/platform/authentication/README.md) which can be generated at `DBNL_API_URL/tokens`)
* Project ID (your DBNL project identifier, typically starts with `proj_` and is part of the URL for your project)

{% hint style="warning" %}
OTEL Trace Ingestion needs to be enabled during [Deployment](https://docs.dbnl.com/v0.29.x/platform/deployment) so that the required Clickhouse database is provisioned and initialized.
{% endhint %}

You will need to install the required OpenTelemetry packages:

```bash
pip install opentelemetry-sdk>=1.20.0
pip install opentelemetry-exporter-otlp>=1.20.0
```

For LangChain applications, also install [OpenInference](https://arize-ai.github.io/openinference/) instrumentation:

```bash
pip install openinference-instrumentation-langchain>=0.1.0
```

## Implementation

### Basic Setup

Create a telemetry initialization module (`telemetry.py`) in your application:

```python
import os
import logging
from typing import Optional
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def create_dbnl_exporter() -> Optional[OTLPSpanExporter]:
    """Create OTLP exporter for DBNL"""
    # Get configuration from environment
    api_url = os.environ.get("DBNL_API_URL", "").strip()
    api_token = os.environ.get("DBNL_API_TOKEN", "").strip()
    project_id = os.environ.get("DBNL_PROJECT_ID", "").strip()
    
    # Validate configuration
    if not all([api_url, api_token, project_id]):
        logger.info("DBNL configuration incomplete. Set DBNL_API_URL, DBNL_API_TOKEN, and DBNL_PROJECT_ID.")
        return None
    
    try:
        # Create headers
        headers = {
            "Authorization": f"Bearer {api_token}",
            "x-dbnl-project-id": project_id,
            "Content-Type": "application/x-protobuf",
        }
        
        # Create exporter with hardcoded endpoint format
        endpoint = f"https://{api_url}/otel/v1/traces"
        exporter = OTLPSpanExporter(
            endpoint=endpoint,
            headers=headers
        )
        
        logger.info(f"✅ DBNL exporter configured: {endpoint}")
        return exporter
        
    except Exception as e:
        logger.error(f"❌ Failed to configure DBNL exporter: {e}")
        return None

def initialize_telemetry():
    """Initialize OpenTelemetry with DBNL exporter"""
    # Create tracer provider with resource attributes
    resource = Resource.create({
        "service.name": os.environ.get("OTEL_SERVICE_NAME", "my-agent"),
    })
    
    tracer_provider = TracerProvider(resource=resource)
    trace.set_tracer_provider(tracer_provider)
    
    # Add DBNL exporter
    dbnl_exporter = create_dbnl_exporter()
    if dbnl_exporter:
        processor = BatchSpanProcessor(dbnl_exporter)
        tracer_provider.add_span_processor(processor)
        logger.info("📊 DBNL OTEL tracing enabled")
    else:
        logger.info("ℹ️  DBNL OTEL tracing not configured")
    
    return tracer_provider

# Initialize on import
tracer_provider = initialize_telemetry()
tracer = trace.get_tracer(__name__)
```

### LangChain Integration

For LangChain applications, add OpenInference instrumentation:

```python
from openinference.instrumentation.langchain import LangChainInstrumentor

def initialize_telemetry():
    """Initialize OpenTelemetry with DBNL exporter and LangChain instrumentation"""
    # ... (previous code) ...
    
    # Add LangChain instrumentation
    try:
        instrumentor = LangChainInstrumentor()
        instrumentor.instrument(tracer_provider=tracer_provider)
        logger.info("🔧 LangChain OpenInference instrumentation enabled")
    except Exception as e:
        logger.error(f"❌ Failed to instrument LangChain: {e}")
    
    return tracer_provider
```

### Application Integration

Initialize telemetry early in your application startup:

**FastAPI Example:**

```python
from fastapi import FastAPI
from telemetry import initialize_telemetry

app = FastAPI()

@app.on_event("startup")
async def startup_event():
    initialize_telemetry()
    print("✅ Telemetry initialized")
```

**Standalone Script Example:**

```python
from telemetry import initialize_telemetry

if __name__ == "__main__":
    initialize_telemetry()
    # Your application code here
```

## Required Trace Fields

The following fields are required regardless of which ingestion method you are using:

* **`input`**: The text input to the LLM as a `string`
* **`output`**: The text response from the LLM as a `string`
* **`timestamp`**: The UTC timecode associated with the LLM call as a `timestamptz`

You may choose to track other attributes such as `total_token_count` or `feedback_score` which are part of the [DBNL semantic convention](https://docs.dbnl.com/v0.29.x/configuration/dbnl-semantic-convention).

### Custom Attributes

You can add custom attributes for better analytics:

```python
from opentelemetry import trace

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("agent_execution") as span:
    # Set semantic attributes
    span.set_attribute("input.value", user_query)
    span.set_attribute("output.value", agent_response)
    
    # Add custom metadata
    span.set_attribute("session.id", session_id)
    span.set_attribute("conversation.id", conversation_id)
    span.set_attribute("tool.name", "search_symbol")
    span.set_attribute("tool.success", True)
    span.set_attribute("deployment.type", "web-application")
```

## Advanced Configuration

### Batch Processing

DBNL uses `BatchSpanProcessor` by default for efficient trace export. This batches spans before sending, reducing network overhead:

```python
from opentelemetry.sdk.trace.export import BatchSpanProcessor

processor = BatchSpanProcessor(dbnl_exporter)
tracer_provider.add_span_processor(processor)
```

For immediate export (useful for debugging), use `SimpleSpanProcessor`:

```python
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

processor = SimpleSpanProcessor(dbnl_exporter)
tracer_provider.add_span_processor(processor)
```

## Verification

### Test Trace Export

Create a test span to verify traces are being sent:

```python
from opentelemetry import trace
from telemetry import tracer_provider

tracer = trace.get_tracer(__name__)

# Create a test span
with tracer.start_as_current_span("test_dbnl_export") as span:
    span.set_attribute("input.value", "test input")
    span.set_attribute("output.value", "test output")
    span.set_attribute("test", True)

# Force flush to ensure export
tracer_provider.force_flush()
print("✅ Test span exported to DBNL")
```

### View Traces in DBNL

After sending traces, verify they appear in your DBNL dashboard. By default, traces are processed into logs nightly so you will not see them right away.

1. Log into your DBNL deployment and go to your project
2. Check the Status page to confirm that they have been processed
3. Navigate to the Explorer or Logs section
4. Filter by your project ID or service name
5. Verify traces are appearing with the expected attributes

## Troubleshooting

### Traces Not Appearing in DBNL

1. **Check Environment Variables**: Verify all required variables are set:

   ```bash
   echo $DBNL_API_URL
   echo $DBNL_API_TOKEN
   echo $DBNL_PROJECT_ID
   ```
2. **Verify API Endpoint**: Test connectivity to DBNL:

   ```bash
   curl -H "Authorization: Bearer $DBNL_API_TOKEN" \
        -H "x-dbnl-project-id: $DBNL_PROJECT_ID" \
        https://$DBNL_API_URL/health
   ```
3. **Check Logs**: Look for DBNL exporter configuration messages:

   ```
   ✅ DBNL exporter configured: https://api.dev.dbnl.com/otel/v1/traces
   📊 DBNL OTEL tracing enabled
   ```
4. **Verify URL Formatting**: Ensure the endpoint is correctly formatted:
   * Format: `https://{DBNL_API_URL}/otel/v1/traces`
   * Example: `https://api.dev.dbnl.com/otel/v1/traces`

### Common Issues

**Issue: "DBNL configuration incomplete"**

* **Solution**: Ensure `DBNL_API_URL`, `DBNL_API_TOKEN`, and `DBNL_PROJECT_ID` are all set

**Issue: "Failed to configure DBNL exporter"**

* **Solution**: Check that the API URL is valid and the token has proper permissions

**Issue: Traces appear but missing attributes**

* **Solution**: Ensure you're using OpenInference semantic conventions or manually setting required attributes (`input`, `output`, `timestamp`)

**Issue: High latency or performance impact**

* **Solution**: Use `BatchSpanProcessor` (default) instead of `SimpleSpanProcessor` for better performance

For issues or questions:

1. Check the troubleshooting section above
2. Review DBNL documentation
3. Verify your DBNL deployment has OTEL Trace Ingestion enabled
4. Contact DBNL support at <support@distributional.com> with your project ID and API endpoint

## Best Practices

1. **Use Batch Processing**: Always use `BatchSpanProcessor` in production for better performance
2. **Use Semantic Conventions**: Follow OpenInference conventions for automatic attribute mapping
3. **Error Handling**: Wrap exporter creation in try-except blocks to prevent application failures
4. **Graceful Degradation**: Allow your application to function even if DBNL configuration is incomplete

## Example: Complete Integration

Here's a complete example combining all the concepts:

```python
# telemetry.py
import os
import logging
from typing import Optional
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from openinference.instrumentation.langchain import LangChainInstrumentor

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def create_dbnl_exporter() -> Optional[OTLPSpanExporter]:
    """Create OTLP exporter for DBNL"""
    api_url = os.environ.get("DBNL_API_URL", "").strip()
    api_token = os.environ.get("DBNL_API_TOKEN", "").strip()
    project_id = os.environ.get("DBNL_PROJECT_ID", "").strip()
    
    if not all([api_url, api_token, project_id]):
        logger.info("DBNL configuration incomplete")
        return None
    
    try:
        headers = {
            "Authorization": f"Bearer {api_token}",
            "x-dbnl-project-id": project_id,
            "Content-Type": "application/x-protobuf",
        }
        
        endpoint = f"https://{api_url}/otel/v1/traces"
        exporter = OTLPSpanExporter(endpoint=endpoint, headers=headers)
        logger.info(f"✅ DBNL exporter configured: {endpoint}")
        return exporter
    except Exception as e:
        logger.error(f"❌ Failed to configure DBNL exporter: {e}")
        return None

def initialize_telemetry():
    """Initialize OpenTelemetry with DBNL exporter"""
    resource = Resource.create({
        "service.name": os.environ.get("OTEL_SERVICE_NAME", "my-agent"), # Optional: identifies your service
    })
    
    tracer_provider = TracerProvider(resource=resource)
    trace.set_tracer_provider(tracer_provider)
    
    # Add DBNL exporter
    dbnl_exporter = create_dbnl_exporter()
    if dbnl_exporter:
        processor = BatchSpanProcessor(dbnl_exporter)
        tracer_provider.add_span_processor(processor)
        logger.info("📊 DBNL OTEL tracing enabled")
    
    # Add LangChain instrumentation
    try:
        instrumentor = LangChainInstrumentor()
        instrumentor.instrument(tracer_provider=tracer_provider)
        logger.info("🔧 LangChain instrumentation enabled")
    except Exception as e:
        logger.error(f"❌ Failed to instrument LangChain: {e}")
    
    return tracer_provider

# Initialize
tracer_provider = initialize_telemetry()
tracer = trace.get_tracer(__name__)
```

## Additional Resources

* [DBNL Semantic Convention](https://docs.dbnl.com/configuration/data-pipeline/dbnl-semantic-convention) - Learn about semantic conventions for better analytics
* [OpenTelemetry Python Documentation](https://opentelemetry.io/docs/instrumentation/python/) - Official OpenTelemetry Python docs
* [OpenInference Documentation](https://github.com/Arize-ai/openinference) - OpenInference semantic conventions
