Skip to content

Instantly share code, notes, and snippets.

@jwalsh
Last active December 12, 2024 03:06
Show Gist options
  • Save jwalsh/fbf56278d5b80f86de7e1abbb065fb1c to your computer and use it in GitHub Desktop.
Save jwalsh/fbf56278d5b80f86de7e1abbb065fb1c to your computer and use it in GitHub Desktop.

SmallCon: A Virtual Conference for GenAI Builders

December 11, 2024 | 10:00 AM - 2:30 PM PT

About

SmallCon is the first virtual conference dedicated to exploring the potential of Small Language Models (SLMs) in production environments. Industry leaders from prominent tech companies share insights, best practices, and real-world implementation experiences.

Full Conference Summary

Key Themes

Technical Definition of SLMs

  • Models under 3-4B parameters
  • Deployable on laptops/mobile devices
  • Sub-second inference times (0.1s target)
  • Focused on specific, bounded tasks
  • Cost-efficient scaling capabilities

Implementation Patterns

  • LoRA adaptation (60+ adapters per GPU)
  • Hybrid deployment architectures
  • API-first approaches
  • Private VPC + managed scaling
  • Continuous fine-tuning pipelines

Performance Benchmarks

  • 10x cost reduction vs traditional approaches
  • 8% higher F1 scores
  • 80% higher throughput
  • $20 per training cycle
  • 85% organizational adoption rates

Emerging Technologies

  • Solar LLM family (Upstage)
  • Hamba Language Model (1.5B params)
  • Agent Force (Salesforce)
  • Gretel Navigator
  • Guardrails validation framework

View Performance Metrics

Sessions & Materials

Morning Sessions

  1. Keynote on SLM Future
  2. Enterprise Implementation Case Study
  3. Customer Service Analytics
  4. GenAI Future Panel

Afternoon Sessions

  1. Agentforce Platform Deep Dive
  2. Production AI Panel
  3. Synthetic Data Generation
  4. Solar LLMs Implementation
  5. Continuous Fine-tuning
  6. Model Evaluation Best Practices

Key Technical Patterns

Infrastructure Evolution

  1. Initial Phase (2022-2023):

    • API-based access
    • Individual model scaling
    • Basic prompting and RAG
  2. Current Phase (2024):

    • Multi-adapter architectures
    • Hybrid deployment models
    • Continuous fine-tuning
    • Integrated validation

Validation & Quality

  • Human-in-the-loop evaluation
  • Fact-checking modules
  • Automated guardrails
  • Continuous monitoring
  • Performance metrics tracking

Cost Optimization

  • Linear scaling with adapters
  • Shared infrastructure
  • Pay-per-call models
  • Efficient fine-tuning
  • Resource pooling

Data Management

  • Synthetic data generation
  • Document processing pipelines
  • Versioned datasets
  • Quality-focused curation
  • Privacy-preserving techniques

Featured Organizations

  • Meta
  • Hugging Face
  • Mistral AI
  • Salesforce
  • Upstage
  • NVIDIA
  • DoorDash
  • Marsh & McLennan
  • Predibase
  • Gretel
  • Guardrails AI

Impact & Metrics

  • Enterprise adoption rates: 85%
  • Request volumes: 25M annually
  • Time saved: 1M+ hours
  • Training costs: ~$20/cycle
  • Inference times: 0.1s achieved

The conference highlighted the industry's rapid move toward practical, efficient AI implementations using small language models, with particular emphasis on reliability, cost-effectiveness, and real-world validation strategies.

πŸ’‘ Key Technical Implementation Details

AI Stack Evolution:

  1. Initial Stack (AWS-based):

    • SageMaker powering 60+ indicators
    • First LLM implementation in 2019
    • Transition to Longformer in 2021 for extended context
    • Individual auto-scaling infrastructure per model
  2. Current Architecture (PredaBase + LoRA):

    • Base Model: Llama 3.18B
    • 60+ LoRA adapters on single GPU
    • Hybrid setup with private VPC + managed scaling
    • Sub-second inference times (0.1s achieved)

πŸ“ˆ Performance Metrics

Comparative Analysis:

  • Cost: 10x reduction vs OpenAI
  • Accuracy: 8% higher F1 score
  • Throughput: 80% higher than alternatives
  • Latency: 0.1 second inference time (vs 2s target)
  • Scale: Hundreds of inferences per second

Infrastructure Requirements:

  • Rapid scaling (within 1 minute)
  • On-demand GPU provisioning
  • Support for variable text lengths (2min - 1hr calls)
  • Handling unpredictable traffic patterns

πŸ€– Technical Improvements

Training Pipeline:

  1. Data Preparation:

    • Versioned datasets
    • Curated training data
    • Smaller but high-quality datasets
  2. Model Training:

    • Configurable parameters (learning rate, target modules)
    • Runs on commodity hardware
    • Hours/days reduced to minutes
    • ~$20 per training cycle
  3. Deployment:

    • Configuration-based deployment
    • Simultaneous version running
    • Easy A/B testing
    • Zero marginal cost per adapter

πŸ“‹ Monitoring & Operations

System Monitoring:

  • Throughput tracking
  • Latency measurements
  • Model drift detection
  • Combined dashboard system (PredaBase + Converza)

Cost Analysis:

  • Linear cost scaling with PredaBase
  • Exponential cost increase avoided
  • Near-zero marginal cost per adapter
  • Infrastructure costs primarily tied to throughput/latency requirements

The implementation demonstrates successful migration to small language models while achieving better performance metrics and significant cost savings, particularly in scaling scenarios.

🎯 Conference Overview

  • Name: SmallCon
  • Date: December 11, 2024
  • Focus: Small Language Models (SLMs) and Enterprise AI Implementation
  • Format: Virtual conference with mixed session types

πŸ‘₯ Session Details

Fireside Chat with Paul Beswick [~13:20-13:37]

  • Type: Fireside Chat
  • Speaker: Paul Beswick, Global CIO, Marshall McLean
  • Role: Manages 5000+ technologists globally
  • Session Goal: Share enterprise Gen AI implementation insights and evolution of their approach

πŸ’‘ Key Technical Insights

Architecture Evolution:

  • Initial Approach (Early 2023):
    • Started with API-based access (April 2023)
    • Secured APIs by June 2023)
    • Launched organization-wide LLM assistant in August/September 2023
    • Current scale: ~25 million requests annually
    • 85% organizational adoption rate

Infrastructure Strategy:

  • Rent models by API call instead of self-hosting
  • Uses fine-tuned small models for specific tasks
  • Current volume: ~500,000 requests/week through fine-tuned model
  • Training costs: ~$20 per training cycle
  • Achieving accuracy exceeding GPT-4 with better response times

Technical Evolution:

  1. Initial Phase:

    • Focus on prompting and RAG
    • API-based implementation
    • Minimal infrastructure complexity
  2. Current Phase:

    • Implementation of fine-tuned models
    • Shared infrastructure approach
    • Low-cost training cycles
    • Specialized model targeting

πŸ€– Technical Implementation Details

Infrastructure Management:

  • Avoided self-hosting large language models
  • Implemented pay-per-call model architecture
  • Security managed through API access controls
  • Conservative estimate: Over 1 million hours saved through implementation

Cost Economics:

  • Training cost: ~$20 per cycle
  • Infrastructure sharing across use cases
  • Focus on ROI for specific task automation
  • Economy of scale through shared resources

πŸ“ˆ Industry Trends

Evolution of Enterprise AI:

  • Movement from general-purpose to task-specific models
  • Shift toward automated fine-tuning processes
  • Focus on fragmenting models for specialized subtasks
  • Trend toward job augmentation over replacement

πŸ“‹ Follow-up Actions

Technical Focus Areas:

  • Investigation of automated fine-tuning pipelines
  • Research on model specialization approaches
  • Review of infrastructure sharing strategies
  • Analysis of automation vs. augmentation use cases

Future Development (2025):

  1. Continued office suite integration
  2. Enhanced AI-powered helper applications
  3. Direct efficiency improvements through automation
  4. Increased focus on specialized, task-specific models
  5. Implementation of staged approach: LLM prompting β†’ data collection β†’ fine-tuning

The session provided valuable insights into enterprise-scale AI implementation, particularly highlighting the evolution from initial skepticism about fine-tuning to successful large-scale deployment through innovative infrastructure approaches and careful economic consideration.

🎯 Session Details

Converza Case Study [~13:37-13:45]

  • Type: Technical Presentation
  • Speakers:
    • Mo (CTO, Converza)
    • Giuseppe (VP of AI, Converza)
  • Session Goal: Share insights on implementing AI for call analytics at scale

πŸ€– Technical Implementation

System Evolution:

  1. 2001: Initial Analog System

    • Hardware-based recording via PBX
    • Manual human analysis
    • Basic coaching data generation
  2. 2014: Digital AI Transformation

    • Transition to AI-driven analysis
    • Automated call monitoring
    • Expanded data point collection
  3. 2024: Platform Enhancement

    • Integration with PredaBase
    • Scaled to analyze millions of calls monthly
    • Customizable data point tracking

πŸ’‘ Key Technical Metrics

Scale of Operations:

  • Total calls analyzed: Over 1 billion
  • Current volume: Millions of calls/month
  • Implementation results: 78% conversion increase in 90 days (Wheeler/Caterpillar case study)

Data Analysis Capabilities:

  • Agent Performance Metrics:

    • Proper greeting detection
    • Business offering tracking
    • Appointment scheduling
    • Customer service quality
  • Client Side Analysis:

    • Buying signal detection
    • Lead quality scoring
    • Prospect qualification
    • Customer sentiment analysis

πŸ“ˆ Technical Architecture

Call Processing Pipeline:

  1. Call Recording
  2. AI Analysis
  3. Data Point Extraction
  4. Insight Generation
  5. Action Recommendation

Key Features:

  • Custom data point configuration
  • Real-time analysis capabilities
  • Integrated coaching systems
  • Revenue impact tracking
  • Automated quality scoring

This session demonstrated a practical implementation of AI technologies for large-scale audio analysis and business intelligence, showing how the architecture evolved from manual processes to sophisticated AI-driven analysis, with particular emphasis on the role of PredaBase in enabling scalable, real-time processing capabilities.

🎯 Conference Overview

  • Name: SmallCon
  • Date: December 11, 2024
  • Focus: Applied AI and LLMs in Production
  • Format: In-person

πŸ‘₯ Session Details

  • Time: 15:19
  • Type: Panel
  • Speaker(s):
    • Travis Addair, CTO and Co-founder, Predibase
    • Daniel Han, Head of AI Engineering, Harvey
    • Moasati, CTO, Converza
    • Atten Samuel, CTO and Co-founder, Galileo
    • Abhishek, Senior Staff Engineer, New Bank
  • Session Goal: Discuss the shift from LLM experimentation to production-grade systems and real-world applications.

πŸ’‘ Key Technical Insights

Production Challenges:

  • Scaling costs with large prompts
  • Limitations of fine-tuning large models
  • Need for modular complex workflows
  • Gap between controlled and real-world environments

Evaluation and Quality:

  • Traditional NLP metrics insufficient for long-form text
  • Limitations of automated LLM Judge techniques at scale
  • Critical role of human-in-the-loop evaluation
  • Importance of continuous feedback loops

Implementation Strategy:

  • Gradual release process emphasis
  • Focus on smaller, task-specific models
  • Cost and throughput optimization
  • Modularization of complex workflows

Risk Mitigation:

  • Building user confidence in high-stakes industries
  • Addressing unpredictable hallucinations
  • Bias mitigation through human evaluation
  • Quality assurance through feedback loops

The panel highlighted the growing sophistication in LLM deployment practices, with particular emphasis on the role of human feedback in ensuring quality and reliability in production systems. The discussion underscored a clear trend toward smaller, specialized models with robust evaluation frameworks.

🎯 Conference Overview

  • Name: SmallCon
  • Date: December 11, 2024
  • Focus: Small Language Models (SLM)
  • Format: In-person

πŸ‘₯ Session Details

  • Time: 14:01
  • Type: Panel
  • Speaker(s):
    • Dev Rishi, CEO and Co-founder, Predibase
    • Margaret, Head of Product, Mistral AI
    • Pablo, Distinguished Scientist and Research Manager, NVIDIA
    • Luna, Lead of the Small Language Model team, Hugging Face
    • Diego, Head of Generative AI Partnerships, Meta
  • Session Goal: Discuss the future of generative AI, focusing on the training and serving of small language models.

πŸ’‘ Key Technical Insights

Definition and Characteristics:

  • Small language models can run on laptops and mobile phones with low latency
  • Typically less than 3-4 billion parameters
  • Optimized through quantization and compression techniques
  • Best suited for tasks not requiring extensive world knowledge:
    • Rephrasing
    • Summarization
    • Dialogue generation

Implementation Strategies:

  • Hybrid approaches combining small and large models
  • Small models for simpler tasks
  • Large models for complex queries
  • Fine-tuning on synthetic data from larger models
  • Focus on agentic workflows for task automation

πŸ€– Technical Announcements

Hamba Language Model:

  • Specifications:
    • 1.5 billion parameters
    • MMLU score: 50
  • Use Cases:
    • On-device deployment
    • Rephrasing
    • Summarization
    • Dialogue generation

πŸ“ˆ Industry Trends

Technology Shifts:

  • Movement toward efficient, device-deployable models
  • Growing focus on agentic workflows and automation
  • Heavy investment in open-source model development
  • Expected acceleration of adoption across industries

Future Outlook (2025):

  • Significant advancements in generative AI
  • More sophisticated agentic workflows
  • Better reasoning engines
  • Deeper understanding of workflow construction

The panel established foundational definitions for small language models while highlighting the industry's shift toward more efficient, task-specific implementations. The discussion emphasized the complementary role of small and large models in creating effective AI systems.

🎯 Conference Overview

  • Name: SmallCon
  • Date: December 11, 2024
  • Focus: Synthetic data for small language models
  • Format: In-person

πŸ‘₯ Session Details

  • Time: 16:05
  • Type: Technical Presentation
  • Speaker: Martin von Saigbrook, Head of Applied Science, Gretel
  • Session Goal: Introduce the Gretel platform and demonstrate how to generate high-quality synthetic data for training or fine-tuning small language models.

πŸ’‘ Key Technical Insights

Platform Architecture:

  • Transformer-based architecture
  • Built-in differential privacy techniques
  • Multiple agent system with custom elements
  • Comprehensive evaluation reporting

Operational Modes:

  1. Data Design Mode:

    • Design datasets from scratch
    • Configure statistical properties
    • Define data characteristics
  2. Fine Tune Mode:

    • Train on existing datasets
    • Generate secure synthetic variants
    • Maintain statistical properties
    • Ensure privacy compliance

πŸ€– Technical Implementation

Gretel Navigator Platform:

  • Core Features:
    • Automated data generation
    • Statistical property preservation
    • Privacy-preserving techniques
    • Quality validation tools

Deployment Options:

  • Platform access
  • YAML configuration
  • SDK integration
  • Comprehensive documentation

πŸ“ˆ Key Considerations

Data Quality:

  • Statistical fidelity to source data
  • Validation metrics
  • Quality assessments
  • Dataset statistics

Privacy and Security:

  • Differential privacy integration
  • Compliance mechanisms
  • Cybersecurity protections
  • Privacy-preserving features

πŸ“‹ Use Cases

Primary Applications:

  • Training data generation for SLMs
  • Sensitive data synthesis
  • Dataset augmentation
  • Privacy-compliant testing

Industry Impact:

  • Reduced compliance costs
  • Enhanced data security
  • Improved model training
  • Efficient data processing

The session demonstrated how synthetic data generation can address both data quality and privacy concerns in SLM training, while providing practical tools for implementation through the Gretel Navigator platform.

🎯 Conference Overview

  • Name: SmallCon
  • Date: December 11, 2024
  • Focus: Mitigating Volatility in Generative AI
  • Format: In-person

πŸ‘₯ Session Details

  • Time: 17:13
  • Type: Technical Talk
  • Speaker: Shreya Rajpal, CEO and Co-founder, Guardrails AI
  • Session Goal: Discuss the challenges of volatility in generative AI and how to mitigate them using technical tools.

πŸ’‘ Key Technical Insights

Volatility Sources:

  • Development stage issues
  • Deployment challenges
  • Runtime inconsistencies
  • Insufficient model capabilities
  • Improper context handling
  • Hallucinations
  • Edge case behaviors
  • Model jitter

Validation Framework:

  • Explicit validation at every step
  • Verification of system behavior
  • Multiple validator types:
    • Rules-based
    • Heuristic approaches
    • Fine-tuned ML models
    • Secondary LLM calls

πŸ€– Technical Implementation

Guardrails AI Platform:

  • Open-source validator library
  • Risk category coverage
  • Volatility mitigation tools
  • Comprehensive validation suite

Use Cases:

  • Input prompt validation
  • LLM output verification
  • Sensitive content detection
  • Factuality enforcement
  • Application constraint management
  • Edge case monitoring
  • Out-of-distribution detection

πŸ“ˆ Industry Impact

Technology Evolution:

  • Increased focus on reliability
  • Enhanced risk management
  • Validation-centric development
  • Lifecycle-wide verification

Market Benefits:

  • Improved application reliability
  • Increased enterprise adoption
  • Reduced system failure risk
  • Enhanced trustworthiness

πŸ“‹ Implementation Guide

Getting Started:

  • Explore Guardrails AI open-source project
  • Implement appropriate validators
  • Select validation strategies based on volatility sources
  • Integrate with existing applications

Resources:

  • GitHub repository
  • Documentation
  • Validator examples
  • Integration guides

This session highlighted the growing importance of systematic validation in generative AI applications, providing practical tools and strategies for improving model reliability and reducing operational risks.

🎯 Conference Overview

  • Name: SmallCon
  • Date: December 11, 2024
  • Focus: AI Agents, LLMs, Fine-tuning, and Agent Development Lifecycle
  • Format: Virtual

πŸ‘₯ Session Details

  • Time: 14:50
  • Type: Technical Talk
  • Speaker: Manji, Leader on AI Platforms for Agent Force team at Einstein (Salesforce)
  • Session Goal: Share lessons learned from building LLM-based AI agents and workflows at scale

πŸ€– Platform Overview

Salesforce Agent Force:

  • Low-code/no-code platform for AI agents
  • Studio and builder interface
  • Workflow integration capabilities
  • Focus on trust and responsible AI
  • Comprehensive testing and monitoring tools

Use Cases:

  • Service automation
  • Sales assistance
  • Marketing content generation
  • Customer interaction management

πŸ’‘ Key Technical Insights

Agent Evolution:

  • Progression from rule-based chatbots to reasoning agents
  • Integration beyond conversational interfaces
  • Enterprise knowledge grounding
  • Action-taking capabilities

Implementation Best Practices:

  • Define evaluation metrics (eVALs) upfront
  • Conduct thorough batch testing
  • Assess knowledge quality
  • Optimize retrieval/generation pipelines
  • Fine-tune for specific tasks

Key Challenges:

  • Hallucination mitigation
  • Cost/performance optimization
  • Complex system evaluation
  • Knowledge silos
  • Trust building

πŸ“ˆ Technical Architecture

Platform Components:

  • Agent builder
  • Testing suite
  • Deployment tools
  • Monitoring systems

Performance Metrics:

  • Accuracy
  • Precision
  • Latency
  • Token count efficiency

πŸ’¬ Notable Insights

Key Quotes:

"Don't just think about the conversational interface because these agents are going to be embedded into any workflow you can think of."

"Building the flashy demo is only 10% of the work, and the rest 90% of the hard work is actually building the trust that this solution works."

"Teams that are doing very fast iterations... are moving fast and that's why I think this ability to tune things, make changes and test is super super important."

πŸ“‹ Industry Impact

Technology Shifts:

  • Movement toward reasoning-capable agents
  • Emphasis on fine-tuned applications
  • Focus on comprehensive toolchains
  • Integration across workflows

The session emphasized the importance of systematic development approaches and trust-building in AI agent deployment, highlighting the evolution from simple chatbots to sophisticated workflow automation tools.

🎯 Conference Overview

  • Name: SmallCon
  • Date: December 11, 2024
  • Focus: Strategies for updating machine learning models in production
  • Format: In-person

πŸ‘₯ Session Details

  • Time: 16:53
  • Type: Technical Presentation
  • Speaker: Arnav Garg, ML Engineering Lead, Predibase
  • Session Goal: Discuss strategies for updating machine learning models in production using data collected from production.

πŸ’‘ Key Technical Insights

Training Strategies:

  • Continuous model quality improvement for production LLMs
  • Incremental fine-tuning for cost-effective updates
  • Rehearsal learning for performance enhancement
  • Hybrid approach combining:
    • Incremental updates
    • Periodic full retraining
    • Performance/cost balance

πŸ€– Technical Implementation

Predibase Platform:

  • SDK and UI components
  • 100+ base models for LoRA fine-tuning
  • Incremental training via continue_from_version
  • Configurable retraining interface

Deployment Options:

  • SDK integration
  • UI-based configuration
  • LoRA parameter customization
  • Learning configuration flexibility

πŸ“ˆ Performance Benefits

Efficiency Gains:

  • Improved precision and accuracy
  • Reduced training costs
  • Faster update cycles
  • Better data utilization

Production Advantages:

  • Continuous model improvement
  • Cost-effective updates
  • Rapid knowledge incorporation
  • User feedback integration

πŸ“‹ Best Practices

Implementation Strategy:

  1. Start with Predibase platform exploration
  2. Experiment with incremental training
  3. Implement rehearsal learning
  4. Develop hybrid training approach
  5. Monitor performance metrics
  6. Optimize cost efficiency

Resources:

  • Predibase SDK documentation
  • Platform guidelines
  • Integration examples
  • Training configurations

The session highlighted how continuous model updates can be practically implemented in production environments, with particular emphasis on balancing performance improvements with operational costs through incremental training approaches.

🎯 Conference Overview

  • Name: SmallCon
  • Date: December 11, 2024
  • Focus: Small Language Models (SLM)
  • Format: In-person

πŸ‘₯ Session Details

  • Time: 14:01
  • Type: Panel
  • Speaker(s):
    • Dev Rishi, CEO and Co-founder, Predibase
    • Margaret, Head of Product, Mistral AI
    • Pablo, Distinguished Scientist and Research Manager, NVIDIA
    • Luna, Lead of the Small Language Model team, Hugging Face
    • Diego, Head of Generative AI Partnerships, Meta
  • Session Goal: Discuss the future of generative AI, focusing on the training and serving of small language models

πŸ’‘ Key Technical Insights

Defining SLMs:

  • Models running on laptops/mobile devices with low latency
  • Typically less than 3-4 billion parameters
  • Optimized through quantization and compression
  • Suited for specific tasks not requiring extensive world knowledge:
    • Rephrasing
    • Summarization
    • Dialogue generation

Implementation Strategies:

  • Hybrid deployment combining small and large models
  • Task-based model selection
  • Fine-tuning with synthetic data from larger models
  • Focus on agentic workflows
  • Development of reasoning engines

πŸ€– Featured Technology

Hamba Language Model:

  • 1.5 billion parameters
  • MMLU score: 50
  • Designed for:
    • On-device deployment
    • Rephrasing
    • Summarization
    • Dialogue generation

πŸ“ˆ Future Outlook

2025 Predictions:

  • Advanced generative AI capabilities
  • Sophisticated agentic workflows
  • Improved reasoning engines
  • Enhanced deployment strategies

Industry Direction:

  • Investment in open-source development
  • Focus on device-deployable models
  • Growing agentic workflow adoption
  • Widespread industry implementation

The panel established core definitions and characteristics of SLMs while highlighting their role in the future of AI deployment. Key emphasis was placed on the practical advantages of small models and their complementary relationship with larger systems.

SmallCon 2024: A Virtual Conference for GenAI Builders

December 11, 2024 | 10:00 AM - 2:30 PM PT

Overview | Performance Metrics

About SmallCon

A first-of-its-kind virtual conference focused on small language models (SLMs) and their practical implementation in enterprise environments. The event brought together industry leaders to share insights on deploying, scaling, and optimizing SLMs for production use cases.

Key Metrics & Achievements

  • Model Size Target: Under 3-4 billion parameters
  • Response Time: Sub-second inference (0.1s)
  • Cost Efficiency: 10x reduction vs traditional approaches
  • Performance: 8% higher F1 scores, 80% higher throughput
  • Enterprise Success: 85% adoption rate, 1M+ hours saved

Featured Technologies

Solar LLM Family (Upstage)

Hamba Language Model

Agent Force (Salesforce)

Core Sessions

Morning Sessions

  1. Opening Keynote [10:00-10:15 AM PT]

    • Speaker: Devvret Rishi (Predibase)
    • Focus: The future of small language models Session Summary
  2. Enterprise Implementation [10:15-10:35 AM PT]

    • Speaker: Paul Beswick (Marsh & McLennan)
    • Achievement: 25M annual requests, 85% adoption Session Summary

Technical Sessions

  1. Call Analytics at Scale (Converza) Session Summary

  2. Future of GenAI Panel Session Summary

  3. Agent Force Platform (Salesforce) Session Summary

  4. Production AI Panel Session Summary

Lightning Demos

  1. Synthetic Data (Gretel) Session Summary

  2. Solar LLMs (Upstage) Session Summary

  3. Continuous Fine-Tuning (Predibase) Session Summary

  4. Model Evaluation (Guardrails AI) Session Summary

Key Technical Themes

  • Practical SLM implementation
  • Fine-tuning and adaptation
  • Synthetic data generation
  • Model evaluation frameworks
  • Continuous deployment
  • Human feedback integration

Major Trends

  • Shift to production-ready systems
  • Focus on agentic workflows
  • Emphasis on synthetic data
  • Importance of evaluation
  • Cost optimization strategies

Technical Priorities

  • 60+ adapter architectures
  • Synthetic data generation
  • Continuous fine-tuning
  • Evaluation frameworks
  • Human feedback systems

Participating Organizations

  • Meta
  • Hugging Face
  • Mistral AI
  • Salesforce
  • NVIDIA
  • DoorDash
  • Marsh & McLennan
  • Predibase
  • Gretel
  • Guardrails AI

For detailed performance metrics and implementation details, see Technical Metrics.

#!/bin/bash
# Enable error handling
set -euo pipefail
# Configuration
TRANSCRIPT_DIR="smallcon_transcripts"
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
COMBINED_LOG="${TRANSCRIPT_DIR}/smallcon_${TIMESTAMP}_full.txt"
SEGMENT_LENGTH=30 # Length of each audio segment in seconds
CURRENT_SESSION=""
# ANSI colors
CYAN='\033[1;36m'
YELLOW='\033[1;33m'
GREEN='\033[0;32m'
BLUE='\033[0;34m'
GRAY='\033[0;37m'
NC='\033[0m'
# Create transcript directory
mkdir -p "$TRANSCRIPT_DIR"
# Clear screen and move cursor to top
clear
# Initialize combined log with session info
{
printf "${CYAN}%s\n" "==================================================="
printf "${CYAN}SmallCon Conference Transcription - Started at $(date)\n"
printf "${CYAN}==================================================${NC}\n\n"
} | tee "$COMBINED_LOG"
# Cleanup function
cleanup() {
echo
printf "${CYAN}Cleaning up and saving final transcript...\n"
printf "${CYAN}==================================================="
echo "Session ended at $(date)" | tee -a "$COMBINED_LOG"
printf "${CYAN}===================================================${NC}\n"
rm -f temp_stream.wav
printf "${CYAN}Transcript saved to: $COMBINED_LOG${NC}\n"
exit 0
}
# Set up cleanup on script exit
trap cleanup EXIT INT TERM
# Function to format timestamp
format_timestamp() {
printf "${GRAY}[%s]${NC}" "$1"
}
# Function to detect and format session changes
detect_session() {
local text="$1"
if [[ $text =~ "Session" ]] || [[ $text =~ "Panel" ]] || [[ $text =~ "Keynote" ]]; then
printf "\n${BLUE}>>> New Session Detected: ${text}${NC}\n\n" | tee -a "$COMBINED_LOG"
fi
}
# Main transcription loop
while true; do
printf "${CYAN}\nRecording ${SEGMENT_LENGTH}s segment...${NC}\n"
# Record audio segment with preprocessing
ffmpeg -v quiet -f avfoundation -i ":0" -t "$SEGMENT_LENGTH" \
-af "highpass=f=50,lowpass=f=3000" \
-ar 16000 -ac 1 -c:a pcm_s16le temp_stream.wav
# Add timestamp to display and log
CURRENT_TIME=$(date '+%H:%M:%S')
echo -e "${YELLOW}--- ${CURRENT_TIME} ---${NC}" | tee -a "$COMBINED_LOG"
# Transcribe with improved formatting
./main -m models/ggml-base.bin -f temp_stream.wav -np -otxt 2>/dev/null | \
while IFS= read -r line; do
if [ ! -z "$line" ]; then
# Detect session changes
detect_session "$line"
# Format and output the line
if [[ $line =~ ^\[.*\] ]]; then
# This is a timestamp line
printf " ${GRAY}%s${NC}\n" "$line" | tee -a "$COMBINED_LOG"
else
# This is transcript content
printf " %s\n" "$line" | tee -a "$COMBINED_LOG"
fi
fi
done
echo >> "$COMBINED_LOG"
rm -f temp_stream.wav
done
#!/bin/bash
# Enable strict error handling
set -euo pipefail
# Configuration
CONF_DIR="smallcon_transcripts"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
OUTPUT_FILE="smallcon_summaries_${TIMESTAMP}.json"
WIP_FILE="temp_summary.json"
# JSON schema for structured output
JSON_SCHEMA='{"type":"object","properties":{"summary":{"type":"string"}},"required":["summary"]}'
# Initialize output file
{
echo "# SmallCon Session Summaries"
echo "# Generated: $(date)"
echo "# ----------------------------------------"
} | tee "$OUTPUT_FILE"
# Process each transcript
for transcript in "$CONF_DIR"/smallcon_*_full.txt; do
session_time=$(basename "$transcript" | cut -d'_' -f2)
prompt="Given this transcript from a SmallCon session about Small Language Models (SLMs),
provide a single-sentence summary capturing the core message:
$(cat "$transcript")"
{
# Generate and parse summary
ollama run llama3.2 --format "$JSON_SCHEMA" "$prompt" | tee "$WIP_FILE"
echo "Session Time: $session_time"
echo "Summary: $(jq -r .summary < "$WIP_FILE")"
echo "----------------------------------------"
} | tee -a "$OUTPUT_FILE"
done
# Cleanup
rm -f "$WIP_FILE"
echo "Summaries saved to: $OUTPUT_FILE"

🎯 Conference Overview

  • Name: SmallCon
  • Date: December 11, 2024
  • Focus: Upstage's Solar LLMs and Document AI for workspace intelligence
  • Format: In-person

πŸ‘₯ Session Details

  • Time: 16:28
  • Type: Technical Presentation and Demo
  • Speakers:
    • Lucy Park, Upstage
    • Siddharth Ghatti, Marsh McLennan
  • Session Goal: Showcase Upstage's Solar LLMs and Document Parts technology in enterprise solutions

πŸ’‘ Key Technical Insights

Solar LLM Family:

  • Designed for workspace tasks and human-AI collaboration
  • Optimized for finance, legal, and healthcare domains
  • Two variants:
    • Solar Mini: Optimized for fine-tuning
    • Solar Pro: Single-GPU deployment focus

Document Parts Technology:

  • Complex document conversion to structured formats
  • Advanced table structure recognition
  • Fast processing for lengthy documents
  • Fact-checking module for hallucination reduction
  • HTML/markdown output for LLM processing

πŸ€– Implementation Details

System Components:

  • Solar LLM models
  • Document Parts processor
  • Fact-checking module
  • Tool routing system

Use Cases:

  • Workspace intelligence
  • Insurance claims processing
  • Enterprise search
  • Document analysis

πŸ“ˆ Real-World Deployment

Marsh McLennan Implementation:

  • LaniI personal assistant
  • Tool routing system
  • 500,000 requests/week
  • High accuracy in tool selection
  • Integration with Predibase serverless

Performance Focus:

  • Processing speed optimization
  • Accuracy prioritization
  • Hallucination reduction
  • Document structure preservation

πŸ“‹ Industry Impact

Technology Adoption:

  • Workspace AI integration
  • Task-specific fine-tuning
  • Enhanced document processing
  • Automated workflows

Business Benefits:

  • Increased productivity
  • Task automation
  • Improved decision-making
  • Enhanced document handling

The session demonstrated how combining Small Language Models with specialized document processing technology can create effective enterprise solutions, particularly highlighting the successful implementation at Marsh McLennan through the integration of Solar LLMs with Predibase's infrastructure.

@jwalsh
Copy link
Author

jwalsh commented Dec 12, 2024

Analysis of Compute vs. Disk Tradeoffs

In this number transformation problem, we face a classic tradeoff between computation time and disk space (or memory) usage. Here's a breakdown of the different approaches and their tradeoffs:

1. Brute Force (Pure Computation)

  • Approach: Calculate all transformations from scratch every time, without any memoization.
  • Compute: Extremely high. The number of stones can grow exponentially with each level, leading to a massive computational load for higher levels.
  • Disk/Memory: Minimal. No storage needed for memoization.
  • Suitable for: Very small inputs and low transformation levels. Quickly becomes infeasible for larger problems.

2. Full Memoization (In-memory)

  • Approach: Memoize all intermediate results in a dictionary held in memory.
  • Compute: Significantly lower than brute force. Memoization avoids redundant calculations, dramatically speeding up the process.
  • Disk/Memory: Very high. The memoization dictionary can grow rapidly, requiring a large amount of memory, especially for higher levels.
  • Suitable for: Moderate inputs and levels. Can become memory-bound for very large problems.

3. Disk-Based Memoization

  • Approach: Memoize intermediate results and store them in files on disk.
  • Compute: Lower than brute force, but typically higher than full in-memory memoization due to disk access overhead.
  • Disk/Memory: High, but more manageable than full in-memory memoization. Disk space is generally more abundant than memory.
  • Suitable for: Large inputs and high levels where memory is a constraint. Disk access can become a bottleneck.

4. Hybrid Approach (Incremental Memoization)

  • Approach: Combine in-memory and disk-based memoization. Calculate and memoize in increments, saving the memo to disk periodically.
  • Compute: Balances compute and disk access. Offers a good compromise between the two.
  • Disk/Memory: Moderate. Uses both memory and disk space, allowing for better management of resources.
  • Suitable for: A wide range of inputs and levels. Offers flexibility and scalability.

Full Options

  1. Brute Force:

    • No memoization, all calculations done from scratch.
    • Simple to implement but extremely inefficient for larger problems.
  2. Full In-memory Memoization:

    • Store all memoization data in a dictionary in memory.
    • Significant speedup, but can become memory-bound.
  3. Disk-based Memoization:

    • Store all memoization data in files on disk.
    • Manages memory better but introduces disk access overhead.
  4. Incremental Memoization:

    • Calculate and memoize in increments (e.g., 5 levels at a time).
    • Save the memo to disk periodically.
    • Balances compute and disk access.
  5. Variations of Incremental Memoization:

    • Adjust the increment size (e.g., 10 levels, 20 levels) based on available memory.
    • Use different data structures for in-memory memoization (e.g., a more memory-efficient dictionary implementation).
    • Optimize disk access patterns to reduce overhead.

Choosing the Right Approach

The best approach depends on the specific constraints of your problem:

  • Input Size: Larger inputs generally favor memoization.
  • Transformation Levels: Higher levels require more memoization and potentially disk-based storage.
  • Memory Availability: If memory is limited, disk-based or incremental memoization is preferred.
  • Compute Resources: If compute time is critical, full in-memory memoization might be the fastest option.

By carefully considering these factors, you can choose the most appropriate approach that balances compute time, memory usage, and disk space to efficiently solve the number transformation problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment