Eric Hartford ehartford

CAD-B: Confidence-Aware Decision Benchmark

Universal evaluation of uncertainty-guided adaptive behavior via prompting + logprobs

Overview

Tests whether LLMs exhibit prospective uncertainty monitoring and adaptive decision-making using only standard text generation and logit extraction. No custom interfaces required. Based on comparative cognition paradigms (Smith et al., 2003; Hampton, 2001; Kornell et al., 2007).

Qwen‑3 Refine‑Loop (Q3‑RL): a pragmatic HRM‑style hybrid

What changes vs. the original HRM idea?

Keep: an outer iterative refinement loop with optional ACT (halt/continue); data augmentation during training and a majority‑vote at inference. These were the biggest drivers of ARC performance in ablations.

Drop/optionalize: the internal H/L hierarchy and inner recurrent loop. ARC Prize found a matched‑size transformer plus the same refinement pipeline comes within a few points; the hierarchy gives only a small edge at higher loop counts.

Remove: reliance on puzzle_id embeddings; replace with context‑derived task conditioning that generalizes to unseen tasks. (ARC Prize notes puzzle_id is a strong, limiting dependency.)

<core_identity> You are an assistant called Cluely, developed and created by Cluely, whose sole purpose is to analyze and solve problems asked by the user or shown on the screen. Your responses must be specific, accurate, and actionable. </core_identity>

<general_guidelines>

NEVER use meta-phrases (e.g., "let me help you", "I can see that").
NEVER summarize unless explicitly requested.
NEVER provide unsolicited advice.
NEVER refer to "screenshot" or "image" - refer to it as "the screen" if needed.
ALWAYS be specific, detailed, and accurate.

https://notebooklm.google.com/notebook/602120a1-ae97-4316-88ca-b43617dc9aa8/audio

A Formal Framework for Higher-Order Vagueness: Extending Paraconsistent Fuzzy Logic with Multi-Dimensional Truth Values

Abstract

We propose PFL^+, a formal logical framework extending Paraconsistent Fuzzy Logic to address higher-order vagueness. PFL^+ introduces a multi-dimensional truth value structure capturing degrees of truth and contradiction, along with a novel Contradictory Degree Operator. This enables rigorous modeling of vagueness and contradictions. Formalized with a Hilbert-style proof system and corresponding model theory, PFL^+ is proven sound, complete, and non-explosive. We embed the truth value structure within algebraic structures and utilize fixed-point techniques for recursive vagueness definitions. A computational complexity analysis demonstrates PFL^+'s efficiency. We showcase its ability to resolve classic logical paradoxes and apply it to real-world problems in NLP, decision support, a

	#!/usr/bin/env python3
	"""
	OpenAI API benchmark script that replicates llama-bench behavior exactly.
	Uses random tokens for both prompt and generation, no sampling.
	Works with OpenAI-compatible endpoints like vLLM.
	"""

	import time
	import numpy as np
	import argparse

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>Chat with Dolphin</title>
	<style>
	* { margin: 0; padding: 0; box-sizing: border-box; }
	body {
	font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;

	TEMPLATE """{{- range $index, $_ := .Messages }}
	{{- if eq .Role "system" }}[SYSTEM_PROMPT]{{ .Content }}[/SYSTEM_PROMPT]
	{{- else if eq .Role "user" }}
	{{- if and (le (len (slice $.Messages $index)) 2) $.Tools }}[AVAILABLE_TOOLS]{{ $.Tools }}[/AVAILABLE_TOOLS]
	{{- end }}[INST]{{ .Content }}[/INST]
	{{- else if eq .Role "assistant" }}
	{{- if .Content }}{{ .Content }}
	{{- if not (eq (len (slice $.Messages $index)) 1) }}</s>
	{{- end }}
	{{- else if .ToolCalls }}[TOOL_CALLS][

	# make_multi_metric_head.py
	# ------------------------------------------------------------
	# Replace WorldPM-72B’s 1-unit reward head with a 15-unit head
	# and save the result so you can fine-tune from it later.
	# ------------------------------------------------------------
	import torch
	from transformers import AutoConfig, AutoModelForSequenceClassification

	# Metrics you want separate scores for
	METRICS = [

	RUBRICS = {
	"structural_coherence": {
	"name": "Structural Coherence & Progression",
	"description": "Evaluates the overall organization, logical progression, and effective shaping of the content.",
	"scores": {
	5: "The structure is masterfully crafted, exhibiting flawless logical/thematic/narrative progression. All parts are intrinsically linked, contributing to a powerful and unified whole, perfectly suited to the work's purpose and form.",
	4: "The structure is highly effective, with clear logical/thematic/narrative progression. Most parts are well-integrated, contributing to a cohesive work.",
	3: "The structure is generally clear and supports the content, though some areas might lack optimal flow or integration. Progression is mostly logical/thematic/narrative.",
	2: "Structural weaknesses are apparent; progression may be confusing, disjointed, or underdeveloped. Connections between parts are often uncle

	FROM ./mmproj-F16.gguf
	FROM ./Devstral-Small-2505-UD-Q4_K_XL.gguf

	TEMPLATE """{{- range $index, $_ := .Messages }}
	{{- if eq .Role "system" }}[SYSTEM_PROMPT]{{ .Content }}[/SYSTEM_PROMPT]
	{{- else if eq .Role "user" }}
	{{- if and (le (len (slice $.Messages $index)) 2) $.Tools }}[AVAILABLE_TOOLS]{{ $.Tools }}[/AVAILABLE_TOOLS]
	{{- end }}[INST]{{ .Content }}[/INST]
	{{- else if eq .Role "assistant" }}
	{{- if .Content }}{{ .Content }}