All Data and AI Weekly #196 - June 30, 2025

Tim Spann

Jun 30, 2025

https://bsky.app/profile/paasdev.bsky.social

NiFi + AI + AI Data Cloud + Iceberg.

https://www.reddit.com/r/DataEngineeringForAI/hot/

Monthly NYC and Youtube Events

https://lu.ma/PINSAI

Join Hex and I in New York City for a hands-on hackathon with food, AI and prizes.

https://lu.ma/prjumowa

This Week's Focus

Cursor AI + MCP! I got access to pro level Cursor AI IDE and it is really cool. We have an MCP Server to Snowflake that will list databases, schemas, tables, describe tables, read query, creating tables, building queries and more.

See: https://github.com/isaacwasserman/mcp-snowflake-server

IDE: https://www.cursor.com/features

Cool MCP Stuff:

Other AI, Data and Snowflake News!!!

https://sanjmo.medium.com/snowflake-summit-2025-unifying-the-data-universe-a07f399b04d7

https://www.linkedin.com/pulse/scaling-new-heights-snowflake-summit-edition-snowflake-computing-a0r9c/

https://medium.com/snowflake/build-a-natural-language-data-assistant-in-vs-code-with-copilot-mcp-and-snowflake-cortex-ai-04a22a3b0f17

https://kilocode.ai/

https://medium.com/snowflake/building-a-summarization-engine-with-snowflake-triggered-tasks-1c6a2ee4ec05

https://medium.com/@matiasmaquieira96/your-data-team-is-still-stuck-in-2019-and-how-cortex-analyst-changes-everything-edfdc840df29

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

Show hidden characters

	import json
	import pandas as pd
	from pydantic import BaseModel, Field
	from snowflake.cortex import complete, CompleteOptions

	class Judgement(BaseModel):
	match: bool = Field(description="1 Judged as Match; 0 Judged as non-match")
	reasoning: str = Field(description="Why the LLM made this Judgement")

	judgement_response_model = Judgement.model_json_schema()

	# Identify which columns belong to each entity
	cols_src = [c for c in df.columns if c.lower().startswith("src_")]
	cols_tgt = [c for c in df.columns if c.lower().startswith("tgt_")]

	instruction = """
	Compare the columns prefixed with 'src_' to those prefixed with 'tgt_' in the provided data.
	Treat these sets of columns (those with src_ and those with tgt_ prefixes) as if they were two different entities.
	For a given pair of entities, respond if they are likely not to match, along with a brief reason.
	Consider all factors and common differences in naming convention e.g. 123 Main St. is the same as 123 Main Street.

	Also consider industry, company size, and everything available in the data provided for a given record.
	Given the descriptions of the two entities, return a 1 if they are likely the same entity, 0 if they are not.

	Respond only with a JSON object, without backticks.
	"""

	def build_prompt(row) -> str:
	entity_a = ", ".join(f"{col}: {row[col]}" for col in cols_src)
	entity_b = ", ".join(f"{col}: {row[col]}" for col in cols_tgt)
	return (
	instruction.strip()
	+ f"\n\nEntity_A → {entity_a}\nEntity_B → {entity_b}\n"
	+ "Return the JSON object as specified above."
	)

	df["prompt"] = df.apply(build_prompt, axis=1)

	options = CompleteOptions(
	max_tokens=256,
	temperature=0.0,
	response_format={"type": "json", "schema": judgement_response_model},
	)

	def claude_complete(prompt: str):
	return complete("claude-4-sonnet", prompt, options=options)

	df["judgement"] = df["prompt"].apply(claude_complete)

	# create function to validate complete response and extract fields into series
	def parse_match_json(json_str):
	dict = json.loads(json_str)
	judgement = Judgement(**dict)
	return pd.Series({"match": judgement.match, "reasoning": judgement.reasoning})

	# apply the funciton to the dataframe
	df[["match", "reasoning"]] = df["judgement"].apply(parse_match_json)

view raw entity_resolution_matching_snowflake hosted with ❤ by GitHub

https://github.com/cline/cline

https://developer.nvidia.com/blog/run-google-deepminds-gemma-3n-on-nvidia-jetson-and-rtx/

https://prss.co/

https://github.com/sfc-gh-jreini/llama-parse-cortex-search

https://medium.com/@komal1491/semantic-views-in-snowflake-making-business-self-service-60acf88b386a

https://hex.tech/blog/snowflake-summit-2025-recap/

https://medium.com/@tim.spann_50517/real-time-enrichment-of-air-quality-data-26564464b2a5

Code and Open Source Projects

Apache NiFi + AI Agents + Cortex AI + Snowflake AISQL

https://github.com/tspannhw/TrafficAI/tree/main/Agents

AI Tools

https://openrouter.ai/

Editor

https://micro-editor.github.io/

Build great documents

https://mermaid.js.org/

New Models

❄️ https://www.snowflake.com/en/blog/anthropic-claude-3-7-sonnet-cortex/

❄️ https://docs.snowflake.com/en/release-notes/2025/other/2025-05-30-complete-multimodal-new-models

Tutorials

❄️ https://github.com/Snowflake-Labs/sfguide-build-data-agents-using-snowflake-cortex-ai

❄️ https://candf.com/our-insights/articles/snowflake-summit-2025-key-takeaways-for-data-leaders/

❄️ https://careers.snowflake.com/us/en/blogarticle/five-key-takeaways-from-snowflake-summit

❄️ https://sanjmo.medium.com/snowflake-summit-2025-unifying-the-data-universe-a07f399b04d7

❄️

Marketplace

⚡️ https://app.snowflake.com/marketplace/listing/GZSYZ10SV6W/matillion-matillion-data-productivity-cloud?search=matillion

Upcoming Events, Conferences, Meetups, Webinars and More

July 16 - AI in Action https://www.snowflake.com/events/ai-in-action-how-to-accelerate-to-production/

July 17 - Build an ML Model https://www.snowflake.com/webinars/virtual-hands-on-labs/build-an-ml-model-to-crack-the-code-of-customer-conversions-2025-07-17/

July 30 - Dev Day AI https://www.snowflake.com/emea-dev-day-skill-up-with-ai-2025-07-30/

In-Person

Details: 🔹 Hex + Snowflake Hackathon: Solving NYC’s Biggest Data Challenges 📅 Date: July 15, 2025 🕘 Time: 9:00 AM – 1:00 PM 📍 Location: 44 W 18th St, New York https://lu.ma/prjumowa

July 16 https://aws.amazon.com/events/summits/new-york/

https://sessionize.com/tspann

https://github.com/timothyspann

FLaNK Stack Weekly

Discussion about this post