All Data and AI Weekly #196 - June 30, 2025
#196 - June 30, 2025

https://bsky.app/profile/paasdev.bsky.social
NiFi + AI + AI Data Cloud + Iceberg.

https://www.reddit.com/r/DataEngineeringForAI/hot/
Monthly NYC and Youtube Events
Join Hex and I in New York City for a hands-on hackathon with food, AI and prizes.
This Week's Focus
Cursor AI + MCP! I got access to pro level Cursor AI IDE and it is really cool. We have an MCP Server to Snowflake that will list databases, schemas, tables, describe tables, read query, creating tables, building queries and more.
See: https://github.com/isaacwasserman/mcp-snowflake-server
IDE: https://www.cursor.com/features

Cool MCP Stuff:
https://github.com/Klavis-AI/klavis/tree/main/mcp_servers/discord
https://github.com/quarkiverse/quarkus-mcp-servers/tree/main/jdbc
Other AI, Data and Snowflake News!!!
https://sanjmo.medium.com/snowflake-summit-2025-unifying-the-data-universe-a07f399b04d7
https://kilocode.ai/
?
import json | |
import pandas as pd | |
from pydantic import BaseModel, Field | |
from snowflake.cortex import complete, CompleteOptions | |
class Judgement(BaseModel): | |
match: bool = Field(description="1 Judged as Match; 0 Judged as non-match") | |
reasoning: str = Field(description="Why the LLM made this Judgement") | |
judgement_response_model = Judgement.model_json_schema() | |
# Identify which columns belong to each entity | |
cols_src = [c for c in df.columns if c.lower().startswith("src_")] | |
cols_tgt = [c for c in df.columns if c.lower().startswith("tgt_")] | |
instruction = """ | |
Compare the columns prefixed with 'src_' to those prefixed with 'tgt_' in the provided data. | |
Treat these sets of columns (those with src_ and those with tgt_ prefixes) as if they were two different entities. | |
For a given pair of entities, respond if they are likely not to match, along with a brief reason. | |
Consider all factors and common differences in naming convention e.g. 123 Main St. is the same as 123 Main Street. | |
Also consider industry, company size, and everything available in the data provided for a given record. | |
Given the descriptions of the two entities, return a 1 if they are likely the same entity, 0 if they are not. | |
Respond only with a JSON object, without backticks. | |
""" | |
def build_prompt(row) -> str: | |
entity_a = ", ".join(f"{col}: {row[col]}" for col in cols_src) | |
entity_b = ", ".join(f"{col}: {row[col]}" for col in cols_tgt) | |
return ( | |
instruction.strip() | |
+ f"\n\nEntity_A → {entity_a}\nEntity_B → {entity_b}\n" | |
+ "Return the JSON object as specified above." | |
) | |
df["prompt"] = df.apply(build_prompt, axis=1) | |
options = CompleteOptions( | |
max_tokens=256, | |
temperature=0.0, | |
response_format={"type": "json", "schema": judgement_response_model}, | |
) | |
def claude_complete(prompt: str): | |
return complete("claude-4-sonnet", prompt, options=options) | |
df["judgement"] = df["prompt"].apply(claude_complete) | |
# create function to validate complete response and extract fields into series | |
def parse_match_json(json_str): | |
dict = json.loads(json_str) | |
judgement = Judgement(**dict) | |
return pd.Series({"match": judgement.match, "reasoning": judgement.reasoning}) | |
# apply the funciton to the dataframe | |
df[["match", "reasoning"]] = df["judgement"].apply(parse_match_json) |
https://github.com/cline/cline
https://developer.nvidia.com/blog/run-google-deepminds-gemma-3n-on-nvidia-jetson-and-rtx/
https://prss.co/
https://github.com/sfc-gh-jreini/llama-parse-cortex-search
https://medium.com/@komal1491/semantic-views-in-snowflake-making-business-self-service-60acf88b386a
https://hex.tech/blog/snowflake-summit-2025-recap/
https://medium.com/@tim.spann_50517/real-time-enrichment-of-air-quality-data-26564464b2a5
Code and Open Source Projects
Apache NiFi + AI Agents + Cortex AI + Snowflake AISQL
https://github.com/tspannhw/TrafficAI/tree/main/Agents
AI Tools
https://openrouter.ai/
Editor
https://micro-editor.github.io/
Build great documents
https://mermaid.js.org/
New Models
❄️ https://www.snowflake.com/en/blog/anthropic-claude-3-7-sonnet-cortex/
❄️ https://docs.snowflake.com/en/release-notes/2025/other/2025-05-30-complete-multimodal-new-models
Tutorials
❄️ https://github.com/Snowflake-Labs/sfguide-build-data-agents-using-snowflake-cortex-ai
❄️ https://candf.com/our-insights/articles/snowflake-summit-2025-key-takeaways-for-data-leaders/
❄️ https://careers.snowflake.com/us/en/blogarticle/five-key-takeaways-from-snowflake-summit
❄️ https://sanjmo.medium.com/snowflake-summit-2025-unifying-the-data-universe-a07f399b04d7
❄️
Marketplace
Upcoming Events, Conferences, Meetups, Webinars and More
July 16 - AI in Action https://www.snowflake.com/events/ai-in-action-how-to-accelerate-to-production/
July 17 - Build an ML Model https://www.snowflake.com/webinars/virtual-hands-on-labs/build-an-ml-model-to-crack-the-code-of-customer-conversions-2025-07-17/
July 30 - Dev Day AI https://www.snowflake.com/emea-dev-day-skill-up-with-ai-2025-07-30/
In-Person
Details: 🔹 Hex + Snowflake Hackathon: Solving NYC’s Biggest Data Challenges 📅 Date: July 15, 2025 🕘 Time: 9:00 AM – 1:00 PM 📍 Location: 44 W 18th St, New York https://lu.ma/prjumowa

July 16 https://aws.amazon.com/events/summits/new-york/

https://github.com/timothyspann
© 2020-2025 Tim Spann https://www.youtube.com/@FLaNK-Stack
