# Quickstart

This guide will show you how to launch a cluster, import data, and execute SQL queries in a few simple steps. More information on other developer tools, such as the Python Client, can be found in the Developer Tools section.

# How to Launch Your First Cluster

Launch a first cluster

Before performing any data operations, you need to launch a cluster. The Clusters page lets you create a cluster and configure computing and storage resources to meet your requirements.

Follow these steps to launch a new cluster:

Go to the Clusters page and click the +New Cluster button to launch a new cluster.
Name your cluster.
Click Launch to run the cluster.

TIP

The development tier of MyScale is limited to the default configuration and does not support multiple replicas. Refer to the Changing Your Billing Plan section and upgrade to the standard plan for a more powerful configuration.

# Environment Setup

You have the option of connecting to a MyScale database using one of the following developer tools:

However, let's use Python to get started.

# Using Python

Before getting started with Python, you need to install the ClickHouse client (opens new window), as described in the following shell script:

pip install clickhouse-connect

Once the ClickHouse client has been installed successfully, the next step is to connect to your MyScale cluster from within a Python app by providing the following details:

Cluster host
Username
Password

To find these details, navigate to the MyScale Clusters page, click the Action dropdown link, and select Connection Details.

Actions

As seen in the following image, a Connection Details dialog will display with the necessary code required to access MyScale. Click on the copy icon to copy the corresponding code and paste it into your Python app.

Python

TIP

See Connection Details for more information on connecting to your MyScale cluster.

# Using the MyScale Console

To use the MyScale console to import data into your database and to run queries, navigate to the SQL Workspace page. Your cluster will be selected automatically, as shown below:

Enter the SQL Workspace page

# How to Import Data into the Database

Follow these steps to import data into MyScale:

Create a table
Insert data into the table
Build a vector index

# Create a Table

Creating a database table in MyScale before importing any data is mandatory.

TIP

To learn more, navigate to the document describing how to Create a Database and how to Create a Table in the SQL Reference section.

Using the following code samples, let's write a SQL statement (both in Python and SQL) to create a new table named default.myscale_categorical_search.

Python
SQL

# Create a table with 128 dimensional vectors.
client.command("""
CREATE TABLE default.myscale_categorical_search
(
    id    UInt32,
    data  Array(Float32),
    CONSTRAINT check_length CHECK length(data) = 128,
    date  Date,
    label Enum8('person' = 1, 'building' = 2, 'animal' = 3)
)
ORDER BY id""")

# Insert Data Into the Table

TIP

MyScale supports data imports from Amazon s3 and other cloud services using s3-compatible APIs. See the s3 Table Function for more information on importing data from Amazon s3.

As described in the following code snippets, let's use SQL to import data into the default.myscale_categorical_search table.

Supported file formats include:

TIP

See Formats for Input and Output Data (opens new window) for a detailed description of all supported formats.

Python
SQL

client.command("""
INSERT INTO default.myscale_categorical_search
    SELECT * FROM s3(
        'https://d3lhz231q7ogjd.cloudfront.net/sample-datasets/quick-start/categorical-search.csv',
        'CSVWithNames',
        'id UInt32, data Array(Float32), date Date, label Enum8(''person'' = 1, ''building'' = 2, ''animal'' = 3)'
    )""")

# Build a Vector Index

In addition to creating traditional indexes on structured data, you can also create a Vector Index in MyScale for vector embeddings. Follow this step-by-step guide to see how:

# Create MSTG Vector Index

As the following code snippets show, the first step is to create an MSTG vector index, a vector index using our proprietary algorithm, MSTG.

Python
SQL

client.command("""
ALTER TABLE default.myscale_categorical_search
    ADD VECTOR INDEX categorical_vector_idx data
    TYPE MSTG
""")

TIP

The index's build time depends on the size of your data import.

# Check Vector Index Build Status

The following code snippets describe how to use SQL to check the vector index's build status.

Python
SQL

# Query the 'vector_indices' system table to check the status of the index creation.
get_index_status="SELECT status FROM system.vector_indices WHERE table='myscale_categorical_search'"
# Print the status of the index creation.  The status will be 'Built' if the index was created successfully.
print(f"index build status is {client.command(get_index_status)}")

The output is seen in the following examples:

Python
SQL

index build status is Built

database	table	name	type	status
default	myscale_categorical_search	vector_idx	MSTG	Built

TIP

See the Vector Search for further information on vector indexes.

# Executing SQL Queries

Once you have imported data into a MyScale table and built a vector index, you can query the data using the following search types:

Vector Search
Filtered Search

TIP

The most significant benefit of building an MSTG vector index is its lightning-quick search speed.

# Vector Search

Conventionally, text or images, such as “a blue car” or images of a blue car, is queried. However, MyScale treats all queries as vectors and returns a response to the query based on the similarity (“distance“) between the query and the existing data within the table.

Use the following code snippets to retrieve data using a vector as the query:

Python
SQL

 # pick a random row from the table as the target
random_row = client.query("SELECT * FROM default.myscale_categorical_search ORDER BY rand() LIMIT 1")
assert random_row.row_count == 1
target_row_id = random_row.first_item["id"]
target_row_label = random_row.first_item["label"]
target_row_date = random_row.first_item["date"]
target_row_data = random_row.first_item["data"]
print("currently selected item id={}, label={}, date={}".format(target_row_id, target_row_label, target_row_date))
# Fetch the result of the query.
result = client.query(f"""
SELECT id, date, label, 
    distance(data, {target_row_data}) as dist FROM default.myscale_categorical_search ORDER BY dist LIMIT 10
""")
# Iterate through the rows of the query result and print the 'id', 'date',
# 'label', and distance for each row.
print("Top 10 candidates:")
for row in result.named_results():
    print(row["id"], row["date"], row["label"], row["dist"])

The result set containing the ten most similar results is as follows:

id	date	label	dist
0	2030-09-26	person	0
2	1975-10-07	animal	60,088
395,686	1975-05-04	animal	70,682
203,483	1982-11-28	building	72,585
597,767	2020-09-10	building	72,743
794,777	2015-04-03	person	74,797
591,738	2008-07-15	person	75,256
209,719	1978-06-13	building	76,462
608,767	1970-12-19	building	79,107
591,816	1995-03-20	building	79,390

TIP

These results are in vector embeddings, which you can use to retrieve the original data by referencing the results' id.

# Filtered Search

Not only can we query data using a vector search (using vector embeddings), but we can also execute SQL queries using a combination of structured and vector data, as the following code snippets describe:

Python
SQL

 # Fetch the result of the query.
result = client.query(f"""
SELECT id, date, label, 
    distance(data, {target_row_data}) as dist
    FROM default.myscale_categorical_search WHERE toYear(date) >= 2000 AND label = 'animal'
    ORDER BY dist LIMIT 10
""")
# Iterate through the rows of the query result and print the 'id', 'date',
# 'label', and distance for each row.
for row in result.named_results():
    print(row["id"], row["date"], row["label"], row["dist"])

The result set containing the ten most similar results is as follows:

id	date	label	dist
601,326	2001-05-09	animal	83,481
406,181	2004-12-18	animal	93,655
13,369	2003-01-31	animal	95,158
209,834	2031-01-24	animal	97,258
10,216	2011-08-02	animal	103,297
605,180	2009-04-20	animal	103,839
21,768	2021-01-27	animal	105,764
1,988	2000-03-02	animal	107,305
598,464	2003-01-06	animal	109,670
200,525	2024-11-06	animal	110,029