# Quickstart
This guide will show you how to launch a cluster, import data, and execute SQL queries in just a few simple steps. For more information on other developer tools, see the Developer Tools for more information.
# Launching a Cluster
Before you can perform any data operations, you need to launch a cluster. The Clusters page lets you configure computing and storage resources to meet your requirements. Follow these steps to launch a new cluster:
- Go to the Clusters page and click the "+ New Cluster" to launch a new cluster.
- Enter a preferred cluster name, such as "default".
- The current version of MyScale is limited to the default configuration and does not support multiple replicas. If you need a more powerful configuration, please refer to the Support & Help section and get in touch with us.
- Click 'Launch' to complete the cluster launch.
# Environment Setup
# Python
Before getting started with python, you need to install Clickhouse client (opens new window).
$ pip install clickhouse-connect
To connect to your cluster with python, you'll need to provide your cluster host, username, and password. Once the cluster creation is complete, you can find the Connection Details button in the Action column of your cluster list.
After clicking on the Connection Details button, you will see the necessary code required to access MyScale. Simply click on the copy icon to copy the corresponding code.
For more guidance on obtaining the necessary information to connect to the cluster, please refer to the Connection Details.
# MyScale Console
Navigate to the "SQL Workspace" page. Your cluster will be selected automatically, as shown below:
# Importing Data
To import data into MyScale, follow these three steps:
- Create a table
- Insert data into the table
- Build a vector index
# Creating a Table
Tables must be created in MyScale before you can import data. To learn more, see the "Create Database" and "Create Table" sub-sections in the SQL Reference section.
Create a new table named default.myscale_categorical_search
.
- Python
- SQL
# Create a table with 128 dimensional vectors.
client.command("""
CREATE TABLE default.myscale_categorical_search
(
id UInt32,
data Array(Float32),
CONSTRAINT check_length CHECK length(data) = 128,
date Date,
label Enum8('person' = 1, 'building' = 2, 'animal' = 3)
)
ENGINE = MergeTree ORDER BY id""")
# Inserting Data
MyScale currently supports data import from AWS S3 and other cloud services with S3-compatible APIs. Find out more import options here.
Use SQL to import data into the default.myscale_categorical_search
table. Supported file formats include CSV
(opens new window), CSVWithNames
(opens new window), JSONEachRow
(opens new window), and Parquet
(opens new window). Please see Formats for Input and Output Data (opens new window) for detailed description of all supported formats.
- Python
- SQL
client.command("""
INSERT INTO default.myscale_categorical_search
SELECT * FROM s3(
'https://d3lhz231q7ogjd.cloudfront.net/sample-datasets/quick-start/categorical-search.csv',
'CSVWithNames',
'id UInt32, data Array(Float32), date Date, label Enum8(''person'' = 1, ''building'' = 2, ''animal'' = 3)'
)""")
# Building a Vector Index
In addition to creating traditional indexes on structured data, you can also create a Vector Index in MyScale for vector embeddings.
- Create an MSTG vector index and check vector index build status:
- Python
- SQL
client.command("""
ALTER TABLE default.myscale_categorical_search
ADD VECTOR INDEX categorical_vector_idx data
TYPE MSTG
""")
The build time of the index will depend on the size of your data import. Further status of vector indexes can be found in Vector Reference Section
- Use SQL to check vector index build status:
- Python
- SQL
# Query the 'vector_indices' system table to check the status of the index creation.
get_index_status="SELECT status FROM system.vector_indices WHERE table='myscale_categorical_search'"
# Print the status of the index creation. The status will be 'Built' if the index was created successfully.
print(f"index build status is {client.command(get_index_status)}")
Output:
- Python
- SQL
index build status is Built
# Executing SQL Queries
After importing data into a MyScale table, queries may be performed on the data within the table. In no time, you’ll be able to experience the quick query speeds attainable with MyScale.
# Vector Search
Conventionally, queries are made using text or images, such as “a blue car” or images of a blue car. However, MyScale treats all queries as vectors, and gives a response to the query based on the similarity (“distance“) between the query and the existing data within the table.
To retrieve data using a vector as the query, use this SQL command as below, which returns the 10 most similar results:
- Python
- SQL
# pick a random row from the table as the target
random_row = client.query("SELECT * FROM default.myscale_categorical_search ORDER BY rand() LIMIT 1")
assert random_row.row_count == 1
target_row_id = random_row.first_item["id"]
target_row_label = random_row.first_item["label"]
target_row_date = random_row.first_item["date"]
target_row_data = random_row.first_item["data"]
print("currently selected item id={}, label={}, date={}".format(target_row_id, target_row_label, target_row_date))
# Fetch the result of the query.
result = client.query(f"""
SELECT id, date, label,
distance(data, {target_row_data}) as dist FROM default.myscale_categorical_search ORDER BY dist LIMIT 10
""")
# Iterate through the rows of the query result and print the 'id', 'date',
# 'label', and distance for each row.
print("Top 10 candidates:")
for row in result.named_results():
print(row["id"], row["date"], row["label"], row["dist"])
Sample Output:
id | date | label | dist |
---|---|---|---|
0 | 2030-09-26 | person | 0 |
2 | 1975-10-07 | animal | 60,088 |
395,686 | 1975-05-04 | animal | 70,682 |
203,483 | 1982-11-28 | building | 72,585 |
597,767 | 2020-09-10 | building | 72,743 |
794,777 | 2015-04-03 | person | 74,797 |
591,738 | 2008-07-15 | person | 75,256 |
209,719 | 1978-06-13 | building | 76,462 |
608,767 | 1970-12-19 | building | 79,107 |
591,816 | 1995-03-20 | building | 79,390 |
The results are in the form of vector embeddings, which also can later be used to retrieve the original data by referencing the “id“ of the results.
# Filtered Search
In the earlier demonstration, a query was executed using vector embeddings. Additionally, users can also perform queries that utilize a combination of structured and vector data in SQL.
- Python
- SQL
# Fetch the result of the query.
result = client.query(f"""
SELECT id, date, label,
distance(data, {target_row_data}) as dist
FROM default.myscale_categorical_search WHERE toYear(date) >= 2000 AND label = 'animal'
ORDER BY dist LIMIT 10
""")
# Iterate through the rows of the query result and print the 'id', 'date',
# 'label', and distance for each row.
for row in result.named_results():
print(row["id"], row["date"], row["label"], row["dist"])
Sample Output:
id | date | label | dist |
---|---|---|---|
601,326 | 2001-05-09 | animal | 83,481 |
406,181 | 2004-12-18 | animal | 93,655 |
13,369 | 2003-01-31 | animal | 95,158 |
209,834 | 2031-01-24 | animal | 97,258 |
10,216 | 2011-08-02 | animal | 103,297 |
605,180 | 2009-04-20 | animal | 103,839 |
21,768 | 2021-01-27 | animal | 105,764 |
1,988 | 2000-03-02 | animal | 107,305 |
598,464 | 2003-01-06 | animal | 109,670 |
200,525 | 2024-11-06 | animal | 110,029 |