-
Notifications
You must be signed in to change notification settings - Fork 288
Presto snowflake enhancement #187
Presto snowflake enhancement #187
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and for the detailed instruction on testing. I have a few moments to clarify, but generally, it looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly style comments.
Please try to reduce the amount of if
s and repetitive lines.
Bad arguments should fail explicitly rather than silently.
data_diff/databases/presto.py
Outdated
kw.pop("password") | ||
|
||
if "cert" in kw: # if a certificate was specified in URI, verify session with cert | ||
cert = kw.get("cert") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be
cert = kw.pop("cert")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have corrected this
data_diff/databases/presto.py
Outdated
self.default_schema = kw.get("schema") | ||
|
||
if ( | ||
"password" in kw and "auth" in kw and kw.get("auth") == "basic" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If auth == 'basic'
, the lack of user/password should throw an exception, rather than fail silently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have implemented this
data_diff/databases/snowflake.py
Outdated
format=serialization.PrivateFormat.PKCS8, | ||
encryption_algorithm=serialization.NoEncryption(), | ||
) | ||
self._conn = snowflake.connector.connect(private_key=pkb, schema=f'"{schema}"', **kw) # replaces password |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you change this to
kw['private_key'] = pkb
There's no need to duplicate the connect()
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have implemented this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More remarks regarding style
data_diff/databases/presto.py
Outdated
kw.get("auth") == "basic" and "user" in kw and "password" in kw | ||
# if auth=basic, add basic authenticator for Presto | ||
kw["auth"] = prestodb.auth.BasicAuthentication(kw.pop("user"), kw.pop("password")) | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please catch specific exceptions. I believe this one should be KeyError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it from ValueError
to KeyError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You misunderstood. My problem was with your use of except:
.
Raising ValueError was actually the better raising KeyError.
data_diff/databases/presto.py
Outdated
|
||
try: | ||
# checks if user and password are missing when auth=basic | ||
kw.get("auth") == "basic" and "user" in kw and "password" in kw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't do anything, won't even throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It throws an exception if user or password is missing. I tested it
This is after replacing ValueError
with KeyError
:
try:
# checks if user and password are missing when auth=basic
kw.get("auth") == "basic" and "user" in kw and "password" in kw
# if auth=basic, add basic authenticator for Presto
kw["auth"] = prestodb.auth.BasicAuthentication(kw.pop("user"), kw.pop("password"))
except:
raise KeyError("User or password cannot be missing if auth==basic")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It throws an exception if user or password is missing. I tested it
The second line does. The first line doesn't do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean having the code like this instead:
if kw.get("auth") == "basic":
kw["auth"] = prestodb.auth.BasicAuthentication(kw.pop("user"), kw.pop("password"))
So that it throws error if user or password is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's more in line with what I'm thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I have corrected that now
Hi @matthiasekundayo-eb , I tried to follow your instructions for setup on Snowflake, but got stuck. Any chance you can walk me through it live? On slack/discord/zoom/hangouts, or anywhere else. |
Hi @erezsh Will you be available anytime later today? I am on EST timezone. |
@erezsh Are you in Isreal? |
@matthiasekundayo-eb I'm in Amsterdam, but close enough. These dates/hours are fine. Where can I reach you? |
@erezsh I just sent over a Google Meet invite to your inbox. Here is also a copy of the same invite: |
@matthiasekundayo-eb Okay, I tested Snowflake and it works with the ImportError: Failed to import test module: tests.test_database_types
Traceback (most recent call last):
File "C:\Python310\lib\site-packages\cryptography\utils.py", line 38, in _check_byteslike
memoryview(value)
TypeError: memoryview: a bytes-like object is required, not 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python310\lib\unittest\loader.py", line 436, in _find_test_path
module = self._get_module_from_name(name)
File "C:\Python310\lib\unittest\loader.py", line 377, in _get_module_from_name
__import__(name)
File "C:\code\datafold\xdiff\tests\test_database_types.py", line 30, in <module>
CONNS = {k: db.connect.connect(v, N_THREADS) for k, v in CONN_STRINGS.items()}
File "C:\code\datafold\xdiff\tests\test_database_types.py", line 30, in <dictcomp>
CONNS = {k: db.connect.connect(v, N_THREADS) for k, v in CONN_STRINGS.items()}
File "C:\code\datafold\xdiff\data_diff\databases\connect.py", line 204, in connect
return connect_with_dict(db_conf, thread_count)
File "C:\code\datafold\xdiff\data_diff\databases\connect.py", line 173, in connect_with_dict
return cls(**d)
File "C:\code\datafold\xdiff\data_diff\databases\snowflake.py", line 44, in __init__
p_key = serialization.load_pem_private_key(
File "C:\Python310\lib\site-packages\cryptography\hazmat\primitives\serialization\base.py", line 22, in load_pem_private_key
return ossl.load_pem_private_key(data, password)
File "C:\Python310\lib\site-packages\cryptography\hazmat\backends\openssl\backend.py", line 823, in load_pem_private_key
return self._load_key(
File "C:\Python310\lib\site-packages\cryptography\hazmat\backends\openssl\backend.py", line 1042, in _load_key
utils._check_byteslike("password", password)
File "C:\Python310\lib\site-packages\cryptography\utils.py", line 40, in _check_byteslike
raise TypeError("{} must be bytes-like".format(name))
TypeError: password must be bytes-like I think it should be able to accept string passwords, no? |
@matthiasekundayo-eb For Presto, I'm getting this error: File "C:\Python310\lib\site-packages\parameterized\parameterized.py", line 533, in standalone_func
return func(*(a + p.args), **p.kwargs)
File "C:\code\datafold\xdiff\tests\test_database_types.py", line 569, in test_types
_drop_table_if_exists(dst_conn, dst_table)
File "C:\code\datafold\xdiff\tests\common.py", line 98, in _drop_table_if_exists
conn.query(f"DROP TABLE IF EXISTS {table}", None)
File "C:\code\datafold\xdiff\data_diff\databases\base.py", line 101, in query
res = self._query(sql_code)
File "C:\code\datafold\xdiff\data_diff\databases\presto.py", line 65, in _query
c = self._conn.cursor()
File "C:\Users\erez\AppData\Roaming\Python\Python310\site-packages\prestodb\dbapi.py", line 166, in cursor
request = self._create_request()
File "C:\Users\erez\AppData\Roaming\Python\Python310\site-packages\prestodb\dbapi.py", line 141, in _create_request
return prestodb.client.PrestoRequest(
File "C:\Users\erez\AppData\Roaming\Python\Python310\site-packages\prestodb\client.py", line 256, in __init__
raise ValueError("cannot use authentication with HTTP")
ValueError: cannot use authentication with HTTP |
Hi @erezsh, For Presto, did you use |
Hi @matthiasekundayo-eb , when I add
On the server side, I'm getting this error repeatedly:
Any idea what's causing it? It only happens for https, it seems. As for snowflake, no problem, I can fix the error myself. |
Hi @erezsh Try using |
Hi @matthiasekundayo-eb , the port helped, thanks! Now I'm getting I don't really know why, but tomorrow I'll try to redo the process as you described it, from scratch, just in case I made a mistake on the way. |
@matthiasekundayo-eb I got it working. I think maybe I missed putting 'localhost' as Common Name. Either way, I think it's ready to merge. I will try to do so later today. |
Merged. Thank you for your contribution, and sorry it took a while! |
Thanks @erezsh |
Changes entail:
Snowflake:
Allow auth by PKCS8 key for Snowflake
User private key specification
Presto:
Allow verify http session with certificate for Presto
Allow using basic authentication for Presto
Presto Port specification
Presto HTTP scheme specification
Presto certificate specification
For Testing:
Snowflake:
Create a Snowflake free trial account here,
Run the following in your local machine via CLI to generate a new SSH key pair:
Then take what is on your Clipboard, and in Snowflake, associate it to a user you’re trying to log in as:
alter user <SNOWFLAKE_USER> set rsa_public_key='FILL_IN_VALUE_FROM_CLIPBOARD'
Presto:
Edit v3.ext file
vi v3.ext
Put the following in file v3.ext:
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid:always,issuer:always
basicConstraints = CA:TRUE
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment, keyAgreement, keyCertSign
subjectAltName = DNS:localhost
issuerAltName = issuer:copy
Run:
openssl x509 -req -in example.com.csr -signkey example.com.key -out example.com.crt -days 3650 -sha256 -extfile v3.ext
Combine cert and private keys into a single file:
cat example.com.crt example.com.key > combined.pem
Put that combined.pem into dev/presto-conf/standalone/
Put the example.com.crt into the data-diff project root
Creat a new user password file with the following , providing test as the password as well
Put that password.db under dev/presto-conf/standalone/
Create a new dev/presto-conf/standalone/password-authenticator.properties file with the following contents:
- '8443:8443'
After the above steps are completed, mount the data-diff directory in a docker container, load sample data and test using a
config.toml
file like below:Then run data diff in the CLI using:
poetry run python3 -m data_diff --conf <path/to/config.toml/file> --run test_diff -v
E.g.
poetry run python3 -m data_diff --conf data_diff/config.toml --run test_diff -v