Mariadb Galera bug dans le script de démarrage?

butler_fr · Février 21, 2016, 9:45pm

Bonjour à tous!

Je ne suis pas certains de moi mais je pense avoir trouvé un bug dans le script de démarrage de mariadb 10 en configuration galera cluster
Ici je veux démarrer un nouveau cluster (étape de création)

voici le résultat de quelques commande et les logs

service mysql stop

service mysql start --wsrep-new-cluster

Job for mysql.service failed. See ‘systemctl status mysql.service’ and ‘journalctl -xn’ for details.

#cat /var/log/syslog

[...] ul 2 09:50:52 debian-OSimage mysqld: 150702 9:50:52 [Note] WSREP: gcomm: connecting to group 'db_wordpress', peer '192.168.8.10:,192.168.8.11:,192.168.8.12:' Jul 2 09:50:52 debian-OSimage mysqld: 150702 9:50:52 [Warning] WSREP: (0fe9ad31, 'tcp://0.0.0.0:4567') address 'tcp://192.168.8.12:4567' points to own listening address, blacklisting Jul 2 09:50:55 debian-OSimage mysqld: 150702 9:50:55 [Warning] WSREP: no nodes coming from prim view, prim not possible Jul 2 09:50:55 debian-OSimage mysqld: 150702 9:50:55 [Note] WSREP: view(view_id(NON_PRIM,0fe9ad31,1) memb { Jul 2 09:50:55 debian-OSimage mysqld: #0110fe9ad31,0 Jul 2 09:50:55 debian-OSimage mysqld: } joined { Jul 2 09:50:55 debian-OSimage mysqld: } left { Jul 2 09:50:55 debian-OSimage mysqld: } partitioned { Jul 2 09:50:55 debian-OSimage mysqld: }) Jul 2 09:50:56 debian-OSimage mysqld: 150702 9:50:56 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.87359S), skipping check Jul 2 09:51:26 debian-OSimage mysqld: 150702 9:51:26 [Note] WSREP: view((empty)) Jul 2 09:51:26 debian-OSimage mysqld: 150702 9:51:26 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) Jul 2 09:51:26 debian-OSimage mysqld: #011 at gcomm/src/pc.cpp:connect():161 Jul 2 09:51:26 debian-OSimage mysqld: 150702 9:51:26 [ERROR] WSREP: gcs/src/gcs_core.cpp:long int gcs_core_open(gcs_core_t*, const char*, const char*, bool)():206: Failed to open backend connection: -110 (Connection timed out) Jul 2 09:51:26 debian-OSimage mysqld: 150702 9:51:26 [ERROR] WSREP: gcs/src/gcs.cpp:long int gcs_open(gcs_conn_t*, const char*, const char*, bool)():1379: Failed to open channel 'db_wordpress' at 'gcomm://192.168.8.10,192.168.8.11,192.168.8.12': -110 (Connection timed out) Jul 2 09:51:26 debian-OSimage mysqld: 150702 9:51:26 [ERROR] WSREP: gcs connect failed: Connection timed out Jul 2 09:51:26 debian-OSimage mysqld: 150702 9:51:26 [ERROR] WSREP: wsrep::connect() failed: 7 Jul 2 09:51:26 debian-OSimage mysqld: 150702 9:51:26 [ERROR] Aborting Jul 2 09:51:26 debian-OSimage mysqld: Jul 2 09:51:26 debian-OSimage mysqld: 150702 9:51:26 [Note] WSREP: Service disconnected.

on voit ici que le serveur cherche à se connecter à un cluster éxistant
ce qu’il ne devrait pas faire du tout! (avec l’option --wsrep-new-cluster)

test en direct avec le binaire:
#mysqld --wsrep-new-cluster

150702 10:50:20 [Note] mysqld (mysqld 10.0.20-MariaDB-1~jessie-wsrep-log) starting as process 6763 ... 150702 10:50:20 [Note] WSREP: Read nil XID from storage engines, skipping position init 150702 10:50:20 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so' 150702 10:50:20 [Note] WSREP: wsrep_load(): Galera 3.9(rXXXX) by Codership Oy <info@codership.com> loaded successfully. 150702 10:50:20 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm. 150702 10:50:20 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1 150702 10:50:20 [Note] WSREP: Passing config to GCS: base_host = 192.168.8.12; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recover 150702 10:50:21 [Note] WSREP: Service thread queue flushed. 150702 10:50:21 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1 150702 10:50:21 [Note] WSREP: wsrep_sst_grab() 150702 10:50:21 [Note] WSREP: Start replication 150702 10:50:21 [Note] WSREP: 'wsrep-new-cluster' option used, bootstrapping the cluster 150702 10:50:21 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1 150702 10:50:21 [Note] WSREP: protonet asio version 0 150702 10:50:21 [Note] WSREP: Using CRC-32C for message checksums. 150702 10:50:21 [Note] WSREP: backend: asio 150702 10:50:21 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory) 150702 10:50:21 [Note] WSREP: restore pc from disk failed 150702 10:50:21 [Note] WSREP: GMCast version 0 150702 10:50:21 [Note] WSREP: (5f144230, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 150702 10:50:21 [Note] WSREP: (5f144230, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 150702 10:50:21 [Note] WSREP: EVS version 0 150702 10:50:21 [Note] WSREP: gcomm: bootstrapping new group 'db_wordpress' 150702 10:50:21 [Note] WSREP: start_prim is enabled, turn off pc_recovery 150702 10:50:21 [Note] WSREP: Node 5f144230 state prim 150702 10:50:21 [Note] WSREP: view(view_id(PRIM,5f144230,1) memb { 5f144230,0 } joined { } left { } partitioned { }) 150702 10:50:21 [Note] WSREP: save pc into disk 150702 10:50:21 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.8.10:4567 150702 10:50:21 [Note] WSREP: discarding pending addr proto entry 0x7f10e6609500 150702 10:50:21 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.8.11:4567 150702 10:50:21 [Note] WSREP: discarding pending addr proto entry 0x7f10e6609740 150702 10:50:21 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.8.12:4567 150702 10:50:21 [Note] WSREP: discarding pending addr proto entry 0x7f10e6609800 150702 10:50:21 [Note] WSREP: gcomm: connected 150702 10:50:21 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636 150702 10:50:21 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0) 150702 10:50:21 [Note] WSREP: Opened channel 'db_wordpress' 150702 10:50:21 [Note] WSREP: Waiting for SST to complete. 150702 10:50:21 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1 150702 10:50:21 [Note] WSREP: Starting new group from scratch: 5f6bc890-2097-11e5-afbe-cb017e6de0c0 150702 10:50:21 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 5f6f4693-2097-11e5-ac28-bb5b1c3a5ee8 150702 10:50:21 [Note] WSREP: STATE EXCHANGE: sent state msg: 5f6f4693-2097-11e5-ac28-bb5b1c3a5ee8 150702 10:50:21 [Note] WSREP: STATE EXCHANGE: got state msg: 5f6f4693-2097-11e5-ac28-bb5b1c3a5ee8 from 0 (teststack-db-3374vvwqqb75) 150702 10:50:21 [Note] WSREP: Quorum results: version = 3, component = PRIMARY, conf_id = 0, members = 1/1 (joined/total), act_id = 0, last_appl. = -1, protocols = 0/7/3 (gcs/repl/appl), group UUID = 5f6bc890-2097-11e5-afbe-cb017e6de0c0 150702 10:50:21 [Note] WSREP: Flow-control interval: [16, 16] 150702 10:50:21 [Note] WSREP: Restored state OPEN -> JOINED (0) 150702 10:50:21 [Note] WSREP: Member 0.0 (teststack-db-3374vvwqqb75) synced with group. 150702 10:50:21 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0) 150702 10:50:21 [Note] WSREP: New cluster view: global state: 5f6bc890-2097-11e5-afbe-cb017e6de0c0:0, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3 150702 10:50:22 [Note] WSREP: SST complete, seqno: 0 150702 10:50:22 [Note] InnoDB: Using mutexes to ref count buffer pool pages 150702 10:50:22 [Note] InnoDB: The InnoDB memory heap is disabled 150702 10:50:22 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 150702 10:50:22 [Note] InnoDB: Memory barrier is not used 150702 10:50:22 [Note] InnoDB: Compressed tables use zlib 1.2.8 150702 10:50:22 [Note] InnoDB: Using Linux native AIO 150702 10:50:22 [Note] InnoDB: Not using CPU crc32 instructions 150702 10:50:22 [Note] InnoDB: Initializing buffer pool, size = 256.0M 150702 10:50:23 [Note] InnoDB: Completed initialization of buffer pool 150702 10:50:23 [Note] InnoDB: Highest supported file format is Barracuda. 150702 10:50:25 [Note] InnoDB: 128 rollback segment(s) are active. 150702 10:50:25 [Note] InnoDB: Waiting for purge to start 150702 10:50:25 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.24-72.2 started; log sequence number 1617051 150702 10:50:31 [Warning] WSREP: last inactive check more than PT1.5S ago (PT4.42184S), skipping check 150702 10:50:39 [Note] Plugin 'FEEDBACK' is disabled. 150702 10:50:40 [Note] Server socket created on IP: '0.0.0.0'. 150702 10:50:42 [Note] Event Scheduler: Loaded 0 events 150702 10:50:43 [Note] Reading of all Master_info entries succeded 150702 10:50:43 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 150702 10:50:43 [Note] Added new Master_info '' to hash table 150702 10:50:43 [Note] mysqld: ready for connections. Version: '10.0.20-MariaDB-1~jessie-wsrep-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution, wsrep_25.10.r4144 150702 10:50:43 [Note] WSREP: REPL Protocols: 7 (3, 2) 150702 10:50:43 [Note] WSREP: Service thread queue flushed. 150702 10:50:43 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3 150702 10:50:43 [Note] WSREP: Service thread queue flushed. 150702 10:50:43 [Note] WSREP: Synchronized with group, ready for connections 150702 10:50:43 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

deux ligne importante:
150702 10:50:21 [Note] WSREP: ‘wsrep-new-cluster’ option used, bootstrapping the cluster
et
150702 10:50:21 [Note] WSREP: gcomm: bootstrapping new group ‘db_wordpress’

donc ce n’est pas un souci de configuration à priori

quelqu’un aurait une idée?

ps: debian 8 jessie
uname -a
Linux teststack-db-3374vvwqqb75 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1 (2015-04-24) x86_64 GNU/Linux

dpkg -l | grep mariadb
ii libmariadbclient18 10.0.20+maria-1~jessie amd64 MariaDB database client library
ii mariadb-client-10.0 10.0.20+maria-1~jessie amd64 MariaDB database client binaries
ii mariadb-client-core-10.0 10.0.20+maria-1~jessie amd64 MariaDB database core client binaries
ii mariadb-common 10.0.20+maria-1~jessie all MariaDB database common files (e.g. /etc/mysql/conf.d/mariadb.cnf)
ii mariadb-galera-server 10.0.20+maria-1~jessie all MariaDB database server with Galera cluster (metapackage depending on the latest version)
ii mariadb-galera-server-10.0 10.0.20+maria-1~jessie amd64 MariaDB database server with Galera cluster binaries

Thunder · Février 21, 2016, 9:54pm

Salut

J’ai exactement le même problème que toi, as tu trouvé la cause? J’ai également essayé de le démarrer avec l’option --wsrep_cluster_address=‘gcomm://’ mais rien à faire il ne prend pas cette option en compte non plus.

Pour le moment j’ai du me contenter de démarrer le premier noeud avec wsrep_cluster_address=‘gcomm://’ dans le fichier de conf. J’ai ensuite démarré les autres noeuds, éteint le premier noeud, remis les ip dans le fichier de conf et redémarré le 1er noeud. Mais bon c’est quand même pas top.

Merci d’avance